jSpellCorrect - It's a simple statistical spelling corrector.


This is a Java implementation of the statistical spelling checking algorithm by Peter Norvig which he described in his Essay "How to write a Spelling Corrector".


24/04/2008 - The spelling corrector was converted to an OSGi bundle. Actually the ZIP-Archive contains three bundles. One for the interface, one for the spelling checker and one example using a service tracker. The source code will be uploaded later.
22/10/2007 - Uploaded Version 0.4. This one brings a major speed increase.
18/06/2007 - The new 0.2 Release fixes a bug in the max method. Also added some performance information.
15/06/2007 - Initial release.


Please see Mr. Norvigs Essay.


Please see Mr. Norvigs Essay.


Performance Tuning
The Python implementation gives these results:
{'bad': 68, 'bias': None, 'unknown': 15, 'secs': 16, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': None, 'unknown': 43, 'secs': 26, 'pct': 67, 'n': 400}
About 17Hz.
My Java implementation (0.2) these:
{'bad': 69, 'bias': 0, 'unknown': 54, 'secs': 53, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 80, 'pct': 67, 'n': 400}
About 5Hz.
As you can see, the java implementation is quite slower. I'm not sure why, but I'm still trying to improve the performance.

NEW: Version 0.4

The new release brings a major speedup, due to a small optimization.
{'bad': 68, 'bias': 0, 'unknown': 53, 'secs': 18, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 27, 'pct': 67, 'n': 400}
This results in about 15Hz. Quite good, eh?


jSpellCorrect 0.4 (OSGi) (md5sum: n/a)
jSpellCorrect 0.4 (md5sum: n/a)
jSpellCorrect 0.3 (md5sum: n/a)
jSpellCorrect 0.2 (md5sum: 88b7b98c687d9ccd96a0f30366c171a3)
jSpellCorrect 0.1 (md5sum: 3bb50da1b7de861e16d57614b74838c4)


At moment I don't think there is much documentation necessary. Just download the JAR File and use this code to get started:
import org.gauner.jSpellCorrect.ToySpellingCorrector;
ToySpellingCorrector sc = new ToySpellingCorrector();
// train some data from a text file
// train a single word
sc.trainSingle("some word");
// get the best suggestion
JavaDoc will be provided later.

Public SVN

A public SVN is available at this location.



Contact me at lkml_AT_ds_DOT_gauner_DOT_org for questions about this Java implementation. For more information about the algorithm itself see the Peter Norvig's Essay.

Here you can tell me whatever you want. If you wan't me to be able to contact you remeber to provide some contact adress, e.g. Jabber or Mail.

Rezepte   developer.gauner.org   blog   Hosted by I.D.S.   Valid HTML 4.01 Transitional   Valid CSS!