jSpellCorrect - It's a simple statistical spelling corrector.


This is a Java implementation of the statistical spelling checking algorithm by Peter Norvig which he described in his Essay "How to write a Spelling Corrector".


24/04/2008 - The spelling corrector was converted to an OSGi bundle. Actually the ZIP-Archive contains three bundles. One for the interface, one for the spelling checker and one example using a service tracker. The source code will be uploaded later.
22/10/2007 - Uploaded Version 0.4. This one brings a major speed increase.
18/06/2007 - The new 0.2 Release fixes a bug in the max method. Also added some performance information.
15/06/2007 - Initial release.


Please see Mr. Norvigs Essay.


Performance Tuning
The Python implementation gives these results:
{'bad': 68, 'bias': None, 'unknown': 15, 'secs': 16, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': None, 'unknown': 43, 'secs': 26, 'pct': 67, 'n': 400}
About 17Hz.
My Java implementation (0.2) these:
{'bad': 69, 'bias': 0, 'unknown': 54, 'secs': 53, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 80, 'pct': 67, 'n': 400}
About 5Hz.
As you can see, the java implementation is quite slower. I'm not sure why, but I'm still trying to improve the performance.

NEW: Version 0.4

The new release brings a major speedup, due to a small optimization.
{'bad': 68, 'bias': 0, 'unknown': 53, 'secs': 18, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 27, 'pct': 67, 'n': 400}
This results in about 15Hz. Quite good, eh?


jSpellCorrect 0.4 (OSGi) (md5sum: n/a)
jSpellCorrect 0.4 (md5sum: n/a)
jSpellCorrect 0.3 (md5sum: n/a)
jSpellCorrect 0.2 (md5sum: 88b7b98c687d9ccd96a0f30366c171a3)
jSpellCorrect 0.1 (md5sum: 3bb50da1b7de861e16d57614b74838c4)


At moment I don't think there is much documentation necessary. Just download the JAR File and use this code to get started:
import org.gauner.jSpellCorrect.ToySpellingCorrector;
ToySpellingCorrector sc = new ToySpellingCorrector();
// train some data from a text file
// train a single word
sc.trainSingle("some word");
// get the best suggestion
JavaDoc will be provided later.

Public SVN

A public SVN is available at this location.



Contact me at lkml_AT_ds_DOT_gauner_DOT_org for questions about this Java implementation. For more information about the algorithm itself see the Peter Norvig's Essay.

