About
This is a Java implementation of the statistical spelling checking algorithm by Peter Norvig which he described in his Essay "How to write a Spelling Corrector".News
24/04/2008 - The spelling corrector was converted to an OSGi bundle. Actually the ZIP-Archive contains three bundles. One for the interface, one for the spelling checker and one example using a service tracker. The source code will be uploaded later.22/10/2007 - Uploaded Version 0.4. This one brings a major speed increase.
18/06/2007 - The new 0.2 Release fixes a bug in the max method. Also added some performance information.
15/06/2007 - Initial release.
Features
Please see Mr. Norvigs Essay.Background
Please see Mr. Norvigs Essay.Todo
Performance TuningThe Python implementation gives these results:
{'bad': 68, 'bias': None, 'unknown': 15, 'secs': 16, 'pct': 74, 'n': 270} {'bad': 130, 'bias': None, 'unknown': 43, 'secs': 26, 'pct': 67, 'n': 400}About 17Hz.
My Java implementation (0.2) these:
{'bad': 69, 'bias': 0, 'unknown': 54, 'secs': 53, 'pct': 74, 'n': 270} {'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 80, 'pct': 67, 'n': 400}About 5Hz.
As you can see, the java implementation is quite slower. I'm not sure why, but I'm still trying to improve the performance.
NEW: Version 0.4
The new release brings a major speedup, due to a small optimization.{'bad': 68, 'bias': 0, 'unknown': 53, 'secs': 18, 'pct': 74, 'n': 270} {'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 27, 'pct': 67, 'n': 400}This results in about 15Hz. Quite good, eh?
Download
jSpellCorrect 0.4 (OSGi) (md5sum: n/a)jSpellCorrect 0.4 (md5sum: n/a)
jSpellCorrect 0.3 (md5sum: n/a)
jSpellCorrect 0.2 (md5sum: 88b7b98c687d9ccd96a0f30366c171a3)
jSpellCorrect 0.1 (md5sum: 3bb50da1b7de861e16d57614b74838c4)
Documenation
At moment I don't think there is much documentation necessary. Just download the JAR File and use this code to get started:import org.gauner.jSpellCorrect.ToySpellingCorrector; ToySpellingCorrector sc = new ToySpellingCorrector(); // train some data from a text file sc.trainFile("/tmp/big.txt"); // train a single word sc.trainSingle("some word"); // get the best suggestion System.out.println(sc.correct("Cads")); System.out.println(sc.correct("Dok")); System.out.println(sc.correct("Speling"));JavaDoc will be provided later.
Public SVN
A public SVN is available at this location.Contact
Contact me at lkml_AT_ds_DOT_gauner_DOT_org for questions about this Java implementation. For more information about the algorithm itself see the Peter Norvig's Essay.Here you can tell me whatever you want. If you wan't me to be able to contact you remeber to provide some contact adress, e.g. Jabber or Mail.