jSpellCorrect

jSpellCorrect - It's a simple statistical spelling corrector.


About

This is a Java implementation of the statistical spelling checking algorithm by Peter Norvig which he described in his Essay "How to write a Spelling Corrector".

News

24/04/2008 - The spelling corrector was converted to an OSGi bundle. Actually the ZIP-Archive contains three bundles. One for the interface, one for the spelling checker and one example using a service tracker. The source code will be uploaded later.
22/10/2007 - Uploaded Version 0.4. This one brings a major speed increase.
18/06/2007 - The new 0.2 Release fixes a bug in the max method. Also added some performance information.
15/06/2007 - Initial release.

Features

Please see Mr. Norvigs Essay.

Background

Please see Mr. Norvigs Essay.

Todo

Performance Tuning
The Python implementation gives these results:
{'bad': 68, 'bias': None, 'unknown': 15, 'secs': 16, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': None, 'unknown': 43, 'secs': 26, 'pct': 67, 'n': 400}
About 17Hz.
My Java implementation (0.2) these:
{'bad': 69, 'bias': 0, 'unknown': 54, 'secs': 53, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 80, 'pct': 67, 'n': 400}
About 5Hz.
As you can see, the java implementation is quite slower. I'm not sure why, but I'm still trying to improve the performance.

NEW: Version 0.4

The new release brings a major speedup, due to a small optimization.
{'bad': 68, 'bias': 0, 'unknown': 53, 'secs': 18, 'pct': 74, 'n': 270}
{'bad': 130, 'bias': 0, 'unknown': 87, 'secs': 27, 'pct': 67, 'n': 400}
This results in about 15Hz. Quite good, eh?

Download

jSpellCorrect 0.4 (OSGi) (md5sum: n/a)
jSpellCorrect 0.4 (md5sum: n/a)
jSpellCorrect 0.3 (md5sum: n/a)
jSpellCorrect 0.2 (md5sum: 88b7b98c687d9ccd96a0f30366c171a3)
jSpellCorrect 0.1 (md5sum: 3bb50da1b7de861e16d57614b74838c4)

Documenation

At moment I don't think there is much documentation necessary. Just download the JAR File and use this code to get started:
import org.gauner.jSpellCorrect.ToySpellingCorrector;
ToySpellingCorrector sc = new ToySpellingCorrector();
// train some data from a text file
sc.trainFile("/tmp/big.txt");
// train a single word
sc.trainSingle("some word");
// get the best suggestion
System.out.println(sc.correct("Cads"));
System.out.println(sc.correct("Dok"));
System.out.println(sc.correct("Speling"));
JavaDoc will be provided later.

Public SVN

A public SVN is available at this location.

Screenshots

Contact

Contact me at lkml_AT_ds_DOT_gauner_DOT_org for questions about this Java implementation. For more information about the algorithm itself see the Peter Norvig's Essay.

Here you can tell me whatever you want. If you wan't me to be able to contact you remeber to provide some contact adress, e.g. Jabber or Mail.




Rezepte   developer.gauner.org   blog   Hosted by I.D.S.   Valid HTML 4.01 Transitional   Valid CSS!