spotcamping.blogg.se

Apache lucene spell check language
Apache lucene spell check language







apache lucene spell check language

However, what I found attractive about it was its compactness and performance. I read about this approach first on Tom White's article "Did you mean Lucene?" on quite some time back, but I never completely understood it then, probably because I was new to Lucene. This is a newer approach to spell checking, which has become feasible with the availability of open source search libraries such as Lucene, since it requires the functionality to tokenize your dictionary words and store them in an index for quick retrieval. The idea is that a misspelled word would have only one or two or three characters misspelled, transposed or missing, so by comparing the n-grams, you would get matches to correctly spelled words that are close to it.

apache lucene spell check language

When the user enters a misspelled word, you do the same thing with his input word, then match the ngrams generated to the ngrams in your dictionary. Basically you break up your dictionary word into sequences of characters of size n, moving your pointer forward one character at a time, and store it in an index. Jazzy uses a combination of Metaphone and Levenshtein distance (aka Edit distance) to match a misspelled word to a set of words in its dictionary.Īn alternative approach to spell checking is the use of n-grams. We use Jazzy, a Java implementation of GNU Aspell, as our spell checking library.

apache lucene spell check language

#Apache lucene spell check language code#

My initial interest in spell checking algorithms started when I had to fix some bugs in the spell checking code at work over a year ago.









Apache lucene spell check language