org.apache.lucene.search.spell
Class CompassSpellChecker

java.lang.Object
  extended by org.apache.lucene.search.spell.CompassSpellChecker

public class CompassSpellChecker
extends Object

Spell Checker class (Main class)
(initially inspired by the David Spencer code).

Example Usage:

  SpellChecker spellcheck = new SpellChecker(spellIndexDirectory);
  // To index a field of a user index:
  spellcheck.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
  // To index a file containing words:
  spellcheck.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
  String[] suggestions = spellcheck.suggestSimilar("misspelt", 5);
 

Version:
1.0

Field Summary
static String F_WORD
          Field name for each word in the ngram index.
(package private)  Directory spellIndex
          the spell index
 
Constructor Summary
CompassSpellChecker(Directory spellIndex)
          Use the given directory as a spell checker index.
CompassSpellChecker(Directory spellIndex, boolean indexing)
           
CompassSpellChecker(Searcher searcher, IndexReader reader)
           
 
Method Summary
 void clearIndex()
          Removes all terms from the spell check index.
 void close()
           
 boolean exist(String word)
          Check whether the word exists in the index.
 StringDistance getStringDistance()
           
 void indexDictionary(IndexWriter writer, Dictionary dict)
          Indexes the data from the given Dictionary.
 void setAccuracy(float minScore)
          Sets the accuracy 0 < minScore < 1; default 0.5
 void setSpellIndex(Directory spellIndex)
          Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.
 void setStringDistance(StringDistance sd)
           
 String[] suggestSimilar(String word, int numSug)
          Suggest similar words.
 String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, boolean morePopular)
          Suggest similar words (optionally restricted to a field of an index).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

F_WORD

public static final String F_WORD
Field name for each word in the ngram index.

See Also:
Constant Field Values

spellIndex

Directory spellIndex
the spell index

Constructor Detail

CompassSpellChecker

public CompassSpellChecker(Searcher searcher,
                           IndexReader reader)

CompassSpellChecker

public CompassSpellChecker(Directory spellIndex)
                    throws IOException
Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.

Parameters:
spellIndex -
Throws:
IOException

CompassSpellChecker

public CompassSpellChecker(Directory spellIndex,
                           boolean indexing)
                    throws IOException
Throws:
IOException
Method Detail

close

public void close()

setStringDistance

public void setStringDistance(StringDistance sd)

getStringDistance

public StringDistance getStringDistance()

setSpellIndex

public void setSpellIndex(Directory spellIndex)
                   throws IOException
Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.

Parameters:
spellIndex -
Throws:
IOException

setAccuracy

public void setAccuracy(float minScore)
Sets the accuracy 0 < minScore < 1; default 0.5


suggestSimilar

public String[] suggestSimilar(String word,
                               int numSug)
                        throws IOException
Suggest similar words.

As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

Parameters:
word - the word you want a spell check done on
numSug - the number of suggested words
Returns:
String[]
Throws:
IOException

suggestSimilar

public String[] suggestSimilar(String word,
                               int numSug,
                               IndexReader ir,
                               String field,
                               boolean morePopular)
                        throws IOException
Suggest similar words (optionally restricted to a field of an index).

As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

Parameters:
word - the word you want a spell check done on
numSug - the number of suggested words
ir - the indexReader of the user index (can be null see field param)
field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.
morePopular - return only the suggest words that are as frequent or more frequent than the searched word (only if restricted mode = (indexReader!=null and field!=null)
Returns:
String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
Throws:
IOException

clearIndex

public void clearIndex()
                throws IOException
Removes all terms from the spell check index.

Throws:
IOException

exist

public boolean exist(String word)
              throws IOException
Check whether the word exists in the index.

Parameters:
word -
Returns:
true iff the word exists in the index
Throws:
IOException

indexDictionary

public void indexDictionary(IndexWriter writer,
                            Dictionary dict)
                     throws IOException
Indexes the data from the given Dictionary.

Parameters:
dict - Dictionary to index
mergeFactor - mergeFactor to use when indexing
ramMB - the max amount or memory in MB to use
Throws:
IOException


Copyright (c) 2004-2008 The Compass Project.