org.compass.core.lucene
Class LuceneEnvironment.SearchEngineIndex

java.lang.Object
  extended by org.compass.core.lucene.LuceneEnvironment.SearchEngineIndex
Enclosing class:
LuceneEnvironment

public abstract static class LuceneEnvironment.SearchEngineIndex
extends Object

Specific environment settings for the batch_insert settings.


Field Summary
static String CACHE_INTERVAL_INVALIDATION
          Sets how often (in milliseconds) the index manager will check if the index cache needs to be invalidated.
static long DEFAULT_CACHE_INTERVAL_INVALIDATION
          The default cache interval invalidation.
static String INDEX_MANAGER_SCHEDULE_INTERVAL
          The index manager schedule interval (in seconds) where different actions related to index manager will happen (such as global cache interval checks.
static String MAX_BUFFERED_DELETED_TERMS
          Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed.
static String MAX_BUFFERED_DOCS
          Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment.
static String MAX_FIELD_LENGTH
          The maximum number of terms that will be indexed for a single field in a document.
static String MAX_MERGE_DOCS
          Determines the largest segment (measured by document count) that may be merged with other segments.
static String MERGE_FACTOR
          Determines how often segment indices are merged by addDocument().
static String RAM_BUFFER_SIZE
          Determines the amount of RAM that may be used for buffering added documents before they are flushed as a new Segment.
static String TERM_INDEX_INTERVAL
          Expert: Set the interval between indexed terms.
static String USE_COMPOUND_FILE
          Setting to turn on usage of a compound file.
static String USE_CONCURRENT_OPERATIONS
          Should concurrent operations be performed during a transaction against the search engine index store.
static String WAIT_FOR_CACHE_INVALIDATION_ON_INDEX_OPERATION
          Defaults to false.
 
Constructor Summary
LuceneEnvironment.SearchEngineIndex()
           
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_MERGE_DOCS

public static final String MAX_MERGE_DOCS

Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

The default value is Integer.MAX_VALUE.

See Also:
Constant Field Values

MERGE_FACTOR

public static final String MERGE_FACTOR
Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

Defaults to 10.

See Also:
Constant Field Values

MAX_BUFFERED_DOCS

public static final String MAX_BUFFERED_DOCS
Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally gives faster indexing.

When this is set, the writer will flush every maxBufferedDocs added documents. Pass in IndexWriter.DISABLE_AUTO_FLUSH to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.

Disabled by default (writer flushes by RAM usage).

See Also:
Constant Field Values

MAX_BUFFERED_DELETED_TERMS

public static final String MAX_BUFFERED_DELETED_TERMS

Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If there are documents buffered in memory at the time, they are merged and a new segment is created.

Disabled by default (writer flushes by RAM usage).

See Also:
Constant Field Values

TERM_INDEX_INTERVAL

public static final String TERM_INDEX_INTERVAL
Expert: Set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms. This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost. In particular, numUniqueTerms/interval terms are read into memory by an IndexReader, and, on average, interval/2 terms must be scanned for each random term access.

See Also:
IndexWriter.DEFAULT_TERM_INDEX_INTERVAL, Constant Field Values

RAM_BUFFER_SIZE

public static final String RAM_BUFFER_SIZE
Determines the amount of RAM that may be used for buffering added documents before they are flushed as a new Segment. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.

When this is set, the writer will flush whenever buffered documents use this much RAM. Pass in IndexWriter.DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.

The default value is IndexWriter.DEFAULT_RAM_BUFFER_SIZE_MB.

See Also:
Constant Field Values

USE_COMPOUND_FILE

public static final String USE_COMPOUND_FILE
Setting to turn on usage of a compound file. When on, multiple files for each segment are merged into a single file once the segment creation is finished. This is done regardless of what directory is in use.

Default value id true

See Also:
Constant Field Values

USE_CONCURRENT_OPERATIONS

public static final String USE_CONCURRENT_OPERATIONS
Should concurrent operations be performed during a transaction against the search engine index store. Defualts to true.

See Also:
Constant Field Values

MAX_FIELD_LENGTH

public static final String MAX_FIELD_LENGTH
The maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory.

Note that this effectively truncates large documents, excluding from the index terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field.

See Also:
Constant Field Values

CACHE_INTERVAL_INVALIDATION

public static final String CACHE_INTERVAL_INVALIDATION
Sets how often (in milliseconds) the index manager will check if the index cache needs to be invalidated. Defaults to 5000. Setting it to 0 means that the cache will check if it needs to be invalidated all the time. Setting it to -1 means that the cache will never check if it needs to be invalidated, note, that it is perfectly fine if a single instance is manipulating the index. It works, since the cache is invalidated when a transaction is committed and a dirty operation has occured.

See Also:
Constant Field Values

DEFAULT_CACHE_INTERVAL_INVALIDATION

public static final long DEFAULT_CACHE_INTERVAL_INVALIDATION
The default cache interval invalidation.

See Also:
CACHE_INTERVAL_INVALIDATION, Constant Field Values

INDEX_MANAGER_SCHEDULE_INTERVAL

public static final String INDEX_MANAGER_SCHEDULE_INTERVAL
The index manager schedule interval (in seconds) where different actions related to index manager will happen (such as global cache interval checks. If set to -1, not scheduling will happen.

See Also:
Constant Field Values

WAIT_FOR_CACHE_INVALIDATION_ON_INDEX_OPERATION

public static final String WAIT_FOR_CACHE_INVALIDATION_ON_INDEX_OPERATION
Defaults to false. If set to true, will cause index manager operation (including replace index) to wait for all other Compass instances to invalidate their cache. The wait time will be the same as the INDEX_MANAGER_SCHEDULE_INTERVAL.

See Also:
Constant Field Values
Constructor Detail

LuceneEnvironment.SearchEngineIndex

public LuceneEnvironment.SearchEngineIndex()


Copyright (c) 2004-2008 The Compass Project.