Chapter 1. Introduction

In todays applications, search is becoming a "must have" requirement. Users expect applications (rich clients, web based, sever side, ...) to provide snappy and relevant search results the same way Google does for the web. Let it be a recipe management software, a trading application, or a content management driven web site, users expect search results across the whole app business domain model.

Java developers on the other hand, need to implement it. As Java developers are getting used to simplified development model, with Hibernate, Spring Framework, and EJB3 to name a few, up until now there was a lack in a simple to use Java Search Engine solution. Compass aim is to fill this gap.

Many applications, once starting to use a search engine in order to implement that illusive search box, find that the search engine can then be used for many data extraction related operations. Once a search engine holds a valid representation of the application business model, many times it just makes sense to execute simple queries against it instead of going to the actual data store (usually a database). Two prime examples are Jira and Confluence, which perform many of the reporting and search (naturally) operations using a search engine instead of the usual database operations.

1.1. Overview

Compass provides a breadth of features geared towards integrating search engine functionality. The next diagram shows the different Compass modules, followed by a short description of each one.

Overview of Compass

Compass Core is the most fundamental part of Compass. It holds Lucene extensions for transactional index, search engine abstraction, ORM like API, transaction management integration, different mappings technologies (OSEM, XSEM and RSEM), and more. The aim of Compass core is to be usable within different scenarios and environments, and simplify the core operations done with a search engine.

Compass Gps aim is to integrate with different content sources. The prime feature is the integration with different ORM frameworks (Hibernate, JPA, JDO, OJB), allowing for almost transparent integration between a search engine and an ORM view of content that resides in a database. Other features include a Jdbc integration, which allows to index database content using configurable SQL expression responsible for extracting the content.

Compass Spring integrate Compass with the Spring Framework. Spring, being an easy to use application framework, provides a simpler development model (based on dependency injection and much more). Compass integrates with Spring in the same manner ORM Frameworks integration is done within the Spring Framework code-base. It also integrates with Spring transaction abstraction layer, AOP support, and MVC library.

1.2. I use ...

The following sections are aimed to be a brief introduction and a navigation map for people who are familiar or use this different technologies:

1.2.1. ... Lucene

Search Engine Abstraction

Compass created a search engine abstraction, with its main (and only) implementation using Lucene. Lucene is an amazing, fast, and stable search engine (or IR library), yet the main problem with integrating Lucene with our application is its low-level usage and API.

For people who use or know Lucene, it is important to explain new terms that are introduced by Compass. Resource is Compass abstraction on top of a Lucene Document, and Property is Compass abstraction on top of Lucene Field. Both do not add much on top of the actual Lucene implementations, except for Resource, which is associated with an Alias. For more information, please read Chapter 5, Search Engine.

RSEM - Resource/Search Engine Mapping

Resource is the lowest level data object used in Compass, with all different mapping technologies are geared towards generating it. Compass comes with a low level mapping technology called RSEM (Resource/Search Engine Mapping), which allows to declaratively define resource mapping definitions. RSEM can be used when an existing system already uses Lucene (upgrade to Compass should be minimal), or when an application does not have a rich domain model (Object or XML).

An additional feature built on top of Compass converter framework, is that a Property value does not have to be a String (as in Lucene Field). Objects can be used as values, with specific or default converters applied to them. For more information, please read Chapter 9, RSEM - Resource/Search Engine Mapping.

Simple API

Compass exposes a very simple API. If you have experience with an ORM tool (Hibernate, JPA, ...), you should feel very comfortable with Compass API. Also, Lucene has three main classes, IndexReader, Searcher and IndexWriter. It is difficult, especially for developers unfamiliar with Lucene, to understand how to perform operations against the index (while still having a performant system). Compass has a single interface, with all operations available through it. Compass also abstract the user from the gory details of opening and closing readers/searchers/writers, as well as caching and invalidating them. For more information, please read Chapter 2, Introduction, and Chapter 12, Working with objects.

Transactional Index and Integration

Lucene is not transactional. This causes problems when trying to integrate Lucene with other transactional resources (like database or messaging). Compass provides support for two phase commits transactions (read_committed and serializable), implemented on top of Lucene index segmentations. The implementation provides fast commits (faster than Lucene), though they do require the concept of Optimizers that will keep the index at bay. For more information, please read Section 5.7, “Transaction”, and Section 5.10, “Optimizers”.

On top of providing support for a transactional index, Compass provides integration with different transaction managers (like JTA), and provides a local one. For more information, please read Chapter 11, Transaction.

Fast Updates

In Lucene, in order to perform an update, you must first delete the old Document and then create a new Document. This is not trivial, especially because of the usage of two different interfaces to perform the delete (IndexReader) and create (IndexWriter) operations, it is also very delicate in terms of performance. Thanks to Compass support for transactional index, and the fact that each saved Resource in Compass must be identifiable (through the use of mapping definition), makes executing an update using Compass both simple (the operation is called save), and fast.

All Support

When working with Lucene, there is no way to search on all the fields stored in a Document. One must programmatically create synthetic fields that aggregate all the other fields in order to provide an "all" field, as well as providing it when querying the index. Compass does it all for you, by default Compass creates that "all" field and it acts as the default search field. Of course, in the spirit of being as configurable as possible, the "all" property can be enabled or disabled, have a different name, or not act as the default search property. One can also exclude certain mappings from participating in the all property.

Index Fragmentation

When building a Lucene enabled application, sometimes (for performance reasons) the index actually consists of several indexes. Compass will automatically fragment the index into several sub indexes using a configurable sub index hashing function, allowing to hash different searchable objects (Resource, mapped object, or an XmlObject) into a sub index (or several of them). For more information, please read Section 5.6, “Index Structure”.

1.2.2. ... Domain Model

One of Compass main features is OSEM (Object/Search Engine Mapping). Using either annotations or xml definitions (or a combination), mapping definitions from a rich domain model into a search engine can be defined. For more information, please read Chapter 6, OSEM - Object/Search Engine Mapping.

1.2.3. ... Xml Model

One of Compass main features is XSEM (Xml/Search Engine Mapping). If your application is built around Xml data, you can map it directly to the search engine using simple xml based mapping definitions based on xpath expressions. For more information, please read Chapter 7, XSEM - Xml to Search Engine Mapping.

1.2.4. ... No Model

If no specific domain model is defined for the application (for example, in a messaging system based on properties), RSEM (Resource/Search Engine Mapping) can be used. A Resource can be considered as a fancy hash map, allowing for completely open data that can be saved in Compass. A resource mapping definition needs to be defined for "types" of resources, with at least one resource id definition (a resource must be identifiable). Additional resource properties mapping can be defined, with declarative definition of its characteristics (search engine, converter, ...). For more information, please read Chapter 9, RSEM - Resource/Search Engine Mapping.

1.2.5. ... ORM Framework

Built on top of Compass Core, Compass Gps (which is aimed at integrating Compass with other datasources) integrates with most popular ORM frameworks. The integration consists of two main features:

Index Operation

Automatically index data from the database using the ORM framework into the search engine using Compass (and OSEM). Objects that have both OSEM and ORM definitions will be indexed, with the ability to provide custom filters.

Mirror Operation

For ORM frameworks that support event registration (most do), Compass will automatically register its own event listeners to reflect any changes made to the database using the ORM API into the search engine.

For more information, please read Chapter 15, Introduction. Some of the ORM frameworks supports are: Chapter 17, Embedded Hibernate, Chapter 19, JPA (Java Persistence API), Chapter 20, Embedded OpenJPA, and Chapter 23, iBatis.

1.2.6. ... Spring Framework

The aim of Compass::Spring module is to provide seamless integration with the Spring Framework (as if a Spring developer wrote it :)).

First level of integration is very similar to Spring provided ORM integration, with a LocalCompassBean which allows to configure Compass within a Spring context, and a CompassDaoSupport class. For more information, please read Chapter 24, Introduction and Chapter 25, DAO Support.

Spring AOP integration, providing simple advices which helps to mirror data changes done within a Spring powered application. For applications with a data source or a tool with no Gps device that works with it (or it does not have mirroring capabilities - like iBatis), the mirror advices can make synchronizing changes made to the data source and Compass index simpler. For more information, please read Chapter 29, Spring AOP.

Spring PlatformTransactionManager abstraction integration, using its SpringSyncTransactionFactory to register synchronization with Spring on going transaction. This allows Compass to work in environments where Spring specific transactions managers are used, like HibernateTransactionManager. For more information, please read Chapter 26, Spring Transaction.

For web applications that use Spring MVC, Compass provides a search and index controllers. The index controller can automatically perform the index operation on a CompassGps, only the initiator view and result view need to be written. The search controller can automatically perform the search operation (With pagination), requiring only the search initiator and search results view (usually the same one). For more information, please read Chapter 30, Spring MVC Support.

Last, LocalCompassBean can be configured using Spring 2 new schema based configuration.