Archive

Posts Tagged ‘Search’

Getting started with Lucene – Part 2

September 11th, 2008 No comments

In this post I will highlight some of Lucene’s search functionality. Refer to part one of this series for creating indexes using Lucene.

Searching in Lucene involves submitting a search query to the IndexSearcher class. The IndexSearcher executes this query against an index and returns search results (hits). Here is a prototype implementation:

public Hits searchIndex(Query qthrows Exception
  {
    IndexSearcher searcher = new IndexSearcher("c:/lucene/index");
    return searcher.search(q);
  }

The IndexSearcher constructor takes the path to the index it needs to search against. The IndexSearcher class is thread safe and Lucene API recommends opening and using one IndexSearcher for all searches.

The Query class is an abstract class that encapsulates a user input. The simplest way to generate a concrete query is to use the QueryParser class. The following code generates a query for all the employees whose first name is Judy:

QueryParser parser = new QueryParser("firstName"new SimpleAnalyzer());
    Query query = parser.parse("Judy");
    Hits hits = searchIndex(query);

The first parameter to the QueryParser is the field name in the document against which the query is being made. For better results, the analyzer passed as the second parameter should be of the same type that is used while creating indexes.

The Hits class encapsulates search results. Hits can be easily iterated over to get to the “interesting” stuff:

for(int i = 0; i < hits.length(); i++)
    {
      Document d = hits.doc(i);
      System.out.println(d.getField("firstName").stringValue());
    }

QueryParser does a good job at interpreting user entered search expressions. If developers find limitations using QueryParser, Lucene provides a nice API to programmatically generate and combine queries. Let’s say a user wants to find all the Employees with first name Judy and last name Test:

Query fnQuery = new TermQuery(new Term("firstName""Judy"));
    Query lnQuery = new TermQuery(new Term("lastName""Test"));
    BooleanQuery query = new BooleanQuery();
    query.add(fnQuery, BooleanClause.Occur.MUST);
    query.add(lnQuery, BooleanClause.Occur.MUST);
    // Notice we are not analyzing user entered input before executing search
    Hits hits = searchIndex(query);

By default the returned search results are ordered by decreasing relevance. This however can be easily changed using overloaded search methods in IndexSearcher. The following code sorts the results of the above query on first name field:

Sort sort = new Sort("firstName");
    Hits sortedHits = indexSearcher.search(query, sort);

An important thing to remember is that fields used for sorting must not be tokenized. Otherwise you will run into this exception: “there are more terms than documents in field “XXXXXX”, but it’s impossible to sort on tokenized fields”.

Categories: Getting Started Tags: ,