Balaji Varanasi

Sadly Singleton

September 18th, 2008 Balaji Varanasi No comments

Here is some code I came across today:

      public class SingletonWannabe
      { 
        private static String SERVER_URL;
        private static int port;
        private static String protocol;
 
       public SingletonWannabe(String url, int port, String protocol) {
          this.SERVER_URL = url;
          this.port = port;
          this.protocol = protocol;
       }

       public static String getServerUrl() {
         return SERVER_URL;
       }
 
      public static int getPort() {
        return port;
      }

      public static String getProtocol() {
        return protocol;
     }
}

This was being used as a singleton in an application 🙁

Categories: Bad Code Tags:

Getting started with Lucene – Part 2

September 11th, 2008 Balaji Varanasi No comments

In this post I will highlight some of Lucene’s search functionality. Refer to part one of this series for creating indexes using Lucene.

Searching in Lucene involves submitting a search query to the IndexSearcher class. The IndexSearcher executes this query against an index and returns search results (hits). Here is a prototype implementation:

public Hits searchIndex(Query q) throws Exception
  {
    IndexSearcher searcher = new IndexSearcher("c:/lucene/index");
    return searcher.search(q);
  }

The IndexSearcher constructor takes the path to the index it needs to search against. The IndexSearcher class is thread safe and Lucene API recommends opening and using one IndexSearcher for all searches.

The Query class is an abstract class that encapsulates a user input. The simplest way to generate a concrete query is to use the QueryParser class. The following code generates a query for all the employees whose first name is Judy:

QueryParser parser = new QueryParser("firstName", new SimpleAnalyzer());
Query query = parser.parse("Judy");
Hits hits = searchIndex(query);

The first parameter to the QueryParser is the field name in the document against which the query is being made. For better results, the analyzer passed as the second parameter should be of the same type that is used while creating indexes.

The Hits class encapsulates search results. Hits can be easily iterated over to get to the “interesting” stuff:

for(int i = 0; i < hits.length(); i++)
    {
      Document d = hits.doc(i);
      System.out.println(d.getField("firstName").stringValue());
    }

QueryParser does a good job at interpreting user entered search expressions. If developers find limitations using QueryParser, Lucene provides a nice API to programmatically generate and combine queries. Let’s say a user wants to find all the Employees with first name Judy and last name Test:

Query fnQuery = new TermQuery(new Term("firstName", "Judy"));
    Query lnQuery = new TermQuery(new Term("lastName", "Test"));
    BooleanQuery query = new BooleanQuery();
    query.add(fnQuery, BooleanClause.Occur.MUST);
    query.add(lnQuery, BooleanClause.Occur.MUST);
    // Notice we are not analyzing user entered input before executing search
    Hits hits = searchIndex(query);

By default the returned search results are ordered by decreasing relevance. This however can be easily changed using overloaded search methods in IndexSearcher. The following code sorts the results of the above query on first name field:

Sort sort = new Sort("firstName");
Hits sortedHits = indexSearcher.search(query, sort);

An important thing to remember is that fields used for sorting must not be tokenized. Otherwise you will run into this exception: “there are more terms than documents in field “XXXXXX”, but it’s impossible to sort on tokenized fields”.

Categories: Getting Started Tags: Lucene, Search

Changing Maven4MyEclipse Web Project Directory Structure

September 11th, 2008 Balaji Varanasi No comments

When you create a MyEclipse Web Project with Maven capabilities, the generated directory structure does not match the “standard” Maven Web Project structure. I find this little annoying and here is what I did to change the directory structure:

Under “src” folder, create two folders main and test. Underneath each folder create two folders java and resources
Go to project properties and under Java Build Path, first remove “src” folder from being a source folder. Make java and resources folders source folders
Create webapp folder underneath src/main. Create WEB-INF and classes folders under webapp. Move the web.xml under WebRoot/WEB-INF to webapp/WEB-INF folder
In the .mymetadata file located in the project root folder (use Navigator view to get to the file in MyEclipse), change the attribute webrootdir’s value to /src/main/webapp
In the .classpath file, change the classpathentry of kind “output” from “WebRoot/WEB-INF/classes” to “src/main/webapp/WEB-INF/classes”
Delete the WebRoot directory and restart MyEclipse

The original pom.xml generated as part of the Maven4MyEclipse Web Project has several entries (sourcedirectory, resource directory e.t.c.) to reflect MyEclipse Web project directory structure. These entries can be safely deleted. Once this is done, the new project can be used to hot-deploy the war file. And yes, dependencies declared as “test” will not end up in the lib directory of the hot-deployed war file.

Categories: Maven, Maven4MyEclipse, MyEclipse, Solutions Log Tags: Maven4MyEclipse

Getting started with Lucene – Part 1

September 8th, 2008 Balaji Varanasi No comments

Apache Lucene is a popular open source text search engine that can be easily embedded in applications needing search functionality. Lucene is not a full blown, out of box web site search engine or crawler. Instead as you will soon see, Lucene exposes a small API to create and search indexes. In this first part of the series, I will show how Lucene can be used to create indexes.

Before any searches can be performed on large amounts of data, it is essential to convert the data into a easy to lookup format. This conversion process is called Indexing (much like a book index). Indexes created by Lucene contain a collection of documents and are usually stored as a list of files on the file system. A Lucene document itself is a sequence of name-value pairs called fields. The strings in a field are referred as terms.

Let say we are writing an employee search application that allows employees lookup each others information. The first step in the process is indexing the employee information. For the sake of simplicity, let’s assume that the employee information is available as a list of Employee objects. Here is the prototype method for creating a Lucene index (using the 2.3.2 version of Lucene API):

public void createIndex() throws Exception
  {
    // Create a writer
    IndexWriter writer = new IndexWriter("c:/lucene/index/", new SimpleAnalyzer());

    // Add documents to the index
    addDocuments(writer);

    // Lucene recommends calling optimize upon completion of indexing
    writer.optimize();
    // clean up
    writer.close();
  }

IndexWriter is the heart to Lucene indexing. It creates a new index and exposes API to add documents to the index. The first parameter to the constructor is the file system path where Lucene needs to store the index files. Before Lucene can index text, the text needs to be broken down in to tokens which is done via an Analyzer. Lucene out of box provides a variety of analyzers such as SimpleAnalyzer, StandardAnalyzer, StopAnalyzer etc. An anlyzer is specified as the second parameter to the writer constructor.

The next step in the process is adding the documents to the index. Here is a prototype implementation:

  public void addDocuments(IndexWriter writer) throws Exception
  {
    for(Employee e : employeeList)
    {
      // Create a document
      Document document = new Document();
      // Add fields to the document
      document.add(new Field("firstName", e.getFirstName(), Field.Store.YES, Field.Index.TOKENIZED));
      document.add(new Field("lastName", e.getLastName(), Field.Store.YES, Field.Index.TOKENIZED));
      document.add(new Field("phoneNumber", e.getPhoneNumber(), Field.Store.YES, Field.Index.UN_TOKENIZED));
    }
  }

In the above code, the first two parameters to the Field specify the field name, field value and the last two parameters provide metadata on how the field needs to be stored and indexed. When storing a field we have three options:
          Field.Store.YES – Original value is stored in the index
          Field.Store.COMPRESS – Original value is stored in the index in a compressed form
          Field.Store.NO – Field value is not stored in the index

Similarly, when indexing, we have couple options:
          Field.Index.NO – Field value is not indexed (useful for data like primary keys)
          Field.Index.TOKENIZED – Field value is analyzed and indexed (commonly used option)
          Field.Index.UN_TOKENIZED – Field value is not analyzed but indexed (useful for indexing “keywords” or data such as phone numbers)

Putting it all together:

import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;

public class EmployeeIndexer {

  // Path to the index directory
  private static final String INDEX_DIRECTORY = "c:/lucene/index";

  private List<Employee> employeeList = new ArrayList<Employee>();

  public EmployeeIndexer() {
    employeeList.add(new Employee("Jane", "Doe", "123-456-8910"));
    employeeList.add(new Employee("John", "Smith", "123-456-8910"));
    employeeList.add(new Employee("Mike", "Test", "123-456-8910"));
    employeeList.add(new Employee("Judy", "Test", "123-456-8910"));
  }

  public void createIndex() throws Exception {
    // Create a writer
    IndexWriter writer = new IndexWriter(INDEX_DIRECTORY, new SimpleAnalyzer());

    // Add documents to the index
    addDocuments(writer);

    // Lucene recommends calling optimize upon completion of indexing
    writer.optimize();
    // clean up
    writer.close();
  }

  public void addDocuments(IndexWriter writer) throws Exception {
    for(Employee e : employeeList) {
      // Create a document
      Document document = new Document();
      // Add fields to the document
      document.add(new Field("firstName", e.getFirstName(), Field.Store.YES, Field.Index.TOKENIZED));
      document.add(new Field("lastName", e.getLastName(), Field.Store.YES, Field.Index.TOKENIZED));
      document.add(new Field("phoneNumber", e.getPhoneNumber(), Field.Store.YES, Field.Index.UN_TOKENIZED));
    }
  }

  public class Employee {
    private String firstName;
    private String lastName;
    private String phoneNumber;

    public Employee(String firstName, String lastName, String phoneNumber) {
      this.firstName = firstName;
      this.lastName = lastName;
      this.phoneNumber = phoneNumber;
    }

    public String getFirstName() {
      return firstName;
    }
    public void setFirstName(String firstName) {
      this.firstName = firstName;
    }
    public String getLastName() {
      return lastName;
    }
    public void setLastName(String lastName) {
      this.lastName = lastName;
    }
    public String getPhoneNumber() {
      return phoneNumber;
    }
    public void setPhoneNumber(String phoneNumber) {
      this.phoneNumber = phoneNumber;
    }
  }
}

Categories: Getting Started Tags: Indexing, Lucene

Running database tests faster using TestNG

September 1st, 2008 Balaji Varanasi No comments

In a recent project, I was doing some integration testing against database using DBUnit and JUnit. I was dealing with large datasets and as the tests grew, testing became painfully slow. Culprit: I had JUnit update the database with data for every test.

The data access layer for this project had lots of complex queries but the methods I was testing had two distinct behaviors: methods that read data from the database and methods that wrote data to the database. This simple observation made me realize that I can cut down the testing time by
– Grouping tests in to “read” and “write” groups
– Refreshing database and run ALL the tests in the “read” group (even better run them parallely)
– Refreshing database before running each and every test in the “write” group

Since JUnit does not provide a way to implement the above idea, I tried TestNG. Here is a simple java code of what I ended up doing:

public class RepositoryImplTest { private void setupDataBase() { // DBUnit code to refresh data } @BeforeClass(groups={"database.read"}, alwaysRun=false) public void setupForRead() { setupDataBase(); } @BeforeMethod(groups={"database.write"}, alwaysRun=false) public void setupForWrite() { setupDataBase(); } @Test(groups="database.read") public void findAllXXX() { } @Test(groups="database.read") public void findByXXX() { } @Test(groups="database.write") public void updateXXX() { } @Test(groups="database.write") public void removeXXX() { } @Test(groups="database.write") public void createXXX() { } }

Here is the sample testng.xml file for the above code:

Categories: Solutions Log Tags: TestNG

Maven4MyEclipse Bug

August 27th, 2008 Balaji Varanasi No comments

I have not been quite impressed with MyEclipse’s Maven4MyEclipse plugin. The most disappointing thing about this is that after checking out an existing project from source control, I cannot add Maven “capabilities” using the context menu (which could be easily done with m2eclipse).

For the last couple days, I am running into this wierd bug with the “Java Maven Wizard”: When I try to check out an existing project from SVN using: “Find/Check Out As” -> “Check Out as a project configured using the New Project Wizard” -> “Java Maven Project” into a location other than the default location provided by the wizard, the wizard still checks out the project into the default location.

Hopefully I will have some solution in the Maven4MyEclipse forum.

Update: Looks like the same issue surfaces for newly created Java Maven projects also.

Categories: Maven, MyEclipse Tags:

WebLogic anonymous user permissioning

August 26th, 2008 Balaji Varanasi No comments

Problem: Accessing MBeans in WebLogic versions 8.1 SP5 and after results in
javax.naming.NoPermissionException: User <anonymous> does not have permission on weblogic.management.home to perform lookup operation.

Solution: A quick fix to this problem is to enable anonymous admin lookup in WebLogic server. In WebLogic 10, this option is available under Security tab of the domain.

Categories: Solutions Log Tags:

Newer Entries Older Entries

Random Thoughts

Archive

Sadly Singleton

Getting started with Lucene – Part 2

Changing Maven4MyEclipse Web Project Directory Structure

Getting started with Lucene – Part 1

Running database tests faster using TestNG

Maven4MyEclipse Bug

WebLogic anonymous user permissioning

My Book

Preparing for a JEE interview?

Random Posts

Categories

Archives

Cluster Maps