A common use-case for Lucene is performing a full-text search on one or more database tables. Although MySQL comes with a full-text search functionality, it quickly breaks down for all but the simplest kind of queries and when there is a need for field boosting, customizing relevance ranking, etc.
The biggest challenge generally is deciding how to represent your data in Lucene. Most probably, this will involve denormalizing your tables into a flat Lucene Document.
The pseudo code will look something like this:
String sql = "select id, firstname, lastname, phone, email from person";
ResultSet rs = stmt.executeQuery(sql);
while (rs.next()) {
Document doc = new Document();
doc.add(new Field("id", rs,getString("id"), Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.add(new Field("firstname", rs,getString("firstname"), Field.Store.YES, Field.Index.TOKENIZED));
// ... repeat for each column in result set
writer.addDocument(doc);
}
When it comes to displaying search results to the user, you have 2 choices:
The approach outlined above assumes loading the entire result set into memory, which could easily be a problem for larger data sets. You might need to do some paging/offset in your sql.
You also want to have a try/catch loop within the resultset while loop to ensure that if an exception is thrown when adding a document, it doesn't kill the entire indexing.
Its often a good idea to concatenate all the fields into a single "contents" field and just search on that, but also leaving individual fields intact so field searches work.
It is usually best to have separate lucene indexes for searches on different types of data, rather than adding them to a single index distinguished by a "type" field.
Here's why:
© Copyright 2024 Kelvin Tan - Lucene, Solr and Elasticsearch consultant