Computerworld - Twitter, Facebook, the Library of Congress — all of these institutions have mind-numbing amounts of structured and unstructured data that must be indexed and searched quickly. In Twitter’s case, that’s about 300 million new pieces of information to index every day.
So it’s not surprising that such institutions would venture into the seemingly untamed world of open-source search applications, not just for the cost savings, but also for the ability to customize and modify applications quickly. Plus, open source has an active community that can help solve related problems.
But what about other enterprise users? Some 80% of the information in the typical enterprise is now unstructured, including texts, emails, blogs and videos, and that percentage is rising, according to Gartner. All of this data potentially holds value, and today every website is expected to query and produce relevant results as fast as the best Internet search engines. “People need search technology [in] virtually everything they do today. Everybody thinks search [capability] is going to be embedded in everything,” says Whit Andrews, an analyst at Gartner.
Right now, most organizations have very constrained search capabilities, which are usually based on SQL queries or specific forms or reports. “That paradigm is soon going to break because the amount of data is just too big, and it’s happening much too quickly in a 24/7 environment,” he adds.
Enterprises of all types are starting to explore open-source search applications to get a glimpse into their collections of structured and unstructured data. One such product is Lucene Solr, an open-source search platform developed by Lucid Imagination, a San Mateo, Calif.-based software company.
Interest in open-source search applications began to take off three years ago. “That’s when we saw creation of Lucid Imagination, which formed as a commercial support resource” for open-source software, says Greg Olson, senior director at Olliance Group, an open-source consulting firm and a unit of Black Duck Software. “That’s a good indicator of mainstream demand for services or a solution around a raw technology like Lucene.”
Make no mistake — Lucene is for heavy hitters of search, Andrews says. “Lucene matters for people who need a very sophisticated search offering or product. Its typical [user] is a vendor that needs enormous scale in its application of technologies. It’s a great place to use Lucene — you need to be able to search a bazillion things. Where you don’t see Lucene used is when an intranet needs a search by next Thursday.”
A few other players offer lighter-weight search tools based on the same Lucene open-source technology. For instance, online retailer Zappos.com uses Lucene Solr to power its 63 million customer inquiries each month. But internally, the company deploys open-source search engine Elasticsearch, for “non-website-critical systems or non-performance-bound types of services,” says Aye Thu, search team lead.