http://blog.case.edu/bmb12/2006/03/more_on_merqueryA bunch of traffic has been directed to this blog due to the post about Merquery. Seems like there has been discussion going on in that posts's comments and also over at Jacob Kaplan-Moss' post.As predicted, a lot of people are only seeing the "reinvention" aspect of Merquery. And admittedly, there is a lot that would need to be reinvented based on the goals I wrote down.The real novel part about Merquery is that it's easy to drop into a Python web application. Imagine if you're going through a TurboGears tutorial and all you have to do is add one line to add full-text indexing and search to your database tables. Cool!So here's the reformulated plan. Write adapters for the nice-looking Python indexing engines mentioned so far, such as PyLucene, Hype, and Xapwrap. Make using any of them look the same (so they're easy to swap in and out), and make it a one-liner for the most basic indexing setup desirable. Then, add a pure-Python indexer to the package as a side project, for those people who don't want dependencies. (All three of those existing libraries mentioned above still require the library they wrap to be installed.)Unlike the current interfaces for those indexing libraries, these adapters don't have to be completely general (yet). If they only provide adapters for SQLObject classes and the Django database API, that's already a great accomplishment, even though these adapters are less flexible than the generic interfaces already provided. This will allow Django and TurboGears developers to stick with what they know rather than worry about getting an indexer working with their underlying database. (Hey, we've got to start somewhere, might as well have mass appeal right from the get-go.)Here's an idea of what some customization of a developer's search engine might look like:class Person(SQLObject): firstName = StringCol(notNone=True) lastName = StringCol(notNone=True)nameSearch = Merquery.LuceneIndex(first=Person.firstName, last=Person.lastName)(I have no idea why that space is there. Sorry for my blog being so ugly.)In this example, the developer has customized the index by giving Person.firstName strings the field name 'first' and Person.lastName strings the field name 'last'. So to find people with 'Beck' in their name but not 'Brian Beck', this would work:beck -first:brianDevelopers could just pass query strings like the above directly from their forms to the index:results = nameSearch.query("beck -first:brian")Since LuceneIndex knows we passed in SQLObject columns, it will know to return results as a ranked list of SQLObject instances.results[0].firstName, results[0].lastNameObviously this example might not be very realistic since firstName and lastName are just strings and we could accomplish this with SQL. But the same ideas apply for fields storing big documents, etc., where things like term frequency and proximity become important.Thoughts?Update: I made a Merquery Google Group so discussion can now happen in a centralized place. I was also kind enough to make the first typo on there. |