本站首页    管理页面    写新日志    退出


«September 2025»
123456
78910111213
14151617181920
21222324252627
282930


公告
 本博客在此声明所有文章均为转摘,只做资料收集使用。

我的分类(专题)

日志更新

最新评论

留言板

链接

Blog信息
blog名称:
日志总数:1304
评论数量:2242
留言数量:5
访问次数:7621946
建立时间:2006年5月29日




[Django]More on Merquery
软件技术

lhwork 发表于 2007/2/1 11:50:24

http://blog.case.edu/bmb12/2006/03/more_on_merqueryA bunch of traffic has been directed to this blog due to the post about Merquery. Seems like there has been discussion going on in that posts's comments and also over at Jacob Kaplan-Moss' post.As predicted, a lot of people are only seeing the "reinvention" aspect of Merquery. And admittedly, there is a lot that would need to be reinvented based on the goals I wrote down.The real novel part about Merquery is that it's easy to drop into a Python web application. Imagine if you're going through a TurboGears tutorial and all you have to do is add one line to add full-text indexing and search to your database tables. Cool!So here's the reformulated plan. Write adapters for the nice-looking Python indexing engines mentioned so far, such as PyLucene, Hype, and Xapwrap. Make using any of them look the same (so they're easy to swap in and out), and make it a one-liner for the most basic indexing setup desirable. Then, add a pure-Python indexer to the package as a side project, for those people who don't want dependencies. (All three of those existing libraries mentioned above still require the library they wrap to be installed.)Unlike the current interfaces for those indexing libraries, these adapters don't have to be completely general (yet). If they only provide adapters for SQLObject classes and the Django database API, that's already a great accomplishment, even though these adapters are less flexible than the generic interfaces already provided. This will allow Django and TurboGears developers to stick with what they know rather than worry about getting an indexer working with their underlying database. (Hey, we've got to start somewhere, might as well have mass appeal right from the get-go.)Here's an idea of what some customization of a developer's search engine might look like:class Person(SQLObject):   firstName = StringCol(notNone=True)   lastName = StringCol(notNone=True)nameSearch = Merquery.LuceneIndex(first=Person.firstName,                                 last=Person.lastName)(I have no idea why that space is there. Sorry for my blog being so ugly.)In this example, the developer has customized the index by giving Person.firstName strings the field name 'first' and Person.lastName strings the field name 'last'. So to find people with 'Beck' in their name but not 'Brian Beck', this would work:beck -first:brianDevelopers could just pass query strings like the above directly from their forms to the index:results = nameSearch.query("beck -first:brian")Since LuceneIndex knows we passed in SQLObject columns, it will know to return results as a ranked list of SQLObject instances.results[0].firstName, results[0].lastNameObviously this example might not be very realistic since firstName and lastName are just strings and we could accomplish this with SQL. But the same ideas apply for fields storing big documents, etc., where things like term frequency and proximity become important.Thoughts?Update: I made a Merquery Google Group so discussion can now happen in a centralized place. I was also kind enough to make the first typo on there.


阅读全文(1780) | 回复(0) | 编辑 | 精华
 



发表评论:
昵称:
密码:
主页:
标题:
验证码:  (不区分大小写,请仔细填写,输错需重写评论内容!)



站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.063 second(s), page refreshed 144787015 times.
《全国人大常委会关于维护互联网安全的决定》  《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号