[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Full text search?
- From: PA <petite.abeille@...>
- Date: Mon, 28 Feb 2005 14:40:10 +0100
On Feb 08, 2005, at 07:49, Steve Donovan wrote:
That's what a pure Lua implementation would do; the index
could be plain text, containing the words as indices (or
hashes). That would certainly be fast enough for
most things - it's a question of scaling and whether
one can afford the memory etc.
Just as a follow-up, here is my first draft implementation of a
diminutive text search in Lua:
http://dev.alt.textdrive.com/file/LUPad/LUPIndex.lua
In a nutshell, the indices are stored in gdbm. The key being a document
id and the value a sample of the text.
When indexed, the text is broken down along non-alphanumeric
boundaries, each token added to a counted set. Finally, a string
representation of the set is stored with the more frequent token first.
During search, each value is evaluated with a simple find(). The
resulting ids are ranked according to the index of the value.
Very brain dead, but kind of work :)
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/