Re: Most efficient way to recognize a line by its first (up to) three characters

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Most efficient way to recognize a line by its first (up to) three characters
From: Sean Conner <sean@...>
Date: Tue, 11 Mar 2014 14:15:40 -0400

It was thus said that the Great Shmuel Zeigerman once stated:
> On 11/03/2014 19:25, meino.cramer@gmx.de wrote:
> >
> >Hi,
> >
> >(using Lua 5.2.2 under Linux)
> >
> >from a serial line I receive A LOT of lines
> >consisting of data in ASCII format.
> >
> >The first one to three characters decide, how
> >I have to interpret the data.
> >Since data are transmitted FAST I want to make
> >the recognition code which branches into the
> >different data processing parts as efficient
> >as possible. The code will run on a embedded
> >system, so I will not have the memory for huge
> >look-up table or such...
> >And I will do it in pure Lua since this is the
> >reason why I do this in pure Lua! ;)
> >
> >What is the "best" solution?
> >('best' in "" since this also a product of the
> >point of view ;)
> >
> 
> For simplicity sake, let it always be 3 characters to decide.
> Define as many 3-character identifiers as you need.
> Then it is a simple if-else:
> 
>     local prefix, info = line:sub(1,3), line:sub(4)
>     if prefix == "abc" then
>       -- do something
>     elseif prefix == "123" then
>       -- do something else
>     elseif ...
> 
> (Strings equality test is very cheap in Lua.)

  The "best" result depends upon both the number of possible choices, and
the frequency of those choices.  With three letters (ignoring case), there
are 17,576 possible choices.  If the three bytes can be arbitrary values,
then you are talking about 16,777,216 possible values.

  Now, ignoring the implementation language for a moment---if the number of
actual 3-byte values is limited to say, 8 or less, then a series of if-else
statements would probably be best.  As the number increases, then the
fastest method would be either a hash table or sorted array (and use binary
search).  The hash table has an amortized cost of O(1) but at the expense of
memory; the sorted array with binary search has the minimal memory usage and
a fixed cost of O(ln) (which isn't that bad actually---a million choices is
only 20 comparisons).  There are perfect hash tables that can minimize the
memory usage, but they tend to be only good for a small set of keys and from
what I can see, still use more memory than a sorted array.

  Now, if you do have a lot of patterns, and say, two or three are most
likely (say, over 30% will be one particlar pattern) then you might get
better performance checking explicitely for that pattern (using if) and in
the else case, use a hash for the other patterns (but this is something to
bench mark).

  With respect to Lua, if you have, say, 8 or more starting patterns, then I
would start with:

do_table = 
{
  ['pattern1'] = function() ... end,
  ['pattern2'] = function() ... end,
  ...
}

then your main loop would look something like:

	while true do
	  local key = serialport:read(3)
	  do_table[key]()
	end

  Another benefit:  new patters are easy to add.

  -spc

References:
- Most efficient way to recognize a line by its first (up to) three characters, meino . cramer
- Re: Most efficient way to recognize a line by its first (up to) three characters, Shmuel Zeigerman

Prev by Date: Re: Most efficient way to recognize a line by its first (up to) three characters
Next by Date: Re: Most efficient way to recognize a line by its first (up to) three characters
Previous by thread: Re: Most efficient way to recognize a line by its first (up to) three characters
Next by thread: Re: Most efficient way to recognize a line by its first (up to) three characters
Index(es):
- Date
- Thread