lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I just wrote a simple app to read out data from a CSV-file (although ; separated).. haven't benchmarked it, but it parses a few thousand lines from a bunch of files in "no time" :)

The nitty gritty is in this function:

local function read_lines(f)
	local i = rows.__n
	for line in f:lines() do
		local value = line:gmatch( "([^;]*);?" )
		local empty_row = true
		for c = 1, #fc do
			local v = value()
			if v and v:len() > 0 then
				empty_row = false
				rows[fc[c]][i+1] = v
			end				
		end

		if not empty_row then
			i = i + 1
		end
	end

	rows.__n = i
end

Ps. the fc holds the columns for the current file (parsed out in a different function), and the rows table holds the values for all files _and_ columns, so it's a kind of combo-mombo krunchbin (yadi-yadi).. ;)

//A

askok@dnainternet.net skrev:

Thanks for the background, so I'm kind of comparing apples and oranges?

Anyways, I had done the benchmarking, and up until the last yards Lua was indeed always faster. With use of qr// in Perl, it wasn't.

Which lead me to think, is there anything that could be done. In general terms, I too would prefer Lua over anything, but it also needs to be "faster or the same speed" as competitor. Otherwise, a transition would certainly -and justifyably- be questioned.

The actual issue is CSV parsing (comma separated values). Just simple, "%d+,%d+,[^,]," kind. Ideas for a better approach in Lua then string.match? (making a C module just for cutting out the , fields would be Fast but is not really an alternative, at least not in comparison to other languages)

-asko



On Thu, 9 Nov 2006 23:23:12 +0100
 Karel Tuma <ktuma@email.cz> wrote:
just benchmark it :)
lua of course wins, but not always.

the thing with compiling regexs, they're FSA (finite state automata)
basically tree of states, where nodes can point meshed to other nodes.
FSA compiler is considered non-trivial in terms of implementation size.

but lua patterns are sort of NFA (nonfinite), they're already compiled as they
are. due to that referring back to some state or sub-expressions
is very limited making lua patterns "less powerful" than pcre's, but
enough for the usual daywork and when you need something more,
you've to code the logic all by yourself.
to be exact FSA is slower for "simple" expressions where NFA on the
complex ones, some pcre implementations even use both and choose
between them depending on the expression (!).

hey, this is lua, we want the simple and fast, right?

add to this perl's suckyness on much everything else and lua is the winner
(using lua for parsing 1Gb+ logs using mmap(), look at
http://blua.leet.cz/sep/STRHOOK_PATCH.patch to get the idea)

On Thu, Nov 09, 2006 at 09:58:22PM +0100, Asko Kauppi wrote:

I didn't find any reference to discussing precompiled regular expressions, and Lua.

Some background:

In huge log file handling, Lua loses to Perl (not by much!) seemingly because it has no concept of precompiling, and then applying the regular expression patterns.

In Perl, one can:

    my $re= qr/^\d+\s/;
$var =~ $re; # $re is a precompiled, optimized regex, applied to $var
    or:
    $var =~ /^\d+\s/o;    # 'o' for once, compile once, cache, reuse

Lua:
string.match( var, "%d+%s" ) -- is "%d+%s" analyzed anew each time?


Is Lua losing in speed, since it has not this concept, or have the authors been so clever, it's already "there", just invisible? :) We're talking BIG files, say 1 million lines, or more.

-asko