[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: parsing improvement
- From: Lionel Duboeuf <lionel.duboeuf@...>
- Date: Fri, 29 May 2015 20:25:49 +0200
hello you all,
Just in case i'm doing it not efficiently and to learn best practices:
I have a character stream that is formated like this one:
...<6 orange/> <2 20/> <1 1/> <2 20/> <5 false/> <1 0/> <16 orange
mechanics/> <2 25/>...
which correspond to a row column format like this
t = {
{ "col1" = "orange" , "col2" = 20 },
{ "col1" = 1 , "col2" = 20 },
{ "col1" = false , "col2" = 0 },
{ "col1" = "orange mechanics" , "col2" = 25 },
...
}
to do so, i parse it like this:
pos = current position of the stream
local rs = { }
local sNbByte, nbByte, val, _
local nbRows = 4
cols = { "col1","col2" }
for i = 1, nbRows do
local row = {}
for j = 1, #cols do
_, pos, sNbByte = string.find(data, "<(%d+)%s",pos)
nbByte = tonumber(sNbByte)
if (nbByte > 0) then
val = string.sub(data, pos, pos + nbByte)
pos = pos + nbByte
end
pos = pos + 1 --just after value
row[cols[j]] = val
end
rs[i] = row
end
i did some benchmarks, and found using gmatch and iterating trough
captures more efficient, but it is not usable when we need to specify a
starting offset position (like string.find) and i don't want to split my
string to avoid copies.
any advices will be very appreciated.
thanks
lionel