parsing improvement

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: parsing improvement
From: Lionel Duboeuf <lionel.duboeuf@...>
Date: Fri, 29 May 2015 20:25:49 +0200

hello you all,

Just in case i'm doing it not efficiently and to learn best practices:
I have a character stream that is formated like this one:

...<6 orange/> <2 20/> <1 1/> <2 20/> <5 false/> <1 0/> <16 orangemechanics/> <2 25/>...


which correspond to a row column format like this
t = {
    {  "col1" = "orange" ,  "col2" = 20  },
    {  "col1" = 1 ,  "col2" = 20  },
    {  "col1" = false ,  "col2" = 0  },
    {  "col1" = "orange mechanics" ,  "col2" = 25  },
...
}



to do so, i parse it like this:

pos = current position of the stream

local rs = { }
local sNbByte, nbByte, val, _
local nbRows = 4
cols = { "col1","col2" }
for i = 1,  nbRows do

      local row = {}

      for j = 1, #cols do

        _, pos, sNbByte = string.find(data, "<(%d+)%s",pos)
        nbByte = tonumber(sNbByte)

        if (nbByte > 0) then
          val = string.sub(data, pos, pos + nbByte)
          pos = pos + nbByte
        end

        pos = pos + 1 --just after value

        row[cols[j]] = val

      end

    rs[i] = row
  end

i did some benchmarks, and found using gmatch and iterating troughcaptures more efficient, but it is not usable when we need to specify astarting offset position (like string.find) and i don't want to split mystring to avoid copies.


any advices will be very appreciated.

thanks

lionel

Follow-Ups:
- Re: parsing improvement, Dirk Laurie
- Re: parsing improvement, Sean Conner

Prev by Date: Re: LPEG: captures
Next by Date: Re: How strict is Lua in honoring collectgarbage() and its subcommands?
Previous by thread: Re: How strict is Lua in honoring collectgarbage() and its subcommands?
Next by thread: Re: parsing improvement
Index(es):
- Date
- Thread