[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Parsing strings written by string.format("%q")
- From: sur-behoffski <sur_behoffski@...>
- Date: Mon, 5 Dec 2016 23:19:17 +1030
(In response to discussions, especially Daurnimator and "David Given"...
apologies for message-thread breakage...)
> I want to write out strings to a file and read them back in again, in an
> ASCII-safe way. I'm writing them out with string.format("%q"), because
> it's cheap and easy[*], but now I need to read them back in again.
In one current project, I write the data as an executable script (after
all, Lua partially derives from the TeX-like DAT)... and simply require()
it in order to recover the data. However, this is an agreement amongst
friends, whereas you're looking for a bullet-proof parser where the source
may not be trusted.
In my project, I'm "fingerprinting" a machine:
-- Obtain raw-text general information about this machine
M.CPUInfo = assert(ReadFile("/proc/cpuinfo"))
M.MemInfo = assert(ReadFile("/proc/meminfo"))
M.PCIInfo = assert(PE.exec_quietly("lspci"))
M.USBInfo = assert(PE.exec_quietly("lsusb"))
M.IfConfigInfo = assert(PE.exec_quietly("ifconfig"))
M.FSTabInfo = assert(ReadFile("/etc/fstab"))
M.PartitionsInfo = assert(ReadFile("/proc/partitions"))
(PE.exec and PE.exec_quietly are value-added versions of luaposix's
pipes x 3/fork/dup2 x 3/fd close x 3/execp, much of it shamelessly
stolen from an earlier version of luaposix.)
[The disk-partition information comes in handy as I usually boot off
of a "live" Linux CD/DVD, modified to have Lua and all the necessary
scripts in place to work neatly. However, device names (sda/sdb etc)
can be jumbled, so I needed to be able to identify partitions
unambiguously. Anyway...]
The raw-text output is quoted verbatim, along with quite a lot of
effort to characterise the disk drives (e.g. MBR, msdos/gpt partition
table, partition types/UUIDs, filesystem types, integrated with
information from fstab, such as mount point, backup pass, mount
options, etc.
I wanted the output of these utilities to appear verbatim, and so
decided to use:
"[[".."]]" quoting; except when I found either of those markers
in the raw text; and so tried
"[=[" .. "]=]" quoting; except where I found either of those
markers in the raw text; and had to try ...
"[==[" .. "]==]" quoting; except where I found either of
those markers in the raw text; and had to try ...
You can see where this is heading.
Perhaps "%Q" could be added as a string.format / pattern match specifier,
based around the dynamic set of long-string literal specifiers. This
would satisfy the "ASCII-safe" needs above.
Some mechanism for bailing out if the specifiers get too long
('[' .. "=":rep(999999999) .. ']' ?!) would be needed, given that the
string is untrusted.
Any comments?
sur-behoffski