lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Also is it a good idea to depend on a "sane locale" for something that
is possibly security relevant? (not a rhetoric question, I am not sure
what a "not sane" locale is and how easy it can be butchered by an
attacker)

Because of my lack of knowledge on things, I wrote up this lengthy
function which is supposed to behave like a gsub with plain=True:

string.replace = function(base, search, replace)
    --[[-- A replace function that will replace all occurances of the search
           string in the given base string with the replace string.

           Unlike string:gsub, no special regex pattern processing will take
           place - this stupidly searches exactly for the search string as is,
           and only if it is found character by character exactly as written,
           the occurance will be replaced. ]]
    if #search == 0 then
        return base
    end
    local startindex = 1
    while true do
        local index = base:find(search, startindex, true)
        if index ~= nil then
            local result = ""
            if index > 1 then
                result = result .. base:sub(1, index-1)
            end
            result = result .. replace
            -- check if there is something left after the replaced piece:
            if index - 1 + #replace < #base + #replace - #search then
                -- there is!
                assert(#result >= startindex)
                startindex = #result
                result = result .. base:sub(index + #search)
                base = result
                -- since stuff is left at the end, continue searching!
            else
                -- we reached the end:
                return result
            end
        else
            -- no further occurances left!
            return base
        end
    end
end

I would rather not do that, but I felt it was a better idea than not
being sure if everything was always escaped properly in all cases.
With a find option for gsub, I would just have used that one instead.
About the %p pattern... I still cannot judge if it is safe for all
circumstances. If it is, then that would be a great solution too!
(certainly shorter)

Regards,
Jonas Thiem


On Thu, Aug 21, 2014 at 4:23 PM, Jonas Thiem <jonasthiem@googlemail.com> wrote:
> Hm I see. Still, wouldn't a plain option be more consistent with find
> and easier to use for beginners?
>
> On Thu, Aug 21, 2014 at 4:21 PM, Roberto Ierusalimschy
> <roberto@inf.puc-rio.br> wrote:
>>> I think this already demonstrates my point. Coming up with a regex
>>> that is safe and escapes everything is not trivial.
>>>
>>> [...]
>>> >
>>> > On my system, '%p' does not match '[+$^]', so '%p' should become '[%p+$^]'.
>>
>> This seems like a bug in his system (or else he is using some weird
>> locale...). '%p' corresponds to 'ispunct', and the C standard says this:
>>
>>   In an implementation that uses the seven-bit US ASCII character set, the
>>   printing characters are those whose values lie from 0x20 (space) through
>>   0x7E (tilde);
>>
>>   [...]
>>
>>   In the "C" locale, ispunct returns true for every printing character for
>>   which neither isspace nor isalnum is true.
>>
>> So, '[+$^]' must be all punctuations (and therefore match '%p').
>>
>> If you assume a correct libC and a sane locale, '%p' is all you need.
>>
>> -- Roberto
>>