lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Rici Lake wrote:
I can't help thinking that all the proposed solutions are a lot more complicated than necessary.

So are your, below, when you extend the functionnalities... :-)

Also, this is a perfect use case for Mike Pall's patch to string.gsub, with which I concur (although I might extend it a bit....)

I agree.

Anyway, here's the simplest html escaper I know of (just the three vital characters):

do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
   local function escape(c) return escapes[c] end
   function html_escape(str) return (str:gsub("[&<>]", escape)) end
end

Indeed, but if you extend the list of escape chars, you have to extend the regular expression, which may be prone to error later. One of the "complexities" of my solutions was to build this RE automatically.

Can't get much simpler than that, except that with Mike's patch you wouldn't need the function "escape"; you could just provide the table "escapes" as the last argument to gsub. (By the way, the redundant parentheses in the last return statement are deliberate; they avoid returning the second return value of gsub.)

If the string to be escaped is ISO-8859-1, and you really want to escape high-ascii numerically, just extend the escapes table:

do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
   for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
   local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) end
end

If you really want named escapes, insert them after the for loop, but I don't see the point; with numeric escapes you don't need to worry about browser support.

Some recent entities, like &euro;, may not be known of old browsers. Using the named entity allows, at least, the user to see the &euro; string, which is easier to understand that the numeric entity.

However:

do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
   for i = 128, 255 do escapes[string.char(i)] = "&#"..i..";" end
   escapes['á'] = "&aacute;"
   -- etc.
   local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) end
end

Perhaps it is better to just output straight in UTF-8:

do
   local escapes = {["&"] = "&amp;", ["<"] = "&lt;", [">"] = "&gt;"}
for i = 128, 191 do escapes[string.char(i)] = "\194"..string.char(i) end for i = 192, 255 do escapes[string.char(i)] = "\195"..string.char(i-64) end
   local function escape(c) return escapes[c] end
function html_escape(str) return (str:gsub("[&<>\128-\255]", escape)) end
end

In no case should it be necessary to scan the string more than once.

Indeed.
Good solutions, as usual...

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --