lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



Le 22 avr. 08 à 09:52, Jim Whitehead II a écrit :
On Mon, Apr 21, 2008 at 3:05 AM, Bertrand Mansion <golgote@mamasam.com> wrote:

Le 21 avr. 08 à 01:06, Jim Whitehead II a écrit :


I currently have the need to strip HTML tags from a given Lua string,
ideally allowing a specific subset (such as <p>, <b>, etc.).  There
are a number of implementations of this, a PHP version in particular:

http://uk2.php.net/strip_tags

Does anyone have something like this in Lua, or some example LPEG code
for a specific tag that I could use?  A naive solution is relatively
simple using patterns matching, but I'd like to be able to handle odd
cases like this:

<a href="blah" onClick="<script src='foo'></script>">Link</a>

I'd like to avoid stripping the <script> tag in this case, since it
occurs as an attribute of another tag.


Either you strip tags or you don't. Since <script> is inside <a>, if you
strip <a>, you strip <script> at the same time.

Actually, that isn't the case.  Using an XML parser you can absolutely
strip one and not the other, because the "tag" inside the attribute
isn't a tag at all.  With a proper ruleset you can actually distill
things down to a point where you have what you need.

This <a href="blah" onClick="<script src='foo'></script>">Link</a> is not even XML... I wonder how an XML parser could consider the part <script src='foo'></script> a tag. It is just the value of an attribute and as such, it should first be escaped to be correct. But it is not and will never be a tag, except with a Jim Whitehead II's xml parser :)


--
Bertrand Mansion
Mamasam
Work : http://www.mamasam.com
Blog : http://golgote.freeflux.net