|
Le 22 avr. 08 à 09:52, Jim Whitehead II a écrit :
On Mon, Apr 21, 2008 at 3:05 AM, Bertrand Mansion <golgote@mamasam.com> wrote:Le 21 avr. 08 à 01:06, Jim Whitehead II a écrit :I currently have the need to strip HTML tags from a given Lua string,ideally allowing a specific subset (such as <p>, <b>, etc.). Thereare a number of implementations of this, a PHP version in particular:http://uk2.php.net/strip_tagsDoes anyone have something like this in Lua, or some example LPEG codefor a specific tag that I could use? A naive solution is relativelysimple using patterns matching, but I'd like to be able to handle oddcases like this: <a href="blah" onClick="<script src='foo'></script>">Link</a> I'd like to avoid stripping the <script> tag in this case, since it occurs as an attribute of another tag.Either you strip tags or you don't. Since <script> is inside <a>, if youstrip <a>, you strip <script> at the same time.Actually, that isn't the case. Using an XML parser you can absolutely strip one and not the other, because the "tag" inside the attribute isn't a tag at all. With a proper ruleset you can actually distill things down to a point where you have what you need.
This <a href="blah" onClick="<script src='foo'></script>">Link</a> is not even XML... I wonder how an XML parser could consider the part <script src='foo'></script> a tag. It is just the value of an attribute and as such, it should first be escaped to be correct. But it is not and will never be a tag, except with a Jim Whitehead II's xml parser :)
-- Bertrand Mansion Mamasam Work : http://www.mamasam.com Blog : http://golgote.freeflux.net