[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Error installing htmlparser
- From: steve donovan <steve.j.donovan@...>
- Date: Mon, 25 Nov 2013 10:30:29 +0200
On Sun, Nov 24, 2013 at 10:41 PM, Craig Barnes <craigbarnes85@gmail.com> wrote:
>> [1] http://stevedonovan.github.io/Penlight/api/modules/pl.xml.html#parsehtml>
> Doesn't work for me. Am I doing something wrong?
Nope, it's an actual bug. It was expecting DOCTYPE in caps, which of
course is not how HTML works. Then it parses the well-formed HTML fine
- but I must emphasize, that this is a 'relaxed' mode of a dinky XML
parser and really cannot cope with any badly-formed HTML. So I can't
recommend it for people who need to deal with the real web.
It coped ok with the Slashdot front page, but that's fairly decent HTML.
The result of the well-formed HTML is the following LOM table:
{
tag = "html",
attr = {
lang = "en"
},
{
tag = "head",
{
tag = "meta",
attr = { charset = "utf-8" },
empty = 1,
},
{
tag = "title",
"Test",
},
},
{
tag = "body",
{
tag = "h1",
"Test",
}
}
}
(Cleaned up from pretty.dump)
steve d.