[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua pattern to match an url
- From: nobody <nobody+lua-list@...>
- Date: Wed, 25 Dec 2019 23:19:19 +0100
> On 25. Dec 2019, at 22:44, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
>
> It matches too many things […]
When dealing with _valid HTML,_ heuristically, any string that starts with 'http://', ends with '.mp3' and doesn't contain spaces is almost certainly exactly a URL pointing at (something that claims to be) an MP3. (The other pattern works, too.) [So a somewhat better pattern than what I initially suggested would be "http://%S+%.mp3" – also excluding line breaks.]
When you're not dealing with random / adversarial strings, that is good enough and you don't have to care about all those intricacies. From what I gathered, the goal is one-off semi-manual extraction of links from HTML generated by some other party, so even potential errors don't really matter… (The human in the loop can notice / fix things.)
-- nobody