[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua pattern to match an url
- From: Philippe Verdy <verdy_p@...>
- Date: Wed, 25 Dec 2019 22:37:28 +0100
It matches too many things, not all characters can be valid in a URI (look at the specs in the relevant RFC and see the RFC describing the URL-encoding on why it is needed to escape other punctuation and whitespace characters in the ASCII range, that a dot pattern would matchunexpectedly, making your matched URI much too long than expected)
Oh! Wonderful! Problem solved. The dot is really lacking.
Thank you very much.
2019-12-25 16:49 GMT-03:00, luciano de souza <luchyanus@gmail.com>:
> I know my english need to be improved. I'm sure my difficulty to
> explain is due to it.
> I said about lua pattern, perhaps it's better to say "lua string
> pattern" or "lua regex".
> I get a webpage:
>
> local content, status = http.request(url)
>
> Now I have in "content" variable the entire content of the webpage.
> By means of Lua string patterns, I would like to get all urls started
> with "http" and ended with ".mp3". I know previously the target url
> are ended with ".mp3".
> I tried something like:
> for url in content:gmatch('(http[a-zA-z0-9_/]-)%.mp3") do
> print(url)
> end
>
> But the above string pattern can't match urls like:
> http://midia.cmais.com.br/assets/audio/default/CENA_00087___P___24_12_10_1293450448.mp3
>
> 2019-12-25 16:17 GMT-03:00, luciano de souza <luchyanus@gmail.com>:
>> Actually, the problem is not how to download the file. This can be
>> done as follows:
>>
>> local http = require('socket.http')
>>
>> local url ="">
>> 'http://midia.cmais.com.br/assets/audio/default/CENA_00087___P___24_12_10_1293450448.mp3'
>>
>> local content, status = http.request(url)
>> if status == 200 then
>> local file = io.open('file.mp3', 'wb')
>> file:write(content)
>> file:close()
>> end
>>
>> The problem is that urls should be searched in html file. In my
>> example, I used a known url, but to know it a web page need to be
>> scanned.
>> I have this url:
>> http://culturafm.cmais.com.br/cena-brasileira/cena-brasileira.
>> Inside html, I have lots of urls. Some of then are direct links to
>> audio files. I don't need mime types becose I know links are finished
>> by ".mp3".
>> So my problem is which pattern matches urls started with "http" and
>> ended with ".mp3". After obtaining the list of urls pointing to audio
>> files, I can download each of then as I have done in the above
>> example.
>> Simplifying: if I have the url http://www.a.com/b/c/d.mp3' or
>> something like that which lua patterns matches it?
>> Perhaps, explaining more than necessary, in my first message, I was
>> confused. Sorry!
>>
>> 2019-12-25 15:12 GMT-03:00, Philippe Verdy <verdy_p@wanadoo.fr>:
>>> There's no standard for download urls to terminate in .mp3. The standard
>>> uses mime types when querying urls.
>>>
>>> You can query mime types of http(s) urls without downloading them using
>>> HEAD requests rather than GET.
>>>
>>> URLS have a standard for parsing them, which allows distinguishing the
>>> protocol, the host name or address, a possible port number, a path and a
>>> query string. All are required but none of them indicate a mime type. An
>>> if
>>> the path part may frequently be used to indicate the mime type, this is
>>> not
>>> required, as the effective mp3 you request may be selected from the
>>> query
>>> string and both may be using randomized encoding defined by the server
>>> and
>>> possibly depending on user's session, i.e.cookies or additional
>>> parameters
>>> in query strings or in encoded form data submited outside the url, such
>>> as
>>> authentication parameters or user preferences set by form input
>>> variables
>>> (possibly hidden). Each web site then defines its own encodings and API
>>> for
>>> path and query strings as well as form data.
>>>
>>> When your request will succeed, the download will senf the binary mp3,
>>> it's
>>> mime type, a possible suggested name for storing a file in your local
>>> file
>>> system. The answer may as well return an error status, an html page
>>> (such
>>> as an log-on form, or reason why your request was denied by the server).
>>>
>>>
>>> Le mer. 25 déc. 2019 à 18:02, luciano de souza <luchyanus@gmail.com> a
>>> écrit :
>>>
>>>> Hello all,
>>>> Cultura FM radio has some interesting audios about classical music.
>>>> I'd like to download it automatically.
>>>> The steps are:
>>>> 1. To get the page with http.request;
>>>> 2. To match urls started with 'http://' and ended with '.mp3';
>>>> 3. To record the urls in a file.
>>>> My problem is the step 2. I could not find a pattern to match urls
>>>> like:
>>>>
>>>> http://midia.cmais.com.br/assets/audio/default/CENA_00087___P___24_12_10_1293450448.mp3
>>>>
>>>> Let me show to you my attempt:
>>>>
>>>> local http = require('socket.http')
>>>>
>>>> local target = '
>>>> http://culturafm.cmais.com.br/cena-brasileira/cena-brasileira'
>>>>
>>>> local content, status = http.request(target)
>>>>
>>>> if status == 200 then
>>>> local file = io.open('url.txt', 'w')
>>>> local pattern = '(http://[a-zA-Z0-9_/]-%.mp3)'
>>>> for url in content:gmatch(pattern) do
>>>> file:write(url)
>>>> end
>>>> file:close()
>>>> end
>>>>
>>>> Would someone know a lua pattern to match urls started with "http://"
>>>> and ended with '.mp3'?
>>>> Best regards,
>>>>
>>>> --
>>>> Luciano de Souza
>>>>
>>>>
>>>
>>
>>
>> --
>> Luciano de Souza
>>
>
>
> --
> Luciano de Souza
>
--
Luciano de Souza