lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I know my english need to be improved. I'm sure my difficulty to
explain is due to it.
I said about lua pattern, perhaps it's better to say "lua string
pattern" or "lua regex".
I get a webpage:

local content, status = http.request(url)

Now I have in "content" variable the entire content of the webpage.
By means of Lua string patterns, I would like to get all urls started
with "http" and ended with ".mp3". I know previously the target url
are ended with ".mp3".
I tried something like:
for url in content:gmatch('(http[a-zA-z0-9_/]-)%.mp3") do
print(url)
end

But the above string pattern can't match urls like:
http://midia.cmais.com.br/assets/audio/default/CENA_00087___P___24_12_10_1293450448.mp3

2019-12-25 16:17 GMT-03:00, luciano de souza <luchyanus@gmail.com>:
> Actually, the problem is not how to download the file. This can be
> done as follows:
>
> local http = require('socket.http')
>
> local url =
> 'http://midia.cmais.com.br/assets/audio/default/CENA_00087___P___24_12_10_1293450448.mp3'
>
> local content, status = http.request(url)
> if status == 200 then
> local file = io.open('file.mp3', 'wb')
> file:write(content)
> file:close()
> end
>
> The problem is that urls should be searched in html file. In my
> example, I used a known url, but to know it a web page need to be
> scanned.
> I have this url:
> http://culturafm.cmais.com.br/cena-brasileira/cena-brasileira.
> Inside html, I have lots of urls. Some of then are direct links to
> audio files. I don't need mime types becose I know links are finished
> by ".mp3".
> So my problem is which pattern matches urls started with "http" and
> ended with ".mp3". After obtaining the list of urls pointing to audio
> files, I can download each of then as I have done in the above
> example.
> Simplifying: if I have the url http://www.a.com/b/c/d.mp3' or
> something like that  which lua patterns matches it?
> Perhaps, explaining more than necessary, in my first message, I was
> confused. Sorry!
>
> 2019-12-25 15:12 GMT-03:00, Philippe Verdy <verdy_p@wanadoo.fr>:
>> There's no standard for download urls to terminate in .mp3. The standard
>> uses mime types when querying urls.
>>
>> You can query mime types of http(s) urls without downloading them using
>> HEAD requests rather than GET.
>>
>> URLS have a standard for parsing them, which allows distinguishing the
>> protocol, the host name or address, a possible port number, a path and a
>> query string. All are required but none of them indicate a mime type. An
>> if
>> the path part may frequently be used to indicate the mime type, this is
>> not
>> required, as the effective mp3 you request may be selected from the query
>> string and both may be using randomized encoding defined by the server
>> and
>> possibly depending on user's session, i.e.cookies or additional
>> parameters
>> in query strings or in encoded form data submited outside the url, such
>> as
>> authentication parameters or user preferences set by form input variables
>> (possibly hidden). Each web site then defines its own encodings and API
>> for
>> path and query strings as well as form data.
>>
>> When your request will succeed, the download will senf the binary mp3,
>> it's
>> mime type, a possible suggested name for storing a file in your local
>> file
>> system. The answer may as well return an error status, an html page (such
>> as an log-on form, or reason why your request was denied by the server).
>>
>>
>> Le mer. 25 déc. 2019 à 18:02, luciano de souza <luchyanus@gmail.com> a
>> écrit :
>>
>>> Hello all,
>>> Cultura FM radio has some interesting audios about classical music.
>>> I'd like to download it automatically.
>>> The steps are:
>>> 1. To get the page with http.request;
>>> 2. To match urls started with 'http://' and ended with '.mp3';
>>> 3. To record the urls in a file.
>>> My problem is the step 2. I could not find a pattern to match urls like:
>>>
>>> http://midia.cmais.com.br/assets/audio/default/CENA_00087___P___24_12_10_1293450448.mp3
>>>
>>> Let me show to you my attempt:
>>>
>>> local http = require('socket.http')
>>>
>>> local target = '
>>> http://culturafm.cmais.com.br/cena-brasileira/cena-brasileira'
>>>
>>> local content, status = http.request(target)
>>>
>>> if status == 200 then
>>>         local file = io.open('url.txt', 'w')
>>>         local pattern = '(http://[a-zA-Z0-9_/]-%.mp3)'
>>>         for url in content:gmatch(pattern) do
>>>                 file:write(url)
>>>         end
>>>         file:close()
>>> end
>>>
>>> Would someone know a lua pattern to match urls started with "http://";
>>> and ended with '.mp3'?
>>> Best regards,
>>>
>>> --
>>> Luciano de Souza
>>>
>>>
>>
>
>
> --
> Luciano de Souza
>


-- 
Luciano de Souza