[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: reading big files
- From: "Meer, H. van der" <H.vanderMeer@...>
- Date: Fri, 21 Jun 2013 09:04:26 +0000
In one of my programs I am reading files in the order of 1 or more GigaBytes. Measuring the program behaviour I came to the conclusion that a not insignificant fraction of the processing time goes into reading the file. If possible I would like to reduce this (or have to live with it -- also an option).
In the processing stage I am taking out parts of about 200 bytes a time out of a larger buffer (buffersize is set to about 10 MB). Enlarging the value of buffersize seems to do more bad than good. This is about how this is done.
while true do
-- Fill the buffer when empty and check for end of file.
if bufferptr > #bufferin then
bufferin = stream:read(buffersize)
if bufferin == nil then
break
end
bufferptr = packetoffset
end
-- consume buffer content in chunks of about 200 bytes
end
I tried to see what is happening by looking into liolib.c (version 5.2.2). However, I am not sure if I understood the code well enough. Therefore I am asking if someone more knowledgeable will correct me where I err.
The impression I got is that stream:read(buffersize) is reading in a buffer a chunk of size LUAL_BUFFERSIZE, and then at each step of fread in function read_all enlarges the size (up to a certain maximum). In the process luaL_prepbuffsize is called and the data are transferred with a memcpy to a new and larger memory chunk. Do I see correctly that in the successive steps while reading very large chunks, this leads to memcpy the data over and over until everything is read?
If correct this might explain why enlarging the variable buffersize seems more counterproductive than benificiary. In that case I see the following options:
1. Enlarging LUAL_BUFFERSIZE to 10MB. But being a compile time thing, that would be inefficient as nearly all other reads are very small compared to the one in the code snippet above.
2. Rewriting the Lua code to make LUAL_BUFFERSIZE a settable variable. But I cannot oversee all the consequences of this and moreover, I am no fan of fiddling with the Lua code. It is inconvenient (new releases) and imho unwise.
3. Writing a C module specific to these reads: do an fread in a malloc chunk and hand that over directly.
But perhaps there is a better solution and I would very much appreciate to hear your opinion.
Hans van der Meer