[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: ?problem with read("*l") with binary data
- From: Tom Oehser <tom@...>
- Date: Tue, 3 Apr 2001 15:22:16 +0000 (America/New_York)
> From: Roberto Ierusalimschy <roberto@i...>
> Actually, read("*l") does not handle binary data. We thoght that the
> concept of "line" itself does not make sense in a binary file (that's
> why the manual did not explicitly explain this "limitation"). To read
> binary data, you can use read('*all') (to read the whole file) or
> read(NUMBER), to read a fixed number of bytes.
But this is problematic when implementing a utility that will handle
arbitrary data that is presumed to *usually* be lines of ascii. For
example, implementing a 'grep'- *usually* the person running the program
is targeting lines. But it has to also be able to give the binary offset,
and the line number, when used on a file that contains nulls. It makes
the most sense in line-oriented utilites to read by lines, keeping a byte
counter and a line counter, scanning each line one at a time. Reading the
whole file is not an option when the file sizes may exceed the virtual
memory limits, and reading by number is problematic when the patterns
might cross buffers. I'll end up just writing my own read_line() that
doesn't use the C null-terminated-string functions, actually, I'll
probably modify libio, because what is the point of having 2 read_line()
functions? The same kind of issues apply, for example, to 'wc'- it needs
to be looking at lines, but it also should be able to handle just a byte
count of an arbitrary file. Granted, if you are writing to known data for
known uses, you can just use *l only where it doesn't have nulls- but
writing utilities for arbitrary use doesn't give me that luxury, and I
want to make them work such that I can regression test against the GNU C
versions. And, it isn't really desirable to do what I've been doing- that
is, reading read(1) and making my own string 1 character at a time, or
reading into a table 1 byte per entry... uck... certainly, it won't
*break* anything for libio to allow nulls in a string on *l... I'll send
a patch once I fix it...
-Tom