[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: question about Unicode
- From: Robert Raschke <r.raschke@...>
- Date: Fri, 8 Dec 2006 09:41:09 +0000
Glenn Maynard wrote:
> On Thu, Dec 07, 2006 at 03:44:05PM -0500, Brian Weed wrote:
>> Asko Kauppi wrote:
>> >But there may be some identifier "stamp" that can be used to know a
>> >file is UTF-8, no?
>> There are two that I know of. I don't know how "standard" they are.
>> One is called a BOM Header, which is some binary code in the first 2
>> bytes of the "text" file.
>
> Three: 0xEF 0xBB 0xBF. Don't use that unless you're writing
> Windows-specific stuff and you really need to be compatible with
> other Windows applications that expect it--it's not "binary" any
> more than any other UTF-8 character, but text file encodings do not
> have headers! (And if you--the reader, not Brian Weed--do use this,
> make it a save-time option and disable it by default if possible.)
>
I just yesterday broke down and added a UTF-8 BOM (0xEF 0xBB 0xBF)
"handler" to luaL_loadfile() because I foolishly said in my
documentation to save config files (i.e., Lua sources in diguise) as
UTF-8. The resulting flurry of support requests about errors like
test-utf8bom.conf:1: `=' expected near `»'
or
test-utf8bom.conf:1: unexpected symbol near `ï'
because of Notepad being used as the editor made my last few months
reasonably uncomfortable.
All I do now, is upon loading the file, look for those dreaded three
bytes and skip them.
Robby