|
Well that’s true of the ZWNBSP *codepoint* U+FEFF, which of course encodes to 0xEF/0xBB/0xBF. But what about dumb encoders that encode a big-endian UTF-16 sequence into UTF-8 and emit a byte-swapped encoding for the BOM? The problem is UTF-8 *should* be used to decode: UTF-8 -> codepoint array. Instead its (shudder) often used to decode UTF-8 -> UTF-16 -> (byte-swap based on BOM) -> codepoint array. It’s one reason I detest Unicode. —Tim |