|
On 12/7/06, Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
If I understand correctly, even asian languages use ascii punctuation (dots, spaces, newlines, commas, etc.), which uses 1 byte in utf-8 but 2 in utf-16. So, even for these languages utf-8 it is not so less compact as it seems.
I don't know about other Asian languages but Japanese has special punctuation characters. There is even a wide character for space. Here are some of them with their ASCII equivalents; I hope your mil reader groks them. . = 。 , = 、 " " = 「 」 (note the wide space within the Japanese-style quotes) I believe newline is the same in Japanese character sets as it is in ASCII and I presume this extends into UTF-8. However, as some of the other readers have pointed out, many of the multibyte characters express denser ideas so the ideas per byte is probably not too much different from European languages. Here are some characters the Japanese use frequently with their English equivalents. I have chosen non-sino characters to try to make my point more relevant to the English speaking readership. ☎ or ℡ = Tel (when listing telephone numbers) a 〜 b = a to b or from a to b Ken Smith