lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


No, keep strings as UTF-8 blobs. Erlang has found that this is the way to
keep things efficient, since most data you are dealing with does not need
serious per-codepoint analysis. lpeg can parse at the byte level and
identify the parts of the string that need more intensive processing.

Tony.

Off topic, but I believe erlang stores each character as 32-bit value, according to this:

http://schemecookbook.org/Erlang/StringBasics

"To understand why Erlang string handling is less efficient than a language like Perl, you need to know that each character uses 8 bytes of memory. That's right -- 8 bytes, not 8 bits! Erlang stores each character as a 32-bit integer, with a 32-bit pointer for the next item in the list (remember, strings are lists of characters.)"