|
The other things is: if your text editor is storing files in UTF-8, the é might actually be occupying two bytes. In this case, %w won't work regardless of locale, because %w only matches one byte, since Lua strings are essentially byte-vectors. (Or dare I say immutable byte tuples :).
Either way, figuring out what a "word" character is can be quite challenging. The C standard library call on which %w is based in used to find letters which can be "normally parts of an identifier", that is an identifier in a programming language. Human identifiers (that is, names) can be quite a bit more complex. For example, O'Reilly and Dell'omo are pretty common surnames in some places; the ' character would also throw off %w.
R. On 4-Nov-05, at 7:56 AM, Walter Cruz wrote:
Hi all. I'm using lua package from Debian, (I'm using unstable) But there's something strange. That little script: _______ x = "Walter é" print(x) t={} for word in string.gfind(x, "%w+") do table.insert(t,word) end table.foreach(t,print) ___________ returns : _____ Walter é 1 Walter _______The accented char is losted. I have downloaded lua 5.1 beta and compiled it, but the behaviour is the same.I don't know what is causing that :( []' - Walter