|
David Given wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Marco Antonio Abreu wrote:When a field value has one accented char, it truncate the last one ('Flávia' comes like 'Fl??vi' - ?? are especial chars), if the text has two accented chars it has the last two chars cutted and so on...This is a classic symptom of UTF-8 misparsing.
Kind of. In fact the problem is that LuaCOM is truncating characters.The issue is this. There's a function to convert from BSTR (utf-16 strings, as used by COM) to Lua strings. When converting "Flávia", it computes its size (6) and converts to utf-8 (which gives a 7 byte string: Flávia) BUT, it pushes just 6 bytes to Lua (instead of the required 7).
So, the strings got truncated depending on the amount of codepoints present (roughly).
I'll push a fix for that to LuaCOM. Regards, Ignacio Burgueño