|
Sent: Tuesday, April 29, 2008 11:41 AM Subject: Re: Unpooled strings?
One last note (because I thought of that afterward): if that's the cost of hashing the strings that is bothering you (rather than the memory usage of copying your long strings), the hash is not computed on the whole string but just on a few tens of characters at its beginning (I don't remember exactly how many), so the cost of hashing a 100kB string is the same as for a 100B string.
You should also check Mike Palls faster string hashing algorithm. It generates a better hash from 3 ints in the string, but unfortunately isn't ANSI C (due to requiring unaligned loads).
http://lua-users.org/lists/lua-l/2007-12/msg00238.htmlJerome is right though. Any function in the string library except for string.len is likely to [greatly] exceed the time taken to intern the string, making the optimization moot.
Also there's quite a bit of complexitity there and features lost. Eg without modifying the core, you could never get equality to work correctly between the strings (which, due to not being interned, would always require a full memory compare anyway), or table hashing for that matter. Mind you, this may not be an issue for very-likely-to-be-unique multi hundred k strings though.
One thing that puzzles me though, is if they're constant... why not just push them at the start of the program? If it's an embedded processor I can see the problem there, but else wise memcpy isn't -that- slow. If they're not constant, I'm intrigued as to how you can guarantee their safety in the case that they aren't "forgotten". The only real use I can see for non-interned strings is to make them fully mutable, which would bring with it large performance increases for some applications.
If you're still keen though, check http://lua-users.org/wiki/SpeedingUpStrings by Rici Lake. It provides something similar to what you want, but modifies internals instead of using userdata (ie, should be faster). Do note though, that the project was left as the results were not as high as expected.
- Alex