Re: string hash

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: string hash
From: spir <denis.spir@...>
Date: Fri, 14 Dec 2012 10:22:04 +0100

On 13/12/2012 08:21, Richter, Jörg wrote:

Strings are stored in a hash table but the string hash value is reduced
modulo the size of the hash table and so it helps to compare string hash
values for colliding strings before resorting to memcmp. See
	http://www.lua.org/source/5.2/lstring.c.html#luaS_newlstr


After reading this I applied the same optimization to my hash table
implementation.
Are you sure this is worthwhile? My unrepresentative performance tests are
somewhat slower (2-3%) than without this extra compare.

But I have no performance tests for Lua to double check.

Perhaps anyone can run their tests with "h == ts->tsv.hash &&" commented out.

    Jörg

So, here are some measures. One issue is: what to choose as supposedlyrepresentative input data? In particular, the proportion of equal strings (ones,which are actually stored in the pool, either because we mean to look them up orby chance) determines the number of petential memcmp's and thus the possibleinfluence of checking the hash before. So, I decided to feed the pool withrandom strings of fix length=7 (so that len check does not pre-filter, but it'sunreal) among an alphabet of varying number of letters. Which gives:


without hash check:

#strings:1000000  #letters:6  %equal:27
time  : 0.450

#strings:1000000  #letters:7  %equal:58
time  : 0.470

#strings:1000000  #letters:8  %equal:80
time  : 0.480

with hash check:

#strings:1000000  #letters:6  %equal:27
time  : 0.450

#strings:1000000  #letters:7  %equal:58
time  : 0.470

#strings:1000000  #letters:8  %equal:80
time  : 0.490

%equal is the proportion of equal strings. Numbers are fairly constant amongdifferent runs. Time is CPU time (just diff in clock()). So, about no influenceat all in this test case. At best (or worst) I get ~ 2% in disfavour of checkingthe hash with high proportions of equal strings. Conclusion: in absence of moreinfo (such as the reason why it's there in Lua), let it down.

However, I initially placed the hash in the string struct (as is done in Lua, apoint I stepped on by chance and was the reason for my question) for anotherreason: i keep the hash anyway to avoid re-hashing when the pool grows, so whynot put it in the string instead of on the cell struct (node)? If ever useful,it's there, while someone having access to the string but not to the cell wouldjust not have it...


Denis

Follow-Ups:
- RE: string hash, Richter, Jörg

References:
- string hash, spir
- Re: string hash, Luiz Henrique de Figueiredo
- RE: string hash, Richter, Jörg

Prev by Date: Re: string hash
Next by Date: Re: Conventional name proposal: merge
Previous by thread: Re: string hash
Next by thread: RE: string hash
Index(es):
- Date
- Thread