Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

AppGameKit Classic Chat / Hashing, creating an associative array?

Author
Message
Phaelax
DBPro Master
21
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 27th Feb 2014 15:18
Basically, I'm trying to use a string as an array's index. I've looked up a few simple hashing algorithms online, but how do I wrap that to a number within the array's range?

Le Verdier
12
Years of Service
User Offline
Joined: 10th Jan 2012
Location: In the mosh-pit
Posted: 27th Feb 2014 16:51 Edited at: 27th Feb 2014 16:53
Basically the returned hash should be a number so the index can be i=h mod arraysize
I read somewhere that arraysize as prime number give better result.
From here two options, depending of what it used for:
-The hash is intended to be unique: the program should complain the developper to chose another string
-Handle hash collision: the elements of the array are the start of a chained list. All the entries of the list have to be compared. Eg

Type entry
Nextentry
Entrystring$
Resultdata
Endtype

The point is for large arrays, there is only a few (depends of the entry count and array size) entries to compare, plus the hash compute cost

Phaelax
DBPro Master
21
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 27th Feb 2014 18:41
I got this hash algorithm online somewhere, maybe it's not correct. Because when I did try to mod the hash by the array size, most of my indices were 0.

Basically, I need to look up an array item based on a given string. I figured hashing would be better than looping through all elements until I find a match. Sorting and a binary search can speed that up but I still think this would be better if I can get it to work.




Le Verdier
12
Years of Service
User Offline
Joined: 10th Jan 2012
Location: In the mosh-pit
Posted: 27th Feb 2014 19:38
5 is too small !!
I tried 223 and the values seem OK..

This technique is worth for large amount of data..

Marl
12
Years of Service
User Offline
Joined: 19th Nov 2011
Location: Bradford, UK
Posted: 27th Feb 2014 19:48 Edited at: 27th Feb 2014 19:54
IMO, It's overly complicated, particularly for your needs.

You could probably get away with something like like;

then use;

index = mod( hash( string$ ) , arraySize )

But as Le Verdier mentions, you have to code for duplicate hash values so unless you're using massive arrays a look-up would probably be a better option.

Edit:
you could change the addition to;

Basically, you are just mixing it up so that words like "time" and "mite" produce different hashes even though they have the same letters (plain ascii adding would produce the same values)
JimHawkins
14
Years of Service
User Offline
Joined: 26th Jul 2009
Location: Hull - UK
Posted: 27th Feb 2014 19:59
It's usual theory that your array size should be at least twice as big as the largest amount of data. You also need to allow for collisions, and then have a non-hashed linked list of the collision entries.

It's really not worth it for small numbers of items. Very powerful for large systems, though. I use an array with 50000 slots for our server software to match a computer name to other data. Takes less than a millisecond, on average.

-- Jim - When is there going to be a release?
The Daddy
15
Years of Service
User Offline
Joined: 13th Jan 2009
Location: Essex
Posted: 27th Feb 2014 20:27
Jim

Do you use a hash table or sql?

www.bitmanip.com
All the juicy you could ever dream of!
Le Verdier
12
Years of Service
User Offline
Joined: 10th Jan 2012
Location: In the mosh-pit
Posted: 27th Feb 2014 20:32 Edited at: 27th Feb 2014 20:34
It is not related to strings but it used hashing in my agttc game. It is not string but 3d coords.
It is very efficient for sorting the nearest objects out of hundreds of items.



Phaelax
DBPro Master
21
Years of Service
User Offline
Joined: 16th Apr 2003
Location: Metropia
Posted: 27th Feb 2014 20:35
I thought collisions were only a problem if I had a lot of data. And if I'd have to double my array size, then a hash table isn't exactly memory efficient it looks like.

If I had a couple hundred elements, you think a lookup would still be a better option? I don't have enough data to know yet exactly how big this array can be. Could be 5, could be 1k.

JimHawkins
14
Years of Service
User Offline
Joined: 26th Jul 2009
Location: Hull - UK
Posted: 27th Feb 2014 23:58 Edited at: 27th Feb 2014 23:59
There's no such thing as a perfect hashing algorithm my elders and betters tell me.

This is the fairly simple one I use:



This produces an unsigned number which we can then MOD with the size of the hashtable (in my case a big file) to get a starting point. If the slot is empty we simply put the data in. If it's not, we have to traverse a linked list off this point and find an empty slot. We then link the previous entry down to this index.

It's quite complex to code the insertion:


It's absolutely not worth going to all this trouble for small lists. If the data is sorted, a binary search is very fast. If it's coming in randomly, a tree can be good, but if it's sorted a tree just becomes a list (degenerate).

Comparing an unsigned number is always massively faster than comparing strings, so for less than (say) 1000 items if you have two arrays - the strings(string) and the hashcode (integer) which are kept in sync you can:

* Compute the hashcode for the string you're searching for.
* Traverse the integer array. If the numbers match, then compare the strings.
* If you found it - fine - if not either add it or say you didn't find it.

A huge amount depends upon how many data items you want to compare, so there's no easy answer to which algorithm to use. The same thing applies to sorting. If you have <50 items, a Bubble Sort may actually be simpler and quicker than a Quicksort.

-- Jim - When is there going to be a release?

Login to post a reply

Server time is: 2024-05-02 15:17:28
Your offset time is: 2024-05-02 15:17:28