There's no such thing as a perfect hashing algorithm my elders and betters tell me.
This is the fairly simple one I use:
function StrToHash(const s : AnsiString) : Cardinal;
var i : integer;
begin
s:= Uppercase(s);
result := length(s)*111;
if s='' then exit;
for i := length(s) downto 1 do
result:= result + (ord(s[i])* 2049);
end;
This produces an unsigned number which we can then MOD with the size of the hashtable (in my case a big file) to get a starting point. If the slot is empty we simply put the data in. If it's not, we have to traverse a linked list off this point and find an empty slot. We then link the previous entry down to this index.
It's quite complex to code the insertion:
function AddStudentRecord(hfile:TFilestream;thiskey:ansistring;staffkey:ansistring;thisini:ansistring):boolean;
var initial, index,traverse,j,previous:cardinal;
recfind,rectraverse,recnew: THashRecord;
begin
result := false;
if not assigned(hfile) then exit;
fillchar(recnew,HashRecordSize,#0);
for j := 1 to length(thiskey) do
recnew.key[j-1]:= thiskey[j];
for j := 1 to length(staffkey) do
recnew.staffid[j-1]:= staffkey[j];
if length(thisini)>0 then
begin
for j := 1 to length(thisini) do
recnew.ini[j-1]:= thisini[j];
recnew.hasIni:= 1;
end;
recnew.isActive:= 1;
WLog(thiskey);
hfile.Seek(0, soBeginning);
hfile.Write(recnew,hashRecordsize);
// compute the primary hash key
initial := StrToHash(thiskey);
index := (initial) mod HashTableMax;
hfile.Seek(index*HashRecordSize, soBeginning);
hfile.Read(recfind,HashRecordSize);
// do we have a collision?
if recfind.isActive=1 then
begin
inc(collisions);
previous:= index;
repeat
traverse := (previous+1) mod HashTableMax;
WLog('bucket: '+ inttostr(traverse));
hfile.Seek(traverse * HashRecordSize,soBeginning);
hfile.Read(rectraverse,HashRecordSize);
if rectraverse.isActive=1 then previous := traverse;
until not (rectraverse.isActive=1);
// now put new data in and update the pointer to the last traversed field
WLog(format('previous = %d current = %d',[previous,traverse]));
hfile.Seek(traverse * hashrecordsize,soFromBeginning);
hfile.Write(recnew,hashRecordSize);
hfile.Seek(previous*hashrecordsize,soBeginning); // should now point at previous active record
hfile.Read(rectraverse,HashRecordSize);
rectraverse.uplink := traverse;
hfile.Seek(previous*hashrecordsize,soBeginning); // could wind back, but this builder is not time-critical
hfile.Write(rectraverse,hashRecordSize);
end
else
begin
// simple insertion
hfile.Seek(index*hashrecordsize, soBeginning);
hfile.Write(recnew,hashRecordsize);
Wlog('simple: '+inttostr(index));
end;
end;
It's absolutely not worth going to all this trouble for small lists. If the data is sorted, a binary search is very fast. If it's coming in randomly, a tree can be good, but if it's sorted a tree just becomes a list (degenerate).
Comparing an unsigned number is always massively faster than comparing strings, so for less than (say) 1000 items if you have two arrays - the strings(string) and the hashcode (integer) which are kept in sync you can:
* Compute the hashcode for the string you're searching for.
* Traverse the integer array. If the numbers match, then compare the strings.
* If you found it - fine - if not either add it or say you didn't find it.
A huge amount depends upon how many data items you want to compare, so there's no easy answer to which algorithm to use. The same thing applies to sorting. If you have <50 items, a Bubble Sort may actually be simpler and quicker than a Quicksort.
-- Jim - When is there going to be a release?