Version: v1.00
Date: 01-sep-2000
Author: Derek Parnell
These routines help manage hash tables. See "Hash Routines" for details.
An error was detected. See Hash_Errors for details.
Insert mode: Keys must be unique, duplicate keys ignored.
Callback action: Insert the returned data.
Callback action: Replace the existing with the returned data.
Callback action: Ignore the insert request.
Used by calling routines to request the default hashing algorithms
Set Default number of dimensions for record estimates.
Set Max Key length for precalculated seeds.
The data returned by an Insert callback function was not a 2-element sequence
The supplied TableId is not any of the current Ids
The supplied TableId refers to a deleted table.
One of the Table dimensions is not an atom/integer
One of the Table dimensions is than 1.
A unknown Insert mode was supplied.
The action code in the Insert callback data is not recognised.
The data to be replaced wasn't found.
Wrong number of hash values supplied.
An unknown Default ID was supplied.
An invalid value for the specific default ID was supplied.
Unable to restore system state because there is no state to restore from.
These routines help manage hash tables. A Hash table is like a RAM-based database, but with limitations. It is usually only for small numbers of records, and for small record sizes, but it is very fast. Individual records can only be retrieved if you know their key; there is no FindFirst, FindNext type of functions, because there is no assumed relationship between the records in a table. However, there is a function to retrieve all the stored records, optionally with a user- written filtering function.
Typically, hash tables are used for syntax analysis, but can be used in any application that has to retrieve data based on unique keys. The record data to be stored can be anything, as these routine do not use this data in any way at all. In fact, the data probably should not include the record's key as this is stored by the routines anyway, so it would be redundant and wasteful.
These routines store any error messages in a global sequence called Hash_Errors. After calling any of the hash routines, you can check this sequence and if its length is not zero it means that at least one error was detected. Each entry in this sequence is a three-element sequence - the first is the error code, the third is a sequence of relevant data relating to the particular error, and the second is an English text message relating to the error.
The reason they are called hash tables is that each record's key is converted into a "Hash Key" that is, in turn, used to identify a specific queue that the record is stored in. The conversion algorithms examine each key calculate a number for the key. The number is divided by the number of queues available to see which queue the record can be stored in. Because it is possible that different keys can all map into the same queue number, some queues store multple records. The more effecient hash tables are those that have the fewest number of records per queue.
This library helps you to distribute records over the queues by allowing you to define a hash table to use multple algorihms. Each table created with this library has a set of "dimensions", and each dimension uses a different algorithm. You not only specify how many dimensions to use, but also their relative weightings. For example, a one dimension table defined with {x}, a 2-dim table as {x,y}, a 3-dim table as {x,y,z}, and so on. There is no limit to the number of dimensions except memory. In practice, a maximum or 4-6 is realistic. These dimensions also determine the number of queues to use. The total number of queues is the product of the dimension values, ie. a 3-dim table of {x,y,z} has x * y * z queues. You can bias the algorithm(s) to use by having larger dimension values. For example, a table of {3,19,3} has 3 * 19 * 3 (171) queues and the second dimension's algorithm is more heavily relied upon to distribute the keys than the other two. Generally, the more dimensions and the more queues, the better the key distribution. But you must trade-off memory for speed.
To make it easier, the table defintion funtion also allows you to supply an estimate of the number of records you think will be stored, it then calculates a dimension set for the table that should be suitable for the data, however it tends to be a bit memory hungry. Some functions allow you to supply your own hash keys. You might do this if the default algorithms don't suit your data. When you do this, you must supply a hash key set with exactly the same number of elements as there are in the table's dimension. Plus, each element must be in the range 1 .. N, where N is the value in the respective dimension. Thus for a table defined with dimensions of {3,5,7}, you must supply a hash-key-set with three elements and the first must be from 1 - 3, the 2nd from 1-5, and the 3rd from 1-7.
The hash key is based on the dimensions defined for this table.
You would typically use this to store the hash key for common data keys
to save the lookup routine recalculating it.
This function does not store any data!
Examples:-integer lHT sequence lHashKey lHT = Hash_NewTable( {7,5,91} ) -- This creates a hash table lHashKey = Hash_Calc ( lHT , "Derek") -- Returns the hash key for "Derek"
See Also: Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
You must supply the Table to use. Once a table is deleted, its Table ID cannot be used again.
Examples:- integer lHT -- Hash Table ID Hash_DeleteTable ( lHT )
See Also: Hash_Calc, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
This sequence is updated whenever an error is detected. Each new error is appended, enabling this to handle multiple errors in a single call. All previous errors are deleted when you call a Hash function. Each entry in this consists of a 3-element sequence.
[1] An ErrorCode
[2] A sequence of data for the error
[3] A message template in English
sequence lRC lRC = Hash_...( ... ) if length(lRC) != 0 then for i = 1 to length(lRC) do printf(2, "Hash Err #%d, '%s'\n", {lRC[i][1], sprintf(lRC[i][3], lRC[i][2]) } ) end for else ... -- Function worked. end if
See Also: Hash_Calc, Hash_DeleteTable, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
info It uses the the data Key. If Hashkey is kHASH_DEF the routine first calculates a hash key for this data key, otherwise it uses whatever hashing data you provide. See HashKeys for more details If the returned sequence is empty, then the key has not been inserted into the hash table, otherwise the returned data is in the first element. I've done it this way so that I can support empty sequences as valid data. That is, an empty data sequence would be returned as {{}} whereas No-Data-Found is returned as {} .
Examples:-integer lHT -- Hash Table ID sequence lMyData lMyData = Hash_Fetch ( lHT , "Derek", kHASH_DEF) if length(lMyData) = 0 then --- no data found. else --- data is in lMyData[1] !!! N.B. Not in lMyData, but in lMyData's 1st element. end if
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
You must supply the Table to use, and an optional filtering routine, and optional data for the filter function. To use a filtering routine, the second parameter must be a valid routine_id. This routine is passed three parameters, the record's key, the record's data and the Extras. The routine must return a sequence. If the sequence is empty, the record will be skiped. Otherwise the returned sequence must contain a single element, and this element is returned to the caller of Hash_GetTable. Each element in the returned sequence is a two-element sequence containing the record's key and data. The order of the returned records is random; it is not sorted.
Examples:- function NumFilter( sequence pKey, sequence pData, sequence pExtra) if IsNumeric(pKey) then return {pData} else return{} end if end functionfunction AlphaFilter( sequence pKey, sequence pData, sequence pExtra) Only select alpha-keys that match the 'type' in pExtra. if not IsNumeric(pKey) then if equal(pData[3], pExtra[1]) then return {pData} else return {} end if else return{} end if end function
integer lHT -- Hash Table ID sequence lNums, lWords lNums = Hash_GetTable ( lHT, routine_id("NumFilter"), {}) lWords = Hash_GetTable ( lHT, routine_id("AlphaFilter"), {"CONST"})
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
You must supply the Table to use, the data's Key, the data to store, the Insertion Mode, and optionally the key's hash value. You can use kHASH_DEF to have this routine calculate the hash for you. See HashKeys for more detail. The Insertion Mode is either kHASH_UNIQUE or a routine_id. A value of kHASH_UNIQUE inserts the data and its key if the key has not already been inserted, otherwise it just ignores it. A routine_id allows the use of more sophosticated control of the data. With a callback routine, you can elect to modify the new data prior to insertion, or update existing data, or ignore new data. The callback routine receives three parameters - the key, any existing data, and the new data. It must return a sequence of two elements - the first must be either kHASH_REPLACE, kHASH_INSERT, or kHASH_IGNORE. The second must be a sequence - for kHASH_INSERT and kHASH_REPLACE this the data put into the table, for kHASH_IGNORE it can be anything because it is just ignored.
Examples:- #1 - Simple Insertion... integer lHT -- Hash Table ID integer lRC lRC = Hash_Insert ( lHT , "Derek", {1,2,"xyz"}, kHASH_UNIQUE, kHASH_DEF) if lRC = kHASH_ERROR then --- an error detected. See Hash_Errors for details. elsif lRC = kHASH_INSERT --- data has been succesfully inserted. else --- data already exists end if #2 - Insertion with callback... function MyCallback(sequence pKey, sequence pExisting, sequence pNew) This counts the occurances of non-numeric records. if length(pExisting) = 0 then -- New Record if IsNumeric(pKey) then return {{kHASH_INSERT, pNew} ) Don't add a counter. else return {{kHASH_INSERT, append(pNew, 1)} ) Attach a counter to the data. end if else if length(pExisting[1]) = 4 then pExisting[1][4] += 1 -- Increment the counter return {{kHASH_REPLACE, pExisting[1]} ) else return {{kHASH_IGNORE, {}} ) end if end if end functioninteger lHT -- Hash Table ID integer lRC lRC = Hash_Insert ( lHT , "Derek", {1,2,"xyz"}, routine_id("MyCallback"), kHASH_DEF) if lRC = kHASH_ERROR then --- an error detected. See Hash_Errors for details. elsif lRC = kHASH_INSERT --- data has been succesfully inserted. else --- data already exists end if
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
'Dimensions' is either a sequence containing a list of atoms or a single integer. If a sequence then the number of atoms specifies the number of sub-tables to use. Each sub-table uses a different algorithm to generate a hashing index. The value of each atom specifies the number of queues in its sub-table. The queues hold the inserted entries. Which queue is actually used for a given entry depends on the number of sub-tables and the number of queues in each sub-table. By adjusting these dimensions, you can influence the efficiency of finding entries in a table. The aim is to get the entries evenly spread amonsgt to queues and to have a minimum of entries per queue. If an integer, 'Dimensions' is an estimate of the number of entries expected to be stored. This function calculates a default queue length for four sub-tables. The default is biased towards efficient lookup so it uses more memory than may be required. This id must be used in subsequent hashing functions. Zero is returned when the function finds something wrong with the 'Dimensions' parameter.
Examples:-integer lHT lHT = Hash_NewTable( {7,5,91} ) -- This creates a hash table with 3 sub-tables Sub table #1 has 7 queues, Sub table #2 has 5 queues, Sub table #3 has 91 queues lHT = Hash_NewTable ( 7000 ) -- This creates a hash table that is expected to store about 7000 entries.
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
After your data has been inserted into your table via a call to Hash_Insert, you can use this function to fetch the store data. You must supply the Table the data was stored in, the data's Key, the new data, and optionally the key's hash value. You can use kHASH_DEF to have this routine calculate the hash for you. See HashKeys for more detail.
Examples:-integer lHT -- Hash Table ID integer lRC lRC = Hash_Replace ( lHT , "Derek", {1,2,"xyz"}, kHASH_DEF) if lRC = kHASH_ERROR then --- an error detected. See Hash_Errors for details. else --- data has been succesfully replaced. end if
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
The state, saved by Hash_SaveState, is restore to allow re-enterant code to work. This must be used by callback routines if they call any other Hash function. Hash_Errors is updated with code HASH_ERR_NOSTATE if there is no state to restore.
Examples:- function InsertCallback( ... ) ... Hash_SaveState() lRC = Hash_Fetch ( lHT, "GrandTotal", kHASH_DEF) lRC[1][1] += 1 lRC = Hash_Replace( lHT, "GrandTotal", lRC[1], kHASH_DEF) Hash_RestoreState() ... return ... end function
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_SaveState, Hash_SetDefault, Hash_Stats, Hash_TableStats
This allows re-enterant code to work. This must be used by callback routines if they call any other Hash function.
Examples:- function InsertCallback( ... ) ... Hash_SaveState() lRC = Hash_Fetch ( lHT, "GrandTotal", kHASH_DEF) lRC[1][1] += 1 lRC = Hash_Replace( lHT, "GrandTotal", lRC[1], kHASH_DEF) Hash_RestoreState() ... return ... end function
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SetDefault, Hash_Stats, Hash_TableStats
You must supply two parameters, the first contains the code for the default being set, the second is the new value for that default.
kHASH_DEF_DIM "Default number of dimensions when record estimates are supplied to Hash_NewTable function." Must be an integer greater than zero.
kHASH_DEF_SEEDKEYLEN "Maximum length of key for pre-calculated seeds." Must be an integer greater than zero.
Examples:- Hash_SetDefault ( kHASH_DEF_DIM, 3 ) -- Set the default dimensions to 3.
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_Stats, Hash_TableStats
This is the same as Hash_TableStats , but returns the data for every table in use. See Hash_TableStats for the details.
Examples:- sequence lRC lRC = Hash_Stats () if length(lRC) = 0 then --- an error detected. See Hash_Errors for details. else for i = 1 to length(lRC) do printf(1, "Records inserted for %s: %d, Avg Q len=%f\n", {sprint(lRC[1][11]), lRC[i][1], lRC[i][6]}) end for end if
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_TableStats
Can be used to determine the best set of table dimensions to use with the given set of data being inserted. You must supply the Table to use.
[1] ==> Total records inserted
[2] ==> # of Queues with records
[3] ==> # of Empty Queues
[4] ==> Maximum record count in a queue
[5] ==> Minimum record count in a (non-empty) queue
[6] ==> Average number of records per queue
[7] ==> Standard Deviation of records in queues
[8] ==> # of queues with higher than average record count
[9] ==> # of queues with lower than average record count
[10]==> Queue Spread. A sequence containing queue counts per queue size. That is, the # of queues with 1 record, 2 records, 3 records ...
[11]==> Table Definition. A sequence containing the table's dimesions.
Examples:- integer lHT -- Hash Table ID sequence lRC lRC = Hash_TableStats ( lHT) if length(lRC) = 0 then --- an error detected. See Hash_Errors for details. else printf(1, "Records inserted: %d, Avg Q len=%f\n", {lRC[1], lRC[6]}) end if
See Also: Hash_Calc, Hash_DeleteTable, Hash_Errors, Hash_Fetch, Hash_GetTable, Hash_Insert, Hash_NewTable, Hash_Replace, Hash_RestoreState, Hash_SaveState, Hash_SetDefault, Hash_Stats