TBRUNNER.OCT

A DYNAMIC HASHING ALGORITHM TO MINIMIZE PAGING PART I
by TIMOTHY A. BRUNNER

The author, a NaSPA member, is the product development manager
for LEGENT's DASD Performance Optimizer products -- PMO, Quick-
Fetch and CPO.  He has an M.S. in Computer Science from the
University of Pittsburgh.  

This article, presented in two-parts generically discusses a
practical hashing algorithm so that it can be implemented on any
operating system where paging is a potential problem.  Part I is
a detailed description of the algorithm itself, while Part II
will outline an example of putting it to use and discuss
implementation considerations.

Abstract

During the 1980s, the evolution of popular operating systems like
MVS and VM has resulted in significant increases in the amount of
virtual storage available to system users and their programs. 
This evolution is due to the exponentially growing amount of data
handled by data processing installations.  This explosion in the
amount of data has made the effective storing and accessing of
data is now much more critical.  Good data storage algorithms,
such as hashing, must not only address the initial storing of
data into a data structure but also the dynamic addition of data. 
Poor techniques can often lead to significant paging while
searching for a specific data item.  This article discusses a
practical hashing algorithm designed to keep paging to a minimum. 
This algorithm, implemented in LEGENT's Program Management
Optimizer (PMO) product, can determine if an item exists in a
hash table (and if it does, retrieve it) by accessing just a
single page of the table over 90 percent of the time.  Although
this original implementation of the algorithm was on an MVS
system, the algorithm is presented generically here so that it
can be implemented on any operating system where paging is a
potential problem.  

Introduction

Searching techniques play an important role in the amount of
paging that results from the use of a data structure.  The
searching technique must not only be sensitive to paging when
scanning for an item that exists in the data structure, but must
also be able to determine that an item doesn't exist by scanning
as few pages as possible.  The searching algorithm needs to be
insensitive to the size of the data structure.  The average
number of pages accessed when searching a data structure should
be the same regardless of the data structure's size.  Any
increase in paging when data is dynamically added to a data
structure must be held to a minimum. 

Techniques such as sequential searches, binary searches and
linked list searches don't meet the above criteria.  These
searches often require that items from a significant portion of a
data structure be scanned when searching for a particular item. 
This occurs whether the item exists or not.  In most cases, a
larger portion of the data structure is scanned when the item
doesn't exist.  

Hashing algorithms by their very nature usually only require a
small number of items in a data structure be scanned to determine
if a particular item exists.  However, collision-resolution needs
to be effectively addressed to ensure that the number of scanned
items is kept low.  Poor strategies can cause long scans of the
data structure and may impede the addition and removal of items
from the data structure. 

The three collision-resolution policies most often employed are
open addressing, the use of buckets and chaining [see Reference
1].

With open addressing, an item's hash address can collide with
another item already occupying the address.  If this is the case,
the new item is placed in an empty entry at an address other than
its hash address.  The sequence of addresses used to resolve
collisions, known as the probe sequence [see Reference 1], may be
any permutation of the addresses in the table.

The open addressing policy can create a problem when searching
for items that don't exist in the table.  To determine that an
item doesn't exist, the probe sequence used to resolve a
collision for this item must be scanned until an empty entry is
found.  If most of the hash table entries are occupied, a lengthy
search may result.  The number of pages referenced during a
search could increase rapidly as items are dynamically added to
the hash table.  Adding an item results in longer scans for
searches beginning at that item's hash address.  

Furthermore, it can also lead to longer scans for searches
beginning at other hash addresses.  Specifically, there will be
longer scans for those hash addresses where the new item's entry
had previously been the first empty entry in their probe
sequence.  The length of each of these scans increases until the
next empty entry in the probe sequence for that hash address is
reached.  If the hash table is full, the probe sequence for each
hash address consists of all addresses in the table.  Hence, the
entire table must be scanned to ensure that an item isn't in the
table.

The use of buckets entails dividing a table of N entries into M
groups of average size B entries.  Each group is called a bucket. 
Each item in the table hashes to a bucket instead of an address. 
A collision means two items hash to the same bucket.  Searching
involves determining which bucket will contain an item if it
exists, then scanning that bucket's entries for the item.  Any
traditional searching technique can be used to scan the items in
a bucket.  If an item is dynamically added to the data structure,
only one bucket is affected.  The searches for all other buckets
take no longer than before the item was added.  By storing a
bucket's header and all of its entries on the same page, scans of
a bucket can be resolved by referencing a single page. 

Of course, there is always the possibility that all entries for a
bucket won't fit on the same page.  If this occurs, an adequate
overflow policy must be used.  One approach is to store the
overflow entries for all buckets on the same page or group of
pages.  However, if more than one page is used for the overflow
entries, all attempts to put the overflow entries for each bucket
on the same page should be made.  

Chaining involves establishing a linked list at each hash
address.  Items which hash to the same address are chained
together in the linked list for that address.  In most
implementations, chaining can be thought of as a special
application of the use of buckets.  Each hash address can be
thought of as a bucket.  The average size of each bucket is
determined by dividing the number of entries in the table by the
number of hash addresses.  The items in each bucket are chained
together into a linked list.  The linked list is sequentially
searched from start to end.  As is generally true with buckets,
if all entries in a bucket's linked list are stored on the same
page, searches can be resolved by referencing only one page.  If
a chain extends to an overflow page, the last link on the
originating page points to the first entry on the overflow page. 
    

The hashing algorithm described here embodies a chaining strategy
designed to minimize the number of pages referenced during
searching operations.  A lesser objective of this strategy is to
keep each linked list to a reasonable length.  Each data item
hashes to a bucket.  The implementer of the algorithm determines
the average bucket size using the guideline that all items in an
average size bucket should be able to fit on the same page.  If a
single bucket is assigned to each page, then the average bucket
size should not be more than the maximum number of data items
which will fit on a page.  However, as discussed below, the
searching operation proceeds more efficiently with a greater
number of buckets.

With more buckets, the average length of each linked list is
shorter.  This is important for two reasons.  First, if there is
a single bucket for each page and that bucket overflows to
another page, then paging may increase when that bucket is
searched.  To verify that an item which hashes to that bucket
isn't in the data structure, the overflow page must always be
referenced.  Normally, there is an inverse relationship between
the number of buckets per page and the probability of any single
bucket overflowing to another page.  Hence, increasing the number
of buckets per page causes a lower percentage of buckets to
overflow.  

In addition, fewer items hash to each bucket.  This results in a
lower percentage of items hashing to the overflow buckets, and
hence, the overflow pages are referenced less often.  The second
reason for more buckets is that with shorter linked lists,
searches proceed with less compares, resulting in less CPU time. 
The only trade-off with more buckets is that each bucket has a
header which requires a small amount of space -- normally enough
space to store the address of the first entry in the bucket's
linked list.    

The Hashing Algorithm

This algorithm determines the size of each hashing data structure
from the initial number of data items to be placed in the data
structure.  Each data item will occupy one entry in the data
structure.  In addition to the initially used entries, the
algorithm allows the implementer to specify an approximate
additional percentage of entries to be added to each data
structure.  Beyond this percentage, the algorithm may internally
allocate even a few more entries since the size of each data
structure is always a whole number of pages.  These additional
entries aren't used initially, but they allow for the dynamic
addition of data once the data structure has been built.

The algorithm requires the implementer to specify:

o the initial number of entries used in the data structure
(#ENTRIES);

o the approximate additional percentage of entries (APXADDPCT);

o an average number of entries in a bucket, i.e., the average
size of each linked list (BKTSIZE);

o a page size (PAGESIZE);

o an entry size (ENTRYSIZE); and 

o a length for each bucket header (BKTHDRLEN). 

The variable names (shown in parentheses) are used through the
remainder of this article to refer to each user-specified field.

A minimum estimate of the number of buckets in the data structure
can be determined by calculating the value for:

CEIL ((1+APXADDPCT)*#ENTRIES/BKTSIZE)

This value ensures that enough buckets are allocated so that the
average number of entries in each bucket won't be more than
BKTSIZE.  However, as discussed below, the number of buckets is
usually increased to facilitate the hash function.

The objective of the algorithm's hash function is to provide the
most even distribution of the data items among the buckets. 
Experimental evidence [see Reference 2] has shown a hash function
named the division-remainder method is very reliable with regard
to uniformly distributing data in a hash table.  This method
divides a key for the data item by the size of the hash table
(i.e., number of buckets), and takes the remainder as the bucket
address for the data item.  If the key consists of more than one
computer word (i.e., more than 4 bytes on an MVS system), then
the separate computer words can be added together or combined by
means of the "exclusive or" function.  Then the result can be
used as the dividend.  In theory, using a prime number as the
divisor provides the most even distribution.  However,
experimental studies [see Reference 2] have shown that an
arbitrary odd number results in just as uniform a distribution
[see Reference 2].   

The first prime greater than or equal to the minimum estimate
described above is used as the number of buckets (#BUCKETS) in
the algorithm presented here.  However, the first odd number
meeting this criterion may also be used.  (Of course, with the
prime number approach, the hash tables are often allocated with
more buckets, and thus, are slightly larger.  However, this
allows for more dynamic additions of data items.)

In almost all practical cases, a distribution where each bucket
is assigned an identical number of items isn't achievable.  This
algorithm is designed to minimize the number of bucket overflows
to second pages which may result from a distribution where there
are variations in the number of data items assigned to each
bucket.  The algorithm accomplishes this by hashing an average of
BKTSIZE entries to each bucket and by assigning multiple buckets
to each page.  (The BKTSIZE value may need to be lowered to fit
multiple buckets on each page.  For details, refer to the
equation for BKTPERPG.)  

The algorithm allocates approximately enough entries on each page
so that each bucket can have BKTSIZE entries on that page.  This
allows several buckets on each page to contain more entries than
the BKTSIZE value since other buckets on the same page will often
have fewer entries than the BKTSIZE value.

PAGESIZE, ENTRYSIZE, BKTSIZE and BKTHDRLEN determine the number
of buckets per page (BKTPERPG) and entries per page (ENTPERPG). 
The total size of the data structure (#PAGES) is the minimum
number of pages necessary to support #BUCKETS and #ENTRIES, given
the values for BKTPERPG and ENTPERPG.  The actual equations used
to calculate these fields are:

#BUCKETS = next prime after (CEIL
((1+APXADDPCT)*#ENTRIES/BKTSIZE))

BKTPERPG = ROUND (PAGESIZE/(ENTRYSIZE*BKTSIZE+BKTHDRLEN))

ENTPERPG = FLOOR ((PAGESIZE-BKTPERPG*BKTHDRLEN)/ENTRYSIZE)

#PAGES   = MAX ( CEIL (#BUCKETS/BKTPERPG), CEIL
(#ENTRIES/ENTPERPG) ) 

The last page in the data structure is the first choice for
overflows.  Due to the ceiling function used in the calculation
for #PAGES, the last page often has a lower number of buckets
assigned to it than BKTPERPG.  In some cases, no buckets are
assigned to it.  If this page is full, overflows are resolved
from the page with the greatest number of unused entries.

The use of the ROUND function in calculating BKTPERPG may result
in each page having several entries less than the number needed
for each bucket to have BKTSIZE entries on that page (i.e., a
total of at least BKTPERPG*BKTSIZE entries on that page).  This
will occur if the ROUND function causes the resulting value from
the division to be rounded up to the next integer.  Storage is
provided on the last page(s) of the data structure for the
additional entries necessary to support all buckets.  Since the
algorithm usually over-allocates the size of the data structure,
this shouldn't be a problem.  However, if there are too many
overflows, the FLOOR function can be used in place of the ROUND
function.

If the BKTPERPG value is one, it should be increased by lowering
the BKTSIZE value.  Larger values for BKTPERPG usually result in
a lower percentage of buckets overflowing to second pages.

To find the first unused entry on a page quickly, the first word
on the page contains its address.  When the storage for the data
structure is allocated, the first word on each page contains the
address of the first entry on the page.  As the hashing algorithm
assigns data items to buckets on a page, the entries on the page
are used sequentially.  Since the first word of each page is
being used in this manner, the available page size (PAGESIZE) for
bucket headers and entries must be adjusted.  (On an MVS system,
the available page size would be 4092 bytes, since 4 bytes are
used for the address of the first unused entry.)

The hash function used in the algorithm to assign data items to
buckets must return both:

o the relative number of the page within the data structure that
contains the bucket.

o the relative number of the bucket within this page.

Mathematically, the hash function is defined as:  (PAGE,BUCKET) =
HASH (KEY)

where:

KEY is the data item's key.  The key may be adjusted through
addition or the "exclusive or" function so that it occupies only
one computer word.

PAGE = FLOOR (REMAINDER2(KEY,#BUCKETS)/BKTPERPG).

BUCKET = REMAINDER (REMAINDER(KEY,#BUCKETS),BKTPERPG).

The function REMAINDER (KEY,#BUCKETS) determines the absolute
number of the bucket where KEY hashes.  Its returned value is in
the range from zero to #BUCKETS minus one.

PAGE, the relative page, has a value between zero and #PAGES
minus one.  BUCKET, the relative bucket within this page, has a
value between zero and BKTPERPG minus one.  

Even though the algorithm allocates additional entries to allow
data items to be dynamically added, there is always the
possibility that the hash table will become full.  There are
several ways this can be handled.  One possible solution is to
allocate an additional page of storage to the hash table each
time it becomes full.  However, this results in larger average
bucket sizes, and as more additional pages are allocated, more
bucket chains will likely extend to multiple pages.  

The solution is to rebuild the hash table each time it becomes
full.  Each time the hash table is rebuilt, #ENTRIES is set to
the number of data items to be initially placed in the new data
structure.  Although the rebuilding introduces some overhead, it
results in the searching constantly being done at the same
efficient level.  If the hash tables are becoming full quite
frequently, it's a good idea to raise the APXADDPCT value. 
Figure 1 shows a generic layout of a hash table.

This concludes the presentation of a hashing algorithm which
permits the dynamic addition of data and is sensitive to paging. 
The second part of this article will demonstrate how the
implementation of this algorithm in LEGENT's PMO product achieved
impressive results.

References

1.  Data Structure Techniques, "Hash Tables," pages 141-164,
Thomas A. Standish, Addison Wesley, 1980

2.  Data Structure Theory and Practice, "Scatter Storage
Techniques," pages 438-441, A.T. Berztiss, Academic Press, Second
Ed., 1975.

/*
2966