TBRUNNER.OCT A DYNAMIC HASHING ALGORITHM TO MINIMIZE PAGING PART I by TIMOTHY A. BRUNNER The author, a NaSPA member, is the product development manager for LEGENT's DASD Performance Optimizer products -- PMO, Quick- Fetch and CPO. He has an M.S. in Computer Science from the University of Pittsburgh. This article, presented in two-parts generically discusses a practical hashing algorithm so that it can be implemented on any operating system where paging is a potential problem. Part I is a detailed description of the algorithm itself, while Part II will outline an example of putting it to use and discuss implementation considerations. Abstract During the 1980s, the evolution of popular operating systems like MVS and VM has resulted in significant increases in the amount of virtual storage available to system users and their programs. This evolution is due to the exponentially growing amount of data handled by data processing installations. This explosion in the amount of data has made the effective storing and accessing of data is now much more critical. Good data storage algorithms, such as hashing, must not only address the initial storing of data into a data structure but also the dynamic addition of data. Poor techniques can often lead to significant paging while searching for a specific data item. This article discusses a practical hashing algorithm designed to keep paging to a minimum. This algorithm, implemented in LEGENT's Program Management Optimizer (PMO) product, can determine if an item exists in a hash table (and if it does, retrieve it) by accessing just a single page of the table over 90 percent of the time. Although this original implementation of the algorithm was on an MVS system, the algorithm is presented generically here so that it can be implemented on any operating system where paging is a potential problem. Introduction Searching techniques play an important role in the amount of paging that results from the use of a data structure. The searching technique must not only be sensitive to paging when scanning for an item that exists in the data structure, but must also be able to determine that an item doesn't exist by scanning as few pages as possible. The searching algorithm needs to be insensitive to the size of the data structure. The average number of pages accessed when searching a data structure should be the same regardless of the data structure's size. Any increase in paging when data is dynamically added to a data structure must be held to a minimum. Techniques such as sequential searches, binary searches and linked list searches don't meet the above criteria. These searches often require that items from a significant portion of a data structure be scanned when searching for a particular item. This occurs whether the item exists or not. In most cases, a larger portion of the data structure is scanned when the item doesn't exist. Hashing algorithms by their very nature usually only require a small number of items in a data structure be scanned to determine if a particular item exists. However, collision-resolution needs to be effectively addressed to ensure that the number of scanned items is kept low. Poor strategies can cause long scans of the data structure and may impede the addition and removal of items from the data structure. The three collision-resolution policies most often employed are open addressing, the use of buckets and chaining [see Reference 1]. With open addressing, an item's hash address can collide with another item already occupying the address. If this is the case, the new item is placed in an empty entry at an address other than its hash address. The sequence of addresses used to resolve collisions, known as the probe sequence [see Reference 1], may be any permutation of the addresses in the table. The open addressing policy can create a problem when searching for items that don't exist in the table. To determine that an item doesn't exist, the probe sequence used to resolve a collision for this item must be scanned until an empty entry is found. If most of the hash table entries are occupied, a lengthy search may result. The number of pages referenced during a search could increase rapidly as items are dynamically added to the hash table. Adding an item results in longer scans for searches beginning at that item's hash address. Furthermore, it can also lead to longer scans for searches beginning at other hash addresses. Specifically, there will be longer scans for those hash addresses where the new item's entry had previously been the first empty entry in their probe sequence. The length of each of these scans increases until the next empty entry in the probe sequence for that hash address is reached. If the hash table is full, the probe sequence for each hash address consists of all addresses in the table. Hence, the entire table must be scanned to ensure that an item isn't in the table. The use of buckets entails dividing a table of N entries into M groups of average size B entries. Each group is called a bucket. Each item in the table hashes to a bucket instead of an address. A collision means two items hash to the same bucket. Searching involves determining which bucket will contain an item if it exists, then scanning that bucket's entries for the item. Any traditional searching technique can be used to scan the items in a bucket. If an item is dynamically added to the data structure, only one bucket is affected. The searches for all other buckets take no longer than before the item was added. By storing a bucket's header and all of its entries on the same page, scans of a bucket can be resolved by referencing a single page. Of course, there is always the possibility that all entries for a bucket won't fit on the same page. If this occurs, an adequate overflow policy must be used. One approach is to store the overflow entries for all buckets on the same page or group of pages. However, if more than one page is used for the overflow entries, all attempts to put the overflow entries for each bucket on the same page should be made. Chaining involves establishing a linked list at each hash address. Items which hash to the same address are chained together in the linked list for that address. In most implementations, chaining can be thought of as a special application of the use of buckets. Each hash address can be thought of as a bucket. The average size of each bucket is determined by dividing the number of entries in the table by the number of hash addresses. The items in each bucket are chained together into a linked list. The linked list is sequentially searched from start to end. As is generally true with buckets, if all entries in a bucket's linked list are stored on the same page, searches can be resolved by referencing only one page. If a chain extends to an overflow page, the last link on the originating page points to the first entry on the overflow page. The hashing algorithm described here embodies a chaining strategy designed to minimize the number of pages referenced during searching operations. A lesser objective of this strategy is to keep each linked list to a reasonable length. Each data item hashes to a bucket. The implementer of the algorithm determines the average bucket size using the guideline that all items in an average size bucket should be able to fit on the same page. If a single bucket is assigned to each page, then the average bucket size should not be more than the maximum number of data items which will fit on a page. However, as discussed below, the searching operation proceeds more efficiently with a greater number of buckets. With more buckets, the average length of each linked list is shorter. This is important for two reasons. First, if there is a single bucket for each page and that bucket overflows to another page, then paging may increase when that bucket is searched. To verify that an item which hashes to that bucket isn't in the data structure, the overflow page must always be referenced. Normally, there is an inverse relationship between the number of buckets per page and the probability of any single bucket overflowing to another page. Hence, increasing the number of buckets per page causes a lower percentage of buckets to overflow. In addition, fewer items hash to each bucket. This results in a lower percentage of items hashing to the overflow buckets, and hence, the overflow pages are referenced less often. The second reason for more buckets is that with shorter linked lists, searches proceed with less compares, resulting in less CPU time. The only trade-off with more buckets is that each bucket has a header which requires a small amount of space -- normally enough space to store the address of the first entry in the bucket's linked list. The Hashing Algorithm This algorithm determines the size of each hashing data structure from the initial number of data items to be placed in the data structure. Each data item will occupy one entry in the data structure. In addition to the initially used entries, the algorithm allows the implementer to specify an approximate additional percentage of entries to be added to each data structure. Beyond this percentage, the algorithm may internally allocate even a few more entries since the size of each data structure is always a whole number of pages. These additional entries aren't used initially, but they allow for the dynamic addition of data once the data structure has been built. The algorithm requires the implementer to specify: o the initial number of entries used in the data structure (#ENTRIES); o the approximate additional percentage of entries (APXADDPCT); o an average number of entries in a bucket, i.e., the average size of each linked list (BKTSIZE); o a page size (PAGESIZE); o an entry size (ENTRYSIZE); and o a length for each bucket header (BKTHDRLEN). The variable names (shown in parentheses) are used through the remainder of this article to refer to each user-specified field. A minimum estimate of the number of buckets in the data structure can be determined by calculating the value for: CEIL ((1+APXADDPCT)*#ENTRIES/BKTSIZE) This value ensures that enough buckets are allocated so that the average number of entries in each bucket won't be more than BKTSIZE. However, as discussed below, the number of buckets is usually increased to facilitate the hash function. The objective of the algorithm's hash function is to provide the most even distribution of the data items among the buckets. Experimental evidence [see Reference 2] has shown a hash function named the division-remainder method is very reliable with regard to uniformly distributing data in a hash table. This method divides a key for the data item by the size of the hash table (i.e., number of buckets), and takes the remainder as the bucket address for the data item. If the key consists of more than one computer word (i.e., more than 4 bytes on an MVS system), then the separate computer words can be added together or combined by means of the "exclusive or" function. Then the result can be used as the dividend. In theory, using a prime number as the divisor provides the most even distribution. However, experimental studies [see Reference 2] have shown that an arbitrary odd number results in just as uniform a distribution [see Reference 2]. The first prime greater than or equal to the minimum estimate described above is used as the number of buckets (#BUCKETS) in the algorithm presented here. However, the first odd number meeting this criterion may also be used. (Of course, with the prime number approach, the hash tables are often allocated with more buckets, and thus, are slightly larger. However, this allows for more dynamic additions of data items.) In almost all practical cases, a distribution where each bucket is assigned an identical number of items isn't achievable. This algorithm is designed to minimize the number of bucket overflows to second pages which may result from a distribution where there are variations in the number of data items assigned to each bucket. The algorithm accomplishes this by hashing an average of BKTSIZE entries to each bucket and by assigning multiple buckets to each page. (The BKTSIZE value may need to be lowered to fit multiple buckets on each page. For details, refer to the equation for BKTPERPG.) The algorithm allocates approximately enough entries on each page so that each bucket can have BKTSIZE entries on that page. This allows several buckets on each page to contain more entries than the BKTSIZE value since other buckets on the same page will often have fewer entries than the BKTSIZE value. PAGESIZE, ENTRYSIZE, BKTSIZE and BKTHDRLEN determine the number of buckets per page (BKTPERPG) and entries per page (ENTPERPG). The total size of the data structure (#PAGES) is the minimum number of pages necessary to support #BUCKETS and #ENTRIES, given the values for BKTPERPG and ENTPERPG. The actual equations used to calculate these fields are: #BUCKETS = next prime after (CEIL ((1+APXADDPCT)*#ENTRIES/BKTSIZE)) BKTPERPG = ROUND (PAGESIZE/(ENTRYSIZE*BKTSIZE+BKTHDRLEN)) ENTPERPG = FLOOR ((PAGESIZE-BKTPERPG*BKTHDRLEN)/ENTRYSIZE) #PAGES = MAX ( CEIL (#BUCKETS/BKTPERPG), CEIL (#ENTRIES/ENTPERPG) ) The last page in the data structure is the first choice for overflows. Due to the ceiling function used in the calculation for #PAGES, the last page often has a lower number of buckets assigned to it than BKTPERPG. In some cases, no buckets are assigned to it. If this page is full, overflows are resolved from the page with the greatest number of unused entries. The use of the ROUND function in calculating BKTPERPG may result in each page having several entries less than the number needed for each bucket to have BKTSIZE entries on that page (i.e., a total of at least BKTPERPG*BKTSIZE entries on that page). This will occur if the ROUND function causes the resulting value from the division to be rounded up to the next integer. Storage is provided on the last page(s) of the data structure for the additional entries necessary to support all buckets. Since the algorithm usually over-allocates the size of the data structure, this shouldn't be a problem. However, if there are too many overflows, the FLOOR function can be used in place of the ROUND function. If the BKTPERPG value is one, it should be increased by lowering the BKTSIZE value. Larger values for BKTPERPG usually result in a lower percentage of buckets overflowing to second pages. To find the first unused entry on a page quickly, the first word on the page contains its address. When the storage for the data structure is allocated, the first word on each page contains the address of the first entry on the page. As the hashing algorithm assigns data items to buckets on a page, the entries on the page are used sequentially. Since the first word of each page is being used in this manner, the available page size (PAGESIZE) for bucket headers and entries must be adjusted. (On an MVS system, the available page size would be 4092 bytes, since 4 bytes are used for the address of the first unused entry.) The hash function used in the algorithm to assign data items to buckets must return both: o the relative number of the page within the data structure that contains the bucket. o the relative number of the bucket within this page. Mathematically, the hash function is defined as: (PAGE,BUCKET) = HASH (KEY) where: KEY is the data item's key. The key may be adjusted through addition or the "exclusive or" function so that it occupies only one computer word. PAGE = FLOOR (REMAINDER2(KEY,#BUCKETS)/BKTPERPG). BUCKET = REMAINDER (REMAINDER(KEY,#BUCKETS),BKTPERPG). The function REMAINDER (KEY,#BUCKETS) determines the absolute number of the bucket where KEY hashes. Its returned value is in the range from zero to #BUCKETS minus one. PAGE, the relative page, has a value between zero and #PAGES minus one. BUCKET, the relative bucket within this page, has a value between zero and BKTPERPG minus one. Even though the algorithm allocates additional entries to allow data items to be dynamically added, there is always the possibility that the hash table will become full. There are several ways this can be handled. One possible solution is to allocate an additional page of storage to the hash table each time it becomes full. However, this results in larger average bucket sizes, and as more additional pages are allocated, more bucket chains will likely extend to multiple pages. The solution is to rebuild the hash table each time it becomes full. Each time the hash table is rebuilt, #ENTRIES is set to the number of data items to be initially placed in the new data structure. Although the rebuilding introduces some overhead, it results in the searching constantly being done at the same efficient level. If the hash tables are becoming full quite frequently, it's a good idea to raise the APXADDPCT value. Figure 1 shows a generic layout of a hash table. This concludes the presentation of a hashing algorithm which permits the dynamic addition of data and is sensitive to paging. The second part of this article will demonstrate how the implementation of this algorithm in LEGENT's PMO product achieved impressive results. References 1. Data Structure Techniques, "Hash Tables," pages 141-164, Thomas A. Standish, Addison Wesley, 1980 2. Data Structure Theory and Practice, "Scatter Storage Techniques," pages 438-441, A.T. Berztiss, Academic Press, Second Ed., 1975. /* 2966