What is Database Indexing by Hash Indexing approach?
Table of contents
Hash indexing is a database indexing technique that uses a hash function to compute an index value for each record in a table. The index value is used to locate the record quickly when a search is performed. In this answer, we will explain the hash indexing approach and provide an example of how it works.
Under the Hood Structure
The hash index structure consists of an array of buckets, each of which contains a linked list of records that have the same index value. The hash function takes the key value of a record and computes an index value, which is used to locate the bucket that contains the linked list of records with the same index value. When a record is inserted into the table, its key is hashed to determine the bucket where it should be stored. If there are no other records in the bucket with the same hash value, the record is inserted at the head of the linked list. If there are other records in the bucket with the same hash value, the new record is inserted at the end of the list.
Example
Suppose we have a table of employees with the following attributes:
EmployeeID
FirstName
LastName
Department
Salary
We want to create a hash index on the EmployeeID
attribute. We start by choosing a hash function that takes an EmployeeID
value and generates a unique hash code for that value. For example, we might use the following hash function:
hash(EmployeeID) = EmployeeID % 10
This function takes the EmployeeID
value and returns the remainder when it is divided by 10. This gives us a hash code in the range 0-9.
Next, we create an array of 10 buckets to hold the records:
Bucket[0] -> null
Bucket[1] -> null
...
Bucket[9] -> null
When a record is inserted into the table, its EmployeeID
value is hashed using the hash function to determine the bucket where it should be stored. For example, if the EmployeeID
is 123456, the hash code would be 6 (hash(123456) = 123456 % 10 = 6
), so the record would be stored in bucket 6.
If there are no other records in bucket 6, the new record is inserted at the head of the linked list:
Bucket[0] -> null
Bucket[1] -> null
...
Bucket[5] -> null
Bucket[6] -> [123456, John, Smith, Sales, 50000, null]
Bucket[7] -> null
...
Bucket[9] -> null
If there are other records in bucket 6, the new record is inserted at the end of the linked list:
Bucket[0] -> null
Bucket[1] -> null
...
Bucket[5] -> null
Bucket[6] -> [123456, John, Smith, Sales, 50000, null] -> [789012, Jane, Doe, Marketing, 60000, null]
Bucket[7] -> null
...
Bucket[9] -> null
To retrieve a record from the hash table, the EmployeeID
value is hashed using the hash function to determine the bucket where the record should be stored. For example, if we want to retrieve the record with EmployeeID
123456, we hash the value to get a hash code of 6, then traverse the linked list in bucket 6 until we find the record with the matching EmployeeID
:
hash(123456) = 6
Process
When a search is performed on a table that has a hash index, the search key is passed to the hash function, which computes an index value. The index value is used to locate the bucket that contains the linked list of records with the same index value. The linked list is then searched sequentially until the record with the matching key value is found.
Design
The design of a hash index depends on several factors, including the size of the table, the expected number of records, and the characteristics of the data.
The number of buckets is chosen to balance memory usage and search performance. Collisions occur when two or more records have the same index value, and are handled using chaining or open addressing.