Section 20b

Hash Tables That Probe


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Continuing Down The Path

Probing

As mentioned in the previous topic, one of the two fundamental ways to handle collisions is to probe. That is, once the index has been calculated, and it is found that there is data already located in that particular array element, the collision management system simply looks for a subsequent element that could hold the new data. And while there are a wide set of variations on this theme, we will only look at a couple of them to get a feel for how they work.

Linear Probing

Let's say that the index 25 is generated for a certain new data quantity. However, when the system attempts to add the data at that index, other valid data is already found there. The response to this circumstance is simple: Try index 26, and if that doesn't work, try index 27, and so on. It must be assumed that the array has enough room for all the data so by continuing with this strategy, an empty element will be found and the data will be stored there.

That sounds nice and in fact, works just fine. However, there are two considerations to handle. The first one is that when it comes time to access the data, the same strategy must be used along with some kind of search operation. Something like the following.

// loop while value not found at the current index
// AND value not equal to search value

// increment index

// return found value

This isn't so bad since you know you have narrowed the search already by hashing to that particular index. Nevertheless, the system is now iteratively searching for the value which obviously increases the complexity of this access operation.

The other problem with linear probing is called clustering. This is the condition where several values could be found in a row if a hashing operation tended to result in indices at or around a small group of locations. As the indices are generated, and the probing continues, several data quantities might be found in one area of the array while other parts of the array are sparsely filled or nearly empty.

This leads to another possible probing action called quadratic probing.

Quadratic Probing

Quadratic probing starts out the same. An index is hashed and generated, the system attempts to place the new data at that location, and it is found that the location already holds valid data. This time instead of moving by ones from the index location, the probing process iterates by squares of the incremented values. Example: Instead of probing by ones: index 25, then 26 (+1), then 27 (+2), then 28 (+3), etc., the probing is conducted by squares of the probe values: index 25, then 26 (+12 [1]), then 29 (+22 [4]), then 34 (+32 [9]), and so on.

The good news is that this strategy will significantly reduce clustering. However, since the offsets are increasing exponentially, it would be easy to go off the end of the array pretty quickly. Not a problem. The system will conduct modulo operations (i.e., find the remainder) for each newly calculated index, roll around to the beginning of the array, and continue searching for an open element.

Simple Strategies

Neither of these strategies are terribly complicated or difficult. The programmer just needs to develop the code and verify that the probing is conducted correctly. In addition, as mentioned earlier in this topic, there is more to managing a hash table than just adding data. One must consider how the data will be retrieved, modified, and sometimes removed. Once again, accomplishing these tasks just calls for analysis of where the data is, how you get to it with the probing considered, and then how to manage what ends up being normal array operations that you have conducted previously. No prob.