Searching Through Sorted Data

Now that you have sorted these character sets, you can take the opportunity to search through them in at least one better way. In your findLowestValue function, you had no choice but to iterate as many times as there were items (i.e. N times); this is called a linear search and is the least efficient of any kind of search since it is assumed on average to take N / 2 iterations to find its data. This can take a while if you have many more data items. In the case of the findLowestValue function, the list was presumed to be unsorted and there was no guarantee that any particular character would be at any particular place.

However, since you can trust your sorting processes, you can know that the characters will be in alphabetical or other appropriate order. Certainly this is true about sorting names, account numbers, and checking account balances as well.

Consider the search efficiency improvement if you could divide your testing process in half every time you implemented one test. That may not mean much for ten or twenty items, but if you have three million people in your state and you have to find a certain driver's license number, it could mean the difference between a quick computer response or a potentially long waiting period. Think about the process described below.

   Start a loop that iterates until the bottom test value
   is greater than the top test value and/or the item is found

      Divide the list of data in half

       If the item is equal to the value being sought,
       stop the program and return the result

       If the item being sought is greater than the bottom half of the list,
       then remove the bottom half of the list from the part being searched

       If the item being sought is less than the top half of the list,
       then remove the top half of the list from the part being searched

   Repeat

If this sounds like a naturally recursive process, it is. However, start by considering the use of loops to solve the problem. Consider the following program code/pseudo code segment. It is not complicated, but it divides the list in half every time it runs a test iteration.

   int loopBinarySearch( const char letters[],
                                            int numLetters, char searchChar )
      {
       // initialize function/variables
       int lowerIndex = 0, upperIndex = numLetters - 1, middleIndex;

       // Loop while the lower index is not greater than the upper index
       while( lowerIndex <= upperIndex )
          {
           // set the middle index between the upper and lower indices
           middleIndex = ( upperIndex + lowerIndex ) / 2;

           // check for the search item found
           if( searchChar == letters[ middleIndex ] )
              {
               // return its index
               return middleIndex;
              }

           // otherwise, check if the search item is less than the middle index
           else if( searchChar < letters[ middleIndex ] )
              {
               // set the upper index one BELOW the middle index
               upperIndex = middleIndex - 1;
              }

           // otherwise assume the search item is greater than the middle index
           else
              {
               // set the lower index to one ABOVE the middle index
               lowerIndex = middleIndex + 1;
              }
          }
       // end search loop

       // assume search failure, and return this result
       return SEARCH_FAILED;
      }

Watch this video to see the loop-driven binary search work.

This process works well and is simple to implement using loops, like other searches. But consider an analysis of a search that goes from beginning to end searching for a letter. Your best case condition would be if the letter 'a' was being sought; you would implement one iteration. Your worst case would be if the letter 'z' were being sought; then you would have to search the entire alphabet for it if you did it one at a time, in a linear way. On average -- as mentioned above -- you would have to search N / 2 times for every letter.

In this binary search, you cut the list in half every time you iterate. You can do the math on this if you would like, but it turns your searching process from N to Log₂(N). As a short refresher, Log₂(N) = X is the reversed way of saying 2^X = N when you do not know what the X is. In the example of the alphabet, you know that there are 26 characters in the alphabet. Since 2⁵ is 32, you will have no more than five iterations to get the worst case search condition from this binary search. If you extend on this analysis a little, since 2¹⁰ is 1,024, it means that you can search a list with 1,024 items in it with only ten iterations. . . and again observe that this is the worst case condition.

Now, having considered the value of the binary search, and the loop version of it, it is time to implement recursion. The binary search is a natural for recursion because its general case is so much like its base case. While we may not be using recursion heavily, you have been introduced to it, so you will probably enjoy seeing the simplicity of this process.

You divide a list in half, check to see which side your test value falls in, then divide the list in half, and repeat. Consider the following recursive function.

   int RecBinarySearch( const char letters[], int lowerIndex,
                                           int upperIndex, char searchChar )
      {
       // find the index of the middle of the list
       int middleIndex = ( lowerIndex + lowerIndex ) / 2;

       // if the bottom catches up with the top, the item is not in the list
       if( lowerIndex > upperIndex )
          {
           // return the failed indication
           return SEARCH_FAILED;
          }

       // if the item is at the middle index
       if( searchChar == letters[ middleIndex ] )
          {
           // send back its index
           return middleIndex;
          }

       // if the searchItem value is LESS than the middle item,
       else if( searchChar < letters[ middleIndex ] )
          {
           // call this function to search the bottom half of the list
           return RecBinarySearchHelper( letters, lowerIndex,
                                              middleIndex - 1, searchChar );
          }

       // if the searchItem value is MORE than the middle item
       else
          {
           // call this function to search the top half of the list
           return recBinarySearchHelper( letters, middleIndex + 1,
                                                   upperIndex, searchChar );
          }
      }

The base case is that if it is not found, either look in the lower half if the value is below the center of the list, or look in the upper half if the value is above the center of the list. If it is not to be found, the limiting condition is if lowerIndex becomes greater than upperIndex. If that occurs, the function will return a failed indication, as did the loop function.

There is an interesting side note to throw in at this point. This function has quite a few parameters, some of which will always be the same. However, they are needed for each function call, so the programmer was required to put them in to start with. To make this a little more programmer friendly, the function shown previously can be made into what is called a helper function that can do the job, but cover up or properly modify some of the parameters that are required.

If another function, called RecBinarySearch was created and called the above function, the parameter overhead could be reduced. RecBinarySearch would be called as follows.

   searchIndex = recBinarySearch( charList, numItems, searchChar );

Then, the function recBinarySearch would be defined as follows. The three variables are defined for clarity; the function could be called with the values directly.

   int recBinarySearch( const char letters[], int numLetters, char searchChar )
      {
       int lowerIndex = 0;
       int upperIndex = numLetters - 1;

       return recBinarySearchHelper( letters, lowerIndex, upperIndex, searchChar );
      }

In this particular case, the function recBinarySearchHelper is a helper function (but exactly the same as the recBinarySearch shown above) because it takes care of the real business requiring one more parameter, and one slightly adjusted parameter. The programmer can just call the function with the list, the search item, and the size of the list. The helper function adds the necessary lower starting parameter, and turns the number of items into the upper starting parameter. The helper function makes these adjustments easier by keeping them out of the way of the programmer using the function.

Once again, the recursive process made this operation at least as easy to do as the looping process. Unlike the first recursive processes you observed in this book, the process is not a linear "walk" through a set of data. It is a process that takes big steps toward the solution to be found.

Watch this video to see the recursion-driven binary search work.

Searching and Sorting Through All of This

This introduced you to an interesting searching strategy. All of the searching and sorting strategies provided are not particularly fast, but they are easy to study, as you have seen, and working with them will make both your loop and your array management skills better. They are very worth your while to practice writing them as much as possible.

Section 11i

Array Operations - The Binary Search

Searching Through Sorted Data

Searching and Sorting Through All of This