I want to evolve a generic binary search algorithm out of a conventional one. The starting point is a C++ binary search algorithm which takes an integer array, the number of elements in the array and the value searched for as arguments. binary_search returns a constant pointer to the element - if found - the nil pointer else.
const int* binary_search (const int* array, int n, int x) { const int* lo = array, *hi = array + n, *mid; while(lo != hi) { mid = lo + (hi - lo) / 2; if (x == *mid) return mid; if (x < *mid) hi = mid; else lo = mid + 1; } return 0; }
Let us look at the assumptions this algorithm makes about its environment. binary_search only works with integer arrays. To make it work with arrays of arbitrary types we transform binary_search in a template function.
template<class T> const T* binary_search (const T* array, int n, const T& x) { const T* lo = array, *hi = array + n, *mid; while(lo != hi) { mid = lo + (hi - lo) / 2; if (x == *mid) return mid; if (x < *mid) hi = mid; else lo = mid + 1; } return 0; }
Now the algorithm is designed for use with arrays of different types. In case of not finding the value searched for, a special pointer - nil - is returned. This requires that such a value exists. Since we don't want to make this assumption, in case of an unsuccessful search we return the pointer array + n (yes, a past-the-end pointer) instead.
template<class T> const T* binary_search (const T* array, int n, const T& x) { const T* lo = array, *hi = array + n, *mid; while(lo != hi) { mid = lo + (hi - lo) / 2; if (x == *mid) return mid; if (x < *mid) hi = mid; else lo = mid + 1; } return array + n; }
Instead of handing over array as pointer to the first element and a size, we could also specify a pointer to the first and past the last element to approach STL's iterator concept.
template<class T> const T* binary_search (T* first, T* last, const T& value) { const T* lo = array, *hi = array + n, *mid; while(lo != hi) { mid = lo + (hi - lo) / 2; if (value == *mid) return mid; if (value < *mid) hi = mid; else lo = mid + 1; } return last; }
To specify a pointer to the end of a container instead of handing over its size has the advantage that it has not to be possible to compute last out of first with first+n. This is important for containers that don't allow random access to their elements. Because our binary_search needs random access to the elements of the container, this is of little importance in our example. Another advantage is that the difference type (here int) doesn't have to be explicitly handed over, so the user of binary_search doesn't even have to know it. The difference type is the type which is used to express the type of the difference of two arbitrary iterators (pointers), for example last - first could be of the type signed long.
The last step to fully adapt the algorithm to the STL style is to change the first and last pointer type from pointers to the value type to an appropriate iterator type. By this step, the information of how the algorithm steps from one element to the next is torn away from the algorithm implementation and is hidden in the iterator objects. Now, no assumptions about the mechanism to iterate through the elements are made. This mechanism is handed over to the algorithm by the iterator objects. So, the algorithm is separated from the container it works on, all the operations that deal with iterators are provided by the iterator objects themselves.
Since binary_search needs random access to the elements of the container it is called for and so iterators handed over to binary_search have to satisfy the requirements of random access iterators, we name the type of first and last "RandomAccessIterator":
template<class RandomAccessIterator, class T> RandomAccessIterator binary_search (RandomAccessIterator first, RandomAccessIterator last, const T& value) { RandomAccessIterator not_found = last, mid; while(first != last) { mid = first + (last - first) / 2; if (value == *mid) return mid; if (value < *mid) last = mid; else first = mid + 1; } return not_found; }
The only assumptions the algorithm makes are the random access to elements of type T between the two iterators (pointers) first and last and that operator== and operator< are defined for type T and the value type of the iterator.
This generic binary search algorithm hasn't lost anything of its functionality, especially not when dealing with built in types.
int x[10]; // array of ten integer values int search_value; // value searched for // initialize variables int* i = binary_search (&x[0], &x[10], search_value); if (i == &x[10]) cout << "value not found"; else cout << "value found";
All the STL algorithms are constructed like our example algorithm - they try to make as few assumptions as possible about the environment they are run in.
Johannes Weidl (J.Weidl@infosys.tuwien.ac.at) - Apr 16, 1996