While learning Python I am attending several online courses on computer science, data structures, and algorithms. One of the topics covered is binary search.
Binary search is all about finding items in a sorted list quickly. The key here is sorted. If I know the list is sorted, I can perform binary search and be assured I can find the item very quickly. There is a formula for how many guesses (x) it will take to find an item from a list containing n items.
It will take x guesses, where x is the smallest integer such that 2x > n.
For example, if I have a sorted list of 10,000 integers. Worst case is that it takes 14 guesses to find any number in the list.
2x > 10000
213 = 8192 < 10000 < 214 = 16384
Therefore, x = 14. The maximum number of guesses is 14.
To show this, I can write a sample application in Python that takes a random sample of 10,000 integers, sorts them, and then randomly picks an integer for binary search to find.
import random def binary_search(lst, integer): lower_bound = 0 upper_bound = len(lst) - 1 number_guesses = 0 while True: index = (lower_bound + upper_bound) // 2 guess = lst[index] number_guesses += 1 if guess == integer: break if guess > integer: upper_bound = index - 1 else: lower_bound = index + 1 return number_guesses print 'Number Guesses' for _ in range(30): integers = random.sample(range(50000), 10000) integer_to_find = random.choice(integers) sorted_integers = sorted(integers) guesses = binary_search(sorted_integers, integer_to_find) print '{:6} {:7}'.format(integer_to_find, guesses)
When I run this code I get 30 samples of binary search. Below is sample output showing the number to find and the number of guesses it took to find the number. In the worst case, it only takes 14 guesses to find the number.
Number Guesses 38942 12 2934 13 8562 12 28035 13 18510 9 48935 13 11319 14 15053 13 42115 14 21968 13 35387 14 46800 13 35737 12 46956 12 39489 14 16086 13 30011 11 43529 11 37821 2 23551 14 4016 13 7491 13 25928 14 17469 5 4688 13 995 13 38501 13 38212 11 42038 14 42596 12
If I change the sample size to 1000 integers, it will take at most 10 guesses.
29 = 512 < 1000 < 210 = 1024
Number Guesses 6323 9 22853 7 25667 10 43520 10 13567 10 22766 10 32693 10 48141 10 35584 10 18677 10 44205 6 2849 10 11923 10 828 10 46147 6 20458 10 10536 9 36059 10 18923 9 17181 9 24803 8 18641 10 9734 8 42318 10 27811 10 42693 10 15971 10 23070 8 18226 9 34827 8
As the number of items (n) grows exponentially, it is amazing how quickly binary search still finds the number. Using asymptotic notation, binary search is O(log n), which is really nice.
# of items (n) | # of guesses ( 2x > n) |
---|---|
10 | 4 |
100 | 7 |
1,000 | 10 |
10,000 | 14 |
100,000 | 17 |
1,000,000 | 20 |
10,000,000 | 24 |
100,000,000 | 27 |
Binary search is brilliant!