编程珠玑-错误的二分查找算法(下)

微信公众号:科文路。转发须注明出处。

源自《编程珠玑》第 2 版第 5 章,《编程小事》

(上)部分主要介绍二分查找算法原理以及一些编程理念,(中)讲解测试方法,并揪出错误,(下)部分通过运行时间分析来验证算法时间复杂度。

本文成文仓促,后续若有修正,将在文末阅读原文的博客文章上进行。

编程珠玑-错误的二分查找算法(上)编程珠玑-错误的二分查找算法(中)中,经过充分的测试,一份看起来还不错的二分查找算法代码已经给出。

本文将通过计时的方式评价这段代码,验证其时间复杂度是否真的为 O(log n)。具体来说,算法的单次查询时间应该在数组长度指数级增长时,呈线性增长。例如,n 增大 10 倍,时间应增长 log 10 = 1 个单位;n 增大 100 倍,时间应增长 log 100 = 2 个单位 。注意,这里的 log 标记只为说明是对数关系,其底不一定是 10

测试代码在下一小节。

测试设置

在 Python 代码中,被查数组的大小被设置为 [100, 1000, ..., 10000000]。为求合理的平均查询时间,设置查询次数为 1000 次。

运行程序后,得到的结果为

1
2
3
4
5
6
7
8
9
10
11
array length (n): 100, n_tests: 1000, duration_time: 0.001632
array length (n): 200, n_tests: 1000, duration_time: 0.001884
array length (n): 400, n_tests: 1000, duration_time: 0.002418
array length (n): 800, n_tests: 1000, duration_time: 0.002643
array length (n): 1600, n_tests: 1000, duration_time: 0.002892
array length (n): 3200, n_tests: 1000, duration_time: 0.003116
array length (n): 6400, n_tests: 1000, duration_time: 0.003387
array length (n): 12800, n_tests: 1000, duration_time: 0.003663
array length (n): 25600, n_tests: 1000, duration_time: 0.003982
array length (n): 51200, n_tests: 1000, duration_time: 0.004291
array length (n): 102400, n_tests: 1000, duration_time: 0.004760

绘图,可以看到单次运行时间随数组大小的指数级增长,呈线性。当数组大小每增加 10 倍,单次查询时间增加 0.001 左右。故与时间复杂度为 O(log n) 的推论吻合。

plot for average time

测试代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# -*- coding: utf-8 -*-
# ------------------------------------------------------------------
# File Name: binary_search_test.py
# Author: https://xlindo.com/kewenlu
# Version: ver0_1
# Created: 2021/12/29
# Description: To test the time complexity for the binary search
# algorithm.
# License: GPLv3
# ------------------------------------------------------------------

import matplotlib.pyplot as plt
import random
import time


def binary_search_right(arr, t):
"""binary search algorithm

Args:
arr (list): list to be searched
t (int): target number

Returns:
int: index for the target. If not exists, -1
"""
l, r = 0, len(arr) - 1
while l <= r:
m = round(l + (r - l) / 2) # rounding
if arr[m] < t:
l = m + 1 # adjust the searching range to the right section
elif arr[m] > t:
r = m - 1 # adjust the searching range to the left section
else:
return m # found
return -1 # not found


def time_counter(n, n_tests):
"""To test the average running time for the search

Args:
n (int): array to be searched
n_tests (int): repeated times for searching

Returns:
int: the average elapsed time
"""
arr = []
for i in range(n):
arr.append(i)

# let searching target randomly
arr_rand = [i for i in range(n)]
random.shuffle(arr_rand)

start_time = time.time()
for n_test in range(n_tests):
for i in arr_rand:
assert(binary_search_right(arr, i) == arr[i])
duration_time = time.time() - start_time
print("array length (n): %d, n_tests: %d, duration_time: %lf" %
(n, n_tests, duration_time / n))
return duration_time / n


# test and plot the result
arr_n = [100 * 2 ** n for n in range(11)] # 100, 1 000, ... for array size
time_cnt = [time_counter(100 * 2 ** n, 1000)
for n in range(11)] # 1 000 times searching

arr_idx = [n for n in range(11)]

plt.title("The time that increases linearly with the length\nof the array increases exponentially")
plt.xlabel("Array size (100*2^n)")
plt.ylabel("Average elapsed time")
plt.plot(arr_idx, time_cnt)
plt.scatter(arr_idx, time_cnt)

plt.show()

总结

通过三篇文章,我们从概念出发,编写了一个看起来没有问题的二分查找算法、通过脚手架迭代改正了算法中的问题,并在最后通过测试程序验证了算法的时间复杂度为 O(log n)

全文较长,但其中的程序设计的流程和思想值得每一位读者再次阅读、再次回味。

编程珠玑-错误的二分查找算法(下)

https://xlindo.com/kewenlu2022/posts/c22533ae/

Author

xlindo

Posted on

2021-12-29

Updated on

2023-05-10

Licensed under

Comments