Say we have a m by n matrix (m > n), that is sorted by columns and also rows.
find whether an element exists in the matrix, in O(m + m log (n/m)) time.
Let (x,y) denote the element in the y'th index of the x'th array. or more simply
(x,y) = A[y][x]
1) for any element \((x,y) \in A\) the following must hold:
\[(x',y) \geq (x,y) \Leftrightarrow x' > x \]
likewise,
$$(x,y') \geq (x,y) \Leftrightarrow y' > y$$
Therefore it follows that given each array (row wise, col wise) is sorted you can say
$$(x,y) \geq (w,z), w \in {0...x-1} \land z \in {0...y-1}$$
So it must be possible to perform a 2d binary search over the 2d array.
Lets pick the middle (row,col) - we have three situations could happen (just like normal bsearch).
Consider:
10, 17, 18, 20, 28, 29, 30
11, 19, 20, X, 29, 34, 38
15, 26, 26, 26, 32, 39, 40
19, 31, 36, 37, 37, 41, 47
For Key > X
_, _, _, _, 28, 29, 30
_, _, _, _, 29, 34, 38
15, 26, 26, 26, 32, 39, 40
19, 31, 36, 37, 37, 41, 47
We can therefore ignore everything to the left and up. Resulting in three new sub-matrices of size (n/2 * m/2) each
$$(x \cdot {mid}, y \cdot {mid}+1) , (x \cdot {hi}, y \cdot {hi})$$
_, _, _, _, _, _, _
_, _, _, _, _, _, _
_, _, _, 26, 32, 39, 40
_, _, _, 37, 37, 41, 47
$$(x \cdot {lo}, y \cdot {mid}+1),(x \cdot {mid}-1, y \cdot {hi})$$
_, _, _, _, _, _, _
_, _, _, _, _, _, _
15, 26, 26, _, _, _, _
19, 31, 36, _, _, _, _
$$(x \cdot {mid}+1, y \cdot {lo}), (x \cdot {hi}, y \cdot {mid})$$
_, _, _, _, 28, 29, 30
_, _, _, _, 29, 34, 38
_, _, _, _, _, _, _
_, _, _, _, _, _, _
For Key < X, we can similarly ignore everything right and down. giving us:
10, 17, 18, 20, 28, 29, 30
11, 19, 20, _, _, _, _
15, 26, 26, _, _, _, _
19, 31, 36, _, _, _, _
$$(x \cdot {lo}, y\cdot {mid}+1),(x \cdot {mid}-1, y \cdot {hi})$$ $$(x \cdot {lo}, y \cdot {lo}), (x \cdot {mid}-1, y \cdot {mid})$$ $$(x \cdot {mid}, y \cdot {lo}), (x \cdot {hi}, y \cdot {mid}-1)$$
See the algorithm below
lo_i = lo_y = 0
hi_x = n-1, hi_y = m-1
bsearch2(A, key, lo_x, hi_x, lo_y, hi_y)
def bsearch2(A, key, lo_x, hi_x, lo_y, hi_y):
if (lo_x > hi_x or lo_y > hi_y):
return False
mid_x = lo_x + (hi_x - lo_x) / 2
mid_y = lo_y + (hi_y - lo_y) / 2
mid_item = A[mid_y][mid_x]
if (key == mid_item):
return True
if (key > mid_item):
return bsearch2(A, mid_x, hi_x, mid_y+1, hi_y) or
bsearch2(A, key, lo_x, mid_x-1, mid_y+1, hi_y) or
bsearch2(A, key, mid_x+1, hi_x, lo_y, mid_y)
if (key < mid_item):
return bsearch2(A, key, lo_x, mid_x-1, mid_y+1, hi_y) or
bsearch2(A, key, mid_x, hi_x, lo_y, mid_y-1) or
bsearch2(A, key, lo_x, mid_x-1, lo_y, mid_y)
At each step we are splitting the n x m matrix into three n/2 x m/2 matrices.
$$let\ k = m \cdot n$$
then at each step:
$$T(n) = 3 \cdot T(\frac{n}{2}) + O(1)$$ $$O(n^{\frac{{log}_3}{{log}_4}})$$
Well that wasn't what we wanted, lets try something else
This is:
A = m * arrays of size n
m = 3, n = 9
if A is:
---------- 9 ----------
[[1,2,3,4,5,6,7,8, 9], |
[2,3,4,5,6,7,8,9 ,10], 3
[3,4,5,6,7,8,9,10,11]] |
Then for each sub-array A[i], split A[i] like so:
[[1,2,3], [4,5,6], [7,8,9]]
into m groups of size (n/m)
Following, we can modify the method in (c) to do the moving left part with a look ahead of n/m amount. When n = m it becomes just O(n), or similarly for any constant c * m = n. Where c is equal to the look ahead (n/m). Subsequently it is then possible to binary search this lookahead to determine if the key exists in that range / or we should move down in that range also.
The following is a short algorithmic description:
i = 0
j = n-1
return solve(A, key, i, j, m, n)
def solve(A, key, i, j):
cur = A[i][j]
# found it
if (key == cur)
return True
# move down
if (key > cur and i+1 < m):
return solve(A, key, i+1)
# bsearch left (n/m) places O(log(n/m))
# then use that new value as our j
# if j > all then j++
# if j < all then j = j-m-2
# else in the middle
j = bsearch(A[i], key, j-1-m, j-1)
if (key < cur and j > 0)
return solve(A, key, i, j-1)
return False
The total number of elements is \(n \cdot m, n \geq m\)
The work conducted on each row (overall avg in the worst case) is: \(\log \frac{n}{m}\)
We can show that the worst possible case arises when we perform the manhatten (m + n) distance from the top right to bottom left.
[........X]
[.........] start top left
[.........]
[.........]
[.......X.] only able to move 1 in binary search
[.........]
[.........]
[.........] only able to move 1 in binary search - hit bottom
[......X..]
[.........]
[.........] jump n/m
[..X......]
[.........]
[.........] jump n/m
[X........]
If (m = n) then we cannot do any better and it is O(n) anyway - worst case is the reverse diagonal path n + m = 2n = n
If (n > m) then in the worst case we hit A[m-1]j (bottom row) in O(m) steps (log n/m work at each step) after which we can effectively binary search m intervals of (m/n) in \(O(m \log (\frac{n}{m}))\) time.
The complexity is: \(O(m + m \cdot \log {\frac{n}{m}})\)