Wamicide

10 Apr 2020

Say we have a m by n matrix (m > n), that is sorted by columns and also rows.

find whether an element exists in the matrix, in O(m + m log (n/m)) time.

Let (x,y) denote the element in the y'th index of the x'th array. or more simply

(x,y) = A[y][x]

1) for any element $(x,y) \in A$ the following must hold:

\[(x',y) \geq (x,y) \Leftrightarrow x' > x \]

likewise,

$$(x,y') \geq (x,y) \Leftrightarrow y' > y$$

Therefore it follows that given each array (row wise, col wise) is sorted you can say

$$(x,y) \geq (w,z), w \in {0...x-1} \land z \in {0...y-1}$$

So it must be possible to perform a 2d binary search over the 2d array.

Lets pick the middle (row,col) - we have three situations could happen (just like normal bsearch).

Consider:

    10, 17, 18, 20, 28, 29, 30
    11, 19, 20,  X, 29, 34, 38
    15, 26, 26, 26, 32, 39, 40
    19, 31, 36, 37, 37, 41, 47

For Key > X

     _,  _,  _,  _, 28, 29, 30
     _,  _,  _,  _, 29, 34, 38
    15, 26, 26, 26, 32, 39, 40
    19, 31, 36, 37, 37, 41, 47

We can therefore ignore everything to the left and up. Resulting in three new sub-matrices of size (n/2 * m/2) each

$$(x \cdot {mid}, y \cdot {mid}+1) , (x \cdot {hi}, y \cdot {hi})$$

     _,  _,  _,  _,  _,  _,  _
     _,  _,  _,  _,  _,  _,  _
     _,  _,  _, 26, 32, 39, 40
     _,  _,  _, 37, 37, 41, 47

$$(x \cdot {lo}, y \cdot {mid}+1),(x \cdot {mid}-1, y \cdot {hi})$$

     _,  _,  _,  _,  _,  _,  _
     _,  _,  _,  _,  _,  _,  _
    15, 26, 26,  _,  _,  _,  _
    19, 31, 36,  _,  _,  _,  _

$$(x \cdot {mid}+1, y \cdot {lo}), (x \cdot {hi}, y \cdot {mid})$$

     _,  _,  _,  _, 28, 29, 30
     _,  _,  _,  _, 29, 34, 38
     _,  _,  _,  _,  _,  _, _
     _,  _,  _,  _,  _,  _, _

For Key < X, we can similarly ignore everything right and down. giving us:

    10, 17, 18, 20, 28, 29, 30
    11, 19, 20,  _,  _,  _,  _
    15, 26, 26,  _,  _,  _,  _
    19, 31, 36,  _,  _,  _,  _

$$(x \cdot {lo}, y\cdot {mid}+1),(x \cdot {mid}-1, y \cdot {hi})$$ $$(x \cdot {lo}, y \cdot {lo}), (x \cdot {mid}-1, y \cdot {mid})$$ $$(x \cdot {mid}, y \cdot {lo}), (x \cdot {hi}, y \cdot {mid}-1)$$

See the algorithm below

lo_i = lo_y = 0
hi_x = n-1, hi_y = m-1

bsearch2(A, key, lo_x, hi_x, lo_y, hi_y)

def bsearch2(A, key, lo_x, hi_x, lo_y, hi_y):
    if (lo_x > hi_x or lo_y > hi_y):
        return False
    mid_x = lo_x + (hi_x - lo_x) / 2
    mid_y = lo_y + (hi_y - lo_y) / 2
    mid_item = A[mid_y][mid_x]
    if (key == mid_item):
        return True
    if (key > mid_item):
        return bsearch2(A, mid_x, hi_x, mid_y+1, hi_y)         or
               bsearch2(A, key, lo_x, mid_x-1, mid_y+1, hi_y)  or
               bsearch2(A, key, mid_x+1, hi_x, lo_y, mid_y)
    if (key < mid_item):
        return bsearch2(A, key, lo_x, mid_x-1, mid_y+1, hi_y)  or
               bsearch2(A, key, mid_x, hi_x, lo_y, mid_y-1)    or
               bsearch2(A, key, lo_x, mid_x-1, lo_y, mid_y)

At each step we are splitting the n x m matrix into three n/2 x m/2 matrices.

$$let\ k = m \cdot n$$

then at each step:

$$T(n) = 3 \cdot T(\frac{n}{2}) + O(1)$$ $$O(n^{\frac{{log}_3}{{log}_4}})$$

Well that wasn't what we wanted, lets try something else

This is:

A = m * arrays of size n
m = 3, n = 9

if A is:
---------- 9 ----------
[[1,2,3,4,5,6,7,8, 9],    |
[2,3,4,5,6,7,8,9 ,10],    3
[3,4,5,6,7,8,9,10,11]]    |

Then for each sub-array A[i], split A[i] like so:

[[1,2,3], [4,5,6], [7,8,9]]

into m groups of size (n/m)

Following, we can modify the method in (c) to do the moving left part with a look ahead of n/m amount. When n = m it becomes just O(n), or similarly for any constant c * m = n. Where c is equal to the look ahead (n/m). Subsequently it is then possible to binary search this lookahead to determine if the key exists in that range / or we should move down in that range also.

The following is a short algorithmic description:

i = 0
j = n-1
return solve(A, key, i, j, m, n)

def solve(A, key, i, j):
    cur = A[i][j]

    # found it
    if (key == cur)
        return True
    # move down
    if (key > cur and i+1 < m):
        return solve(A, key, i+1)

    # bsearch left (n/m) places O(log(n/m))
    # then use that new value as our j
    # if j > all then j++
    # if j < all then j = j-m-2
    # else in the middle
    j = bsearch(A[i], key, j-1-m, j-1)
    if (key < cur and j > 0)
        return solve(A, key, i, j-1)
    return False

The total number of elements is $n \cdot m, n \geq m$

The work conducted on each row (overall avg in the worst case) is: $\log \frac{n}{m}$

We can show that the worst possible case arises when we perform the manhatten (m + n) distance from the top right to bottom left.

[........X]
[.........]   start top left
[.........]

[.........]
[.......X.]  only able to move 1 in binary search
[.........]

[.........]
[.........]  only able to move 1 in binary search - hit bottom
[......X..]

[.........]
[.........]  jump n/m
[..X......]

[.........]
[.........]  jump n/m
[X........]

If (m = n) then we cannot do any better and it is O(n) anyway - worst case is the reverse diagonal path n + m = 2n = n

If (n > m) then in the worst case we hit A[m-1]j (bottom row) in O(m) steps (log n/m work at each step) after which we can effectively binary search m intervals of (m/n) in $O(m \log (\frac{n}{m}))$ time.

The complexity is: $O(m + m \cdot \log {\frac{n}{m}})$