1/27/08

Trivially Size-Balanced Maps

I tested my interpreter for the Hi language with a translation of the trivially size-balanced map implementation originally written in OCAML.
I developed the code when I was looking for some datastructure which would:

  1. allow for low-cost sharing of slightly modified versions of a mapping,

  2. have reasonably fast insertion and look-up algorithms,

  3. have a fast look-up on size,

  4. be very low on space overhead.


Years ago, I looked at one implementation by Daan Leijen, but it seemed to have a lot of overhead, and not a lot of merit in a referentially transparent language. So, I decided to go another way.

A trivially size balanced map is a structure which holds key-value pairs in an ordered binary tree.

A node in the tree is either an empty leaf node or a node which holds a key value pair, a left branch and a right branch, and the number of key value pairs in that branch. A key and value pair is stored in order in the tree by recursively traversing the tree until an empty leaf node is found. On the way up, nodes are rebalanced (rotated).




| I. a II. c |
| / \ / \ |
| b c -> a e |
| / \ / \ |
| d e b d |




In the picture above, a rotation to the left is shown. Of course, such a rotation is only done if the right-branch I.c of the tree I.a is larger than the left-branch I.b of that tree. If we try to rotate pending the size of the branches then a rotation should be performed only if the size of the resulting branch II.a is smaller than the size of the original branch I.c. The size of I.c is 1 + size(d) + size(e) and the size of II.a is 1 + size(d) + size(b). This gives a wonderfully small and fast test: 1 + size(d) + size(b) < 1 + size(d) + size(e) which simplifies to size(b) < size(e).

The resulting data-structure has roughly size and time complexities between ordered binary trees and perfectly balanced ordered binary trees. On the good side, the size is a good estimate for the depth of a small tree. So, small trees are nearly perfectly balanced which, in most scenarios, keeps almost all trees pretty close to perfectly balanced trees (I hope).

Below the Hi code. The typechecker is still under development, so some small faults may occur in the types.

Note. I never did any formal analysis and I know of some degenerate cases. Also, I never gave much thought about double rotations. But, somehow, the thing works out pretty well with a substantially low overhead in almost all practical cases.



import "util_list.hi"

/********************************************************************/

namespace opt (

type opt = \a => [ just a | nothing ]

)

/********************************************************************/

namespace map (

using system

type map = \a \b =>
[ empty
| node int a b (map a b) (map a b) ]

def empty_test: map a b -> bool =
[ empty -> true
| _ -> false ]

def choose: map a b -> a =
[ node n k v l r -> k ]

def maximum: map a b -> (a, b) =
[ node n k v l empty -> (k,v)
| node n k v l r -> maximum r ]

def minimum: map a b -> (a, b) =
[ node n k v empty r -> (k,v)
| node n k v l r -> minimum l ]

def size: map a b -> int =
[ empty -> 0
| node n k v l r -> n ]

def look: ::ord a => map a b -> a -> opt.opt b =
[ empty, a -> opt.nothing
| node s k v l r, a ->
if a < k then look l a
else if a > k then look r a
else opt.just v ]

def has: ::ord a => map a b -> a -> bool =
[ m, k ->
[ opt.nothing -> false
| opt.just _ -> true ] (look m k) ]

def member: ::ord a => map a b -> a -> b -> bool =
[ m, k, v ->
[ opt.nothing -> false
| opt.just w -> w == v ] (look m k) ]

def nth: ::ord a => map a b -> a -> b =
[ m, k -> [ opt.just v -> v ] (look m k) ]

def nth_total: ::ord a => a -> map a b -> b -> a =
[ m, k, d -> [ opt.just v -> v | _ -> d ] (look m k) ]

def new_node: a -> b -> map a b -> map a b -> map a b =
[ k, v, l, r -> node (1 + size l + size r) k v l r ]

def balance0: a -> b -> map a b -> map a b -> map a b =
[ k, v, l, r ->
[ node sx kx vx lx rx ->
if size l < size rx
then new_node kx vx (new_node k v l lx) rx
else new_node k v l r
| _ -> new_node k v l r ] r ]

def balance1: a -> b -> map a b -> map a b -> map a b =
[ k, v, l, r ->
[ node sx kx vx lx rx ->
if size r < size lx
then new_node kx vx lx (new_node k v rx r)
else new_node k v l r
| _ -> new_node k v l r ] l ]

def balance: a -> b -> map a b -> map a b -> map a b =
[ k, v, l, r ->
let sl = size l in
let sr = size r in
if sl < sr then balance0 k v l r
else if sl > sr then balance1 k v l r
else new_node k v l r ]

def glue: map a b -> map a b -> map a b =
[ l, r ->
[ empty, _ -> r
| _, empty -> l
| node sx kx vx lx rx, node sy ky vy ly ry ->
if (sx < sy) then balance ky vy (glue l ly) ry
else balance kx vx lx (glue rx r) ] l r ]

def change: ::ord a => a -> (opt.opt b -> opt.opt b) -> map a b -> map a b =
[ a, f, empty ->
[ opt.nothing -> empty
| opt.just v -> balance a v empty empty ]
(f opt.nothing)
| a, f, node s k v l r ->
if a < k then balance k v (change a f l) r
else if a > k then balance k v l (change a f r)
else [ opt.nothing -> glue l r
| opt.just z -> balance k z l r ] (f (opt.just v)) ]

def insert: ::ord a => a -> b -> map a b -> map a b =
[ k, v -> change k [x -> opt.just v] ]

def delete: ::ord a => a -> map a b -> map a b =
[ k -> change k [ v -> opt.nothing ] ]

def foldr: (a -> b -> c -> c) -> c -> map a b -> c =
[ f, z, empty -> z
| f, z, node s k v l r ->
foldr f (f k v (foldr f z r)) l ]

def foldl: (a -> b -> c -> c) -> c -> map a b -> c =
[ f, z, empty -> z
| f, z, node s k v l r ->
foldl f (f k v (foldl f z l)) r ]

def foldm: (a -> b -> c -> c -> c) -> c -> map a b -> c =
[ f, z, empty -> z
| f, z, node s k v l r ->
f k v (foldm f z l) (foldm f z r) ]

def fold: (a -> b -> c -> c) -> c -> map a b -> c =
foldr

def to_up_list: map a b -> list (a, b) =
[ m -> foldr [ k, v, l -> list.cons (k,v) l ] list.nil m ]

def to_down_list: map a b -> list (a, b) =
[ m -> foldl [ k, v, l -> list.cons (k,v) l ] list.nil m ]

def to_list: map a b -> list (a, b) =
to_up_list

def from_list: list (a, b) -> map a b =
[ list.nil -> empty
| list.cons (k,v) xx -> insert k v (from_list xx) ]

def domain: map a b -> a list =
[ m -> list.map [ (a,b) -> a ] (to_list m) ]

def range: map a b -> b list =
[ m -> list.map [ (a,b) -> b ] (to_list m) ]

def filter: ::ord a => (a -> b -> bool) -> map a b -> map a b =
[ p -> fold
[ k, v, m -> if p k v then insert k v m else m ]
empty ]

def partition: ::ord a =>
(a -> b -> c) -> map a b -> (c, map a b) map =
[ p, m ->
let f, p, k, v, m =
let i = p k v in
insert i (insert k v (nth_total (empty) m i)) m
in fold (f p) empty m ]

def divide: ::ord a =>
(a -> b -> bool) -> map a b -> (map a b, map a b) =
[ p, m -> let d = partition p m in
(nth_total empty d false,
nth_total empty d true) ]

def union: ::ord a => map a b -> map a b -> map a b =
[ m0, m1 ->
fold [ k, v, m -> insert k v m] m0 m1 ]

def subtract: ::ord a => map a b -> map a b -> map a b =
[ m0, m1 ->
filter [ k, v -> not (member m1 k v) ] m0 ]

def intersection: ::ord a => map a b -> map a b -> map a b =
[ m0, m1 -> filter (member m0) m1 ]

def inverse: ::ord a ::ord b => map a b -> map b a =
[ m -> let f, k, v, m0 = insert v k m0
in fold f empty m ]

def apply: ::ord c => (a -> b) -> map c a -> map c b =
[ f, m -> fold
[ k, v, m -> insert k (f v) m ] empty m ]

def compose: ::ord a ::ord c => map a b -> map c a -> map c b =
[ m0, m1 -> apply (nth m0) m1 ]

def decompose: ::ord a => map a b -> (map int b, map a int) =
[ m ->
foldr
[ k, v, (m0, m1) ->
(insert (size m0) v m0, insert k (size m0) m1) ]
(empty, empty) m
]

)

using system

def make_map: int -> map int int =
[ 0 -> map.empty
| n -> map.insert n (n+1) (make_map (n-1)) ]

def main: map int int = map.nth (map.apply [x -> x * 2 ] (make_map 10)) 5