Set Abstraction for Totally Ordered Types

This is a shameless transcription into Scheme of Ocaml’s set module (from here, described here) by Xavier Leroy. There is a GNU license on the Ocaml code which may limit what you can do with my transcription.

The Scheme expression defines a value set which is a function that takes as its only argument, a function, f, which takes two elements and returns an integer that is negative, 0 or positive depending or their relative order. For such set members x and y, xRy ⇔ f(x, y) ≤ 0. It is necessary that R be a preorder. I.e. xRy & yRz ⇒ xRz and also xRx. The function set returns a list of set tools for sets of such elements as follows:

(apply (lambda (empty empty? mem add singleton remove
   union inter diff compare equal subset iter fold
   for_all exists filter partition cardinal elements
   min_elt max_elt choose split)
  ......
) (set (lambda (i j) (if (< i j) -1 (if (> i j) 1 0)))))

Code situated at the dots can refer to these set primitives. In the following identifiers beginning with s denote sets as processed by these routines.

empty = ∅ (the empty set)
(empty? s) iff s = ∅
(mem e s) iff e ∊ s
(add e s) = s ∪ {e}
(singleton e) = {e}
(remove e s) = {x | x∊s ∧ x≠e}
(union s1 s2) = s1∪s2
(inter s1 s2) = s1∩s2
(diff s1 s2) {x | x∊s1 ∧ x∉s2}
(compare s1 s2) is a total ordering on sets based on the original argument to set.
(equal s1 s2) iff s1 = s2
(subset s1 s2) iff s1 ⊆ s2
(iter f s) Performs function f on each element of s. Ignores returned values and returns nothing.
(fold f s v) Iterates over s applying f to each member, feeding the yield of one application into the next. where f is a function which takes an element and value of some type and returns a value of that type. v is the initial value and fold returns the last value.
(for_all p s) iff p (a predicate) yields true on each element of s. (p is applied in order only until it fails.)
(exists p s) iff p (a predicate) yields true on some element of s. (p is applied in order only until it succeeds.)
(filter p s) = {x | x∊s ∧ p(x)}
(partition p s) = (cons s1 s2) where s1={x | x∊s ∧ p(x)} and s2={x | x∊s ∧ ~p(x)}
(cardinal s) = the number of elements in s (= |s|)
(elements s) a sorted list of the elements of s
(min_elt s) the least element of s by supplied ordering
(max_elt s) the largest element of s by supplied ordering
(choose s) some element of s
(split e s) = (list s1 b s2) s1 = {x | x∊s ∧ x < e}, s2 = {x | x∊s ∧ x > e} and b = (e ∊ s)
(find e s) = “none-such” if e∉s else e' where e' is the oldest element added to the set that is equivalent to e according to the comparator.

It took me two days to transcribe the code and two more to weed out enough transcription errors to pacify these too crude test cases. The tests are adapted here for the repository form of Set. I once wrote a less extensive version of b-tree logic and I am glad that I did not have to debug all of the special cases handled in the Ocaml code.

This describes the interface but with some Ocaml jargon. For instance in the entry for add the text elt -> t -> t means that add takes two arguments; the first is an element of the base type elt and the second a set whose type is t of such elements. It returns a value of type t which is a set.

Both the Ocaml code and Scheme code produce immutable sets. For instance the add function returns a new set while the input set remains unchanged. When the set is large most of the storage is shared between the new and old sets.

Here is some code to bind the names at the top level to support interactive debugging. It assumes that t denotes the yield of set. Here is some code to deal with large sets of integers.