Same Fringe Problem

While considering the clever ‘same fringe problem’ it occurred to me that a list pattern might be applied to a tree. By ‘tree’ I mean here an integer or a pair of trees. (Scheme or Lisp jargon) By contrast a list is either empty (a special value) or the pair of the first element and a list of the rest.

The list pattern that we try to emulate is that car of a list is the first element and cdr of that list is a list without the first element. Can we find an analog for the first and remainder of a tree? Our definition of a tree precludes an empty tree and an empty fringe; we live with that.

This is an exercise in designing an algorithm in English instead of some computer language. We seek routines tf and tr for ‘the first of the tree’ and ‘the rest of the tree’. cdr is non generative which means that x and (cdr x) share storage, which is very good. That seems impossible for tr but it is possible for (tr t) to share a good deal of t.

; (define (r t) (cons (caar t) (cons (cdar t) (cdr t))))

A function r to produce (a . (b . c)) from ((a . b) . c) requires just two new cons cells. t and (r t) have the same fringe. Further the ‘left height’ of (a . (b . c)) is one less than the left height of ((a . b) . c). (The left height of a number is 0 and the left height of (x . y) is one plus the left height of x.) Iterating r enough times gives us (x . y) where x is a number and the first member of the fringe and the fringe of y is the rest of the fringe of the original tree. It seems strategic that we produce (tr x) as we produce (tf x). We thus seek a function trf that produces (trf x) = (cons (tf x) (tr x)). That will serve the purposes of many car-cdr patterns. Since trees cannot be empty (trf x) returns #f when x is a number.

This Scheme code works pretty well. The function of r is incorporated in routine tfr. It compares two mega leaf trees in about one second.

To generate non-trivial test cases we need generators of random tree with the same, or tweaked fringe. We construct the fringe first, as an array of integers. We then recursively bisect the array with random cut-points until we get down to singletons. This gives us a probability distribution over arrays of the same fringe size. Any valid tree has a positive probability. This code generates such trees but requires this framework. (rt n "seed" m) returns a tree with n fringe numbers with the string “seed” controlling the splits. The m^th leaf is incremented by one if 0 ≤ m < n.

When testing trees with 10⁶ elements, it takes 3 sec to generate each tree, and 1.2 sec to compare them. Perhaps I have a performance problem in my random number generators. The recursive routine s inside tfr creates 1999974 cons cells when I compare two different trees with the same fringe of 10⁶ numbers. Only 2 cells per maximum tree depth are germane to the subsequent computation and this determines worst case memory usage.

Here is John McCarthy’s solution, transcribed from Lisp to Scheme. It is much like mine, as fast, and much earlier. I found it here.

Scheme can also wield continuations to achieve coroutines. A bit of assembler code provides coroutines in C.

McCarthy’s solution is rather easy to understand but not as easy as a well packaged coroutine. The coroutine solution is more general and the coroutine logic itself, while opaque, is general to a class of coroutine problems. This coroutine solution is peculiar also in that it seems to require set!. Perhaps this is true and may explain why Haskell lacks continuations.

And an OCaml version which is about the same speed. With ocamlopt generation and comparing together take 1.53 sec.

Here is a similar solution, older than mine but younger than McCarthy’s.

Here is another perspective on the problem.