Building An Index of OCaml’s Syntactic Categories

As I learn a language I feel a need to know where various language constructs (syntactic categories) can be used in programs. The official language definition seems to lack an index, especially a hyperlinked index from the BNF category definitions back to category uses. Here I describe an OCaml program to modify a copy of the html manual in order to provide such back links. Here is the yield of the first edition of the program. I think that the only visible change is that each defining occurrence of a category name is now a hyperlink to a list of links to category definitions that employ the given category. Chapter 6 holds most of these definitions but that fact is not built into the program.

There remain bugs and obvious improvements that I want to make but I pause here to present what I have done in case I get distracted. (2012 Oct) I got distracted.

Running the Program

I have run this only on a Mac with OCaml version 3.12.0. The program assumes that the working directory holds a subdirectory man with the html tree of documents. Running the program produces a modified tree in a new directory man2.
rm -r man2
ocamlc unix.cma str.cma index.ml
./a.out
cp man/*.css man/.ht* man/*.gif man2
To save space on my website I executed the following shell command in directory man2:
perl -pi -w -e 's/libref/http:\/\/caml.inria.fr\/pub\/docs\/manual-ocaml\/libref/g' *.html
rm -r libref
and this allows me to omit the large subdirectory man2/libref. The result is as above.

The Program Itself

Here is a note pad where I accumulated lore. I think that will have to serve as the only logic manual for now. I had to learn too much about regular expressions, but in the end they served.

Blemishes and Faults

References to categories occur formally in category definitions but also informally in the following text that commonly provides semantics. Indexing those is useful also but I want to make the distinction clear in the index. This was the purpose of the field ex but the logic is at best incomplete. Perhaps I will follow a link with a pair of counts, for formal and informal references.