Recursive Tree Hash

By recursive tree I mean here the trees produced by a finite BNF grammar. There is a remarkable and seemingly efficient algorithm for testing whether two non terminals have exactly the same productions, despite the fact that the interesting BNF rules yield an infinite set of productions.

Motivations

I have spent too much time tracking down bugs due to the C compiler seeing different declarations for some parameter type when compiling independent sections of code. The argument is coerced to the parameter type as seen at the call site but the called routine indeed expects some other type. There are conventions which avoid this pitfall, but those conventions are themselves scarcely less daunting to enforce.

I thought first of modifying the loader to compare declarations of the external symbols and a hash of the declaration seemed strategic. Compilers would calculate this hash, usually from prototypes. Type mismatches would be detected at load time.

Type equivalence must be dealt with. It would be good not to require character for character agreement.
typedef float F; F x; and float x; should not produce different hashes for the type of x. Here is a case to consider.

It seems necessary that the hash of a type produced by a type constructor, such as struct be a function of the hashes of the types of the fields therein, including perhaps, some dependence on the spelling of the field names, depending on the language rules and how strictly you wish to enforce them.

But then recursive types such as
struct br{struct br * left; struct br * right;}
present a problem.

Mark Miller has suggested a cute trick to solve this problem: Choose a finite field for hash values; make the hash of a struct a linear combination of the hashes of the field types, and then you can solve the set of recursive equations and assign a hash to each type that conforms to the rules. I think there is no obvious cryptographically secure version of this.

There remain a few problems which seem to have simple solutions as far as I have seen. There are questions of language semantics such as the equivalence of:

union{float f; int i; char c;}
union{float f; union{int i; char c;} u;}

The C types are different; the analogs in Algol68 are the same type. Either notion can be handled by this hash plan.