Here are some core pieces of a transclusion finder. Here are some notes that describe the mathematical parts of the algorithm, along with a brief report of what routines do what math: .pdf, PostScript, html, and rtf. The html is slightly more up-to-date.

Here are the code fragments proper. I am sorry that I have not had the time to make these presentable. They have been seen to run on a moderately large Unix directory of ascii files.
Fast.c, fl.c, FM.c, Lmain.c, New.c, sstream.c, p.c, sort.c, local.h Umain.c, m.h,

How to find differences between remote files without transmitting them first.