Data abstraction is a term that is about 20 years old. The idea that the term refers to is as old as computers. It is the notion that some program should hide the details of how it remembers some situation with which it is charged with dealing. The first big programs that I dealt with in 1955 would write simulation data onto a magnetic tape in order to resume the simulation later. The tape format would consist almost entirely of floating point numbers organized as they appeared in core, arranged for the convenience of the program. The size of the tape block was likely the only clue about the number of points in a row of the simulation and the number of records between ‘file marks’ told how many rows.

A program written to produce pictures of the simulation would most likely be written by a programmer who had learned the ‘secret’ tape format. This learning violated the data abstraction of the program. It was also the only way to go then. The problems with violating the abstraction were the same then as now.

I very much dislike XML which I suppose attempts to ameliorate this tension. The best that I can see is an attempt to convey clues about the role of data by the spelling of the tags which organize the data. I grant that this works sometimes but do not grant that the benefit is worth the cost. If there were something along the lines of an open ontological collaboration which would propose tag meanings, in a variety of areas, then the XML thrust might bring better benefits. My mail application (Apple’s Mail program) is remarkably clever about extraction time, place and subject information from mailed event announcements. This information is delivered to the iCal program which disposes of it intelligently. I don’t know whether XML is part of this useful collaboration but I suppose it is the sort of cross-application message that XML was invented for.

I am still uncomfortable with programs that try heuristically to guess what is going on. I admit that I find some such programs useful and will probably find more useful in the future. It is a main thrust of AI. The main cleverness of Apple’s Mail program is finding the information from freely formatted text, not in how it is formatted for delivery to iCal. If the Mail program were to deliver these messages to me, instead of only to iCal, then XML might be indicated and useful as I could then choose among programs to read such messages.

Part of my discomfort with this XML plan is the idea that the spelling of field names is enough to explain the meaning of messages, or the logic of programs. The abhorrence of English documentation among most programmers is a very bad situation for today’s information technology (as I have said elsewhere).

Perhaps my bias is that people, including programmers, should speak to each other in their native language, English here, while programs and computers should speak to each other in their native language, binary here!