Indices, Indexing

07 Oct 1998 13:34

From the Latin indicare, to indicate, to point out, itself from in, meaning not just what its English cognate does but also having the sense of ``into,'' and dicare, to proclaim --- an index is something which tells you the way to another thing; a pointer. Hence (!) a list of directions, specifically books, in which one should not go, like the Index Librorum Prohibitorum, the list of books prohibited to members of the Catholic Church, first printed in 1559, and officially continued until (if memory serves) 1966. (The literary and intellectual quality of the books listed had by then fallen off so badly that it was no longer much of a service to maintain it.) Indices in books developed at about the same time, and from the same cause (I think) as indices of books, namely printing, and having plenty of books around. It would be fascinating (well, to someone easily absorbed by trifles, like myself) to learn about the history of index-making and the decisions as to what goes into the index (I suspect, knowing the tastes of the time, that it began with authors). In any case, by the end of the eighteenth century, if not before, indexing had reached a peak which has never really been surpassed, in the construction of properly analytical indices, which serve as elaborate and effective means of cross-reference, summary and explication. Here is one especially favorable example, from Adam Smith's Wealth of Nations:

Agriculture, the labour of, does not admit of such subdivisions as manufactures, 6; this impossibility of separation, prevents agriculture from improving equally with manufactures, 6; natural state of, in a new colony, 92; requires more knowledge and experience than most mechanical professions, and yet is carried on without any restrictions, 127; the terms of rent, how adjusted between landlord and tenant, 144; is extended by good roads and navigable canals, 147; under what circumstances pasture land is more valuable than arable, 149; gardening not a very gainful employment, 152--3; vines the most profitable article of culture, 154; estimates of profit from projects, very fallacious, ib.; cattle and tillage mutually improve each other, 220;
etc. If you want to know what Smith thought about agriculture, this is much more useful than just a list ``6, 92, 127, 144, 147, 152--3, 154, 220,'' etc. And it is infinitely superior to searching the text of the book for ``agriculture.'' (Of the references above, only two pages contain the word.) That is to say, we actually know a reliable, easy-to-use way of doing searches and making connections between different documents, one almost providentially adapted to hypertext, and have known it for over two hundred years, but don't use it at all when it comes to hypertexts. (I know of not one hypertext with an index, let alone an annotated, analytical index.) Instead we use keyword searching, and badly at that. (Library catalogues, which also rely on such searches, at least have standardized keywords for subjects; the Web does not.) The advantage of keyword searches is that they're easy to program and require no human thought beyond writing (at the very outside) a few hundred lines of Perl. The disadvantages appear when the results are to be used by human beings, as opposed to a few hundred more lines of Perl. I have no notion of what to do about this, at least until Perl adds concept matching to regular expression matching (if Mr. Wall isn't too busy...?). One thing which does occur to me is that it's fairly easy to write scripts which build keyword indices, which people could then elaborate into useful documents.