out of the land of the dex
Oct. 17th, 2006 08:55 amI finally tidied up and submitted the index yesterday - the index that has been hanging over my head since Dr. Academic first asked me last year to compile it, and has been weighing far more heavily on my time for the last six weeks since the proofs arrived. The indexable part of the text is about 250 pages, but it's a mightily fact- and name-filled book. The index has about 750 main entries, and a total of about 5500 page references. I hope the publisher doesn't want to cut it. But at least for the moment I'm done, which means it's time for my indexing rant:
I took a course in indexing in library school. It was by far the most useless course I took there. Neither the professor, nor the textbook, had any real interest in teaching us how to index. The emphasis was all on the arrangement of the index after it's compiled: the differences between word-by-word and letter-by-letter alphabetization, and the exact rules for each; the advantages and disadvantages of run-in versus indented sub-entries; varying systems for inverting foreign names (a topic already covered in the cataloging rules); how far to indent the run-on lines; and so forth. All things an indexer needs to consider, to be sure, but covered in oddly minute detail in the weird absence of the main topic. It reminded me - and I said so at the time - of a cookbook that begins, "First, cook the meal. Now, we will discuss the finer points of setting the table." Or, more relevantly for some of you, a writing class that's all about spelling corrections, formatting the document, and whether to submit it on paper or electronically.
I've remained interested in indexing - it bears some similarities to cataloging - and I've indexed one commercially published book before, a collection of reviews. But just about everything I've read on indexing has had the same weird absence of advice on how to index, omitted in favor of discussion of how to arrange the index. Perhaps, at most, there will be advice on the color highlighter to use when marking up the proofs.
The publisher informed me that they follow the Chicago Manual of Style, so I read its chapter on indexing. Sure enough, it's mostly about arranging the index. But buried in all that are a few nuggets that help with the actual indexing. Chicago defines light and heavy indexing and tells you how many references by average per text page are deemed to constitute each. This was a useful guideline. It specifically warns against the most sure-fire sign of a badly compiled index: long strings of undifferentiated page references. I set myself a limit of seven at the maximum, and often split into sub-entries at much smaller numbers than that.
One might think that indexing is is simply a matter of going through the book looking for names. Some indexers certainly do; I've seen indexes compiled that way. They are called "bad indexes." But good indexing requires thought. In the middle of discussing something else, Chicago informs the reader that "names or terms that occur in passing references and scene-setting elements that are not essential to the theme of a work need not be indexed," and kindly but parsimoniously gives one example. That's a start, though there's much more that could be said on the subject of proper names. But what about other topics? When a book's topic is large and complex but not clearly differentiated into parts, how do you decide which terms to use and which pages to list it for, when it permeates the whole book? Even a proper name: C.S. Lewis is named on literally 4/5ths of the pages in this book, so how does one construct sub-entries?
There's no advice or guidance on such questions. It helps to know the text well, and one of the reasons indexing is time-consuming is the necessity for the indexer to become familiar with the book. I had a head start here, as I'd read the text five or six times through already during its long gestation, making comments and looking for glitches. But this only helped me backtrack more, as the way to index something became clear halfway through, and I could say, "I know there were other instances of this earlier." Having a searchable PDF of the whole text was a great help, though I actually did the indexing from a printout in a three-ring binder. My highlighting pen, as it happens, was pink, this being the most common color after yellow at the office supply store, and much easier to see.
Indexers are legendarily supposed to be bad. There's some truth to this. I tried to get a sense of how to index by deconstructing the indexes to two of my own articles, as those were texts I know well. These were published in books that were indexed by their editors. I went through the indexes looking for the page numbers that my article covered, typed the numbers and entries into a spreadsheet, and rearranged it by page number, to see what topics were chosen for indexing. One of the indexes was pretty good; the other was haphazard at best, picking random names irrelevant to the topic, leaving out basic concepts, and so on.
Isaac Asimov tells in his autobiography of having hired a professional indexer for one of his first non-fiction books, and being so appalled at the results that he vowed to index all his further books himself. But I can't tell you where in his two-volume autobiography he says this, for although the books are indexed, Asimov himself was a terrible indexer. Each has two separate indexes (itself against the advice of Chicago and most other index-arrangement authorities). One is the "Name Index," by which Asimov means persons, though he doesn't say this. No places or institutions need apply. (Can you find from the index when he started working for the Navy Yard or for Boston University, say? No.) All the names are given, however irrelevant to Asimov's life story ("Calvin, John, 290") or however unlikely that readers will look them up ("Edith, 534; Eileen, 165"). And of course they're totally undifferentiated: "Campbell, John W., Jr., 189, 192, 194-207, 212, 219, 223-25 ..." and so on in a regular stream up to "650, 661, 669, 677, 687, 699, 701," and that's just the first volume. The other is the "Title Index," by which Asimov means - but again doesn't bother to explain specifically - the titles of his own stories and books. The clumsy part here comes when dealing with a story whose title changed somewhere in the writing or editing or publishing process. Asimov mindlessly indexes it just under whatever title it's referred to by on the page. No cross-references. You have to look at all the pages referred to and find the comment about the title change to discover that there's another title with a whole bunch more references.
I tried to avoid such pitfalls. My assignment is full of names and titles, but I tried to be intelligent about them. There are many long lists of names. If the individual names seemed relevant, I indexed them; if not, not. But I always indexed the list as a unit for whatever point it was there to make. Similarly with book titles. If a book by an Inkling was mentioned, I indexed it, directly under title. Almost always I made an entry for the author as well, but not sub-entered under book title - that's already covered - but by topic. Why is the book mentioned at this point? Is it as an example of a work written in collaboration, or with feedback, from other Inklings? Was it reviewed by another Inkling, or did somebody mention it favorably, or for that matter criticize it? Into the proper subcategory of the author entry it goes. If the book's title entry gets long, make sub-entries for that too.
One wants to be consistent in sub-entry terminology, but I felt no need to be consistent in depth of indexing. If a minor entry has just 4 or 5 references, don't bother to make sub-entries unless there's an important point to be brought out. If, among the many sub-entries on a major topic, there are 3 or 4 hair-splitting distinctions with only one or two references each, bring 'em together. But if they have 5 references each, keep them split, or split them if they haven't been already.
Nor is there any need to be reciprocal. The subtopic is the subtopic of that particular main entry. If A is quoted describing B's personality, the sub-entry under A should say, "on B" unless this needs to be further divided. But the sub-entry under B should say "personality," not "A on." If a reader wants to know what A said about B, look under A, not B. And so on.
This index was done in three stages. First, marking the proofs. Second, typing the entries and page numbers into a spreadsheet, which I then had the computer alphabetize (computers alphabetize word-by-word, so as the publisher requested letter-by-letter I manually rearranged it in final copy - not a difficult task). After completing the spreadsheet I thought I was mostly done, and had only to cut-and-paste into a Word document with some minor editing. Wrong. Stage three, the actual writing of the index, was the hardest and took by far the longest. Some entries were short and simple, but I always searched automatically through the PDF to see if I'd missed an obscure name reference or distinctive use of a word. Sometimes I had. But with the sub-entries, all my notes in the database were ad hoc. Constant rethinking was required to sort them out, and often I had to go back and make changes. There was nothing mechanical about this. It took thought and time.
I did some proofreading against the spreadsheet and found a few other errors when checking the printout for something. I'm sure there are others left. But I hope the index will be useful. This book is going to remake Inklings studies, and I get to guide people through its pages.
I took a course in indexing in library school. It was by far the most useless course I took there. Neither the professor, nor the textbook, had any real interest in teaching us how to index. The emphasis was all on the arrangement of the index after it's compiled: the differences between word-by-word and letter-by-letter alphabetization, and the exact rules for each; the advantages and disadvantages of run-in versus indented sub-entries; varying systems for inverting foreign names (a topic already covered in the cataloging rules); how far to indent the run-on lines; and so forth. All things an indexer needs to consider, to be sure, but covered in oddly minute detail in the weird absence of the main topic. It reminded me - and I said so at the time - of a cookbook that begins, "First, cook the meal. Now, we will discuss the finer points of setting the table." Or, more relevantly for some of you, a writing class that's all about spelling corrections, formatting the document, and whether to submit it on paper or electronically.
I've remained interested in indexing - it bears some similarities to cataloging - and I've indexed one commercially published book before, a collection of reviews. But just about everything I've read on indexing has had the same weird absence of advice on how to index, omitted in favor of discussion of how to arrange the index. Perhaps, at most, there will be advice on the color highlighter to use when marking up the proofs.
The publisher informed me that they follow the Chicago Manual of Style, so I read its chapter on indexing. Sure enough, it's mostly about arranging the index. But buried in all that are a few nuggets that help with the actual indexing. Chicago defines light and heavy indexing and tells you how many references by average per text page are deemed to constitute each. This was a useful guideline. It specifically warns against the most sure-fire sign of a badly compiled index: long strings of undifferentiated page references. I set myself a limit of seven at the maximum, and often split into sub-entries at much smaller numbers than that.
One might think that indexing is is simply a matter of going through the book looking for names. Some indexers certainly do; I've seen indexes compiled that way. They are called "bad indexes." But good indexing requires thought. In the middle of discussing something else, Chicago informs the reader that "names or terms that occur in passing references and scene-setting elements that are not essential to the theme of a work need not be indexed," and kindly but parsimoniously gives one example. That's a start, though there's much more that could be said on the subject of proper names. But what about other topics? When a book's topic is large and complex but not clearly differentiated into parts, how do you decide which terms to use and which pages to list it for, when it permeates the whole book? Even a proper name: C.S. Lewis is named on literally 4/5ths of the pages in this book, so how does one construct sub-entries?
There's no advice or guidance on such questions. It helps to know the text well, and one of the reasons indexing is time-consuming is the necessity for the indexer to become familiar with the book. I had a head start here, as I'd read the text five or six times through already during its long gestation, making comments and looking for glitches. But this only helped me backtrack more, as the way to index something became clear halfway through, and I could say, "I know there were other instances of this earlier." Having a searchable PDF of the whole text was a great help, though I actually did the indexing from a printout in a three-ring binder. My highlighting pen, as it happens, was pink, this being the most common color after yellow at the office supply store, and much easier to see.
Indexers are legendarily supposed to be bad. There's some truth to this. I tried to get a sense of how to index by deconstructing the indexes to two of my own articles, as those were texts I know well. These were published in books that were indexed by their editors. I went through the indexes looking for the page numbers that my article covered, typed the numbers and entries into a spreadsheet, and rearranged it by page number, to see what topics were chosen for indexing. One of the indexes was pretty good; the other was haphazard at best, picking random names irrelevant to the topic, leaving out basic concepts, and so on.
Isaac Asimov tells in his autobiography of having hired a professional indexer for one of his first non-fiction books, and being so appalled at the results that he vowed to index all his further books himself. But I can't tell you where in his two-volume autobiography he says this, for although the books are indexed, Asimov himself was a terrible indexer. Each has two separate indexes (itself against the advice of Chicago and most other index-arrangement authorities). One is the "Name Index," by which Asimov means persons, though he doesn't say this. No places or institutions need apply. (Can you find from the index when he started working for the Navy Yard or for Boston University, say? No.) All the names are given, however irrelevant to Asimov's life story ("Calvin, John, 290") or however unlikely that readers will look them up ("Edith, 534; Eileen, 165"). And of course they're totally undifferentiated: "Campbell, John W., Jr., 189, 192, 194-207, 212, 219, 223-25 ..." and so on in a regular stream up to "650, 661, 669, 677, 687, 699, 701," and that's just the first volume. The other is the "Title Index," by which Asimov means - but again doesn't bother to explain specifically - the titles of his own stories and books. The clumsy part here comes when dealing with a story whose title changed somewhere in the writing or editing or publishing process. Asimov mindlessly indexes it just under whatever title it's referred to by on the page. No cross-references. You have to look at all the pages referred to and find the comment about the title change to discover that there's another title with a whole bunch more references.
I tried to avoid such pitfalls. My assignment is full of names and titles, but I tried to be intelligent about them. There are many long lists of names. If the individual names seemed relevant, I indexed them; if not, not. But I always indexed the list as a unit for whatever point it was there to make. Similarly with book titles. If a book by an Inkling was mentioned, I indexed it, directly under title. Almost always I made an entry for the author as well, but not sub-entered under book title - that's already covered - but by topic. Why is the book mentioned at this point? Is it as an example of a work written in collaboration, or with feedback, from other Inklings? Was it reviewed by another Inkling, or did somebody mention it favorably, or for that matter criticize it? Into the proper subcategory of the author entry it goes. If the book's title entry gets long, make sub-entries for that too.
One wants to be consistent in sub-entry terminology, but I felt no need to be consistent in depth of indexing. If a minor entry has just 4 or 5 references, don't bother to make sub-entries unless there's an important point to be brought out. If, among the many sub-entries on a major topic, there are 3 or 4 hair-splitting distinctions with only one or two references each, bring 'em together. But if they have 5 references each, keep them split, or split them if they haven't been already.
Nor is there any need to be reciprocal. The subtopic is the subtopic of that particular main entry. If A is quoted describing B's personality, the sub-entry under A should say, "on B" unless this needs to be further divided. But the sub-entry under B should say "personality," not "A on." If a reader wants to know what A said about B, look under A, not B. And so on.
This index was done in three stages. First, marking the proofs. Second, typing the entries and page numbers into a spreadsheet, which I then had the computer alphabetize (computers alphabetize word-by-word, so as the publisher requested letter-by-letter I manually rearranged it in final copy - not a difficult task). After completing the spreadsheet I thought I was mostly done, and had only to cut-and-paste into a Word document with some minor editing. Wrong. Stage three, the actual writing of the index, was the hardest and took by far the longest. Some entries were short and simple, but I always searched automatically through the PDF to see if I'd missed an obscure name reference or distinctive use of a word. Sometimes I had. But with the sub-entries, all my notes in the database were ad hoc. Constant rethinking was required to sort them out, and often I had to go back and make changes. There was nothing mechanical about this. It took thought and time.
I did some proofreading against the spreadsheet and found a few other errors when checking the printout for something. I'm sure there are others left. But I hope the index will be useful. This book is going to remake Inklings studies, and I get to guide people through its pages.
no subject
Date: 2006-10-17 06:16 pm (UTC)I've done a substantial amount of indexing, with no formal training at all, and have never had a single complaint from author or publisher, so apparently I have some talent for it. And I think that's the prime factor that makes the difference between a good and a bad indexer: an underlying ability. Technique can be learned, but it isn't sufficient.
If I had to define it, I'd say it includes an ability to accurately see patterns (humans may be pattern-seeking animals, but some of them are pretty inept at it, IMHO), some of the same wide-ranging trivial knowledge that a good editor/copyeditor also needs, and an ability to hold a lot of information in a kind of mental buffer/storage bank while working.
I find indexing to be extremely demanding, even exhausting, and I actively enjoy editing/copyediting, so I do mainly the latter.
no subject
Date: 2006-10-17 06:40 pm (UTC)Of course, if electronic texts were the norm, there'd be no need for that. But they aren't. *Other* aspects of indexing, the ones requiring judgement and skill, will remain relevant of course.
Your comments on the naming and lack of explanation of Asimov's two indexes are also quite to the point.
I've never constructed an index myself (which may be just as well), and hadn't considered the use of the length of the list of page references after an entry as a cue to when to split it into sub-entries. I've certainly suffered from people who didn't bother. So I have definitely learned something about indexing from your short article.
Seems like a good place to post this
Date: 2006-10-17 06:54 pm (UTC)http://www.pmla.org/altsource.html
no subject
Date: 2006-10-17 06:58 pm (UTC)Perhaps *you* should write the text on how to index.
Got here via
no subject
Date: 2006-10-17 10:22 pm (UTC)I certainly agree about inept pattern-seeking. There's several types of this that drive my crazy; the one that's currently bugging me most is people who cannot grasp the concept of inductive logic. (This lack is behind a lot of the religious-right objection to science, I think.)
no subject
Date: 2006-10-17 10:27 pm (UTC)The failures of Asimov's name index, of course, were not at all necessary to allow it to be used for egoscanning. That's the problem.
no subject
Date: 2006-10-17 10:40 pm (UTC)no subject
Date: 2006-10-17 11:28 pm (UTC)Re: Seems like a good place to post this
Date: 2006-10-18 02:13 am (UTC)re: indexing
Creating an index is as much an art as it is a science. It sounds like you have mastered probably the most difficult part, which is learning how to think about it: from considering the material, understanding what the audience will want/need, and then editing the index consistently. Knowing the material as well as you do certainly helps.
You're correct about the Chicago Manual of Style. Its concerns focus on matters which I tend to think of as copyediting, layout, and production. While these are important, they're not what you needed to become an indexer.
You might also consider that indexing is a sort of obscure (or even arcane?) skill that can take as much time to learn as you have to give to the effort. The American Society of Indexers (ASI) offers a Training in Indexing" course that takes anywhere from five months to three years, requiring 120 to 150 hours of work to complete. That's brand new.
It used to be that the USDA's courses was the big source of formal training. The Graduate School at the USDA still offers courses in indexing, editing, library stuff -- quite the variety. For that matter, I see from the ASI page that even UC/Berkeley offers a distance learning course (3 college credits).
Me, I took a one-day introductory workshop geared towards tech writers through my local tech writers' professional group (the STC). In it we learned, quite incidentally, that indexing a user manual can highlight a poor copyediting job -- indeed, even the lack of a database administrator for the system being documented.
/fannish-professional splurt of info
no subject
Date: 2006-10-18 02:59 am (UTC)I'm not impressed at the thought of graduate schools which offer courses in indexing. Remember, I took a graduate course in indexing which was totally useless.
One hopes for better from the ASI course, though I see it's divided into three units, of which only one is actually about, you know, indexing.
no subject
Date: 2006-10-18 03:30 am (UTC)re: the ASI course
Well, a lot of the thought and directions for indexing comes from a good analysis of the document and the people who will be using the index. That's what I suspect comprises much of their Unit A. Unit B seems to be the meat of index construction.
no subject
Date: 2006-10-21 09:49 pm (UTC)The indexing chapter does briefly discuss the actual process of indexing, though mostly in general principles with few illustrative examples. The discussion on p. 155-6 of when not to index a term misses the point of the particular example given: the reason it shouldn't be indexed is not that it vaguely "provides no information," but that it specifically is itself a cross-reference to another part of the text. Either this should have been more specific, or a broader range of examples should have been given.
What the chapter does discuss in some detail, besides the ordering of the index, is the choice of terminology, another useful point, though I have to say that a guide that considers it necessary to advise not to use index entries beginning with "What is a ...?" (p. 157) is intended for nitwits. But I must admit that there are a lot of nitwits in the world.