Extraction vs. Assignment – To be or not to be (an index)

Wow! I have to evaluate the effectiveness of indexing by extraction and indexing by assignment by contrasting the advantages and disadvantages of natural language and controlled vocabulary. Sometimes I like to read aloud our tasks to people who have no idea what “representation and organization of knowledge and information” means. The blank stares and rapid retreats always give me a little evil chuckle.

I guess the best place to begin is to define “indexing by extraction” and “indexing by assignment.” Soergel describes two different ways of indexing. The first is “entity related indexing,” analogous to “indexing by extraction” that involves “preparing entities representations.” An indexer observes the various aspects of the entity to be indexed and prepares a list through a searcher can peruse to determine the relevance of that entity to their search. Of course, among other things, the problem with this type of indexing is deciding what information should be included. Soergel also talks about “request oriented indexing” which would be analogous to “indexing by assignment.” In request oriented indexing, an indexer anticipates query statements and builds an index of resources based on what searchers may be looking for. One of the major challenges, however, with request oriented indexing is how to anticipate what query statements a searcher will use. Even without throwing “natural language” and “controlled vocabulary” into the mix, discussions of indexing by extraction and indexing by assignment are already difficult.

At the same time, as information professionals, we need to at least try. So, here’s my try.

Controlled vocabulary and indexing by extraction seem to go hand in hand. If an indexer has access to a controlled vocabulary, finding applicable words and phrases within a document will be much easier. In fact, it would involve simply denoting each occurrence of a term from the vocabulary. At the extreme, however, the result would be much closer to a concordance than an index. Not surprisingly, indexing by extraction will more often lead to a concordance like result than a topical index.

It should also be noted that indexing by extraction may also help build a controlled vocabulary, so that the relationship between the two things is very “chicken vs. egg.” The advantage of this type of indexing could exist for searchers familiar with the controlled vocabulary. Often these query makers have developed a type of internal thesaurus to help them navigate the index.

Like the relationship between entity-oriented indexing and controlled vocabulary, a connection exists between indexing by assignment (request-oriented indexing) and natural language. In indexing by assignment, the indexer must anticipate what questions may be asked and how the materials may relate to those questions. It is very important that the indexer be familiar with a variety of perspectives from which an entity may be approached. An article about head injuries among baseball players may be applicable to team doctors or trainers, baseball players, brain surgeons, eye doctors, parents of little leaguers, or psychologists to name a few. Each perspective may ask diverse questions for which the article may provide relevant information. It is then the responsibility of the indexer to anticipate all of these varied query statements. Of course, this seems like a much more appropriate way to index until you consider the immense amount of knowledge that must be held by the index along with the amount of time available.

There are, not surprisingly, huge debates about which type of indexing is better. Soergel mentions a third alternative that involves both anticipating queries and analyzing entities in advance. It seems that this may represent a happy medium. Perhaps an even happier medium would involve analyzing the body of searchers and determining what would best suit their needs. If one is anticipating searches from individuals within a specific discipline, a controlled vocabulary may be appropriate. If, however, a larger number of perspectives are anticipated, it may be more appropriate to index by request.

In my opinion, a good index should include both indexing by extraction and indexing by assignment. That index…the good one…can be both useful to a broad segment of searchers and instructive.

  Josh says:

    Well, you can do request-oriented indexing by abstraction or by assignment. This orientation is just using what the index will be used for as a guide, not the terms themselves.

