About RR & IF
Newsletter Home
May 2002
June 2002
July 2002
Aug 2002
Sept 2002
Oct 2002
Nov 2002
Dec 2002
Jan 2003
Feb 2003
Mar 2003
Apr 2003
May 2003
June 2003
July 2003
Aug 2003
Sept 2003
Oct 2003
Dec 2003
~~ INFORMATION FLOW #3 ~ July 2002 ~~~ Ramana Rao ~~~ @ inxight.com

I recently encountered the following quote on the Web. It was
apparently written by Heinlein as part of an essay that was looking
forward 50 years:

"The greatest crisis facing us is not Russia, not the Atom bomb,
not corruption in government, not encroaching hunger, nor the
morals of the young. It is a crisis in the *organization* and
*accessibility* of human knowledge. We own an enormous
"encyclopedia" -- which isn't even arranged alphabetically. Our
"file cards" are spilled on the floor, nor were they ever in
order. The answers we want may be buried somewhere in the heap,
but it might take a lifetime to locate two already known facts,
place them side by side, and derive a third fact, the one we
urgently need. Call it the crisis of the librarian."

-- Robert A. Heinlein, 1950 or 1952
-- via http://www-ec.njit.edu/~robertso/infosci/heinlein.html

Fifty years later, it's still the crisis of modern reality. The
future now as always depends on understanding the natural and human
world, identifying important questions/problems, answering/solving
them, discovering and inventing, creating beautiful and useful things,
and making good decisions. Which of these doesn't depend on access to

~~~ IN THIS ISSUE ~~~ July 2002 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Vocabularies of Cataloging AND/OR Cataloging of Vocabularies
* Under construction (e.g. am I late with this issue?)
* Links

~~~ Vocabularies of Cataloging AND/OR Cataloging of Vocabularies

Librarians have been "cataloging resources" for a long time. The
diversity and richness of resources and thus the intellectual
challenge of the activity have been considerable, even before the day
of Yahoos and Bloggers. So, with words like directories,
vocabularies, ontologies, thesaurus, and taxonomies coming at us from
all different directions as the answer to organizing information, I
thought I'd go back and clear my head by looking at the vocabularies
of cataloging.

Two resources were particularly useful in putting this article
together. The first is a "dictionary" in the lay sense, but the
second is not what most would recognize as a thesaurus. Shortly, you
will see that it is a thesaurus in a more technical sense.

Joan Reitz's Online Dictionary of Library and Information Science

ASIS Thesaurus of Information Science, 2nd Edition

Exploring these resources, you would quickly discover that there are
two kinds of "languages" used in classifying and indexing resources:
classification schemes and controlled vocabularies. Classification
schemes order a collection of resources by assigning unique,
sort-able, structured codes (e.g. a call number) to each resource.
You have probably encountered the two common library systems: the
Dewey Decimal Classification and the Library of Congress
Classification systems [see links].

Controlled vocabularies are used to label a resource with one or more
subject headings or descriptors. These labels typically show up in
the card catalog as subjects headings or in digital library
collections as index terms or keywords. Examples of controlled
vocabularies include the Library of Congress Subject Headings, the
ASIS Thesaurus, the Medical Subject Headings [see links].

Though the difference between the two kinds of languages is becoming
increasingly blurred, the essential distinction remains important.
The primary task of a classification scheme is to pick a *class* for
an object, whereas that of a controlled vocabulary is to pick all
*terms* that apply. In the library, you (the librarian) have shelves
within stacks within rooms, and the job is to put a book onto the
right shelf along with others of its class. When you (the patron)
browse a shelf, you get a sense of all that is available on a topic,
and if enough is there, a sense of the topic itself.

In the case of a controlled vocabulary, you (the librarian) have a
book and you must pick sanctioned terms in sanctioned combinations
from, say, a five volume bible called the "Red Books" to put on the
book's catalog card. When you (the patron) look at the subject
headings for a given book, you get a sense of what the particular book
is about. The following example illustrates both kinds of systems:

Dewey Classes (hierarchical)
500 Natural Sciences and Mathematics
590 Animals
599 Mammals
599.8 Primates

Dewey Call Number
[599.8072 J192b 2001]
Beauty and the beasts: women, apes, and evolution
by Carole Jahme

Library of Congress Subject Headings
Women primatologists
Human-animal relationships

Classification schemes and controlled vocabularies encounter common
issues of control, coordination, and structure across elements. A
classification scheme provides a list (enumerative) or tree
(hierarchical) structure of classes, which represent, say, all topics
of interest. In hierarchical structures, higher level classes
represent broad topics, and lower level classes represent successively
finer and finer subtopics. A class may be linked to other classes
through "see also" links. Increasingly, a resource is classified into
multiple classes, especially in electronic collections, unconstrained
by physical location.

A controlled vocabulary provides a set of terms (words or phrases)
that may be used to describe an object. A thesaurus, in particular,
will prescribe "preferred terms" for indexing that are "used for"
other synonym terms. The terms are organized in a hierarchy of
"broader terms" and "narrower terms" relationships, and cross-linked
to "related terms". This structure allows for indexers to navigate
terms to find permitted and specific terms as well as additional terms
that might apply.

Matters quickly get complicated when combinations of terms are needed
to specify a subject. Subject headings systems often prescribe strict
rules on ordering and interpreting combinations, note the
"Primates--Research" subject heading above. This kind of coordination
at indexing time is called "pre-coordination". Alternatively,
"post-coordination" systems assigned multiple terms (e.g. "Primates",
"Research") which the user combines at search time.

This matter of coordination suggests a deeper issue. Resources have
multiple characteristics that are independent of one another. And
thus, user might want to access the resource along any number of
dimensions. The problem can't be solved by putting the resource in
multiple places or picking multiple terms from a single hierarchy.
Hierarchies get quite twisted and elements get baroque when you
attempt to combine the multiple dimensions of an object.

This brings us to "faceted classification" schemes. These move beyond
"hierarchical" or "enumerative" classification schemes, by decomposing
objects into their different facets and assigning classes or terms
independently for each of these facets. More to come in the next
issue "near the middle of next month."

~~~ Under Construction ~~~~~~~~~~~~~~~~~~~~~~~~

Speaking of "near the middle of the month," this newsletter shipped on
May 15th, Jun 19th, and now July 22nd. Having been an engineer, I can
argue that the 22nd is within the middle half of the month. But in
any case, the trend doesn't look good, ... I will buck the trend next

I've been refining my thinking on this "Information Flow" activity,
not just the newsletter, but also other vehicles for communication
including my Web site, print articles, and live presentations (six or
seven this fall). Here's my "hierarchy of topics":

Information Flow -- about intelligent approaches to information
access that make people, organizations, and society more
productive and more creative.

* Information Science and Architecture -- organizing information
for access and use

* Knowledge Work and Workers -- workers, workplaces, and how
knowledge work really happens

* Information Visualization and Design -- using visual techniques
and graphic design to increase the bandwidth of interaction

* Content Access Technologies -- software technologies for
categorizing, organizing, analyzing, and tagging content

* Software Architecture and Design -- how representation,
computation, and infrastructure impact information access

* Power Tools -- tools for individuals that enhance productivity
and make them smarter, more informed, more creative

* Broader, Wilder -- the broader design, business, and cultural
context and the wilder edge that reveal possible futures

Moving my project and its various vehicles forward requires furthering
refinement of directions and particularly, what might be of most
interest. I'd love to hear your thoughts.

~~~ Links ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Examples of Classification Schemes
Besides, the two most common systems used in libraries, I've included
the ACM Classification System. Take a look at at the "How to Classify
..." link on the ACM site for instructions to article authors.
LCC -- http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html
DDC -- http://www.oclc.org/dewey/about/thousands.htm
ACM -- http://www.acm.org/class/1998/overview.html

Examples of Controlled Vocabularies
Look at the picture of the "big red books" at the first link, lots of
subject headings in there. Search through the ASIS Thesaurus, for
"index language" and "organization of information".
LCSH -- http://lcweb.loc.gov/cds/lcsh.html#lcsh20
ASIS -- http://www.asis.org/Publications/Thesaurus/tnhome.htm

Listing of Classification Schemes and Controlled Vocabularies

Ramana Rao (rao @ inxight.com) is Founder & CTO of Inxight Software, Inc.
Copyright 2002 Ramana Rao. All rights reserved. Reproduction of
material from Information Flow without permission is prohibited.
Forward this issue in its entirety freely.
See: http://www.ramanarao.com
Archive: http://www.ramanarao.com/informationflow/archive/2002-07.html
Unsubscribe: mailto:leave-informationflow@envoy.inxight.com