May 2002
June 2002
July 2002
Aug 2002
Sept 2002
Oct 2002
Nov 2002
Dec 2002
Jan 2003
Feb 2003
Mar 2003
Apr 2003
May 2003
June 2003
July 2003
Aug 2003
Sept 2003
Oct 2003
Dec 2003
~~~ Ramana Rao's INFORMATION FLOW ~~~ Issue #4 ~~ August 2002 ~~

"Ranganathan’s objection to the prevailing classification
systems, such as Dewey Decimal Classification and Library of
Congress Classification, was that they tried to enumerate all
possible subjects and provide preconceived pigeonholes to
accommodate all documents. But this enumerative approach made
little allowance for the addition of new topics. Thus, these
systems couldn’t easily accommodate the explosion of
knowledge occurring in the twentieth century ...

As so often happens in scientific discovery, this vague
notion was fully conceptualized only with the help of an
unlikely catalyst. For Isaac Newton, according to legend, the
catalyst was a falling apple. For Friedrich Kekule,
discoverer of the benzene ring, it was a snake with its tail
in its mouth that appeared to him in a dream. For
Ranganathan, it was a toy erector set at Selfridge’s, the
London department store. There he saw a salesperson create
an entirely new toy with each new combination of metal
strips, nuts, and bolts. The experience made Ranganathan
realize that his classification scheme should likewise
consist of elements that could be freely combined to meet the
needs of each specific subject."

-- Eugene Garfield's 1984 remarks on S.R. Ranganathan's
development of the Colon Classification System (1933)

~~~ IN THIS ISSUE ~~~ August 2002

* Facets and Multiple Angles of Access
* Links and Resources
* Software for Classification and Extraction
* Web Seminar on the Problem of Underutilized Content

~~~ Facets and Multiple Angles of Access

Last month, I described two kinds of systems used in classifying
and indexing content: classification schemes and controlled
vocabularies. I promised to continue the discussion this month
on an approach called "faceted classification" that addresses the
fact that information resources are usually complex objects with
many dimensions and characteristics.

Stepping back, it is wise to keep reminding yourself, that the
ultimate purpose of both classification systems and controlled
vocabularies is to facilitate access. Thus, ask yourself over
and over, how might the seeker look for a resource? And you will
naturally calibrate hypothetical issues against real goals.

Faceted systems address the fact that information seekers might
seek a resource from any number of angles corresponding to its
rich structure and multidimensionality. By capturing the
distinct characteristics or dimensions as "facets", a faceted
system provides for greater flexibility at access time than a
system that tries to combine all elements into categories or
subject headings at index time.

As a simple example, consider a collection of recipes. You might
seek a recipe based on its ingredients (beef or eggplant), its
cuisine (French or Indian), how it is prepared (baked or
grilled), occasion (Christmas or Summer), or course (appetizer or
dessert). The Epicurious site enables access to its 14000
recipes by such facets.

Many other potential applications appear on the Internet
including wines, cheeses, and the ecommerce catalogs of any
boutique or niche store like luxury gifts, specialized clothing,
music, videos, even books. At first blush such examples make
faceted systems seem quite simple. At least for these
collections, we have a relatively well-bounded task to support
(e.g. finding an object to buy) and a reasonably articulated
cultural context for defining the facets themselves (e.g. wines
being known by regions).

The apparent simplicity of these cases belie the actual challenge
of developing a faceted system, even in these cases (e.g. what
does the user really want to do?), but even more so for a less
clearly contextualized set of resources, say, a diverse
collection of documents on an intranet. It is considerably
harder to characterize the more abstract knowledge tasks and to
analyze the distinct facets of not just the document's content,
but also of the document as an information container. Consider a
preliminary outline of possibilities:

Tasks or Goals --

Getting an overview
Orienting toward relevant subsections
Assimilating a subject area
Finding specific facts
Evaluating authority or credibility

Facets of Document Content --

Subjects or Topics
People, Organizations
Locations (or Places)
Time Periods

Facets of Document Form or Provenance --

Publication Date
Intended Audience
Genre or Form
Price, Usage Rights

Against the above decomposition, which is by no means exhaustive,
you can test key points:

* a seeker might seek, select, absorb an information resource
based on a variety of characteristics

* a single hierarchical structure that somehow combined all
characteristics would be hard to develop and would likely bias
access in ways that would defeat certain uses

* defining a collection of facets that are usable and useful is
also a challenging endeavor

Marcia Bates recently used the analogy: "faceted classification
is to hierarchical classification as relational databases are to
hierarchical databases." Ironically, this analogy rings richly
to a computer scientist, while falling dead for everybody else.
What I hear most simply is: storage::indexing should not limit
the way that records::resources can be accessed.

I hope this article is a useful overview for you, feel free to
tell me what you think, good, bad, ugly. If I've just whet your
appetite, you will find many relevant Web resources below.

~~~ Software for Classification and Extraction

In IF#1, I discussed the category of software called Enterprise
Categorization (or Taxonomy), which supports the cataloging of
enterprise content. IF#3 and the above article on library
cataloging provide a good platform for thinking about challenges
for building software that will help not only information
publishers and catalogers but also information consumers.

I will expand on this topic next month, but I'd like to leave you
with a taste of what it might mean for software to support
multiple angles of access:

* Controlled vocabularies are fundamentally a different kind of
language than a classification scheme. Furthermore supporting
facets increases the challenges. The forces that created these
varied representational devices should be heeded in the design
of indexing and retrieval software.

* Beyond filing documents into monolithic hierarchical
classification systems focused on major topic, software can
extract a variety of properties related to both the content and
context of the document.

~~~ Web Seminar on the Problem of Underutilized Content

Overload or Under Use? Industry expert, Mike Maziarka of CAP
Ventures, a leading analyst firm, and I will be giving a Web
seminar about the challenges enterprises face in leveraging their
information assets. Sponsored by Inxight, this Webinar focuses
on broad business and technology perspectives on the problem of
underutilized content. Sign up at:

~> http://www.inxight.com/news/seminar_register.php

~~~ Links and Resources

After the Dot-Bomb: Getting Web Information Retrieval Right

Marcia Bates, a respected professor of information science,
makes seven recommendations for more effective retrieval
systems, the first related to faceted classification. Lou
Rosenfeld add his remarks on his blogs.

~> http://firstmonday.org/issues/issue7_7/bates/
~> http://louisrosenfeld.com/home/bloug_archive/000100.html

Examples of navigation systems based on facets

The Epicurious recipe database and Tower Music's catalogs are
good simple examples. Tower's implementation is based on a
product from Endeca. Another company, bpallen technologies,
showcases their product on other kinds of collections.

~> http://eat.epicurious.com/recipes/browse_home/index.ssf
~> http://towermusic.endeca.com/towermusic?n=0
~> http://www.bpallen.com/product.html

Intricacies of Faceted Systems

Amanda Maple, a music librarian, surveyed the literature on
faceted systems for the Music Library Association, to support
the faceting in the Music Thesaurus project. Music along with
Architecture and Art are among the more traditional areas of
librarianship that call out for Faceted systems. The Getty's
Architecture and Art Thesaurus includes facets that are
concrete and straightforward while others are abstract and
seem potentially elusive.

~> http://theme.music.indiana.edu/tech_s/mla/facacc.rev
~> http://www.musiclibraryassoc.org/BCC/BCC-Historical/BCC94/94WGFAM1.html
~> http://www.getty.edu/research/tools/vocabulary/aat/about.html

Ranganathan and the Colon Classification System

The requirement of supporting multiple angles of access is so
important that in fact all classification systems eventually
must grapple with facets. The Colon Classification System,
first published in 1933, by the great Indian library
scientist, S.R. Ranganathan is designed ground up based on
facets. Articles, by the famous Eugene Garfield, focus on
this topic, but also provide a much broader glimpse into the
history of library science.

~> http://www.garfield.library.upenn.edu/essays/v7p037y1984.pdf
~> http://www.garfield.library.upenn.edu/essays/v7p045y1984.pdf

Information Architects Unite on Faceted Systems

Faceted Classification has certainly been top of mind for
several thought leaders in the Information Architecture
community. A number have written easily digested short pieces
on faceted systems.

~> http://www.adaptivepath.com/publications/essays/
~> http://www.eleganthack.com/archives/002780.html
~> http://www.peterme.com/archives/00000063.html

