FACETED ACCESS: A REVIEW OF THE LITERATURE
by Amanda Maple
[This paper was presented by Amanda Maple during the Music Library Association Annual Meeting, 10 February 1995, at the open meeting sponsored by the Working Group on Faceted Access to Music. A handout accompanying this paper was distributed at that meeting; that handout has been incorporated into this version of the paper.]
The purpose of this paper is to define what is meant by facet analysis, and to review briefly the history of facet analysis within the context of other types of subject analysis in libraries and within the context of information retrieval research.
Subject access to information has traditionally been provided in one of two ways: through classification, and through assigning terms or phrases from a standardized vocabulary such as a list of subject headings or a thesaurus. With the advent of online catalogs and databases, subject access is now also provided through keyword indexing and searching. Classification, the first device mentioned, involves the development and use of a scheme for the systematic organization of knowledge. (Taylor p 576) Arlene Taylor identified three approaches to classification: enumerative, hierarchical, and analytico- synthetic. Enumerative classification attempts to assign headings for every subject and alphabetically enumerates them. Hierarchical classification uses a more philosophical approach based on the inherent organization of the subject being classified, and establishes logical rules for dividing topics into classes, divisions, and subdivisions. Analytico-synthetic classification assigns terms to individual concepts and provides rules for the local cataloger to use in constructing headings for composite subjects. Traditional classification systems in this country are basically enumerative, though many contain some elements of hierarchy and faceting. (Taylor pp 319-321)
Ranganathan was the first to introduce the word "facet" into library and information science, and the first to consistently develop the theory of facet analysis. A facet is, simply put, a category. Taylor defines facets as "clearly defined, mutually exclusive, and collectively exhaustive aspects, properties, or characteristics of a class or specific subject." Ranganathan demonstrated that analysis, which is the process of breaking down subjects into their elemental concepts, and synthesis, the process of recombining those concepts into subject strings, could be applied to all subjects, and demonstrated that this process could be systematized. (Taylor pp 320-21; Foskett p 390). The phrase "analytico-synthetic classification" derives from these two processes: analysis and synthesis.
The basic features of facet analysis, or analytico-synthetic classification, are: the analysis of compound subjects into terms, the organization of those terms into facets, the display of relationships between the terms, and the synthesis of terms into compound subject headings. Each facet has a distinctive notation and a facet indicator to show the sequence of facets. The facet indicator provides context for the term, that is, it shows which facet the term comes from. The notation, or set of symbols added to the classification system, represents concepts and gives each concept a filing value. Another function of the notation is to express the hierarchy of the classes. A notation that shows the hierarchy is called an expressive notation. Since the 1960s, all major classification schemes except the Library of Congress Classification have been partially or entirely restructured on a faceted basis. In fact, many of the subdivisions in Library of Congress subject headings show evidence of a faceted approach. (Taylor p 322; Aitchison and Gilchrist p 57)
The use of facets in information retrieval did not originate with Ranganathan. In the 18th century, a Frenchman named Condorcet devised what we would now call a faceted classification scheme for organizing information about objects or facts. (Whitrow) The Dewey Decimal Classification, first published in 1876, contained elements of facet analysis. Dewey recognized four facets common to all basic classes: bibliographic form, time, place, and general subjects (such as statistics or research) that at times are related to other subjects. (Foskett pp 176-7) Dewey provided for "number building" to combine two or more facets to express a complex subject. (Taylor p 320) The Universal Decimal Classification, based on the Dewey Decimal Classification and first published in 1905, was intended to be an international classification scheme. It also had elements of a faceted structure, and partly influenced Ranganathan's thinking. (Foskett p 349; Vickery pp 12-14)
The first edition of Ranganathan's Colon Classification was published in 1933. Ranganathan postulated five fundamental categories of facets: Personality (for things and kinds of things); Matter (physical materials and abstract properties; for example, the wood of a table, the table's shape, the table's color); Energy (for example, actions); Space; and Time. (Foskett p 391) Ranganathan devised a system of notation assigned to each facet and a system of symbols, including the colon, to serve as facet indicators within the synthesized classification mark. The classification mark represented the subject heading string that was synthesized from individual facets. Ranganathan also devised a citation order for determining the sequence of the facets when synthesizing subject headings. (Langridge pp 41-42)
Ranganathan's theories influenced several classification schemes in the 20th century. The British Catalogue of Music Classification, developed by Eric Coates for the British National Bibliography, was a fully faceted scheme influenced by the Colon Classification. It was not designed as a library classification scheme, but rather for use in the national bibliography. It was used to classify the British Catalog of Music from 1957 to 1982, and since then has influenced the revisions of both the Dewey Decimal Classification Class 780 and the Bliss Classification. (Elliker pp 1279, 1287, 1315-16) The recently revised Class 780 for music of the 20th edition of the Dewey Decimal Classification is a fully faceted scheme. (Redfern 1991) The first edition of the Bliss Classification was published in 1953 by Henry Bliss, librarian at the College of the City of New York. The second edition, under the editorship of Jack Mills, has been in progress since 1977 and is also a fully faceted scheme. (Thomas p 4, 8; Langridge p 70))
We have quickly reviewed facet analysis in the context of library classification systems. Information needs experienced during World War II led to research in information retrieval outside the world of libraries, especially in the sciences and applied sciences. Some of this research was conducted by members of a British group called the Classification Research Group, which was formed in 1952 "to discuss the principles and practice of classification, unhampered by allegiance to any particular published scheme." (Vickery pp 10-11) During the 1950s, the group concentrated on the construction and use of classification schemes devoted to specialized subject areas.
One of the early and important developments in information science research was the coordinate index, which is a list of terms that can be coordinated, or combined, during indexing or during searching. These coordinate indexes developed into subject-based terminology lists showing semantic relationships between terms, and provided the basis for standards for the construction and use of thesauri. In precoordinate indexing, terms from the coordinate index are combined at the time of indexing into subject strings. Postcoordinate indexing involves the coordination of terms by the searcher at the time of searching. Keyword indexes in online catalogs and databases also act as de facto coordinate indexes, and in most online catalogs today, the process of postcoordination is done through the use of Boolean operators. (Taylor p 453; Foskett p 502)
Much of the research in information science after World War II centered around defining and displaying the relationships between terms. There are two kinds of relationships between terms: semantic and syntactic. Semantic relationships denote concepts such as water, sea, and river, that are by definition permanent relationships; they arise from the definition of the subjects involved, and are not dependent on any particular document content. Syntactic relationships, on the other hand, denote otherwise unrelated concepts that are brought together as composite subjects in the documents being indexed. These relationships are not permanent, but rather ad hoc. The following summary gives examples of the types of relationships between terms.
RELATIONSHIPS BETWEEN TERMS
(from Foskett, chapters 5-6)
Foskett described three groups of semantic relationships: equivalence, hierarchical, and affinitive/associative. In equivalence relationships, more than one term denotes the same concept. These relationships are shown through cross-references in an alphabetical tool, and through juxtaposition in a classified tool. Hierarchical relationships are of two kinds: genus/species and whole/part. These relationships are shown through hierarchies in classified tools and with Broader and Narrower Term codes in alphabetical tools. Foskett described several kinds of affinitive/associative relationships; these relationships are denoted by Related Term codes. (Foskett pp 72- 78)
Syntactic relationships are displayed according to the syntax of a normal sentence, either through the syntax of the subject string (in precoordinate indexing), or through devices such as facet indicators (in postcoordinate indexing). The result of not providing for the display of syntactic relationships in postcoordinate systems results in users not being able to distinguish between different contexts for the same term. For example, a keyword search for "photographs" and "albums" should allow users to specify whether they want photographs of albums or albums of photographs. Another problem with postcoordinate systems occurs when two different composite subjects are represented in one document. When each composite subject is analyzed into terms, false drops can occur unless there is a mechanism for linking the terms to their respective composite subject. (Foskett p 442-4) In their manual for thesaurus construction, Aitchison and Gilchrist described devices for creating links to avoid false drops and for indicating roles to avoid retrieving terms out of context, but they caution that these devices can cause unforseen and undesired reduction in recall and are time-consuming for indexers. (Aitchison and Gilchrist pp 65-67)
In addition to procedures for displaying relationships between terms, information science research also developed rules for establishing terms to denote concepts. The various rules for establishing terms and for displaying relationships between terms, and rules for devising and organizing facets, resulted in international standards for the construction and use of thesauri.
To review what Taylor and others have written about the differences between thesauri and subject heading lists: thesauri are composed of terms that represent single concepts, whereas many subject headings represent compound subjects containing more than one concept. The relationships between terms in thesauri are defined and displayed according to rules, whereas the relationships between subject headings are at best shown inconsistently and, according to Mary Dykstra, are impossible by definition to establish (because headings are not terms).
Thesauri are usually limited to coverage of a particular discipline, whereas subject headings attempt to cover the entire realm of recorded knowledge. And though there are international standard guidelines for the creation of thesauri, there are none for subject heading lists. (Taylor p 454-5; Dykstra "LC Subject Headings" p 44)
An example of an information retrieval system that preserves both semantic and syntactic relationships is PRECIS, an acronym for Preserved-Context Indexing System. PRECIS was developed under the direction of Derek Austin during the late 1960s before the advent of online catalogs. Its design was influenced by the work of the Classification Research Group, and its purpose was to create printed lists of precoordinated subject headings in which the semantic context of each term, and the syntax of each string, were defined and preserved. PRECIS consists of rules for the analysis of a subject into individual terms, and rules for the synthesis of these terms into a "mini-abstract" or "precis" of the document content. Each term in the string was coded to show its syntactic relationship to the other terms in the string. The computer was programmed to use these syntax codes (called role operators) to print out as many multiple entry strings as were needed so that every important term in the string was placed in the lead position in the printed index, while preserving the syntax of each subject string. Example 1 shows the use of PRECIS to express the subject, "The renovation of houses by developers in Darmstadt."
***************************
EXAMPLE 1
Use of PRECIS to express the subject:
The renovation of houses by developers in Darmstadt
Concepts Roles
_____________________________________
PRECIS Analysis: Renovation Action
Houses Object of the action
Developers Agent of the action
Darmstadt Location
Resulting coding: (0) Darmstadt
(1) houses
(2) renovation $v by $w of
(3) developers
Index entries generated by computer:
Darmstadt
Houses. Renovation by developers
Houses. Darmstadt
Renovation by developers
Renovation. Houses. Darmstadt
By developers
Developers. Darmstadt
Renovation of houses
(example from Dykstra, "Handling the Stuff Itself")
***************************
PRECIS also involved the development and maintenance of a
thesaurus to record semantic term relationships. These
relationships were also coded and used by the computer to
generate cross-references. (Taylor pp 459-461; Foskett pp 254-
272; Dykstra "Handling the Stuff" pp 169-171) A thorough
description of how PRECIS works can be found in Austin's PRECIS
manual.
Current research in information retrieval suggests that
constructing a thesaurus that shows both semantic and syntactic
relationships will have potentially great benefits to our users
in the future. The syntactic coding and programming in PRECIS,
for example, could be adapted for use in postcoordinate systems.
In Example 2, Dykstra suggested that in a postcoordinate system
coded for syntactic relationships, the results of Boolean
searches could be displayed in syntactically organized word
strings that allow the user to choose from among the items
retrieved only those that really match the query. This display
is quite an improvement over the hodgepodge of hits that would
result from a search for the terms TEACHER, STUDENT, and
ASSESSMENT in most online catalogs today.
***************************
EXAMPLE 2
An example of how a postcoordinate system coded for syntactic
relationships could display the results of a Boolean search for:
TEACHER AND STUDENT AND ASSESSMENT
Results displayed:
Student teachers. Assessment. 13 items
Students. Assessment by teachers. 3 items
Teachers. Assessment by students. 21 items
United States. Students. Interpersonal
relationships with teachers. Assessment. 7 items
Universities, Students and teachers.
Assessment by administrators. 14 items
(example from Dykstra, "PRECIS in the Online Catalog")
***************************
Other recent research in information retrieval also supports
the use of syntactic as well as semantic relationships. PLEXUS
is an example of how a thesaurus that contains both kinds of
relationships can be used to develop an expert system for
referral. PLEXUS is an experimental database on the subject of
gardening developed by the Central Information Service of the
University of London. Every term in the thesaurus is assigned to
a role facet, and a structure of relationships between terms is
mapped. Example 3 shows the "role" facets in PLEXUS, and an
example of the logical process programmed into the system to
interact with the user. (Vickery, Brooks, and Vickery pp 161-2)
***************************
EXAMPLE 3
The "role" facets in PLEXUS are:
Object (rose, aphids)
Part of object (seeds, cuttings)
Operations (pruning, digging)
Process (die, flowering)
Interaction (X infesting Y)
Instrument (spade, weedkiller)
Attribute (herbaceous, silver)
Environment (garden, greenhouse)
Use (agriculture, domestic)
Time (winter, May)
Location (London, tropics)
The term "blossom" is in both the Process facet and the Part
facet. If a searcher enters the term "blossom," PLEXUS is
programmed to follow these steps:
IF there is another process concept present
THEN assume blossom is a part
ELSE if there is an operations concept present
THEN assume blossom is a part
ELSE if there is an object7 or object1 present (i.e.,
general term like plant or tree or specific plant
name)
THEN assume blossom is a process
ELSE assume blossom is a part
(example from Vickery, Brooks, and Vickery)
***************************
Computer programs already exist to search up and down hierarchies. Four of the works cited in the bibliography, those by Godert, Ingwersen and Wormell, Paice, and Rada and Barlow, all report recent research into the use of thesauri enriched with syntactic as well as semantic relationships to improve information retrieval. One conclusion we can draw from this direction in research is that doing the intellectual work necessary to establish syntactic and semantic relationships in a thesaurus would open up many possibilities for revolutionary improvements in information retrieval.
The Art & Architecture Thesaurus (AAT) was recently published in its 2nd edition, accompanied by a Guide to Indexing and Cataloging with the Art and Architecture Thesaurus. The AAT is a controlled vocabulary for the description of fine art and architecture. The terminology in the AAT is arranged in seven facets, which are further subdivided into thirty-three subfacets or hierarchies. The AAT is available in print or in electronic format. The print edition is arranged in two displays: volumes 1-2 show the thirty-three hierarchies, and volumes 3-5 are the alphabetical displays that contain information about each term and serve as an index to the hierarchies. The Guide provides thorough instructions, with examples, for indexing and searching with the AAT, including over 150 pages of actual cataloging records, mostly MARC records, showing the use of AAT terms for cataloging bibliographic materials, electronic materials, film, maps, paintings, personal papers, and many other items. The AAT has served as a model for the conceptual development of the Music Thesaurus.
In his recent article in Notes, Calvin Elliker summarized the seven facets for music put forth by the International Association of Music Libraries Classification Subcommission in 1974: performing medium; forms and genres; time; space; purpose; occasion; and contents. He also summarized the seven concepts that Redfern proposed for organizing scores in Organising Music in Libraries, which are composers; instruments; size of ensemble; forms; musical character; space; and time. For the purposes of his study, Elliker added to these the concept of "format." (Elliker p 1270) In Chapter 5 of his Music Cataloging, Richard Smiraglia reviewed several writers' suggestions for organizing music, including Redfern's, and conflated them into the following elements: intellectual form, containing facets for medium of performance and for form of composition; topicality, containing facets for topic or character, for historical style period, and for cultural influences; intended audience; and physical form, containing facets for printed music and for recorded music. As Smiraglia wrote, "promises of truly faceted systems, those that allow the regular use of all relevant elements, are the hope of the future for subject retrieval of musical works." (Smiraglia pp 63-66, 72)
SELECTED BIBLIOGRAPHY ON FACET ANALYSIS
Aitchison, Jean and Alan Gilchrist. Thesaurus Construction: A
Practical Manual. 2nd ed. London: Aslib, 1987.
Art & Architecture Thesaurus. 2nd ed. 5 vols. New York: Oxford
University Press, 1994.
Austin, Derek. PRECIS: A Manual of Concept Analysis and Subject
Indexing. London: The British Library Bibliographic Services
Division, 1984.
Dykstra, Mary. "'Handling the Stuff Itself': Toward Automated
Textual Analysis." In Tools for Knowledge Organization and
the Human Interface. Vol. 2. Proceedings, 1st International
ISKO Conference, Darmstadt, 14-17 August 1990. Ed. Robert
Fugmann. Frankfurt/Main: Indeks Verlag, 1991, pp. 168-174.
_____. "LC Subject Headings Disguised as a Thesaurus." Library
Journal 113 (March 1, 1988): 42-46.
_____. "PRECIS in the Online Catalog." Cataloging &
Classification Quarterly 10, no. 1/2 (1989): 81-94.
Elliker, Calvin. "Classification Schemes for Scores: Analysis of
Structural Levels." Notes 50 (June 1994): 1269-1320.
Foskett, A.C. The Subject Approach to Information. 4th ed.
Hamden, Connecticut: Linnet Books, 1982.
Godert, Winfried. "Facet Classification in Online Retrieval."
International Classification 18, no. 2 (1991): 98-109.
Guide to Indexing and Cataloging with the Art and Architecture
Thesaurus. Eds. Toni Petersen and Patricia J. Barnett.
New York: Oxford University Press, 1994.
Ingwersen, Peter and Irene Wormell. "Ranganathan in the
Perspective of Advanced Information Retrieval." Libri 42
(July-September 1992): 184-201.
Langridge, D.W. Subject Analysis: Principles and Procedures.
London: Bowker-Saur, 1989.
Paice, Chris D. "A Thesaural Model of Information Retrieval."
Information Processing & Management 27 (1991): 433-447.
Rada, Roy and Judith Barlow. "Document Ranking Using an Enriched
Thesaurus." Journal of Documentation 47 (September 1991):
240-253.
Ranganathan, S.R. Colon Classification, Basic Classification.
6th ed. New York: Asia Publishing House, 1963.
_____. Prolegomena to Library Classification. 3rd ed. New York:
Asia Publishing House, 1967.
Redfern, Brian. "On First Looking Into Dewey Decimal
Classification 20, Class 780: A Review Article." Brio 28
(Spring/Summer 1991): 19-28.
Redfern, Brian. Organising Music in Libraries. Rev. ed. 2 vols.
Hamden, Connecticut: Linnet Books, 1978.
Smiraglia, Richard P. Music Cataloging: The Bibliographic Control
of Printed and Recorded Music in Libraries. Englewood,
Colorado: Libraries Unlimited, 1989.
Taylor, Arlene G. Introduction to Cataloging and Classification.
8th ed. Englewood, Colorado: Libraries Unlimited, 1992.
Thomas, Alan R. "Bliss Bibliographic Classification 2nd Edition:
Principal Features and Applications." Cataloging &
Classification Quarterly 15, no. 4 (1992): 3-17.
Vickery, B.C. Faceted Classification Schemes. Rutgers Series on
Systems for the Intellectual Organization of Information.
Vol. 5. Ed. Susan Artandi. New Brunswick, New Jersey:
Graduate School of Library Service, Rutgers, the State
University, 1966.
Vickery, A., H.M. Brooks, and B.C. Vickery. "An Expert System for
Referral: the PLEXUS Project." In Intelligent Information
Systems: Progress and Prospects. Ed. Roy Davies. Chichester:
Ellis Horwood, 1986, pp. 154-183.
Whitrow, Magda. "Historical Studies in Documentation: an
Eighteenth-Century Faceted Classification System." Journal
of Documentation 39 (June 1983): 88-94.
STANDARDS FOR THESAURUS CONSTRUCTION
British Standards Institution. British Standard Guide to
Establishment and Development of Monolingual Thesauri.
BS 5723. 1987.
International Organization for Standardization. Documentation:
Guidelines for the Establishment and Development of
Monolingual Thesauri. 2nd ed. ISO 2788. 1986.
National Information Standards Organization. Guidelines for the
Construction, Format, and Management of Monolingual
Thesauri: An American National Standard.
ANSI/NISO Z39.19-1993. 1994.
Return to BCC
1995 Documents Menu
Return
to BCC Historical Documents Menu
Return to BCC Home Page
Last updated February 4, 2000