Derived Indexing

Contents

Introduction to Derived Indexing
Title-Based Indexing

Keyword in Context (KWIC) Indexing
Keyword Out of Context (KWOC)
Keyword Augmented in Context (KWAC)
Key-Term Alphabetical (KEYTALPHA)

Merits of Keyword Indexing System
Demerits of Keyword Indexing
Search Strategy for Keyword Indexes
Conclusion
Citation Indexing
Advantages of Citation Indexing

0 Introduction to Derived Indexing

We have to encode the subject of a document in order to place the document itself or our records of it in our store. This means that we must in the same way be able to specify the subject. Generally, an indexer neither has time to read all the documents added to the stock nor has enough understanding about them. He, therefore, uses short cuts-like: the contents page, preface or introduction, or publishers blurb on the book cover; or an abstract if we are looking at a journal article or technical report; or the claims for a patent specification. All of these will give some indication of the subject and will suggest certain lines of thought if we want to pursue the matter further, for example in a dictionary or encyclopedia.

While indexing we may rely solely on information which is manifest in the document, without attempting to add to this from our own knowledge or other sources. This is derived indexing, that is, indexing derived directly from the document. There are some ways in which derived indexing has been used to produce printed indexes, particularly in computer-based systems. These are now often found in online systems, but the principles remain the same.

However, during the process of indexing, it is practice to distinguish between intellectual and clerical effort involved in an IR system, and computers enable is to carry out the clerical operations at high speed. Derived indexing reduces intellectual effort to a minimum and is thus suited to computer operations, which enables to get a variety of outputs from the one input.

Examples of derived indexing are title based indexing and citation indexing.

1 Title-Based Indexing

There is one part of a document in which authors themselves usually try to define the subject: the title. The title in itself is a one-line summary of a document and this serves as an index point, hence, title indexes came into force. This is very simple as the important terms representing the subject of the document are selected and rotated to prepare entries from the title, moreover, this could be very easily prepared using a computer. Examples of title indexes are KWIC (Key Word In Context, KWOC (Keyword Out of Content), and KEYTALPHA (Key-Term Alphabetical).

It is important to note that the titles are not always provided in a manner to represent the subject, so title-based indexes are good only if the subject is clearly expressed in the words f the title Title-indexing is also referred to as Keyword indexing.

Keyword indexing system was originally developed by Andrea Crestadoro in 1956, under the name ‘Keywords in Titles’. He used it for the catalogue of the Manchester Public Library. H.P. Lubn of IBM revived this system under the name of Keyword in content (KWIC) in 1958. KWIC was adopted by American Chemical Society in 1960 for its publication ‘Chemical titles’.

Keyword indexing was a significant development in the area of subject indexing. It is a totally mechanised, computerised and automated indexing system.

1.1 Keyword in Context (KWIC) Indexing

Keyword in Context Indexing system is based on the principle that the title of the document represents its contents. It is believed that the title f the document is one line abstract of the document. The significant words in the title indicate the subject of the document. a KWIC index makes an entry under each significant word in the title, along with the remaining part of the title to keep the context intact. The entries are derived using terms one by one as the lead term along with the entire context for each entry.

(a) Structure

Each entry in KWIC index consists of three parts

i) Keyword: Significant words of the title which serve as approach/access teems.

ii) Context: The rest of the terms of the title provided along with the keywords specifies the context fo the document.

iii) Identification or Location Code: A code (usually the social number of the entry) which provides an address of the document where its full bibliographical details will be available.

In order to indicate the end of the title, a “/” symbol is used. The identification code is put on the extreme right to indicate the location of the document.

(b) Indexing Process

KWIC indexing system consists of three steps

Step I : Keyword selection

Step II :Entry generation

Step III : Filing

Step I: First of all significant words or keywords are selected from the title. It is done by omitting articles, prepositions, conjunctions and others non-significant words or terms. The selection is done by the editor who marks the keywords. When a computer is used for preparing an index, the selection is done by having ‘stop list’ of non-significant terms stored in it. A stop list consists of articles, prepositions and certain other common words which would be stopped from becoming the keywords. Another method of providing the correct terms f entries is by human intervention at the input stage, wherein the editor indicates the key terms which are then picked up by the computer.

Step II: After the selection of keywords, the computer moves the title laterally in such a way that a significant word (keyword) for a particular entry always appears either on the extreme left-hand side or in the centre. The same thing can be performed manually following the structure of KWIC to generate entries.

Step III: After all the index entries for a document are generated, each entry is filed at its appropriate place in the alphabetical sequence.

Example: Classification of Books in a University Library (with identification code 1279)

Step I :
Classification Books University Library

StepII :
CLASSIFICATION of Books in a University Library 1279
Books in a University Library/Classification of 1279
UNIVERSITY Library/Classification of Books in 1279
LIBRARY/Classification of Books in University 1279

Step III :
Books in a University Library/Classification of 1279
CLASSIFICATION of Books in a University Library 1279
LIBRARY/Classification of Books in a University 1279
UNIVERSITY Library/Classification of Books in a 1279

The keyword may also be in the centre as follows:

Classification of BOOKS in a University Library 1279

University Library CLASSIFICATION of Books in a 1279

in a University LIBRARY/Classification of Books 1279

of Books in a UNIV. LIBRARY/Classification 1279

Some variations in the keyword in context indexing system have been introduced to overcome its limitations and to improve its working. Important among the variants are:

1. KWOC (Keyword Out of Context)

2. KWAC (Keyword Augmented in Context)

3. Key-term Alphabetical (KEYTALPHA)

1.2 Keyword Out of Context (KWOC)

In KWOC system, keyword or the access point is shifted to the extreme left at its normal place in the beginning of the line. It is followed by the complete title to provide complete context. The keyword and the context are written either in the same line or in two successive lines. Both the formats are displayed below.

Example-Title: Computerisation of Libraries in India

FORMAT1

COMPUTERISATION Computerisation of libraries in India 1289

INDIA Computerisation of libraries in India 1289

LIBRARIES Computerisation of libraries in Indian 1289

FORMAT 2

COMPUTERISATION

Computerization of libraries in India 1289

INDIA

Computerisation of libraries in India 1289

LIBRARIES

Computerisation of libraries in India 1289

These entries are then filed in an alphabetical sequence in the file of the KWOC index.

It should be noted that the changing of format in KWOC index has provided only limited improvement. Since it follows the same indexing technique there is hardly any difference in its retrieval efficiency.

1.3 Keyword Augmented in Context (KWAC)

The acronym KWAC also stands for Keyword and Context. The KWAC system provides for the enrichment of the keywords of the title with additional significant words taken either from the abstract f the document or its contents. Since titles do not always represent the contents of a document fully, the enrichment minimises this limitation. The problem of false retrieval, which is inherent in a purely title based indexing system, is solved to some extent.

For example, consider a title of a document ‘Expert System’. Here, in this case, the title is not clearly expressing the contents of the document. So the abstract of the document or even the contents itself may be consulted to find the significant words, which should be added to the title to make it expressive. E.g. the above example may result in, Expert System in Library then the index should be prepared either by KWIC or by KWAC system

1.4 Key-Term Alphabetical (KEYTALPHA)

In the Key-Term Alphabetical index, keywords are arranged side by side without forming a sentence. Entries are prepared containing only keywords and location excluding the context.

Example: Computerisation of libraries in India

The Keytalpha index entries are:

COMPUTERISATION, INDIAN, LIBRARIES 1289

INDIA

, LIBRARIES, COMPUTERISATION 1289

LIBRARIES, COMPUTERISATION, INDIA 1289

Merits of Keyword Indexing System

1. The principle merit of keyword indexing is the speed with which it can be produced.

2. The system automatically generates the entries.

3. The system is easly to operate

4. It does not require any intellectual labour on the part of indexer.

5. The keyword index reflects current terminology in a particular subject field since words used as access points are those used by author in his title.

6. No controlled vocabulary is required.

7. Users may not remember the exact order of keywords in titles and subject headings, but are likely to remember the keywords themselves. Therefore, keyword access is more likely to result in successful retrieval than without keyword access.

Demerits of Keyword Indexing

1. As the entries are prepared based on the title of a document, sometime the entries prepared may not be representing the embodied thoughts, i.e. in case of catchy, fanciful, non-expressive and vague titles.

2. Computer will have to be given proper ‘stop list’, otherwise unwanted entries might be prepared.

3. As the entries are arranged alphabetically, information in a specific topic get scattered throughout the index.

4. Searching for related subjects, in order to narrow or broaden the search also presents problems since no recognized hierarchical structure is incorporated in the index.

Search Strategy for Keyword Indexes

In the keyword indexes, significant terms of the titles of documents are arranged alphabetically, each having its context and the identification number. There is no vocabulary control and, therefore, related or identical subjects are scattered throughout the index file. There is no reference system to connect or correlate the related or identical topics. While formulating search strategy, these limitations should be kept in mind. The user should search under the synonyms of the words and also under the related terms. When titles are improved and supplemented by the editors, the search yields better results. The keyword indexes do not provide for the coordination of two or more search words. In search strategy this limitation should also be kept in mind. Also the users of these indexes should be prepared to search under the terms with alternative, spelling singular plurals, synonyms and near synonyms. Because of the uncontrolled vocabulary, the number of search terms is considerably enlarged necessitating more search efforts.

Conclusion

Despite the deficiencies, the keyword index has been quite popular during the last four decades. A number of evaluation studies have indicated that keyword indexes may offer several advantages over others. The continued growth of machine-readable database has shown that the use of keyword indexes works well. The problem of unexpressive titles is solved to a considerable extent by editorial intervention. It is true that Key Word Indexes as such will not facilitate a comprehensive search. Production of any index taking care of comprehensive search takes time, money and effort. Key Word Index was never envisaged to provide a comprehensive subject index. It is a mechanism of providing quick and specific subject approach to information which Luhn envisaged it to be.

3.2 Citation Indexing

Citation index is an ordered list of cited articles along with a list of citing articles. The cited article is identified as the reference and the citing article as the source. The index is prepared utilising the association of ideas existing between the cited and the citing articles, as the fact is that whenever a recent paper cites a previous paper there always exists a relation of ideas, between the two papers.

Examples of Citation Index:

1. Science Citation Index-Philadelphia: Institute for Scientific Information, 1963-

2. Social Science Citation Index – Philadelphia; ISI, 1973-

Citation indexes have proved to be better than the other indexes and can be prepared without much complications. They are also amenable to computer manipulation.

Citation indexing provides subject access to bibliographic records in an indirect but powerful manner. Since the citation or reference to another scholar’s work implies an intellectual connection between citing and cited publications, one can make the fundamental assumption that the citing and cited publications deal with either the same or closely related subjects.

Advantages of Citation Indexing

1. Citation indexing eliminates the need, for intellectual indexing; it has the potential of being automated to a large degree.

2. Citation indexing overcomes the problems of vocabulary and semantic difficulties.

3. It overcomes the language barrier, because citation patterns, especially in scientific disciplines, are similar across languages.

4. Literature searches using citation indexing are highly effective in gathering a large number of relevant documents quickly.

5. Objective factors such as the number of citations and frequency of being cited can be used in introducing various weighting and other procedures to improve the quality and effectiveness of retrieval.

REFERENCES

Information Access Through The Subject : An Annotated Bibliography / by Salman Haider. - Online : OpenThesis, 2015. (408 pages ; 23 cm.)

LIBRARIANSHIP STUDIES & INFORMATION TECHNOLOGY

Librarianship Studies & Information Technology

Derived Indexing

Facebook

Tags

Total Pageviews

Random Posts

Recent in Lists

Popular Posts

Best Quotes About Libraries Librarians and Library and Information Science

Five Laws of Library Science

Cataloging

Menu Footer Widget

LIBRARIANSHIP STUDIES & INFORMATION TECHNOLOGY

Librarianship Studies & Information Technology

Derived Indexing

You may like these posts

CONNECT

Facebook

Tags

Total Pageviews

Random Posts

Recent in Lists

Popular Posts

Best Quotes About Libraries Librarians and Library and Information Science

Five Laws of Library Science

Cataloging

Menu Footer Widget