Saturday, March 11, 2017

Problems of Natural Language in Indexing

Information Access Through The Subject


Derived indexing is based on the natural language of the documents which proves to be problematic sometimes in the Subject Indexing Process. These problems prompted to move towards the use of Assigned indexing. These problems  can be categorized under two heads:
  • Problems inherent in the language
  • Problems pertaining to relationships

  • Problems of Natural Language in Indexing
    • Problems inherent in the language
      • Synonyms
      • Homographs
      • Use of Plural-Singular Forms
      • Multi-Worded Concept
      • Complex Subject
    • Problems Pertaining to Relationships
      • Semantic Relationships
      • Syntax

Problems inherent in the language

Specific problems encountered in this connection are:
a) synonyms,  b) homographs,  c) singular and plural forms,  d) multi-worded concepts, and  e) complex subjects

a) Synonyms are terms with the same or similar meanings. Such terms are present in every subject near synonyms are most common. True synonyms which mean exactly the same thing and which are used precisely in the same context, are rather unusual. Some situations in which synonyms arise are:

i) In the case of some subjects which have one stem and several derivatives, or computing, computed, computation. Sometimes, it is acceptable to treat such words as equivalent to one another, and at other times it is important to differentiate between them.

ii) Some of the subjects might have both common and technical names, and these must be recognized for the purpose of subject indexing so that depending upon the clientele for whom the index is meant., these are reflected in the index. Examples are ‘Sodium Chloride’ and ‘Salt’.

iii) Use patterns of terms also present a problem. The indexer should try to keep pace with changes in normal usage. E.g. ‘Wireless’ to ‘Radio’ to ‘Transistor’.

iv) Some concepts are named differently in different versions of one language. American and English are examples of such differences in usage, for example, lift and elevator, catalog (American) and catalogue (British).

v) Near synonyms which mean two or more words having nearly the same meaning e.g. salary, wage, income.

b) Homographs mean words which have the same spelling but different meaning. In normal language usage, the meaning of such homographs is established by the context in which the term is used. For example, Pitch (Cricket), Pitch (Music) and Tank (Military Vehicles), and Tank (Water tank), or  Bear (to carry), Bear (animal).

c) Use of Plural and Singular Forms: Generally, the plural and singular forms of the same now are regarded as an equivalent, but there are some situations when it is necessary to treat them distinct. This also creates problem certain words are used as noun and adjective, e.g. heat and hot, which again becomes a problem.

d) Multi-Worded Concept: Some subjects cannot adequately be described by one word, and require two or more words to specify them fully. Examples are: Information Retrieval, Underwater Colour Photography, Algebraic Topology, etc. In such cases, no matter which word (in the term) is used as the main approach point in the index, the user might choose to seek the subject under the second or third word (in the multiform) first.

e) Complex Subject: Complex subjects contain more than one unit concept in them and a number of terms may be used to fully describe these subjects. Each of these concepts might form a potential search key in the index. E.g. ‘History of Science’.

Problems Pertaining to Relationships

There are two main categories of relationships between subjects. These are known as syntatic relationships and semantic relationships. A syntatic relationship exists between two terms in a statement. A semantic relationship exists between terms that are defined in a vocabulary as having meanings that are in some way related, are sequenced and interrelated so that the statement becomes meaningful.

The statements “They are eating” and “They eating are” exemplify where syntatic rules operate. Although both statements contain the same words, according to the rules of syntax only one of them is correct and meaningful.

Examples of semantic relationships appear among terms such as heating, electric heating, plasma heating, heat, and temperature. The meanings of these terms do not completely overlap with one another. Nevertheless, we recognize that they are related in some manner.

Semantic Relationships: Relationship between Meaning

An aspect of meaning that has particular relevance for indexing, because of its bearing on vocabulary control, is the relationships between meanings, and therefore denotation of words used to represent them. Foskett (1996) noted three categories of relationships:
  • Equivalent
  • Hierarchical
  • Affinitive/Associative
Two expressions are equivalent when they denote the same referent. Synonyms are an obvious case of equivalence, as one variation in spelling and word form,  and acronyms and abbreviations. Examples of such equivalences are ‘I.D. Cards’ and ‘Identification cards; ‘SDI’ and ‘Selective Dissemination of Information’. We have to take note that there are degrees of  equivalence: some words overlap the meaning of others but do not mean the same thing, such as ‘animals’ and ‘zoology’

There are two hierarchical relationships: genus-species and whole part. For instance, ‘Homo Sapiens’  is a species of the genus ‘Homo’.  All of ‘Homo Sapiens’ belong to the genus ‘Homo’, but only some of the species which belong to the genus  ‘Homo’ belong to the species ‘Homo Sapiens’. An example of the whole-part relationship is a camera lens.

An example of a group of words that bears affinitive relationships is teaching, learning education, training, and teachers, the difficulty that arises with these kinds of relationships is that, unlike equivalence and hierarchical relationships, affinitive relationships are dependent on context. Education may imply training, and training is frequently a part of education, but it cannot be assumed that training necessarily implies education. Decisions about affinitive relationships cannot easily be built into the indexing language or into the system of which the language is a part; they must be made on an individual basis.


A statement consists of elements from the vocabulary of a language joined together in such a way that it has more meaning than a simple list of the same elements. The additional meaning is given to the statement by its structure, or syntax because it shows the relationships between the elements.

i) Encyclopaedia of Bibliography
ii) Bibliography of Encyclopaedia

i) The lady is smiling
ii) The smiling is lady

The above examples show the importance of syntax, i.e. the sentences convey a particular significant meaning only when the terms are written in a particular order.


Information Access Through The Subject : An Annotated Bibliography / by Salman Haider. - Online : OpenThesis, 2015. (408 pages ; 23 cm.)

Annotated bibliography titled Information Access Through The Subject covering Subject Indexing, Subject Cataloging, Classification, Artificial Intelligence, Expert Systems, and Subject Approaches in Bibliographic and Non-Bibliographic Databases etc. 

MLIS Thesis is available and discussed in following places: 
Information Access Through The Subject

The project "annotated bibliography" was worked out as Master of Library & Information Science (MLIS) dissertation in the Department of Library and Information Science, A.M.U, IndiaInformation Access Through The Subject is a very much appreciated work (see Testimonials). It earned the author S. Bashiruddin – P. N. Kaula Gold Medal, Post Graduate Merit Scholarship, First Division, and IInd Position in the MLIS program.



  • Written 2017-03-11

  • Help us improve this article! Contact us with your feedback.
  • Victoria Fr├óncu, Librarian at Central University Library of Bucharest, Bucharest, Romania [In LinkedIn Group - Information Science and LIS, March 23, 2017] -- I really enjoyed reading this article which I find interesting and informative for the problems it presents. I particularly appreciated the way the syntax and semantic relationships are explained and illustrated.

Thanks all for your love, suggestions, testimonials, likes, +1, tweets, and shares ...