Hindi-centric Automated Computer Translation of Indian Languages (Tamil, Kannada, Malayalam, Telugu ...)

Thanjai Nalankilli

TAMIL TRIBUNE, November 2016 (ID. 2016-11-01); Updated 2017-05-08
Executive Summary

1. Introduction

2. A Few Questions to TIFAC

3. There is no Language Family Called "Indian Languages" (from Linguistic Perspective)

4. English-to-Telugu versus English-to-Hindi-to-Telugu

5. Quality of Computer Translations Degraded by this Multi-Step Approach

6. Do Not Centralize all Translations through Hindi/Sanskrit

7. Oppose the Meta-Language Approach


ISO - International Standardization Organization (International Organization for Standardization)

TIFAC - Technology Information, Forecasting and Assessment Council

[Author is from Tamil Nadu and so Tamil language is used as an example but the conclusions of this article are applicable to some other Indian languages also.]


An Indian government affiliated organization, TIFAC, had suggested that automated computer translation of all Indian languages from English go through a meta-language. Although the meta-language was not specified, based on our experience with the Indian government, we have reason to expect that meta-language would be either Hindi or Sanskrit. This Hindi-centric approach should be opposed because all Indian languages would forever become dependent on Hindi and the quality of translations would also suffer.

1. Introduction

Indian  language localization community (those involved in creating Internet content in local languages like Hindi, Telugu, Tamil, Marathi, Kannada, Bengali ...) met in New Delhi (India) on September 24-25, 2016. A proposal by the executive director of TIFAC, Dr. Prabhat Ranjan, raised concern in this writer.  "Ranjan's team found English to Hindi translation easier when documents are first translated into another Indian language. Based on this experience, Ranjan bounced the idea of agreeing on a meta language to ease the translation process." [Reference 3]. Dr. Ranjan did not make any suggestion for the meta-language. Our fear is that the Indian government or some other organization or individual may try to elevate Hindi or Sanskrit as the meta language. 

TIFAC is an autonomous body under Department of Science and Technology of Government of India. Indian government efforts to establish Hindi/Sanskrit as a super-language over all other Indian languages is no secret. Indian government Home Minister Rajnath Singh said that Sanskrit is the mother of all Indian languages and he considers Hindi as the elder sister of all regional languages because it is closer to Hindi (Hindustan Times; September 16, 2015). This statement may be true for SOME Indian languages but false for others, for example, Tamil. One should not bunch linguistically unrelated languages into a single family.

Any effort to establish Hindi/Sanskrit as a meta language through which all automated computer translations flow should be opposed, and international standardization bodies like ISO should not accept it. These bodies should not become unwitting tools in the hands of the Indian government or other vested interests.

2. A Few Questions to TIFAC

Summary of Dr. Prabhat Ranjan's speech says, "Our team found English to Hindi translation easier when documents are first translated into another Indian language". [Reference 3] What was that "another Indian language"? Was it Sanskrit or some other Indo-Aryan language? It would not be a surprise if that "another Indian language" was Sanskrit or another Indo-Aryan language. Not all Indian languages are related to Sanskrit/Hindi. Tamil, for one, has very little in common with Sanskrit/Hindi. Professor George L. Hart of University of California, Berkeley, United States of America (USA) said, "Tamil arose as an entirely independent tradition, with almost no influence from Sanskrit or other languages" [Reference 1]. He says. "Tamil language's separate identity and character have been cultivated and preserved from its beginnings to the present" [Reference 2].

Just because a small percentage words are common between Sanskrit and Tamil, neither may be concluded as derived from the other. There are so many English words used in Tamil these days; it does not mean Tamil is derived from English.

3. There is no Language Family Called "Indian Languages" (from Linguistic Perspective)

There is no language family or language group called "Indian Languages" from the perspective of linguistics. Indian languages is a political marker or geographic marker. The two main language families in India (or South asia) are Indo-Aryan and Dravidian. the Indo-Aryan languages are spoken by about 75% of Indians and the Dravidian languages spoken by about 20%.  Hindi and Sanskrit and a number of other languages belong to the former, and Tamil, Telugu, Kannada. Malayalam and some other languages belong to the latter.

Any suggestion of translating English to Indian languages through Sanskrit/Hindi (or any "Indian  meta language") is without merit. It has no scientific rationale. Using data, if any exists, of translating English to an Indo-Aryan language through Hindi/Sanskrit to justify translating English to Tamil through Hindi/Sanskrit is voodoo science, voodoo-linguistics. It is unacceptable.

4. English-to-Tamil versus English-to-Hindi-to-Tamil

For the sake of argument let us say than Hindi-to-Tamil automated computer translation is cheaper than English-to-Tamil translation. Is English-to-Hindi-to-Tamil translation cheaper than English-to-Tamil? I doubt it. Such a scenario should be studied and established for each and every Indian language--be it Telugu or Malayalam or Kannada or Telugu or Bengali or Oriya or Manipuri. We cannot accept this meta language suggestion based on undemonstrated hypotheses.

5. Quality of Computer Translations Degraded by this Multi-Step Approach

It is an established fact that quality is degraded during automated computer translations. If one translates a text from English to Russian and then translate back the translated Russian text to English, it would not be the same as the original English text. The same would happen in the proposed meta language three step approach. A direct translation would be truer to the original text than an indirect "three step approach". That is, a Tamil translation directly from English would be closer to the original English text than first translating English to Hindi and then Hindi to Tamil. So Hindi would end up with higher quality translations and the other Indian languages like Marathi, Oriya, Bengali, Kannada, Malayalam, Telugu would end up with lesser quality translations. We cannot allow such a systematic two-tier approach, a higher level of translation for Hindi/Sanskrit and a lower level of translation for the other languages.

6. Do Not Centralize all Translations through Hindi/Sanskrit

Indian government or some other organization or individual may use the meta-language suggestion to to make all Indian languages depend on Hindi/Sanskrit for their automated computer translations. "All roads lead to Rome" during the Imperial Roman Empire. Indian government may make all translations pass through Hindi. It is unacceptable.

7. Oppose the Meta-Language Approach

We explained in the preceding sections how centralizing translations through Sanskrit/Hindi would be detrimental to Indian languages. Politicians, scholars and the public should oppose this approach. Tamil scholars and computer specialists active in automated computer translation of Indian languages should contact the Unicode standards organizations and request them to contact not only the Indian government but also the state governments on language related matters. India is a multi-lingual country. States were reorganized on language basis in the 1950s so that the major languages have a state where each major language can flourish. State governments are there to nurture and protect them. So it is appropriate that recommendations on languages come from the states, directly or through the Indian central government. People who know the language 


1. Statement on the Status of Tamil as a Classical Language (by George L. Hart), University of California at Berkeley, April 2000.

2. Sanskrit and Tamil (by George L. Hart), University of California at Berkeley, November 25, 2010.

3. Gilt-Conference

