p

Tamil

Hindi-centric Automated Computer Translation of Indian Languages (Tamil, Kannada, Malayalam, Telugu ...)

Thanjai Nalankilli

TAMIL TRIBUNE, November 2016 (ID. 2016-11-01)
Click here for MAIN INDEX to archived articles
www.tamiltribune.com


OUTLINE

Abbreviations

Executive Summary

1. Introduction

2. A Few Questions to TIFAC

3. There is no Language Family Called "Indian Languages" (from Linguistic Perspective)

4. English-to-Telugu versus English-to-Hindi-to-Telugu

5. Quality of Computer Translations Degraded by this Multi-Step Approach

6. Do Not Centralize all Translations through Hindi/Sanskrit

7. Oppose the Meta-Language Approach


ABBREVIATIONS

ISO - International Standardization Organization (International Organization for Standardization)

TIFAC - Technology Information, Forecasting and Assessment Council



[Author is from Tamil Nadu and so Tamil language is used as an example but the conclusions of this article are applicable to some other Indian languages too.]


EXECUTIVE SUMMARY

An Indian government affiliated organization, TIFAC, had suggested that automated computer translation of all Indian languages from English go through a meta-language (Hindi or Sanskrit?). This Hindi-centric approach should be opposed because all Indian languages would forever become dependent on Hindi and the quality of translations would also suffer.

1. Introduction

Indian  language localization community (those involved in creating Internet content in local languages like Hindi, Telugu, Tamil, Marathi, Kannada, Bengali ...) met in New Delhi (India) on September 24-25, 2016. A statement by the executive director of TIFAC, Dr. Prabhat Ranjan, raised concern in this writer. He said, "Our team found English to Hindi translation easier when documents are first translated into another Indian language". Ranjan went on to suggest the idea of agreeing on a meta language to ease the translation process. This suggestion is nothing but a sinister plan to either Hindi-fy or Sanskrit-ise the process of translating English texts to Indian languages. The meta language he flouted would either be Hindi or Sanskrit. I have no doubt about it.

TIFAC is an autonomous body under Department of Science and Technology of Government of India. Indian government efforts to establish Hindi/Sanskrit as a super-language over all other Indian languages is no secret. Indian government Home Minister Rajnath Singh said that Sanskrit is the mother of all Indian languages and he considers Hindi as the elder sister of all regional languages because it is closer to Hindi (Hindustan Times; September 16, 2015). This statement may be true for SOME Indian languages but false for others, for example, Tamil. One should not bunch linguistically unrelated languages into a single family.

This effort to establish Hindi/Sanskrit as a meta language through which all automated computer translations flow should be opposed, and international standardization bodies like ISO should not accept it. These bodies should not become unwitting tools in the hands of the Indian government or other vested interests.

2. A Few Questions to TIFAC

Summary of Dr. Prabhat Ranjan's speech says, "Our team found English to Hindi translation easier when documents are first translated into another Indian language". What was that "another Indian language"? Was it Sanskrit or some other Indo-Aryan language? It would not be a surprise if that "another Indian language" was Sanskrit or another Indo-Aryan language. Not all Indian languages are related to Sanskrit/Hindi. His conclusion would not hold for those languages. Tamil, for one, has very little in common with Sanskrit/Hindi. Professor George L. Hart of University of California, Berkeley, United States of America (USA) said, "Tamil arose as an entirely independent tradition, with almost no influence from Sanskrit or other languages" [Reference 1]. He says. "Tamil language's separate identity and character have been cultivated and preserved from its beginnings to the present" [Reference 2].

Just because a small percentage words are common between Sanskrit and Tamil, neither may be concluded as derived from the other. There are so many English words used in Tamil these days; it does not mean Tamil is derived from English.

3. There is no Language Family Called "Indian Languages" (from Linguistic Perspective)

There is no language family or language group called "Indian Languages" from the perspective of linguistics. Indian languages is a political marker or geographic marker. The two main language families in India (or South asia) are Indo-Aryan and Dravidian. the Indo-Aryan languages are spoken by about 75% of Indians and the Dravidian languages spoken by about 20%.  Hindi and Sanskrit and a number of other languages belong to the former, and Tamil, Telugu, Kannada. Malayalam and some other languages belong to the latter.

So Dr. Ranjan's suggestion of translating English to Indian languages through Sanskrit/Hindi (or any "Indian  meta language") is without merit. It has no scientific rationale. Using data, if any exists, of translating English to an Indo-Aryan language through Hindi/Sanskrit to justify translating English to Tamil through Hindi/Sanskrit is voodoo science, voodoo-linguistics. It is unacceptable.

4. English-to-Tamil versus English-to-Hindi-to-Tamil

For the sake of argument let us say than Hindi-to-Tamil automated computer translation is cheaper than English-to-Tamil translation. Is English-to-Hindi-to-Tamil translation cheaper than English-to-Tamil? I doubt it. Such a scenario should be studied and established for each and every Indian language--be it Telugu or Malayalam or Kannada or Telugu or Bengali or Oriya or Manipuri. We cannot accept this meta language suggestion based on undemonstrated hypotheses.

5. Quality of Computer Translations Degraded by this Multi-Step Approach

It is an established fact that quality is degraded during automated computer translations. If one translates a text from English to Russian and then translate back the translated Russian text to English, it would not be the same as the original English text. The same would happen in the proposed meta language three step approach. A direct translation would be truer to the original text than a indirect three step approach proposed by TIFAC executive director Dr, Ranjan. That is, a Tamil translation directly from English would be closer to the original English text than first translating English to Hindi and then Hindi to Tamil. So Hindi would end up with higher quality translations and the other Indian languages like Marathi, Oriya, Bengali, Kannada, Malayalam, Telugu would end up with lesser quality translations. We cannot allow such a systematic two-tier approach, a higher level of translation for Hindi/Sanskrit and a lower level of translation for the other languages.

6. Do Not Centralize all Translations through Hindi/Sanskrit

The unspecified meta-language in Dr. Ranjan's proposal is nothing but a Trojan horse hiding Hindi/Sanskrit. The attempt to make all Indian languages depend on Hindi/Sanskrit for their automated computer translations should be of no surprise. Dr. Ranjan's proposal is a charade to make Hindi the center of all translations and other Indian languages the feeder or secondary languages dependent on Hindi. "All roads lead to Rome" during the Imperial Roman Empire. Indian government funded TIFAC wants all translations pass through Hindi. It is unacceptable.

7. Oppose the Meta-Language Approach

We explained in the preceding sections how centralizing translations through Sanskrit/Hindi would be detrimental to Indian languages. Politicians, scholars and the public should oppose this approach. Tamil scholars and computer specialists active in automated computer translation of Indian languages should contact the Unicode standards organizations and request them to contact not only the Indian government but also the state governments on language related matters. India is a multi-lingual country. States were reorganized on language basis in the 1950s so that the major languages have a state where each major language can flourish. State governments are there to nurture and protect them. So it is appropriate that recommendations on languages come from the states, directly or through the Indian central government. People who know the language 


Post your comments and/or Read other comments (Subject: November 2016)

REFERENCES

1. Statement on the Status of Tamil as a Classical Language (by George L. Hart), University of California at Berkeley, April 2000.

2. Sanskrit and Tamil (by George L. Hart), University of California at Berkeley, November 25, 2010.

ARCHIVED ARTICLES
Index to Archived Articles

Thanjai Nalangkilli

If you would like to translate this article to Tamil for us, please write us. Your help would be greatly appreciated.


This is a "Category B" article.  Free to publish as long as the entire article, author's name and Tamil Tribune name and URL (http://www.tamiltribune.com) are included (no permission needed). Click here for more details.



FIS161031m - 2016-a1d

 

Your comments on this article or any other matter relating to Tamil are welcome

(e-mail to: tamiltribuneatasia.com Please replace "at" with the @ sign.)

http://www.tamiltribune.com

Copyright © 2017 by TAMIL TRIBUNE.

All rights reserved. http://www.tamiltribune.com/gen/permit.html)