jagomart
digital resources
picture1_Pdf Hindi Translation 101138 | Article16


 145x       Filetype PDF       File size 0.31 MB       Source: sciresol.s3.us-east-2.amazonaws.com


File: Pdf Hindi Translation 101138 | Article16
issn print 0974 6846 indian journal of science and technology vol 10 16 doi 10 17485 ijst 2017 v10i16 111895 april 2017 issn online 0974 5645 approaches for improving hindi ...

icon picture PDF Filetype PDF | Posted on 22 Sep 2022 | 3 years ago
Partial capture of text on file.
                                                                                                                              ISSN (Print) : 0974-6846 
              Indian Journal of Science and Technology, Vol 10(16), DOI: 10.17485/ijst/2017/v10i16/111895, April  2017      ISSN (Online) : 0974-5645
                 Approaches for Improving Hindi to English Machine 
                                                                                                      Translation System
                                                                                                                          1                           2
                                                                                    Rajesh Kumar Chakrawarti  and Pratosh Bansal
                           1
                            Faculty of Computer Engineering, Institute of Engineering and Technology, Devi Ahilya Vishwavidyalaya, 
                                                                   Indore – 452017, Madhya Pradesh, India; rajesh_kr_chakra@yahoo.com
                   2
                    Department of Information Technology, Institute of Engineering and Technology, Devi Ahilya Vishwavidyalaya, 
                                                                            Indore – 452017, Madhya Pradesh, India; pratosh@hotmail.com
                Abstract
                Objectives: To provide approaches for effective Hindi-to-English Machine Translation (MT) that can be helpful in 
                inexpensive and ease implementation of and MT systems. Methods/Statistical Analysis: Structure of the Hindi and 
                English languages have been studied thoroughly. The possible steps towards the Natural languages have also been studied. 
                The methods, rules, approaches, tools, resources etc. related to MT have been discussed in detail.  Findings: MT is an idea 
                for automatic translation of a language. India is the country with full of diversity in culture and languages. More than 20 
                regional languages are spoken along with several dialects. Hindi is a widely spoken language in all the states of country. 
                A lot of literature, poetries and valuable texts are available in Hindi which gives opportunities to retranslate into English. 
                However, new generation is learning English rapidly and also showing keenness to learn it in simplified lucid manner. 
                Several efforts have been made in this direction. A large number of approaches and solutions exist for MT still there is a huge 
                scope. The paper addresses the challenges of MT and solution efforts made in this direction. This motivates researchers to 
                implement new Hindi-to-English Machine translation systems.  Application/Improvements: Efficient, inexpensive and 
                ease translation for available Hindi literature, poetries and other valuable texts into English. Children can easily learn the 
                culture through the poetries and literatures hence the Machine Translation of these will bring wonderful impact.
                Keywords: English Language, Hindi Language, Machine Translation, Translation-Rules and Translation Approaches
              1. Introduction                                                         work. Most of the newspapers are also published in vari-
                                                                                      ous regional languages. There are 22 regional languages 
              India is one of the finest examples for multi-lingual and               named “Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi 
              multi-social country. People from different regions speak               (it is official also), Kannada, Kashmiri, Konkani, Maithili, 
              different languages. After the analysis, it is found that the           Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, 
              spoken languages may change after in every few kilo-                    Sanskrit, Santali, Sindhi, Tamil, Telugu and Urdu” speak 
              meters (in digits of 10s). In India, Hindi is the national              in various regions. Hence there is dire and great demand 
              language which is spoken by most of the people. English                 for better Machine Translation systems to establish a bet-
              is   internationally accepted language which is used for                ter communication and exchange of information with 
              communication throughout the world. The constitu-                                                                              1,2
                                                                                      other   countries, states and central governments . 
              tion of India accepts only these two languages Hindi                        Machine Translation is the key research area in the 
              and English as official languages. The official commu-                  field of Natural Language Processing (NLP). It is a com-
              nication between central and state governments is also                  puterized and automated idea, responsible for translating 
              done in these two languages. The states government                      the text/documents from one language (called source 
              may have their own regional languages to carry out their                language) to another language (called target language). 
              *Author for correspondence
               Approaches for Improving Hindi to English Machine Translation System
               The work in machine translation area has been going on           sents a block diagram for a Hindi-to-English Machine 
               for several decades but efficient machine translation is a       Translation system.
               still challenging task. In India, the market is largest for 
                                     3
               Machine Translation . Figure 1 represents a block dia-
               gram for a simple Machine Translation system. 
                                                                                Figure 2. Hindi ð English Machine Translation.
               Figure 1. A simple Machine Translation (MT) System.              1.2 English-to-Hindi Translation 
                                                                                English is a major internationally accepted language 
                   Machine Translation produces various challenges for          which is spoken and used in all kinds of communications 
               all levels called “Phonetics and Phonology, Morphology,          among almost all countries throughout the world. We can 
               Syntax, Semantics, Pragmatics and Discourse” of Natural          also say that almost English is the only language which is 
               Language Processing. In which, ambiguity (Semantics) is          popular among people from all over the world.
               the biggest one. Other than this, the different language            The default structure of the English sentence is 
               might also have language diversity (called translation           Subject-Verb-Object (SVO), e.g. 
               divergence) problem. Machine Translation systems deal               “Prithvi wants gold” where S = Prithvi, V = want and 
               with ambiguity and the linguistic diversity problems             O = gold.
                                                                    4              English is having following main characteristics:
               under the umbrella of Natural Language Processing . 
                   In India, we feel that the important and fore-                    •	 Highly positional language 
               most Machine Translations are HindiðEnglish and                       •	 Rudimentary (poor) morphology. 
               HindiðRegional Language. 
               1.1 Hindi-to-English Translation                                    English-to-Hindi Machine Translation results a verb 
                                                                                movements of large distance. Hindi satisfies the gen-
               Hindi is our national language. People speak different           der agreement also, which is not possible in English. By 
               regional language but Hindi is the main official language        enriching the source side English resources with linguis-
               for standard communication. Other than us, Hindi is                                                                      5,6. 
                                                                                tic factors, the morphological issues can be resolved
               known in other countries like Pakistan, Bangladesh and           Figure 3 shows a block diagram for an English-to-Hindi 
               Nepal etc.                                                       Machine Translation system.
                   The default structure of Hindi sentence is Subject-
               Object-Verb (SOV), e.g.
                     “पृथ्वी सोना चाहता है |” where S = पृथ्वी, O = सोना 
               and V = चाहना
                   Indian languages (primarily Hindi) have the following 
               characteristics:
                                                                                Figure 3. English ð Hindi Machine Translation.
                     •	 Highly inflectional language,
                     •	 Rich morphology, and                                       The HindióEnglish Machine translation can be 
                     •	 Relatively free word order.                             improved by incorporating technique called Word Sense 
                                                                                Disambiguation. Word Sense Disambiguation (WSD) is 
                   The Hindi-to-English Machine Translation is more             defined as the task of identifying the correct sense of a 
               complex due to its characteristics. Anything written             word depending upon the context. Word sense disambig-
               in Hindi may show different senses depending upon                uation algorithms can be broadly classified as knowledge/
               the context. The spoken sequence of any statement in             dictionary-based, supervised, semi-supervised, unsuper-
                                                       5,6. Figure 2 repre-
               Indian language may differ by people                             vised approaches. However, there is no boundary in using 
          2    Vol 10 (16) | April 2017 | www.indjst.org                                                 Indian Journal of Science and Technology
                                                                                                                Rajesh Kumar Chakrawarti and Pratosh Bansal
                                                                                                                                                                 19
                either single or combinations. Earlier, the combinations                       Indian languages machine aided translation system . It 
                                                        7,8.
                have also produced good results                                                is using rule-based (pseudo-interlingua based) method. 
                    Since last 03 decades, In India a lot of research                          The system produces good results. However, sometimes 
                and research projects are done in the area of Machine                          produces more than one target sentences for a given 
                Translation. Although they have produced some good                             source English sentence. Computer Assisted Translation 
                Machine Translation systems, they all have their own                           System Mantra, translates the texts from English to Hindi 
                advantages, disadvantages and limitations and “It is not                       in the domain of Personnel Administration, is developed 
                possible to have fully automatic, qualitative, and general-                                                                            20. Research 
                                                                                               using rule-based (transfer-based) method
                                                     5
                purpose Machine Translation ”. Hence, still there is scope                     through this system produces new areas to contribute 
                for researchers to do more research in this area. A lot of                     other facilities. The Anusaaraka system, makes docu-
                researches and research projects are also on going to over-                    ments accessible in one Indian language to another Indian 
                come these disadvantages and limitations. These scopes                         language, is developed using direct (word-to-word) 
                                                                                                        21
                are motivating the Teaching of Machine Translation in                          method . This system also produces good results but 
                                                                              9
                Indian perspective to the students and researchers .                           if it enters into common use, it has major implications. 
                    In the field of Machine Translation, a lot of surveys                      Universal Networking Language (UNL) {Interlingua}-
                are done in the Indian perspective. First, Survey relates                      based machine Translation system is used translation 
                to resources, services and tools for Machine Translations                      for English to Indian languages although is a good sys-
                system throughout India. This survey is the rigorous                           tem but language divergence issues between source and 
                                                                10. Second, Survey                                                               22. AnglaHindi is 
                collection for the Indian perspective                                          target to the UNL results implications
                                                               
                includes Word-sense Disambiguationapproach which can                           a participant project of the Anglabharti translation and 
                                                                                     11                                                               23
                be used for improving the Machine Translation system .                         responsible for English to Hindi translation . It is devel-
                This contains the type of approach (like knowledge-based,                      oped using rule and example-based hybrid method. 
                supervised, minimally-supervised, unsupervised, hybrid                         MaTra is a fully automatic system for English-Hindi 
                                                                                                                                                                 24
                etc.), corpus or WordNet details, features, advantages,                        Machine Translation (MT) of general-purpose texts . It 
                disadvantages and limitations of the approach, new tech-                       is developed using rule-based (transfer-based) method. 
                niques under these approaches etc. Third, Survey includes                      Statistical-based Machine Translations by Google, 
                different types of Machine Translation approaches                              Microsoft, Worldlingo and IBM are Google Translate, 
                                                          12-15. Surveys related to            Bing Translator, Worldlingo and IBM Server respectively. 
                used for developing the systems
                approaches include the name of approach (like direct,                          Machine Translation approaches are classified as direct 
                rule-based, corpus-based, hybrid etc.) for developing the                      translation, rule-based (transfer and Interlingua-based) 
                Machine Translation system, features, advantages, disad-                       translation, corpus-based (statistical and example-based) 
                vantages and limitations of the approach, new techniques                       translation and hybrid (combination of one or more) 
                under these approaches etc. Fourth, Survey includes dif-                       translations25. These systems and approaches have their 
                ferent type of Machine Translation systems developed                           own features, advantages, disadvantages and limitations. 
                                                                                                                                                              3,14 and 
                in India. Surveys related to these systems contain name,                       The Statistical Machine Translation (SMT) Model
                year of development, people and/or organization, fund-                         its types Word, Phrase and Hierarchical Phrase Based 
                ing agency, place of development, domains/applications                         Models and others provides the basis to improve the 
                of the system, approaches/techniques and tools/resources                       Machine Translation systems. These are helpful in devel-
                used, features etc14-17. The all types of surveys also display                 oping new systems also.
                the web-links to use these kinds of Machine Translation                            A number of online applications are available and 
                systems. The literature available in this paragraph is based                   accessible for Hindi-to-English Machine Translation. 
                on survey papers only but the next paragraph is based on                       Table 1 gives the detail analysis of providing the effective-
                actual research, research projects and resources.                              ness of those applications. For example, a Hindi language 
                    Machine Translation system faces ambiguity and diver-                      statement “पृथ्वी सोना चाहता है |” has been converted into 
                gence issues at all levels of Natural Language Processing4,18.                 English language by using online applications mentioned 
                It is observed that the multilingual system is bounded                         in table. By analyzing the output it can be easily observed 
                to resource constraint like WordNet which is costly and                        that most of the applications failed to produce desired 
                takes more time in processing. Anglabharti is English to                       output. Only “Google Translate”  is  producing good result 
                Vol 10 (16) | April 2017 | www.indjst.org                                                                     Indian Journal of Science and Technology     3
               Approaches for Improving Hindi to English Machine Translation System
               “Earth wants to sleep”. However, it cannot identify the              A lot of ancient literatures exist in Hindi. They are 
               Noun “पृथ्वी” that’s why it is producing “Earth” whether          written on “Devanagari lipi (script)” which had been 
                                                                                                      th
               it should write “Prithvi”. The remaining applications are         developed during 15  Century. Mostly books, novels, vol-
               producing improper results. Hence, it can easily analyze          umes etc. are in Hindi script. In modern era, there is a 
               that there is a need of an enhanced and appropriate ver-          huge demand for English translation. Since last decades, 
                                                                                                                 35
               sion of Hindi-to-English Machine Translator which can             the research has been increased . 
               provide better and appropriate result.                               One of the hardest kinds of machine translation is 
                   WordNet is an online lexical database designed                poetry translation. A lot of poetries are available in Hindi. 
               for English language includes four main Parts-of-                 A lot of work has been done in this move. Available sys-
               Speech (PoS) (i) Noun, (ii) Verb, (iii) Adjective and (iv)        tem requires better mechanism for poetry translation into 
                                                                         26              36
               Adverb which are organized into sets of synonyms .                English .
               HindiWordNet is an online lexical database designed for              Many researchers, institutions and research orga-
               Hindi language on the basis of English WordNet. Similar           nizations have started working on Machine Translation 
               to English WordNet, It also includes the four main parts-         systems for Hindi to English translation, English to Hindi, 
               of-speech of Hindi (i) Noun, (ii) Verb, (iii) Adjective and       Hindi to regional language translation and vice-versa and 
               (iv) Adverb, which are organized into sets of synonyms.           have succeeded in obtaining very satisfactory results. The 
               IndoWordNet is a linked structure of wordnets of major            prominent institutions and research organizations which 
                                 27. 
               Indian languages                                                  have worked in area of Machine Translation and still 
                                                                                                        2,5,17
                   Word-sense Disambiguation algorithms and appli-               working are as follows    :
               cations are categorized as knowledge/dictionary-based, 
               supervised, semi-supervised, unsupervised and hybrid                   •	  Technology Development for Indian Languages 
                           7                                                              (TDIL) project by Department of Electronics and 
               approaches . They have their own features, advantages, 
               disadvantages and limitations. The critical analysis                       Information Technology (DeitY), Ministry of 
               provides the knowledge to choose the appropriate Word-                     Communications and Information Technology, 
               sense Disambiguation approach for improving the                            Government of India.
               Machine Translation Systems28. Unsupervised Word                       •	  Department of Computer Science and 
               Sense Disambiguation based an experimental study of                        Engineering, Indian Institute of Technology 
               Graph Connectivity helps in improving the Machine                          (IIT), Kanpur, Bombay and Delhi.
                           29                                                         •	  Department of Computer and Information 
               Translation .
                   Concept map construction might help in improving                       Sciences, University of Hyderabad (UoH), 
               the Machine Translation because with the help of this, the                 Hyderabad.
               ideas and knowledge can be combined which are related                  •	  Language Technologies Research Center 
               to each other in some respect. This creates a semantic                     (LTRC), International Institute of Information 
               binding between two ideas or knowledge. With concept                       Technology (IIIT), Hyderabad.
               map, we can interlink the concepts which belong to the                 •	  Centre for Development of Advanced Techniques 
               same domain30,31.                                                          (CDAC), Pune, Noida and Banglore.
                   Chinese-Japanese Sign Language Translation pro-                    •	  National Center for Software Technology 
               posed system provides research directions for other kind                   (NCST) (Now CDAC), Bombay.
               of similar translations like HindiðEnglish Sign Language               •	  Department of Computer Science and 
                                   32                                                     Engineering, Jadhavpur University, Kolkata.
               Translation System . Bi-lingual Hindi-English (Hinglish) 
               Machine Translation plays important research direction                 •	  Machine Learning Lab, CSA, Indian Institute of 
               for separate the pure component languages from a mixed                     Science (IISc), Banglore.
                            33                                                        •	  AU-KBC Research Centre, Chennai.
               set language .
                   BLEU (Bilingual Evaluation Understudy) is the major                •	  Department of Computer Science and 
               and some other metrics are helpful in the automatic eval-                  Application, Utkal University, Utkal.
               uation of Machine Translation system. There are different              •	  Advanced Center for Technical Development 
               techniques under BLEU which play important role in                         of Punjabi Language, Literature and Culture, 
               evaluation the Machine Translation system6,34.                             Punjabi University, Patiyala.
           4   Vol 10 (16) | April 2017 | www.indjst.org                                                   Indian Journal of Science and Technology
The words contained in this file might help you see if this file matches what you are looking for:

...Issn print indian journal of science and technology vol doi ijst vi april online approaches for improving hindi to english machine translation system rajesh kumar chakrawarti pratosh bansal faculty computer engineering institute devi ahilya vishwavidyalaya indore madhya pradesh india kr chakra yahoo com department information hotmail abstract objectives provide effective mt that can be helpful in inexpensive ease implementation systems methods statistical analysis structure the languages have been studied thoroughly possible steps towards natural also rules tools resources etc related discussed detail findings is an idea automatic a language country with full diversity culture more than regional are spoken along several dialects widely all states lot literature poetries valuable texts available which gives opportunities retranslate into however new generation learning rapidly showing keenness learn it simplified lucid manner efforts made this direction large number solutions exist stil...

no reviews yet
Please Login to review.