jagomart
digital resources
picture1_Language Pdf 101620 | J88860881019


 115x       Filetype PDF       File size 0.67 MB       Source: www.ijitee.org


File: Language Pdf 101620 | J88860881019
international journal of innovative technology and exploring engineering ijitee issn 2278 3075 volume 8 issue 10 august 2019 advanced tamil pos tagger for language learners m rajasekar a udhayakumar abstract ...

icon picture PDF Filetype PDF | Posted on 22 Sep 2022 | 3 years ago
Partial capture of text on file.
                                                        International Journal of Innovative Technology and Exploring Engineering (IJITEE) 
                                                                                                     ISSN: 2278-3075, Volume-8 Issue-10, August 2019 
                           Advanced Tamil POS Tagger for Language 
                                                                             Learners 
                                                                   M. Rajasekar, A. Udhayakumar 
              Abstract  -  In  the  emerging  technology  Natural  Language                To  make  a  POS  tagger  for  Tamil  language  is  very 
              Processing, machine translation is one of the important roles.               challengeable. The mail challenges in Tamil POS Tagging 
              The machine translation is translation of text in one language to            are solving complexity in word structure and ambiguity of 
              another with the implementation of Machines. The research topic              words  . 
              POS Tagging is one of the most basic and important work in                           [1]
              Machine translation. POS tagging simply, we say that to assign                                        III. OBJECTIVES 
              the  Parts  of  speech  identification  for  each  word  in  the  given 
              sentence.  In  my  research  work,  I  tried  the  POS  Tagging  for         The main objectives are to make an improved POS tagger 
              Tamil language. There may be some numerous research were                     for  Tamil  Language  Learners.  We  made  an  analysis  on 
              done in the same topic. I have viewed this in different and very             Tamil classical grammar, collected actual part of speech in 
              detailed  implementation.  Most  of  the  detailed  grammatical              Tamil language and used it for POS Tagging. Some of other 
              identifications  are  made  for  this  proposed  research.  It  is  very     goals are: 
              useful to know the basic grammar in Tamil language.                            To provide machine aided POS Tagger in Tamil with 
                     
              Keywords- Natural Language Processing, Machine Translation,                      improvement. 
              Parts of Speech Tagging, POS Tagger for Tamil.                                 To  make  a  tool  to  help  the  students  to  learn  Tamil 
                                                                                               grammar easily 
                                     I. INTRODUCTION                                         To make a helpful tool for Tamil language learners. 
              The Part of Speech (POS) Tagging is an important process                       To  make  the  computational  advancement  in  Tamil 
              in  the  field  of  Natural  Language  Processing.  In  the                      linguistic research 
              computational linguistics part-of-speech tagging also called                  
              as  grammatical  information  tagging  is  the  process  of                                       IV. RELATED WORKS 
              assigning  grammatical  tag  to  every  word  of  the  given                 Various concepts already exist for POS Tagger in Dravidian 
              sentence.  POS  Tagging  is  one  of  the  harder  process  in               languages. For Tamil language A rules-based POS Tagger 
              Natural  Language  Processing.  Because  some  words  have                   was developed by Arulmozhi et al, 2004[2].  A POS Tagger 
              more than one grammatical tag (POS tag) in some different                    for Classical Tamil was developed and tested by R. Akilan, 
              places. Example, book will come as noun in one place and                     et  al,  2012   .  A  POS Tagger and Chunker for Tamil was 
              comes as verb in another place.                                                            [3]
                                                                                           developed by Dhanalakshmi V et al, 2013 . And a Hybrid 
              The Book (noun) is on the table and Ramu book(verb) the                                                                          [4]
                                                                                           POS Tagger for Tamil was developed by Arulmizhi et al, 
              tickets for Robo 2.                                                          2006[5]. This system is developed by using HMM technique 
              Most of the NLP researchers have already tried the POS                       and a rule based system.        These  existing  concepts  are 
              tagger  by  implementing  different  concepts.  In  English                  mainly focused on some similar methods, mostly rule-based. 
              language, commonly there are nine parts of speech. noun,                     There  are  some  generalized  tag  sets  are  also  developed. 
              pronoun,  verb,  adverb,  adjective,  preposition,  article,                 Namely  AUKBC, Vasuranganathan tag  set,  CIIL  tag  set, 
              conjunction,  and  interjection.  In  viewing  the  previous                 and Amrita POS Tag set. These all tag sets are developed 
              research approaches about POS Tagging, the part of speech                    with focus on English general tag sets. We have concluded 
              is distinguishing from 42 to 150 for English Language. The                   some problems with these tag sets. 
              POS Tagging is an important process in natural language                      1.  Every tags are generated as English language tags only.  
              parsing,     machine  translation,         speech      reorganization,       2.  Tag  sets  are  not  defined  as  deep,  though  in  Tamil 
              information  retrieval  and  other  computational  linguistics                   language  the  grammatical  information  is  much  varied 
              development.                                                                     when comparing with English tag sets. 
                                                                                           3.  The Tag sets are limited; it is not describing the Tamil 
                               II. POS TAGGING IN TAMIL                                        words in detailed. 
              Tamil  is  one  of  the  Dravidian  languages  and  longest                     V. BUREAU OF INDIAN STANDARDS (BIS) TAG 
              surviving  languages  in  the  world.    It  has  very  classical                                             SETS 
              literature, has been documented for over 2000 years. And                     The Bureau of Indian Standards (BIS) Tagset has authorized 
              Tamil  is  a  morphologically  very  rich.  Tagging  a                       a  common  tagsets  for  Parts  of  Speech  Tags  for  Indian 
              grammatical  information  to  a  word  is  very  complex.                    Languages on 2010 . Most of the experts in the area of 
              Because  the  word  structure  is  very  much  complex.  The                                          [6]
              words are in Tamil made with a root word with or without                     Natural Language Processing have involved generating this 
              one or more affixes.                                                         tagsets. The research works related to the POS Tags must 
                                                                                           follow  these  BIS  Tagsets.  We  are  also  followed  and 
              Revised Manuscript Received on August 01, 2019                               generated  the  main  tags  from  this  BIS  Tagsets.  The  BIS 
                 Dr. A. Udhayakumar, Professor and Controller of the Examinations at       Tagsets for Tamil is shown 
              Hindustan Institute of Technology and Science, Chennai, India,               below. 
                 M. Rajasekar,  Research Scholar at Hindustan Insitute of Technology        
              and Science, Chennai, India.  
                    Retrieval Number J8886088101920/19©BEIESP                                 Published By: 
                                                                                       741  Blue Eyes Intelligence Engineering  
                    DOI: 10.35940/ijitee.J8886.0881019                                        & Sciences Publication  
                                                                                          
                                                    Advanced Tamil POS Tagger for Language Learners 
                   S. No         Main Tag                      Sub Tags                                            Single 
                     1.             Noun                Common, Proper, Nloc                8.             Subordinate        Palavinpaal               
                     2.           Pronoun            Personal, Reflective, Relative,                               Plural                                   
                                                         Reciprocal, Wh-word                                     Table 4. Pronoun Tags 
                     3.        Demonstrative          Deictic, Relative, Wh-word                                                
                                                       Finite, Non-Finite, Verbal         S.                Descriptio
                                                     Participle, Relative Participle     No       Tag           n           Details in English       Details in Tamil 
                     4.             Verb                Verb, Conditional Verb,                                                                               
                                                    Infinitive Verb, Gerund, Verbal       1.           Direct Verb     TherinilaiVinaimutru                
                                                           Noun, Auxiliary                                   Indirect                                         
                     5.           Adjective                                               2.               Verb         KurippuVinaimutru                  
                     6.            Adverb                                                 3.            Verb Finite         Vinaimutru                     
                     7.          Preposition                                                                   Verb                                           
                     8.          Conjunction           Coordinator, Subordinator          4.             Infinite         Vinaieccham                    
                                                          Default, Classifiers,                               Present 
                     9.           Particles             Interjection, Intensifier,        5.             Tense           Nigazhkaalam                     
                                                               Negation                   6.            Past Tense        IrandhaKaalam                    
                    10.          Quantifiers          General, Cardinals, Ordinals                            Future 
                    11.           Residuals          Foreign, Symbol, Punctuation,        7.              Tense            EthirKaalam                     
                                                        Unknown, Echo words                                        Table 5. Verb Tags 
                               Table 1. BIS Tagsets for Tamil                                                                   
                                                                                                                                        Details 
                                 VI. PROPOSED TAG SETS                                       S.       Tag          Description             in        Details in 
              We need a tag sets to give fully grammatical information for                  No                                         English          Tamil 
              Tamil Literature. It should be in basic level, to satisfy all the              1.              Participle Male       an,aan             , 
              grammar  rules  in  Tamil  language.  This  stimulates  me  to                                                                                
              develop our own HITS POS Tagset for Tamil Language.                            2.                 Participle         l, aal, i          , 
              The proposed Tagsets for Tamil language are as follows:                                                 Female                             , ஐ 
                      S.               Descrip                     Details in                                        Participle          ar, aar           , 
                      N      Tag         tion        Details         Tamil                   3.              Plural Human            pa,            , ப, 
                      o                                                                                                                  maar               
                      1.            Word                                                 4.                 Human               Thu               
                      2.              Word             l                                                          Plural Non-
                                         Human             Thu un         ,     
                      3.       >       n Word           ol                                                           Participle 
                                                                                                                Table 6. Participle Tags 
                      4.            Word             l                                   S.                                    Details in 
                           Table2. Noun Tags (Literature view)                              No         Tag        Description       English       Details in Tamil 
                                                                                             1.             Attrib. Word     IrattaiKilavi             
                 S.     Tag       Description     Details in       Details in Tamil                                 Doubler                                
                No                                 English                                   2.             Attrib. Word     Adukkuthto               
                 1.            Noun of        PorulPeyar                                                        Chains            dar                  
                                    Things                                                   3.            Attrib. Word      PuNarchi                 
                 2.            Noun of        Idappeyar                                                         Coining 
                                     Place                                                                  Noun of       Kaalappeyar                                 4.        D)           Coning,        Thondral                 
                                   Date/Year                                                                       Addition 
                 4.           Noun of       ChinaiPeyar                                                  Coning,        Thirithal                
                 5.            Noun of       Kunappeyar                                                        Alteration 
                                   Qualities                                                                  Coning,        Keduthal                 
                 6.             Action /     ThozhilPeyar                                                       Delete 
                                    Verbal                                                                       Table 7. Attribute Tags 
                                     Noun                                                                                       
                           Table 3. Noun Tags (Grammar view)                               S.             Descriptio      Details in 
                                                                                            N    Tag          n            English             Details in Tamil 
               S.       Tag        Description         Details in     Details in Tamil      o 
               No                                       English                             1.   
First Person Thanmai > ive Letters kal 2. Second Person Munnilai 2. Third Person Padarkai > e Letters kkal 4. Superset Male Aanpaal Table 8. Special Letters Tags Single 5. Superset Female Penpaal Single 6. Superset Plural Palarpaal 7. Subordinate Ondranpaal Published By: Retrieval Number J88860881019/2019©BEIESP Blue Eyes Intelligence Engineering 742 DOI: 10.35940/ijitee.J8886.0881019 & Sciences Publication International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-10, August 2019 S. Details in Details in number of words for its future process. Then it checks No Tag Description English Tamil whether it is noun or verb or other components in the 1. Punctuate.Co KaaLpulli grammar. Then it will forward the words into its own mma process. Then each of the POS tagging will be done with its 2. Punctuate. Aaripulli own tagging machine. Finally we get the exact output for the Semi colon given words or sentences. Punctuate. 3. Colon Mukkalpulli A. System Description: Punctuate. 4. Full Stop Muttruppulli Punctuate. 5. Question Vinaakkuri Mark Punctuate. 6. Exclamation Viyappukkuri Mark Punctuate. IrattaiMerkolK 7. Double uri Quotation Punctuate. OttraiMerkolKu 8. Single ri Quotation Punctuate. 9. Bracket Adaipukkuri Figure 2. Approaches in POS Tagging There are three types of approach in POS Tagger Punctuate. development. 1. Rules Based 2. Stochastic and 3. 10. History Varalaatrukkuri Mark Hierarchical approach. From these three types of approach, Punctuate. OttraiSamakkur we have preferred the rules based approach to design the 11. Hyphen i overall system. The steps followed in the core system are, Step 1: The system gets input from the end user as word or 12. Punctuate. Siluvaikkuri sentences. Plus Sign Punctuate. Natchatthirakku Step 2: it will find the input is word or sentence by checking 13. Star Mark ri the whole input with the corpus annotation. If it is there means it will show the Tagged information of the given word. If it is not available in the corpus, it will go for 14. Punctuate. IrattaiInaippukk Braces uri chunking process. Step 3: In the chunking phase, it will split words from the Table 9. Punctuation Tags given sentences. Then it will check word by word from the corpus These tags sets are defined in details of Tamil Grammar as annotation. completely. These tags may come as single or combined. Step 4: In this phase, every word will be checked at first There are 52 root tags in HITS Tagset. The HITS Tagset is with noun corpus. Then it will go for Verb corpus. Then it mostly focused on Tamil literature. It covers most of the will go for other adjective, adverb, all other corpus. If the grammatical definition in Tamil language. tag set is found in anyone of the corpus it will finish the checking process for that particular word. Finally it will VII. ARCHITECTURE OF TAMIL POS TAGGER show the tagged words with the tag sets. As we discussed about the proposed POS Tagger for Tamil, the overall system architecture of POS Tagger is shown in B. Tagger Development: the following: We have developed a POS Tagger End user environment to Interact with the POS Tagger. It is purely based on Embedded with the Web technologies. It can be used in any kind technological devices. We have used the HTML with PhP Script as development core, and the MS Access as the data storage. The front end user interface has Tamil keys as in webpage. The front end view is shown in the following figure. Figure 1. Overall Architecture In the above figure the POS Tagger architecture is showed. At first we have to give the word or sentences in Tamil, as input. The system will split the sentences into separate Retrieval Number J8886088101920/19©BEIESP Published By: 743 Blue Eyes Intelligence Engineering DOI: 10.35940/ijitee.J8886.0881019 & Sciences Publication Advanced Tamil POS Tagger for Language Learners 6. Kannan in 7. Kannan idam VIII. TESTING OF POS TAGGER The developed POS Tagger has been tested with some set of words for its accuracy. Some of the examples were given below:       Figure 3. Front End  அ  
This POS Tagger front view is very much comfortable for the users they can easily type Tamil words. Like this we have tested around 10,000 root words for its accuracy. It shows 97.04% of accuracy when compared with manual POS Tagging for the same words. When comparing with other POS Tagger for Tamil we have tagged more number of words with its correct form of POS tagsets. We have improved with deep grammatical definitions for Tamil words. IX. RESULTS AND ANALYSIS The POS Tagger for Tamil language is developed as a try to help the Tamil Language Learners to understand the Figure 4. Front end 2 Grammatical POS Tagging. The proposed method is C. Output of the POS Tagger: implemented with the set of tags assigned manually. The By using the user friendly POS Tagger, we can easily type system will check each word in the given sentence and find Tamil words, as well as the result of the Tagged set of words out the exact Tag. The is tested with set of documents for the given input. The following Figure shows that the contains the following number of words. The evaluation output of the given words. result of our POS Tagger is shown in the following table. We have evaluated as states. The analysis of the evaluation is given in the chart. Word Noun / Verb / Attributes / Type Pronoun Adverb Preposition Punctuation / Others Tested 4578 3967 1098 45 Correct 4423 3812 997 42 Accuracy 96.61 96.09 90.80 93.33 Table 10. Test and Evaluation Figure 5. Output of POS Tagger D. Corpus Development: To produce this POS Tagger system, we need to develop such a huge parallel corpus in Tamil – English language, with its appropriate POS Tagsets. I have developed the Parallel corpus contains around 1.8 lakhs of root words with POS Tagsets. When we pass to Morphological Analysis phase these root words will generate 15 times more morphemes with its POS tagsets. But we have focused on detailed grammatical tagsets for the Tamil Words in our corpus. The Morphological Analysis of a particular word is following process for POS Tagging. Noun and Verb are have been regenerated as morphs. It will be available as Figure 6. Analysis Chart Root + Prefix + Infix + Suffix + Stem +Etc. based on the Tense, Person, it will vary from one to another. X. CONCLUSION For Example, This paper describes the improved POS tagger for Tamil The noun, Kannan will be generated as, language efficiently. In the corpus around 1.8 lakh words 1. Kannan Ai has been used. The system tested and compared with manual 2. Kannan Aal POS Tagging. 3. Kannan ukka 4. Kannan ukkaga 5. Kannan udaya Published By: Retrieval Number J88860881019/2019©BEIESP Blue Eyes Intelligence Engineering 744 DOI: 10.35940/ijitee.J8886.0881019 & Sciences Publication
The words contained in this file might help you see if this file matches what you are looking for:

...International journal of innovative technology and exploring engineering ijitee issn volume issue august advanced tamil pos tagger for language learners m rajasekar a udhayakumar abstract in the emerging natural to make is very processing machine translation one important roles challengeable mail challenges tagging text are solving complexity word structure ambiguity another with implementation machines research topic words most basic work simply we say that assign iii objectives parts speech identification each given sentence my i tried main an improved there may be some numerous were made analysis on done same have viewed this different classical grammar collected actual part detailed grammatical used it other identifications proposed goals useful know provide aided keywords improvement tool help students learn easily introduction helpful process computational advancement field linguistic linguistics also called as information iv related works assigning tag every various concepts alr...

no reviews yet
Please Login to review.