Processing Pdf 180851 | Natural Language Processing Using Python

Partial capture of text on file.
                International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017                                                                         19 
                                                                                                                                 
                ISSN 2229-5518
                            NATURAL LANGUAGE PROCESSING USING PYTHON 
                                                                    
                                                 1             2
                                                  Vismaya V,  Darvin Reynald J 
                     1Student (B.Tech) - Department of IT, Sri Krishna College of Technology, Coimbatore 
                      2Student (B.Sc) - Department of Computer Science Application & Software Systems, 
                                          Sri Krishna Arts & Science College, Coimbatore 
                                     mayadevan1210@gmail.com, wowdarvin@gmail.com 
                 
                 
                Abstract-This paper focuses on a simplified          engineering, artificial intelligence & robotics, 
                Natural Language Processing (NLP) system             and psychology. NLP researchers aim to 
                using Python and Raspberry Pi. Natural               gather knowledge on how human beings use 
                language processing systems have been used           and manipulate natural languages to perform 
                in a wide range of tech industries ranging           desired tasks so that appropriate tools and 
                from medical, defense, consumer, corporate.          techniques can be developed. Applications of 
                Most NLP systems used currently requires a           NLP include a number of fields of study such 
                subsidiary processing hardware and a  as multilingual and cross-language 
                default OS. The system proposed in this              information retrieval (CLIR),           machine 
                paper is a standalone NLP system which is            transaction, natural language, text processing 
                open source and can be accessed in remote            and summarization, user interfaces, speech 
                locations using a simple hardware  recognition, artificial intelligence and expert 
                component. The processes including voice             systems. 
                extraction, speech to text conversion, text                   
                processing and database management and                     II      LITERATURE REVIEW 
                speech synthesis have been explained in                           
                                                                             NLP researchers aim to gather 
                                  IJSER
                detail along with the python modules used to 
                build the system. By minimizing the  knowledge on how human beings tend to 
                hardware components and using open  understand and use the   language so that 
                source software, a universal, adaptable NLP          appropriate tools and techniques can be 
                system has been proposed.                            developed to make computer systems 
                                                                     understand and manipulate natural languages 
                Keywords: NLP (Natural language 
                processing), RaspberryPI, speech to text             to perform the desired  [1][4]  Phonological 
                conversion, synthesize.                              rules are captured through machine learning 
                                                                     on training sets. Pronunciation  dictionaries 
                                                                     are also used for both text-to-speech and 
                            I   INTRODUCTION                         automatic speech recognition. Sounds as well 
                                                                     as words can be predicted by using the 
                        Natural Language Processing (NLP) is         conditional probability theory [7][6] the input 
                an area of application and research that             to a speech recognizer is a series of acoustic 
                explores how computers can be used to                waves. The waves are then sampled, 
                understand and manipulate natural language           quantified and literally converted to spectral 
                speech or text to do useful things. The              representation.  The method of Conditional 
                foundation of NLP lie in a number of  probability is then used to evaluate each 
                disciplines, namely, computer and vector of the spectral representation with a 
                information sciences, linguistics, system of stored phonetic representation. 
                mathematics, electrical and electronic  Decoding is the process of finding the optimal 
                                                                   IJSER © 2017 
                                                                 http://www.ijser.org 
                                                                    
                 
                International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017                                                                         20 
                                                                                                                                    
                ISSN 2229-5518
                sequence of input observations. Each 
                successful match is later used in embedded 
                training  –  a method for training speech 
                recognizers.  [2]  [3]  Python and NLTK 
                Module  are  mandatory for the following 
                tasks. NLTK module is included as follows:  
               Part of Speech tagging and categorizing 
               words >>> text = nltk.word_tokenize("And 
               now for something completely different") 
               >>>nltk.pos_tag(text) 
                       Table 1 Part of Speech tagging and 
                               categorizing words 
                 
                        The main intention  of designing the 
                raspberry pi board is to increase the                                                                   
                encouragement on learning, experimentation                          Fig. 1.  Raspberry Pi 2 
                and innovation for students. The raspberry pi                                    
                board is portable and low cost. Maximum of                     The Pi comes with 512MB of RAM. 
                the raspberry pi computers is used in mobile           Programs are stored on the SD card and the Pi 
                phones [8].                                            is powered on. They are copied into the much 
                                                                       faster RAM until the computer is turned off 
                         III. CATEGORIZING THE                         and the RAM is cleared. One of the most 
                                COMPONENTS                             convenient aspects of Raspberry Pi is that you 
                        In this section we categorize the              can convert it from a media player to a 
                necessary requirement for the process as               desktop computer just by swapping out the 
                hardware and software based upon the proper 
                                  IJSER
                usage of those parts.                                  SD card. This is easier than removing a 
                                                                       laptop’s hard disk. A single chip contains the 
                    A) HARDWARE COMPONENTS                             pi’s memory, central processing unit, and 
                    The components needed for NLP  graphics chip. The version used in the pi is 
                implementation can be summarized in the                slower than the ones in i-pad and others but it 
                following way:                                         is fast enough to do the job. 
                                                                               The architecture of Raspberry Pi is 
                1)      Raspberry Pi                                   shown in Fig. 2. 
                        Unlike CPU, the Graphics Processing 
                Unit on the Pi is equivalent to that in a high 
                specification mobile device. It can run 3D 
                games and play high-definition video. With 
                the right software, a TV and a broadband link 
                you can have i-Player, YouTube and other 
                videos services at your fingertips. Python is 
                intended as an integral part of the ‘standard’ 
                teaching toolkit. 
                An Outlook model of Raspberry Pi is shown                                                               
                in Fig. 1.                                                   Fig. 2. Architecture of Raspberry Pi 
                                                                        
                                                                    IJSER © 2017 
                                                                   http://www.ijser.org 
                                                                     
                 
                International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017                                                                         21 
                                                                                                                                  
                ISSN 2229-5518
                2)      MICROPHONE                                            An operating system is the set of basic 
                In general, a microphone  is any device               programs and utilities that make our computer 
                capable of recording a voice.It is used as an         run. At the core of an operating system is the 
                input device for inputting the voice.Usually          kernel. The kernel is the most fundamental 
                the microphone is installed in a CD drive, but        program on the computer and lets you start 
                in the case of raspberry pi it is downloaded as       other programs.  
                a driver as it is required. Later the                          
                Microphone is given a source code or a name                   Debian systems use the Linux kernel 
                for instance to be called during the process.         which is a piece of software. FreeBSD is an 
                                                                      operating system including a kernel and other 
                                                                      software in it.  
                SPEECH RECOGNITION FROM 
                MICROPHONE:                                                    
                Import speech_recognition as sr                               However, the work is in progress to 
                #obtain audio from microphone                         provide Debian for other kernels. The Hurd is 
                r=sr.Recognizer()                                     a collection of servers to implement different 
                withsr.Microphone() as source:                        features that run on top of a microkernel. Like 
                printf(“say something!”)                              a tower-at the base is the kernel, on top of it 
                audio=r.listen(source)                                are all the basic tools. Next is the software 
                                                                      that runs on the computer. At the top of the 
                3)      SPEAKER                                       tower is Debian. 
                    Speaker is used as an output device for                    
                sending out the converted text to speech              4)      POCKETSPHINX 
                response.                                                     Pocketsphinx is a library that depends 
                                                                      on another library called SphinxBase. It is a 
                       B)  SOFTWARE COMPONENTS                        lightweight speech recognition engine. To 
                1)      LINUX                                         install Pocketsphinx, you need to install both 
                                  IJSER
                        Linux is an open source operating             Pocketsphinx and Sphinxbase. Pocketsphinx 
                system for computers, mainframes, servers,            can be used in Linux, Windows, MacOS, 
                mobile devices and embedded devices. The              iPhone and Android. In my paper I am using 
                Linux OS includes the Linux kernel as well as         this pocketsphinx as a speech to text 
                supporting tools and libraries. Popular Linux         conversion engine. It is converted as an image 
                OS distributions include Debian, Ubuntu,              file and extracted for execution. 
                Fedora, Red Hat, etc.,  here  we are using                     
                Debain and the reason is specified.                   5)      IBM 
                                                                              The IBM Speech to  Text services 
                2)      PYTHON                                        provides an API that enables you to add 
                        One of the advantages  of Python is           IBM’s speech recognition capabilities to your 
                that it allows us to type directly into the           applications. The service transcribes speech 
                interactive interpreter. We can access the            from various languages and audio formats to 
                Python interpreter using a graphical interface        text with low latency. This service can also be 
                called the Interactive Development used instead of pocketsphinx as this provides 
                Environment (IDLE). Python very closely               both broadband and narrowband. 
                resembles the English language. In this paper                  
                the functions are called using python.                         
                                                                               
                3)      DEBIAN 
                                                                   IJSER © 2017 
                                                                  http://www.ijser.org 
                                                                    
                 
                International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017                                                                         22 
                                                                                                                                
                ISSN 2229-5518
                                                                     predefined state. The sphinx base is used as a 
                RECOGNIZING             SPEECH USING 
                SPHINX:                                              basic layer for the conversion of the speech 
                Try:                                                 text. 
                Printf(“sphinx thinks you A language module is created in the 
                said”+r.recognize_sphinx(audio))                     beginning which contains all the predefined 
                exceptsr.UnknownValueError:                          sentences.  The text is matched with the 
                printf(“sphinx could not understand audio”)          module and verified. If the texts match a 
                exceptsr.RequestErroe as e:                          positive response is picked from the database. 
                printf(“sphinx error; {0}”.format(e))                If the inputted text doesn’t match with the 
                                                                     database module the response is searched via 
                                                                     online speech recognition modules and the 
                PROCESSING TECHNIQUE:                                matched  database is sent for further 
                The whole conversion process is classified           processing.      Below is the systematic 
                into two main sections as follows                    representation of the input-output module: 
                1)      Speech to text recognition                     Speech to                                  Text to 
                2)      Text to speech conversion                        text                                     speech 
                                                                      
                Speech to text recognition                              module                                    module 
                •       Before the process begins we must             
                install the speech recognition module, which                               RASPBERRY               Speake
                is the Pocketsphinx as of here.Installation            MIC                      PI                    r 
                ofpocketsphinx is easy and it requires   
                installation of three components altogether.          
                They are thesphinxbase,pocketsphinx,and               
                pocketsphinx-python.                                  
                •       SphinxBase is the base package that           
                                  IJSER
                all of the other Sphinx programs use                                         Python 
                •       PocketSphinx is the lightweight                                   programming 
                recognizer to decode phrases faster                   
                •       PocketSphinx-python is the wrapper           Fig. 3. Python Programming Block Diagram  
                to allow us to program in the best scripting          
                language ever.                                       Text to speech recognition 
                Speech recognition can be achieved in many           The  converted and processed text is now 
                ways on Linux (and so on the Raspberry Pi).          again converted to speech. To convert it into 
                •       Speech Recognition Toolkit                   speech a module called festival is used.  
                •       Installing build tools and required           
                libraries                                            Festival is a free text to speech tool. When 
                •       Building Sphinxbase                          we pass a text file to festival, it converts the 
                •       Building PocketSphinx                        contents of the text file into voice.  
                •       Creating a Language Model                    Installation of festival is also very simple. 
                The  user sends in the input speech to the           •      sudo apt-get install festival 
                microphone.  The voice is detected and the           This is used to install festival. 
                code sets up the microphone and saves each           •      Try out Festival with: 
                phrase detected as a temporary file. This file       echo “Just what do you think you're doing, 
                is decoded by the sphinx decoder and is              Dave?” | festival --tts 
                translated into a list of strings in the 
                                                                  IJSER © 2017 
                                                                 http://www.ijser.org
The words contained in this file might help you see if this file matches what you are looking for:

...International journal of scientific engineering research volume issue may issn natural language processing using python vismaya v darvin reynald j student b tech department it sri krishna college technology coimbatore sc computer science application software systems arts mayadevan gmail com wowdarvin abstract this paper focuses on a simplified artificial intelligence robotics nlp system and psychology researchers aim to raspberry pi gather knowledge how human beings use have been used manipulate languages perform in wide range industries ranging desired tasks so that appropriate tools from medical defense consumer corporate techniques can be developed applications most currently requires include number fields study such subsidiary hardware as multilingual cross default os the proposed information retrieval clir machine is standalone which transaction text open source accessed remote summarization user interfaces speech locations simple recognition expert component processes including v...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area