305x Filetype PDF File size 1.22 MB Source: globaljournals.org
Global Journal of Computer Science and Technology: F
Graphics & vision
Volume 17 I
ssue 2 Version 1.0 Year 2017
Type: Double Blind Peer Reviewed International Research Journal
Publisher: Global Journals Inc. (USA)
Online ISSN: 0975-4172 & Print ISSN: 0975-4350
Towards Arabic Alphabet and Numbers Sign Language
Recognition
By Ahmad Hasasneh & Sameh Taqatqa
Palestine Ahliya University
Abstract- This paper proposes to develop a new Arabic sign language recognition using Restricted
Boltzmann Machines and a direct use of tiny images. Restricted Boltzmann Machines are able to
code images as a superposition of a limited number of features taken from a larger alphabet.
Repeating this process in deep architecture (Deep Belief Networks) leads to an efficient sparse
representation of the initial data in the feature space. A complex problem of classification in the input
space is thus transformed into an easier one in the feature space. After appropriate coding, a
softmax regression in the feature space must be sufficient to recognize a hand sign according to the
input image. To our knowledge, this is the first attempt that tiny images feature extraction using deep
architecture is a simpler alternative approach for Arabic sign language recognition that deserves to
be considered and investigated.
Keywords: component; arabic sign language recognition, restricted boltzmann machines, deep belief
networks, softmax regression, classification, sparse representation.
GJCST-FClassification: I.5, I.7.5
TowardsArabicAlphabetandNumbersSignLanguageRecognition
Strictly as per the compliance and regulations of:
© 2017. Ahmad Hasasneh & Sameh Taqatqa. This is a research/review paper, distributed under the terms of the Creative Commons
Attribution-Noncommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use,
distribution, and reproduction inany medium, provided the original work is properly cited.
owards Arabic Alphabet and Numbers Sign
T
Language Recognition
α σ
Ahmad Hasasneh & Sameh Taqatqa
Abstra
ct- This paper proposes to develop a new Arabic sign features can be used as a reference to understand the
language recognition using Restricted Boltzmann Machines differences among the classes.
and a direct use of tiny images. Restricted Boltzmann Recognizing and documenting of ArSL have
Machines are able to code images as a superposition of a only been paid attention recently, where few attempts 2017
limited number of features taken from a larger alphabet. have investigated and addressed this problem, see for
Repeating this process in deep architecture (Deep Belief example [8]–[11]. The question of ArSL recognition is Year
Networks) leads to an efficient sparse representation of the therefore a major requirement for the future of ArSL. It
initial data in the feature space. A complex problem of 15
classification in the input space is thus transformed into an facilitates the communication between the deaf and
easier one in the feature space. After appropriate coding, a normal people by recognizing the alphabet and
softmax regression in the feature space must be sufficient to numbers signs of Arabic sign language to text or
recognize a hand sign according to the input image. To our speech. To achieve that goal, this paper proposes a
knowledge, this is the first attempt that tiny images feature new Arabic sign recognition system based on new
extraction using deep architecture is a simpler alternative machine learning methods and a direct use of tiny
approach for Arabic sign language recognition that deserves images.
to be considered and investigated. The rest of the paper is organized as follows.
Keywords: component; arabic sign language
recognition, restricted boltzmann machines, deep belief Section2 presents the current approaches to Arabic
alphabet sign language recognition (ArASLR). Section 3
networks, softmax regression, classification, sparse
describes the proposed model for ArASLR. Conclusions
representation. and future works are presented in section 4.
I. Introduction )
II. Current Approaches (F
ign language continues to be the best method to
Studies in Arabic sign language recognition,
Scommunicate between the deaf and hearing
impaired. Hand gestures enable communication although not as advanced as those devoted to other
between deaf people during their daily lives rather than scripts (e.g. Latin), have recently shown interest [8]–
speaking. In our society, Arabic Sign Language (ArSL) is [11]. We have also seen that current research in ArSLR
only known for deaf people and specialists, thus the has only been satisfactory for alphabet recognition with
community of deaf people is narrow. To help people accuracy exceeding 98%. Isolate Arabic word
with normal hearing communicate effectively with the recognition has only been successful with medium-size
deaf and the hearing-impaired, numerous systems have vocabularies (less than 300 signs). On the other hand,
been developed for translating diverse sign languages continuous ArSLR is still in its early stages, with very
from around the world. Several review papers have been restrictive conditions.
published that discuss such systems and they can be Current approaches on sign language
found in [1]–[7]. recognition usually falls into two major approaches. The
Generally, the process of ArSL recognition first one is sensors based approaches, which employs
(ArSLR) can be achieved through two main phases: sensors attached to the glove. Look-up table software is
detection and classification. In stage one, each given usually provided with the glove to be used for hand
image is pre-processed, improved, and then the regions gesture recognition. Recent sensors based approaches
of interest (ROI) is segmented using a segmentation can be found, for instance, in [11]–[14]. The second
algorithm. The output of the segmentation process can approaches, vision-based analysis, are based on the Global Journal of Computer Science and Technology Volume XVII Issue II Version I
thus be used to perform the sign recognition process. use of video cameras to capture the movement of the
Indeed, accuracy and speed of detection play an hand that is sometimes aided by making the signer wear
important role in obtaining accurate and fast recognition a glove that has painted areas indicating the positions of
process. In the recognition stage, a set of features the fingers and the wrist then use those measurements
(patterns) for each segmented hand sign is first in the recognition process. Image-based techniques
extracted and then used to recognize the sign. These exhibit a number of challenges. These include: lighting
Auth conditions, image background, face and hands
or α σ: Information Technology Department Palestine Ahliya segmentation, and different types of noise.
University Bethlehem, West Bank, Palestine.
e-mails: ahasasneh@paluniv.edu.ps, sameh@paluniv.edu.ps
©2017 Global Journals Inc. (US)
wards Arabic Alphabet and Numbers Sign Language Recognition
To
Among of image-based approaches, some focuses on static and simple moving gestures. The
authors [15] introduced a method for automatic inputs are color images of the gestures. To extract the
recognition of Arabic sign language alphabet. For skin blobs, the YCbCr space is used. The Prewitt edge
feature extraction, Hus moments were used followed by detector is used to extract the hand shape. To convert
support vector machines (SVMs) to perform the the image area into feature vectors, principal component
classification process. A correct recognition rate of 87% analysis (PCA) is used with a K-Nearest Neighbor
was achieved. Other authors in [16] developed a neuro- Algorithm (KNN) in the classification stage. Furthermore,
fuzzy system. The proposed system includes five main the authors in [22] and [23] proposed a pulse-coupled
steps: image acquisition, filtering, segmentation, and neural network (PCNN) ArSLR system able to
hand outline detection, followed by feature extraction. compensate for lighting nonhomogeneity and
Bare hands were considered in the experiments, background brightness. The proposed system showed
2017achieving a recognition accuracy of 93.6%. In [17], the invariance under geometrical transforms, bright
authors proposed an adaptive neuro-fuzzy inference background, and lighting conditions, achieving a
Yearsystem for alphabet sign recognition. A colored glove recognition accuracy of 90%. Moreover, the authors in
was used to simplify the segmentation process, and [24] introduced an Arabic Alphabet and Numbers Sign
16 geometric features were extracted from the hand region. Language Recognition (ArANSLR). The phases of the
The recognition rate was improved to 95.5%. In [18], the proposed algorithm consists of skin detection,
authors developed an image-based ArSL system that background exclusion, face and hands extraction,
does not use visual markings. The images of bare feature extraction, and also classification using Hidden
hands are processed to extract a set of features that are Markov Model (HMM). The proposed algorithm divides
translation, rotation, and scaling invariant. A recognition the rectangle surrounding by the hand shape into zones.
accuracy of 97.5% was achieved on a database of 30 The best number of zones is 16 zones. The observation
Arabic alphabet signs. In [19], the authors used of HMM is created by sorting zone numbers in
recurrent neural networks for alphabet recognition. A ascending order depending on the number of white
database of 900 samples, covering 30 gestures pixels in each zone. Experimental results showed that
performed by two signers, was used in their the proposed algorithm achieves 100% recognition rate.
experiments. The Elman network achieved an accuracy On the other hand, new systems for facilitating
) rate of 89.7%, while a fully recurrent network improved human machine interaction have been introduced
F the accuracy to 95.1%. The authors extended their work recently. In particular, the Microsoft Kinect and the leap
(by considering the effect of different artificial neural motion controller (LMC) have attracted special attention.
network structures on the recognition accuracy. In The Kinect system uses an infrared emitter and depth
particular, they extracted 30 features from colored sensors, in addition to a high resolution video camera.
gloves and achieved an overall recognition rate of 95% The LMC uses two infrared cameras and three LEDs to
[20]. capture information within its interaction range.
A recent paper reviews the different systems However, the LMC does not provide images of detected
and methods for the automatic recognition of Arabic objects. The LMC has recently been used for Arabic
sign language can be found in [7]. It highlights the main alphabet sign recognition with promising results [25].
challenges characterizing Arabic sign language as well After presenting the different existing image-
as potential future research directions. Recent works on based approaches that have been used to achieve
image-based recognition of Arabic sign language ArASLR, we have noted that these approaches generally
alphabet can be found in [9], [10], [21]–[25]. In include two main phases of coding and classification.
particular, Naoum et al. [9] proposes an ArSLR using We have also seen that most of the coding methods are
KNN. To achieve good recognition performance, they based on hand-crafted feature extractors, which are
proposed to combine this algorithm with a glove based empirical detectors. By contrast, a set of recent
analysis technique. The system starts by finding methods based on deep architectures of neural
histograms of the images. Profiles extracted from such networks give the ability to build it from theoretical
histograms are then used as input to a KNN classifier. considerations.
Global Journal of Computer Science and Technology Volume XVII Issue II Version I Mohandes [10] proposes a more sophisticated ArSLR therefore requires projecting images onto
recognition algorithm to achieve high performance of an appropriate feature space that allows an accurate
ArSLR. The first attempt to recognize two-handed signs and rapid classification. Contrarily to these empirical
from the Unified Arabic Sign Language Dictionary using methods mentioned above, new machine learning
the CyberGlove and SVMs to perform the recognition methods have recently emerged which strongly related
process. PCA is used for feature extraction. The authors to the way natural systems code images [26]. These
in [21] proposed an Arabic sign language alphabet methods are based on the consideration that natural
recognition system that converts signs into voice. The image statistics are not Gaussian as it would be if they
technique is much closer to a real-life setup; however, have had a completely random structure [27]. The auto-
recognition is not performed in real time. The system similar structure of natural images allowed the evolution
©20
1 Journa ls Inc. (US)
7 Global
wards Arabic Alphabet and Numbers Sign Language Recognition
To
to build optimal codes. These codes are made of DBNs coupled with tiny images can also be successfully
statistically independent features and many different used in the context of ArASLR.
methods have been proposed to construct them from III. Proposed Model
image datasets. Imposing locality and sparsity
constraints in these features is very important. This is The methodology of this research mainly
probably due to the fact that any simple algorithms includes four stages (see figure 1) which can be
based on such constraints can achieve linear signatures summarized as follows: 1) data collection and image
similar to the notion of receptive field in natural systems. acquisition, 2) image pre-processing, 3) feature
Recent years have seen an interesting interest in extraction and finally 4) gesture recognition.
computer vision algorithms that rely on local sparse a) Description of the Database
image representations, especially for the problems of
image classification and object recognition [28]–[32]. The alphabet used for Arabic sign language is
Moreover, from a generative point of view, the displayed in Figure 2, left [38], will be used to 2017
effectiveness of local sparse coding, for instance for investigate the performance of the proposed model. In
image reconstruction [33], is justified by the fact that an this database, the signer performs each letter Year
natural image can be reconstructed by a smallest separately. Mostly, letters are represented by a static
17
posture, and the vocabulary size is limited. In this
possible number of features. It has been shown that
Independent Component Analysis (ICA) produces section, several methods for image-based Arabic sign
localized features. Besides it is efficient for distributions language alphabet recognition are discussed. Even
with high kurtosis well representative of natural image though the Arabic alphabet only consists of 28 letters,
statistics dominated by rare events like contours; Arabic sign language uses 39 signs. The 11 additional
however the method is linear and not recursive. These signs represent basic signs combining two letters. For
two limitations are released by DBNs [34] that introduce example, the two letters “ال” are quite common in Arabic
nonlinearities in the coding scheme and exhibit multiple (similar to the article “the” in English). Therefore, most
layers. Each layer is made of a RBM, a simplified version literature on ArASLR uses these basic 39 signs.
of a Boltzmann machine proposed by Smolensky [35] b) Image Pre-processing
and Hinton [36]. Each RBM is able to build a generative The typical input dimension for a DBN is
statistical model of its inputs using a relatively fast approximately 1000 units (e.g. 30x30 pixels). Dealing )
learning algorithm, Contrastive Divergence (CD), first with smaller patches could make the model unable to F
introduced by Hinton [36]. Another important extract interesting features. Using larger patches can be (
characteristic of the codes used in natural systems, the extremely time-consuming during feature learning.
sparsity of the representation [26], is also achieved in Additionally the multiplication of the connexion weights
DBNs. Moreover, it has been shown that these acts negatively on the convergence of the CD algorithm.
approaches remain robustness to extract local sparse The question is therefore how could we scale the size of
efficient features from tiny images [37]. This model has realistic images (e.g. 300x300 pixels) to make them
been successfully used in [32] to achieve semantic appropriate for DBNs?
place recognition. The hope is to demonstrate that
Global Journal of Computer Science and Technology Volume XVII Issue II Version I
Figure 1: Proposed model
©2017 Global Journals Inc. (US)
no reviews yet
Please Login to review.