224x Filetype PDF File size 0.13 MB Source: aclanthology.org
AMetagrammarforVietnameseLTAG 129
AMetagrammarforVietnameseLTAG
LêHồngPhương NguyễnThịMinhHuyền AzimRoussanaly
LORIA/INRIALorraine HanoiUniversity of Science LORIA/INRIALorraine
Nancy, France Hanoi, Vietnam Nancy, France
lehong@loria.fr huyenntm@vnu.edu.vn azim@loria.fr
Abstract of natural language processing in general and in
We present in this paper an initial inves- the task of parsing Vietnamese in particular. No
tigation into the use of a metagrammar work on formalizing Vietnamese grammar is re-
for explicitly sharing abstract grammati- ported before (Nguyen et al., 2004). In (Lê et
cal specifications for the Vietnamese lan- al., 2006), basic declarative structures and comple-
guage. Wefirst introduce the essential syn- ment clauses of Vietnamese sentences have been
tactic mechanisms of the Vietnamese lan- modeled using about thirty elementary trees, rep-
guage. We then show that the basic sub- resenting as many subcategorization frames. We
categorization frames of Vietnamese can show in this paper that these basic subcatego-
be compactly represented by classes us- rization frames can be compactly represented by
ing the XMGformalism(eXtensible Meta- classes in XMG formalism.
Grammar). Finally, we report on the im- Wefirst introduce the essential syntactic mech-
plementation the first metagrammar pro- anisms of the Vietnamese language. We then show
ducing verbal elementary trees recogniz- that the basic subcategorization frames of Viet-
ing basic Vietnamese sentences. namese can be compactly represented by classes
using the XMG formalism. We then report on the
1 Introduction implementation the first metagrammar producing
Metagrammars (MG) have recently emerged as a verbal elementary trees recognizing basic Viet-
means to develop wide-coverage LTAG for well- namese sentences, before concluding.
studied languages like English, French and Ital- 2 Vietnamese Subcategorizations
ian (Candito, 1999; Kinyon, 2003). MGs help
avoid redundancy and reduce the effort of gram- As for other isolating languages, the most impor-
mardevelopment bymaking useofcommonprop- tant syntactic information source in Vietnamese is
erties of LTAG elementary trees. wordorder. Thebasic wordorder isSubject –Verb
We present in this paper an initial investiga- – Object. A verb is always placed after the sub-
tion into the use of a metagrammar for explic- ject in both predicative and question forms. In a
itly sharing abstract grammatical specifications for noun phrase, the main noun precedes the adjec-
the Vietnamese language. We use the eXtensible tives and the genitive follows the governing noun.
MetaGrammar (XMG) tool which was developed The other syntactic means are function words,
byCrabbé(Crabbé,2005;ParmentierandL.Roux, reduplication, and, in the case of spoken language,
2005) to compile a TAG for Vietnamese. The built prosody (Nguyễn et al., 2006).
grammar is called vnMG and is made available From the point of view of functional gram-
1
online for free access . mar, the syntactic structure of Vietnamese fol-
Only in recent years have Vietnamese re- lows a topic-oriented structure. It belongs to the
searchers begun to be involved in the domain topic-prominent languages as described by (Li and
1http://www.loria.fr/∼lehong/tools/vnMG.php Thompson, 1976). In those languages, topics are
Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms
Tübingen, Germany. June 6-8, 2008.
130 Le, Nguyen and Roussanaly
codedinthesurfacestructure andthey tend tocon- is feeble., Học cũng là làm việc / To study is
trol co-referentiality. The topic-oriented “double to work.
subject” construction is a basic sentence type. For 2.3 ThirdTypePredicates
example, “Cậu ấy khoẻ mạnh, là sinh viên y khoa
/ He strong, be student medicine”, which means Thethirdtypepredicates arepredicates whichcon-
that “Heis strong, he is medicine student”. In Viet- nect directly to their subjects in the declarative
namese, passive voice and cleft subject sentences form; however in the negative form, they are con-
are rare or non-existent. nected to their subjects by a copula. Predicates of
In general, Vietnamese predicates may be clas- this type are usually
sified into three types depending on the need of a • A clause: Nó vẫn tên là Quþt. / His name is
copula connecting them with their subjects in the still Quþt.
declarative and negative forms (Nguyễn, 2004).
Complexpredicates canbeconstructed toformco- • A composition of a numeral and a noun: Lê
ordinated predicative structures starting from these này mười ngàn đồng. / This pear costs ten
basic types of predicates. We present briefly these thousand dongs.
three types of Vietnamese predicates in the follow-
ing subsections. • A composition of a preposition and a noun:
Lúanày của chị Hoa. / This is the rice of Ms.
2.1 First Type Predicates Hoa.
Thefirst type predicates are predicates which con- • An expression: Thằng ấy đầu bò đầu bướu
nect directly to their subjects without the need of lắm. / That guy is very stubborn.
a copula in both of the declarative and negative
forms. For example 2.4 Subcategorizations
• Declarative form:Tôiđọcsách. /Iamreading In the first grammar LTAG for Vietnamese pre-
books. sented in (Lê et al., 2006), each subcategorization
is represented by the same structure of elemen-
• Negative form: Tôi không đọc sách. / I am not tary trees associcated with a considered predicate.
reading books. We view that the suject is subcategorized in the
These predicates are assumed by verbal phrases or same way like arguments. The verbs anchor thus
adjectival phrases. Thefact that an adjective can be elementary trees composed of a node for the sub-
a predicate is a specificity of Vietnamese in com- ject and one or more nodes for each of its essential
parison with predicates of occidental languages. In complements.
English or French for instance, only verbal phrases Wefollow the de facto standard that in TAG, in
can be predicates, adjectives in these languages al- which each subcategorization is represented by a
wayssignify properties of subjects and they are al- family of elementary trees. We define families of
waysfollowed the verb “to be” in English or “être” verbal elementary trees in the Table 1.
in French. We present in the next section a metagrammar
that generates this set of elementary trees.
2.2 SecondTypePredicates
The second type predicates are predicates which 3 AMetagrammarforVerbalTrees
are connected to their subjects by the copula “là” The subcategorizations of elementary trees de-
in the declarative form and by copulas “không là” scribe only “canonical” constructions of predica-
or“khôngphải”,or“khôngphảilà”inthenegative tive elements without taking into account for rela-
form. Predicates of this type are rather rich. They tive or question structures. For the purpose of in-
can be: vestigation, we constraint ourselves in developing
• Nouns or noun phrases: Tôi là sinh viên. / I at the first stage only the verb spines and argument
amstudent. realizations shown in the subcategorizations pre-
sented in the previous section.
• Verbs, adjectives, verbal phrases or adjecti- We have developed a XMG metagrammar that
val phrases: Van xin là yếu đuối. / Begging consists of 11 classes (or tree fragments). The
Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms
Tübingen, Germany. June 6-8, 2008.
AMetagrammarforVietnameseLTAG 131
Subcategorizations Families Examples S
Intransitive N V ngủ/sleep
0
With a nominal N VN đọc/to
0 1 N ↓ PredP
complement read 0
With a clausal N VS tin/to be-
0 1
complement lieve tôi V⋄ N1 ↓
With modal com- N V V mong/to
0 0 1
plement wish đọc sách
Ditransitive N VN N cho/to
0 1 2
give Figure 1: Declarative transitive structure αn0V n1
Ditransitive with a N VN ON vay/to
0 1 2
preposition borrow
Ditransitive with a N V N V lãnh
0 0 1 1 4 Conclusion and Future Work
verbal complement đạo/to
lead This paper presents an initial investigation into
Ditransitive with an N VN A làm/to the use of XMG formalism for developing a first
0 1
adjectival comple- make metagrammar producing a LTAG for Vietnamese
ment which recognizes basic verbal constructions. We
Movement verbs N V V N ra/to go have shown that the essential subcategorization
0 0 1 1
with a nominal out frames ofVietnamese predicates can be effectively
complement encoded by means of XMG classes while retain-
Movement verbs N V AV trở nên/to ing basic properties of the realized verbal trees.
0 0 1
with an adjectival become Thisconfirms that various syntactic phenomena of
complement Vietnamese can be covered in a Vietnamese MG.
Movementditransi- N V N V N chuyển/to The first evaluation of the MG for Vietnamese
0 0 1 1 2
tive transfer is promising but the lexical coverage has to be
improved further. Moreover, the grammar cover-
Table 1: Subcategorizations of Vietnamese verbs age needs to be revised by refining the constraints
of agrammatical syntactic constructions. Although
metagrammar is currently able to produce the there are not many tree fragments in the current
same set of elementary trees described in Table 1 metagrammar, we find that the current MG over-
including intransitive, transitive, ditransitive fami- generates some undesired structures. The MG will
lies with and/or without optional complements. As also be extended to deal with constructions not yet
an illustration, the declarative transitive structure covered like adjectival and noun phrase construc-
in Figure 1 can be defined by combining a canon- tions. We also intend to generate a test suite to doc-
ical subject fragment with an active verb and a ument the grammars and perform realistic evalua-
canonical object fragment. tions.
There is an existing work on the development
S + S + S of metagrammars for not frequently studied lan-
guages like Korean and Yiddish and their rela-
tions to a German grammar (Kinyon, 2006). They
N↓ PredP V PredP showed that cross-linguistic generalizations, for
example the verb-second phenomenon, can be in-
corporated into a multilingual MG. We think that
V V N↓ a comparison of the Vietnamese MG with this
This combination is conveniently expressed by work would be useful. In particular, a study of the
a statement in terms of XMG language as usual: relative position of verbs and arguments of Viet-
namese and relate it to this work would be benefi-
tial.
TransitiveVerb = Subject ∧ ActiveVerb ∧Object:
Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms
Tübingen, Germany. June 6-8, 2008.
132 Le, Nguyen and Roussanaly
References
Marie-Hélène Candito. 1999. Représentation modu-
laire et paramétrable de grammaires électroniques
lexicalisées : application au franc¸ais et à l’italien.
Doctoral Dissertation, Université Paris 7.
Benoit Crabbé. 2005. Représentation informatique de
grammairesfortement lexicalisées. Doctoral Disser-
tation, Université Nancy 2.
Nguyễn Thị Minh Huyền, Laurent Romary, Mathias
Rossignol and Vũ Xuân Lương. 2006. A Lexicon
for VietnameseLanguageProcessing. LanguageRe-
sources and Evaluation, Vol. 40, No. 3–4.
Kinyon A. and Rambow O. 2003. Using the Meta-
Grammar to generate cross-language and cross-
framework annotated test-suites. In Proc. LINC-
EACL,Budapest.
Alexandra Kinyon and Carlos A. Prolo. 2002. A Clas-
sification of Grammar DevelopmentStrategies. Pro-
ceedingsoftheWorkshoponGrammarEngineering,
Taipei, Taiwan.
Kinyon, Alexandra and Rambow, Owen and Schef-
fler, Tatjana and Yoon, SinWon and Joshi, Aravind
K. 2006. The Metagrammar Goes Multilingual: A
Cross-Linguistic Look at the V2-Phenomenon. Pro-
ceedings of the Eighth International Workshop on
Tree Adjoining Grammar and Related Formalisms,
Sydney,Australia
Lê Hồng Phương, Nguyễn Thị Minh Huyền, Laurent
Romary, Azim Roussanaly. 2006. A Lexicalized
Tree-Adjoining Grammar for Vietnamese. Proceed-
ings of LREC 2006,Genoa, Italia.
Thanh Bon Nguyen, Thi Minh Huyen Nguyen, Lau-
rent Romary, Xuan Luong Vu. 2004. Developing
Tools and Building Linguistic Resources for Viet-
namese Morpho-Syntactic Processing. Proceedings
of LREC2004,Lisbon,Portugal.
Charles N. Li and Sandra A. Thompson. 1976. Subject
and topic: a new typology of language. In Charles
N. Li (ed.). Subject and Topic. London/New York:
AcademicPress, pp. 457-489..
Yannick Parmentier and Joseph L. Roux. 2005. XMG:
a Multi-formalism Metagrammar Framework. Pro-
ceedings of the Tenth ESSLLI Student Session.
Nguyễn Minh Thuyết and Nguyễn Văn Hiệp. 2004.
ThànhphầncâutiếngViệt. NXBGiáodục,HàNội,
Vietnam.
Proceedings of The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms
Tübingen, Germany. June 6-8, 2008.
no reviews yet
Please Login to review.