Friday, February 15, 2019

Universal Dependencies

Tjo3ya: Creating article


'''Universal Dependencies''', frequently abbreviated as '''UD''', is an international cooperative project to create [[Treebank|treebanks]] of the world's languages. The treebanks are openly accessible and available for all purposes having to do with automated text processing in the field of NLP ([[natural language processing]]) and for research into natural language syntax and grammar, especially with respect to typological studies. The UD webpage introduces UD's development and goals as follows:<ref>Most of the content about the UD project provided in this article is based directly on information in the UD webpage [http://bit.ly/2N61V4c here].</ref>

::“Universal Dependencies (UD) is a project that is developing cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on an evolution of (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). The general philosophy is to provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary.”<ref>This passage is from the UD webpage, from the subsection entitled "Short introduction to UD"; it can be found [http://bit.ly/2ST6HaB here].</ref>

Evident in this passage is that the UD annotation scheme is a type of [[dependency grammar]] (as opposed to a [[phrase structure grammar]]) and that annotation is occurring in a consistent manner cross-linguistically. At the present time (February 2019), there are just over 100 treebanks of more than 70 languages available in the UD inventory.

==Dependencies==
The UD annotation scheme produces syntactic analyses of sentences in terms of the dependencies of dependency grammar. Each dependency is characterized in terms of a syntactic function, which is shown using a label on the dependency edge. For example:<ref>The three example analyses that appear in this section have been taken from the UD webpage, from the subsectioned entitled "UD annotation guidelines: Simple clauses"; they can be found [http://bit.ly/2N8qkpN here], examples 18, 21, and 23.</ref>

[[File:UD picture 1.jpg|UD picture 1]]

This analysis shows that ''she'' and ''is'' are dependents of the adjective ''nice''. The pronoun ''she'' is identified as a nominal subject (nsubj) and the verb ''is'' as the copula (cop). A second, similar example is next:

[[File:UD picture 2.jpg|UD picture 2]]

This analysis identifies ''it'' as the subject (nsubj), ''is'' as the copula (cop), and ''for'' as a case marker (case), all of which are shown as dependents of the root word ''her'', which is a pronoun. The next example includes an expletive and an oblique object:

[[File:UD picture 3.jpg|UD picture 3]]

This analysis identifies ''there'' as an expletive (expl), ''food'' as a nominal subject (nsubj), ''kitchen'' as an oblique object (obl), and ''in'' as a case marker (case). Note also that the copula ''is'' in this case is positioned as the root of the sentence, a fact that is contrary to how the copula is analyzed in the first two examples above, where it is positioned as a dependent of the root.

The examples of UD annotation just provided can of course give only an impression of the nature of the UD project and its annotation scheme. There are numerous thorny issues involved in producing any treebank of natural language texts. UD produces inventories of the parts of speech and of the syntactic functions; it provides guidelines for the analyses of difficult phenomena of syntax, such as coordination and ellipsis; it provides two layers of dependency analysis, basic and enhanced; and concerning its annotation choices most generally, the emphasis is on producing cross-linguistically consistent dependency analyses in order to facilitate structural parallelism across diverse languages.

==Controversy==
Within the dependency grammar community, the UD annotation scheme is controversial. The main bone of contention concerns the analysis of function words. UD chooses to subordinate function words to content words, a practice that is contrary to most works in the tradition of dependency grammar.<ref>The controversy surrounding UD and the status of function words in dependency grammar in general are discussed at length in [http://bit.ly/2STY9Aq Osborne & Gerdes (2019)].</ref>

==Notes==


==References==

*de Marneffe, Marie-Catherine, Bill MacCartney and Christopher D. Manning. 2006. Generating Typed Dependency Parses from Phrase Structure Parses. In the Proceedings of the Language Resources and Evaluation Conference (LREC) 2006, 449–454. Genoa.
*de Marneffe, Marie-Catherine and Christopher D. Manning. 2008. The Stanford typed dependency representation. Proceedings of the COLING Workshop on Cross-Framework and Cross-Domain Parser Evaluation, 92–97. Sofia. DOI: http://bit.ly/2N8jeSr
*de Marneffe, Marie-Catherine, Timothy Dozat, Natalia Silvaire, Katrin Haverinen, Filip Ginter, Joakim Nivre, Christopher D. Manning. 2014. Universal Stanford Dependencies: A cross-linguistic typology. In The International Conference on Language Resources and Evaluation (LREC) 2014, 4585–4592.
*Osborne, Timothy & Kim Gerdes. 2019. The status of function words in dependency grammar: A critique of Universal Dependencies (UD). Glossa: A Journal of General Linguistics 4(1), 17. DOI: http://bit.ly/2SREEZa.
*Petrov, Slav, Dipon Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. The International Conference on Language Resources and Evaluation (LREC) 2012, 2089–2096. Istanbul.
*Zeman, Daniel. 2008. Reusable tagset conversion using tagset drivers. In The International Conference on Language Resources and Evaluation (LREC) 2008, 213–218. Marrakech.


from Wikipedia - New pages [en] http://bit.ly/2N5LeG2
via IFTTT

No comments:

Post a Comment