Annotation

In PropBank, we identify the arguments of predicates (e.g. verbs, eventive nouns) and label them with semantic roles that show their relationship to the predicate. The semantic arguments of the verb are labeled on a verb-by-verb basis, creating a separate frame file that includes verb specific semantic roles to account for each subcategorization frame of the verb. It has been shown that training supervised systems with PropBank’s semantic roles for shallow semantic analysis yields good results (see CoNLL 2005 and 2008). PropBank currently includes four language projects: English, Chinese, Hindi/Urdu, and Arabic.

We currently have two annotation tools that have been used in several different universities: a PropBank annotation tool,ÌýJubilee, and a PropBank Frame File editor,ÌýCornerstone. Both tools are available throughÌýÌýas open source projects.

ÌýÌýÌýFunded by GALE, NIH, and HHSÌý
ÌýÌýÌýFunded by GALEÌý
ÌýÌýÌýFunded by the NSFÌý
Arabic PropBank ProjectÌýÌýÌýFunded by GALE


Funded by GALE and NSF

Word sense ambiguity is a continuing major obstacle to accurate information extraction, summarization and machine translation. While WordNet has been an important resource in this area, the subtle fine-grained sense distinctions in it have not lent themselves to high agreement between human annotators or high automatic tagging performance. Building on results in grouping fine-grained WordNet senses into more coarse-grained senses that led to improved inter-annotator agreement (ITA) and system performance (Palmer et al., 2004; Palmer et al., 2006), we have developed a process for rapid sense inventory creation and annotation that also provides critical links between the grouped word senses and the Omega ontology.

ÌýÌýÌýFunded by GALE

The first level of OntoNotes analysis will capture the syntactic structure of the text, following the approach taken in the Penn Treebank. The Penn Treebank project, which began in 1989, has produced over three million words of skeletally parsed text from various genres. Among many other uses, the one million word corpus of English Wall Street Journal text included in Treebank-2 has fueled widespread and productive research efforts to improve the performance of statistical parsing engines. Treebanking efforts following the same general approach have also more recently been applied to other languages, including Chinese and Arabic.

The Penn treebanking approach has been ported to Colorado, where we have recently finishedÌýÌýand are currently treebanking clinical notes for the Medical Informatics projects.

Clinical annotation (ÌýandÌý)

Incorporating the findings of the above efforts, theÌýÌýandÌýÌýprojects are developing semantic annotations in the clinical domain for materials such as radiology and pathology notes. The following annotation guidelines are being developed in these projects:Ìý
Ìý

Ìý