Textual Entailment: A Perspective on Applied Text Understanding
Ido Dagan, Bar-Ilan University, Israel
Applied textual entailment was proposed as the task of recognizing whether the meaning or truth of one text can be expressed by, or
inferred from, another text. This talk will first discuss the potential relevance of
textual entailment, suggesting that it captures generically major semantic inferences
for many text understanding applications. Textual entailment may also provide useful
variations for some classical semantic problems, such as word sense disambiguation,
ontology learning and the semantic interpretation of syntactic constructs.
Interestingly, the methodologies that evolved for testing human text comprehension also
seem to be "entailment-based". Altogether, this approach may promote an alternative to
common text understanding practices: rather than interpreting texts into explicitly
stipulated semantic representations, the focus may shift to context-sensitive modeling
of entailment relationships between linguistic constructs.
At the second part of the talk we will review the ongoing benchmarks of the PASCAL Recognising Textual Entailment Challenges and our initial research on a probabilistic setting for textual entailment and the acquisition of entailment relations.
The Generation of Referring Expressions: Past, Present and Future
Robert Dale, Macquarie University, Australia
The task of referring expression generation is concerned with determining what semantic
content should be used in a reference to an intended referent so that the hearer will be
able to identify the referent. The task has been a focus of interest within natural
language generation at least since the early 1980s, in part because the problem appears
relatively well-defined. Over the last 25 years, a range of algorithms and approaches
have been proposed and explored; and yet, even a casual analysis of real human-authored
texts suggests that we have a long way to go in terms of providing an explanation for the range of real linguistic behaviour that we find. In this talk, I'll review research in the area to date, try to characterise where we are now, and point to directions for future research in the area.
NLP: An Information Extraction Perspective
Ralph Grishman, New York University
Information extraction -- identifying specified types of events
or relations from free text -- poses dual and related challenges:
adapting systems to new event types, and pulling out information about
these events with high accuracy. In this talk we consider how
these challenges relate to some of the basic problems of natural
language processing -- integrating different types of linguistic
knowledge; improving reference resolution; recognizing paraphrase
-- and how these problems are being addressed by recent research.
Natural Language Processing and Knowledge
Makoto Nagao, National Institute of Information and Communications Technology
Natural language processing(NLP) requires varieties of knowledge.
Linguistic knowledge such as grammar and dictionaries has been
developed and shared among the researchers in NLP for a decade or so.
But general knowledge for use in NLP is not. When we consider about
man-machine dialogue we have to prepare lots of knowledge, and also
strong inference functions such as logical inference and common sense
In this talk I will first explain some new developments in knowledge for computational linguistics, then discuss about what kind of knowledge is required for a dialogue system. Information retrieval on the Web is an important technology. Main research interest of current information retrieval is how the system can discard huge amount of retrieved information which is not so well fitted to the retrieval purposes, and focus on some essential information. However we have to be always careful about the quality of information on the Web. Therefore, an important next step will be to check and indicate how reliable is the obtained information from the Internet.
Estimating the confidence degree of information is a very difficult problem. I will discuss some possible ways of estimating the confidence degree of information. Natural language processing technologies as well as logical and common sense reasonings are involved in the estimation.
Linguistic Challenges for Computationalists
John Nerbonne, University of Groningen
Even now techniques are in common use in computational
linguistics which could lead to important advances in pure linguistics,
especially langauge acquisition and sociolinguistics, if they
were applied with intelligence and persistence. Reliable techniques for
assaying similarities and differences among linguistic varieties are
useful not only in dialectology and sociolinguistics, but also in
studies of first and second language learning and in the study of
language contact. These techniques would be even more valuable if they
indicated relative degrees of similarity, but also the direction of
deviation (contamination). Given the current tendency in linguistics to
wish to confront the data of language use more directly, techniques are
needed which can handle large amounts of noisy data and extract reliable
measures from them. The current focus in Computational Linguistics on
useful applications is a very good thing, but some further attention to
linguistic use of computational techniques would be very rewarding.
Dataset profiling, and what term burstiness can tell you about your data.
Anne de Roeck, Open University, UK
The performance of Information Retrieval and Natural Language Processing techniques is very sensitive to the characteristics of the
data on which they are used. Though well established, this knowledge has never impacted on evaluation: the literature routinely reports,
and compares, experimental evaluation results without reference to the impact of the underlying datasets or collections. This in turn raises a collection of
methodological, and practical problems around replicability. These could be addressed if we had reliable ways of profiling datasets, using
measures that highlight differences between collections. A first step is to investigate what such measures might look like.
In this talk, I will show that even standard textual datasets such as the TIPSTER collection differ in ways that challenge widely accepted assumptions about the general applicability of techniques, and that similar differences will show up between different languages. In exploring what might be suitable profiling measures, I will set out some desirable properties that such measures should have. I will then review some work on term burstiness and explore what the behaviour of some very frequent terms, and variations in burstiness patterns in the occurrence of a term can tell us about genres and datasets.