Temporal Information Extraction for Question Answering Using Syntactic Dependencies in an LSTM-based Architecture

Yuanliang Meng, Anna Rumshisky, Alexey Romanov

Department of Computer Science, University of Massachusetts Lowell
{ymeng,arum,aromanov}@cs.uml.edu

Download paper here

Citation

@InProceedings{meng2017QATemp,
  title={Temporal Information Extraction for Question Answering Using Syntactic Dependencies in an LSTM-based Architecture},
  author={Yuanliang Meng and Anna Rumshisky and Alexey Romanov},
  booktitle={Proc. of the conference on empirical methods in natural language processing (EMNLP)},
  year={2017}
}

Code

Our code is published on https://github.com/text-machine-lab/TEA

Since it was for internal use only, you may find it a little more difficult/confusing to use than some formally released products. Please feel free to ask questions or raise issues.

Abstract

In this paper, we propose to use a set of simple, uniform in architecture LSTM-based models to recover different kinds of temporal relations from text. Using the shortest dependency path between entities as input, the same architecture is implemented to extract intra-sentence, cross- sentence, and document creation time relations. A “double-checking” technique reverses entity pairs in classification, boosting the recall of positive cases and reducing misclassifications between opposite classes. An efficient pruning algorithm resolves conflicts globally. Evaluated on QA-TempEval (SemEval2015 Task 5), our proposed technique outperforms state-of-the-art methods by a large margin. We also conduct intrinsic evaluation and post state-of-the-art results on Timebank-Dense.

Result Highlight

With QA evaluation, we achieved F1 score 0.47 over all genres of test data. The system correctly answers 114 out of 294 questions. The best performing system in previous publications achieved F1 score 0.40, answering 99 questions correctly. Our system does not require an extra Time Expression Reasoner (TREFL) process.

On Timebank-Dense dataset, we achieved F1 score 0.505 with uniform networks and 0.517 with fine-tuned networks. The best score in previous publications is 0.511.

Datasets

QA-TempEval (SemEval 2015 Task 5) data

Training files are annotated with event tags, temporal expression tags (timexes) and temporal relation tags (TLINK). Not all possible TLINKs are labeled. Test files are not annotated.

TimeBank-dense data

Similar to annotated data in QA-TempEval, but all temporal relations within a sentence or cross consecutive sentences are labeled.

Example of labels:

Director of the U.S. Federal Bureau of Investigation (FBI) Louis Freeh <EVENT eid="e1" class="REPORTING">said</EVENT> here <TIMEX3 tid="t2" type="DATE" value="1998-08-21">Friday</TIMEX3> that U.S. air <EVENT eid="e45" class="OCCURRENCE">raid</EVENT> on Afghanistan and Sudan is not directly <EVENT eid="e2" class="OCCURRENCE">linked</EVENT> with the <EVENT eid="e46" class="OCCURRENCE">probe</EVENT> into the <TIMEX3 tid="t3" type="DATE" value="1998-08-07">August 7</TIMEX3> <EVENT eid="e47" class="OCCURRENCE">bombings</EVENT> in east Africa.

<MAKEINSTANCE eventID="e1" eiid="ei1" tense="PAST" aspect="NONE" polarity="POS" pos="UNKNOWN"/>

<MAKEINSTANCE eventID="e45" eiid="ei45" tense="NONE" aspect="NONE" polarity="POS" pos="NOUN"/>

<TLINK lid="l14" relType="IS_INCLUDED" eventInstanceID="ei1" relatedToTime="t2" origin="USER"/>

<TLINK lid="l19" relType="AFTER" eventInstanceID="ei1" relatedToEventInstance="ei45" origin="USER"/>

Approach

Use HeidelTime to annotate timexes in test data.

Build an LSTM-based tagger to annotate events, using word embeddings and lexical features as input.

System Overview

system overview

work flow of LSTM models