A Ph.D. Student at UMass Lowell
This work presents a new dataset for computational humor, specifically comparative humor ranking,
which attempts to eschew the ubiquitous binary approach to humor detection.
The dataset consists of tweets that are humorous responses submitted to a Comedy Central TV show
While a strong RNN token-level system can only achieve 55% accuracy,
a character-level CNN system achieved 63.7% accuracy,
likely due to a large amount of puns that can be captured by a character-level model.
We are running a SemEval 2017 task using this dataset: SemEval-2017 Task 6. Everybody is welcome to participate!
As part of the Text Machine's team, I participated in the
Semantic Textual Similarity task at SemEval-2016.
We built four systems: a small feature-based system that leverages word alignment and machine
translation quality evaluation metrics,
two end-to-end LSTM-based systems, and an ensemble system. Ultimately, the ensemble system was able to
outperform the base systems substantially,
obtaining a weighted Pearson correlation of 0.738, and placing 7th out of 115
This paper demonstrates the effectiveness of a Long Short-Term Memory (LSTM) language model in our
initial efforts to generate unconstrained rap lyrics.
The goal of this model is to generate lyrics that are similar in style to that of a given rapper,
but not identical to existing lyrics: this is the task of ghostwriting.
The Knowledge Evolution project is an experiment in tracking and mapping the evolution of knowledge
as well as the reputations and intellectual networks of the past.
The project uses the history of the Library of Congress book acquisitions and classification,
and the text of historical and contemporary editions of Encyclopedia Britannica and Wikipedia.