Anna Rogers (Gladkova)

I am a post-doctoral associate in the Computer Science Department at Text Machine lab, University of Massachusetts (Lowell). I work at the intersection of linguistics, natural language processing, and machine learning. I hold a Ph.D. degree from the Department of Language and Information Sciences at the University of Tokyo (Japan).


My current projects span intrinsic evaluation of word embeddings, compositionality, temporal and analogical reasoning. I also lead annotation projects for sentiment analysis and temporal reasoning.

2018
Rogers, A., Hosur Anathakrishna, Sh., & Rumshisky, A. What's in Your Embedding, And How It Predicts Task Performance. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 2690–2703). http://aclweb.org/anthology/C18-1228
Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., & Gribov, A. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 755–763). http://aclweb.org/anthology/C18-1064
Karpinska, M., Li, B., Rogers, A. & Drozd, A. Subcharacter Information in Japanese Embeddings: When Is It Worth It? In Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP (pp. 28–37). http://www.aclweb.org/anthology/W18-2905
2017
Rogers, A., Drozd, A., & Li, B. (2017). The (Too Many) Problems of Analogical Reasoning with Word Vectors. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (* SEM 2017) (pp. 135–148). http://www.aclweb.org/anthology/S17-1017
Li, B., Liu, T., Zhao, Z., Tang, B., Drozd, A., Rogers, A., & Du, X. (2017). Investigating different syntactic context types and context representations for learning word embeddings. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2411–2421). http://www.aclweb.org/anthology/D17-1256
Rogers, A. (2017). Multilingual computational lexicography: frame semantics meets distributional semantics (Ph.D. dissertation). University of Tokyo, Tokyo.

2016
Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. https://doi.org/10.18653/v1/N16-2002
Gladkova, A., & Drozd, A. (2016). Intrinsic evaluations of word embeddings: what can we do better? In Proceedings of The 1st Workshop on Evaluating Vector Space Representations for NLP (pp. 36–42). Berlin, Germany: ACL. https://doi.org/10.18653/v1/W16-2507
Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17. https://www.aclweb.org/anthology/C/C16/C16-1332.pdf
Santus, E., Gladkova, A., Evert, S., & Lenci, A. (2016). The CogALex-V shared task on the corpus-based identification of semantic relations. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon (pp. 69–79). Osaka, Japan, December 11-17: ACL. http://www.aclweb.org/anthology/W/W16/W16-53.pdf#page=83

2015
Drozd, A., Gladkova, A., & Matsuoka, S. (2015). Discovering aspectual classes of Russian verbs in untagged large corpora. In Proceedings of 2015 IEEE International Conference on Data Science and Data Intensive Systems (DSDIS) (pp. 61–68). https://doi.org/10.1109/DSDIS.2015.30
Drozd, A., Gladkova, A., & Matsuoka, S. (2015). Python, performance, and Natural Language Processing. In Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing (p. 1:1–1:10). New York, NY, USA: ACM. https://doi.org/10.1145/2835857.2835858

What's in your embedding, and how it predicts task performance.

Linguistic Diagnostics (LD) is a new methodology for evaluation, error analysis and development of word embedding models, implemented in an open-source Python library. In a large-scale experiment with 14 datasets LD successfully highlights the differences in the output of GloVe and word2vec algorithms that correlate with their performance on different NLP tasks.

Project page

Analogical reasoning with word embeddings: why king - man + woman does NOT equal queen.

A series of 5 papers demonstrates that the famous linear vector offset model of linguistic relations fails for most linguistic relations, is biased by cosine similaity, and also underestimates the amount of information captured in word embeddings (which makes word analogies a dubious benchmark). Incorporating subword information is shown to be beneficial for morphological relations in English and Japanese.

Project page

Vecto: a new open-source Python library for training, and evaluating, and working with word embeddings

Vecto is an ongoing project that aims to provide a one-stop toolkit for working with word embeddings. A major part of the project is framework for reproducible research on distributional semantic representations, with experiment metadata collected and logged automatically.

Project page

RuSentiment: the largest sentiment analysis dataset for Russian social media, enriched with active learning.

RuSentiment is currently the largest openly available sentiment dataset for Russian social media (~30K posts), diversified with active learning. We also present a lightweight 5-class annotation scheme that enables speedy and consistent annotation (250-350 posts per hour with Fleiss' kappa 0.654), with ready-to-use sentiment annotation guidelines for English and Russian.

Project page
Programming & scripting

Python, R, Bash

Machine learning

scikit-learn, PyTorch

Data

JSON, XML, MySQL

Theoretical frameworks

Distributional semantics, frame semantics, cognitive linguistics, sociolinguistics, pragmatics

Languages

English, Japanese, French, Ukrainian, Russian

What's in your embedding, and how it predicts task performance.
27 September 2018: UMass Amherst (USA). [SLIDES] [VIDEO]
A version of this talk was also presented on August 30 2018 at IT University of Copenhagen (Denmark).


Distributional compositional semantics in the age of word embeddings.
7 May 2018: Tutorial T4 at LREC 2018, Miyazaki, Japan.
Tutorial website: http://text-machine.cs.uml.edu/lrec2018_t4/index.html


Detecting linguistic relations with analogies: what works and what doesn't.
July 15 2016: Google Tokyo seminar, Tokyo, Japan.

[SLIDES]
Organization

RepEval 2017 workshop, CogALex-V shared task


Reviewing

COLING, *SEM, RepEval, Language Resources and Evaluation


Courses and tutorials

T4 LREC 2018 tutorial, ESSLLI 2019 courses "Introduction to NLP with Python" and "Advanced NLP with Python"