Automatic Production of Analogy Tests for Natural Language Processing

Waseda University, Prof. Yves Lepage

Biography

Yves Lepage obtained his Ph.D. in 1989 from Grenoble university, France, and his habilitation thesis in 2003 at the same university. He is currently professor at Waseda University, Graduate School of Information, Production and Systems, in Japan. His research group currently concentrates on the study of analogy and its application to natural language processing (NLP) and in particular to machine translation. The second main project in his lab is the design and the implementation of a writing aid system to assist NLP researchers who are not native English speakers in writing academic articles in English. Yves Lepage is a member of the French and the Japanese Natural Language Processing Associations and a member of the Information Processing Society of Japan. He has been editor-in-chief of the French journal on Natural Language Processing, TAL, from 2008 to 2016.

Abstract

This talk will focus on analogy test sets in natural language processing (NLP). Although analogy had been studied earlier in linguistics and NLP, it became popular thanks to the introduction of word embeddings and the over-repeated example of male : female :: king : queen. Analogy test sets have been created between words and have served as benchmarks to assess the quality of word embeddings. They ought to be considered useful linguistic resources. We will justify the extensions to other linguistic pieces like chunks, phrases or sentences. We will present our current work in designing techniques to automatically produce such analogy test sets.

The Low Resource NLP Toolbox, 2021 Version

Carnegie Mellon University, Assoc. Prof. Graham Neubig

Biography

Graham Neubig. Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His work focuses on natural language processing, specifically multi-lingual models that work in many different languages, and natural language interfaces that allow humans to communicate with computers in their own language.

Abstract

While machine learning methods have made significant improvements in natural language processing across the board, these advances are unevenly distributed. Languages with lots of data, be it raw text or annotated corpora, tend to have much better systems as it is easier to train models on these larger datasets. In this talk I will discuss the "low resource NLP toolbox, 2021 version", a suite of the most recent machine learning methods that may be used to build effective NLP systems even in the face of strict resource constraints.