Interpreting Predictions of NLP Models
Eric Wallace, Matt Gardner, Sameer Singh
0:00-22:00 Part 1: Overview of Interpretability
22:00-1:01:51 Part 2: What Part of An Input Led to a Prediction? (Saliency Maps)
1:01:51-1:15:22 Zoom Question Answer
1:15:22-1:35:32 Part 2: What Part of An Input Led to A Prediction? (Perturbation Methods)
1:35:32-1:38:25 Zoom Question Answer
1:38:25-2:03:18 Part 3: What Decision Rules Led to a Prediction?
2:03:18-2:26:37 Zoom Question Answer
2:26:37-2:51:38 Part 4: What Training Examples Caused a Prediction?
2:51:38-2:58:50 Zoom Question Answer
2:58:50-3:17:00 Part 5: Implementing Interpretations
3:17:00-3:19:37 Zoom Question Answer
3:19:37-3:50:40 Part 6: Open Problems
3:53:32 Zoom Question Answer
Although neural NLP models are highly expressive and empirically successful, they also systematically fail in counterintuitive ways and are opaque in their decision-making process. This tutorial will provide a background on interpretation techniques, i.e., methods for explaining the predictions of NLP models. We will first situate example-specific interpretations in the context of other ways to understand models (e.g., probing, dataset analyses). Next, we will present a thorough study of example-specific interpretations, including saliency maps, input perturbations (e.g., LIME, input reduction), adversarial attacks, and influence functions. Alongside these descriptions, we will walk through source code that creates and visualizes interpretations for a diverse set of NLP tasks. Finally, we will discuss open problems in the field, e.g., evaluating, extending, and improving interpretation methods.