click here


University of Southern CaliforniaViterbi CSUSC NLP


The Data, Interpretability, Language and Learning, (DILL) lab, led by Swabha Swayamdipta, explores questions at the intersection of language models, NLP and machine learning.

Check out our latest publications and open positions.

Here are some questions we have worked on recently:

  • What do we understand about the geometries of language models?
    Has language model generation reached a performance saturation or do language models still make systematic errors owing to their design? We studied the softmax bottleneck as one concrete limitation and how that affects language generation and justifies truncation sampling. Can we build better language generators? The softmax bottleneck also leads logits leaking information in closed LLM APIs.
  • What are the limits of the generative capabilities of LLMs?
    How do language models handle specific distributions of language, such as ambiguous language, comparative language or the language of explanations? Can language models generate structured data?
  • How reliable is comparative generative evaluation?
    What cannot be measured, cannot be improved. Can we reliably compare the generative performance of two different models, in either close-ended generation tasks such as summarization or in open-ended generation? What makes model A better than model B, or are our test sets somewhat misleading us?
  • What does our data tell us about our models?
    What makes a data collection valuable for instruction tuning or finetuning large language models? Is all human feedback equally valuable under PPO or DPO? Our Dataset Cartography offers point estimates, and V-Information offers both point and aggregate estimates of data quality. How can we build similar estimates for generative models? Are all modalities and all data necessary in multimodal settings?
  • How can our models help us understand our society?
    How far can language models go in helping us understand complex social phenomena such as homelessness? Is it possible to create collaborative setups between humans and generative models to this end? What role does conversational and social context play in this understanding? Can socio-technical solutions work well for all?


Jun 15, 2024 Matt’s paper on Logits of API-Protected LLMs Leak Proprietary Information and Urja’s paper on Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks? both accepted to COLM’24! 🎉
Jun 15, 2024 DILL Lab submits 3 papers to EMNLP. Preprints out soon!
May 15, 2024 Xinyue’s paper on Structure-Conditioned Generation with FrameNet gets accepted to ACL’24! 🎉
Apr 19, 2024 Jaspreet received a best poster award at ShowCAIS 2024 for her work on OATH Frames. 🎉
Mar 29, 2024 DILL lab submits 2 papers to COLM’24.
Feb 16, 2024 DILL hosts Yanai Elazar, YI at the Allen Institute for AI.
Feb 15, 2024 DILL Lab submits three papers to ACL’24.
Jan 30, 2024 Matt gave an invited talk at CMU LTI.
Jan 16, 2024 Matt’s paper on the softmax bottleneck gets accepted to ICLR’24. 🎉
Dec 13, 2023 Yoonsoo’s paper on video summarization now accepted to ICASSP’24. 🎉
Dec 01, 2023 We hosted Kawin Ethayarajh, a PhD student at Stanford.
Nov 18, 2023 The DILL lab had a pre-Thanksgiving get-together with a dinner potluck.
Nov 17, 2023 Attended SoCalNLP 2023, where members of the DILL lab presented 4 papers.
Oct 26, 2023 We hosted Sarah Wiegreffe, a postdoctoral researcher from AI2.
Sep 29, 2023 Two new preprints on arxiv: on video summarization and on the softmax bottleneck.
Aug 31, 2023 We hosted Julia Mendelsohn, a PhD student from UMich who spoke about her work on Computational Analysis of Nuanced Political Rhetoric.
Jun 19, 2023 We had a summer ice cream social at Culver City Downtown with many new members.
Jun 14, 2023 Urja Khurana, a Phd Student from Vrije Universiteit, Amsterdam is visiting our lab this summer, working on hate speech detection.
Apr 29, 2023 We had a summer barbecue social along with the GLAMOR lab at the Kenneth Hahn State Park.
Apr 18, 2023 We hosted (incoming) Assistant Professor at IISc Bangalore, Danish Pruthi in our lab.
Apr 04, 2023 Sayan and Jaspreet hosted the first DILL Lab Office hours for USC undergrads interested in research.
Mar 07, 2023 Friend of the lab, Suchin Gururangan gave an invited talk at the group meeting.
Feb 28, 2023 DILL attended the ACM Undergrad Research Event to reach out to undergrads and masters students interested in the lab.
Feb 21, 2023 New USC PhD student Jiarui Zhang gave an invited talk at the group meeting.
Jan 24, 2023 Friend of the lab, Eunice Jun gave an invited talk at the group meeting.
Jan 12, 2023 Warm welcome to our latest PhD student, Brihi Joshi, now both at INK and DILL labs.
Jan 09, 2023 Software engineering PhD student Tooraj Helmi will be a guest at the DILL for the spring semester.
Nov 22, 2022 DILL celebrated Thanksgiving together with a couple of guests (Ali Omrani and Souti Chattopadhyay) at Swabha’s.
Nov 22, 2022 Sayan and Jaspreet presented a brief overview of their research projects at the USC-NLP lunch.
Nov 18, 2022 We attended the SoCalNLP Symposium where Sayan and Jaspreet presented their latest research posters, and Swabha gave an invited talk.
Aug 15, 2022 First day at USC for the entire DILL Lab!
Apr 15, 2022 We now have four new lab members!