DILL Lab

The Data, Interpretability, Language and Learning, (DILL) lab, led by Swabha Swayamdipta, explores questions at the intersection of language models, NLP and machine learning.

Check out our latest publications and open positions.

Here are some questions we have worked on recently:

What do we understand about the geometries of language models?

Has language model generation reached a performance saturation or do language models still make systematic errors owing to their design? We studied the softmax bottleneck as one concrete limitation and how that affects language generation and justifies truncation sampling. Can we build better language generators? The softmax bottleneck also leads logits leaking information in closed LLM APIs.
What are the limits of the generative capabilities of LLMs?

How do language models handle specific distributions of language, such as ambiguous language, comparative language or the language of explanations? Can language models generate structured data?
How reliable is comparative generative evaluation?

What cannot be measured, cannot be improved. Can we reliably compare the generative performance of two different models, in either close-ended generation tasks such as summarization or in open-ended generation? What makes model A better than model B, or are our test sets somewhat misleading us?
What does our data tell us about our models?

What makes a data collection valuable for instruction tuning or finetuning large language models? Is all human feedback equally valuable under PPO or DPO? Our Dataset Cartography offers point estimates, and V-Information offers both point and aggregate estimates of data quality. How can we build similar estimates for generative models? Are all modalities and all data necessary in multimodal settings?
How can our models help us understand our society?

How far can language models go in helping us understand complex social phenomena such as homelessness? Is it possible to create collaborative setups between humans and generative models to this end? What role does conversational and social context play in this understanding? Can socio-technical solutions work well for all?

news

Nov 14, 2024	OATH-Frames wins an Outstanding Paper Award at EMNLP 2024!
Sep 20, 2024	DILL has 3 acceptances to EMNLP 2024! Congrats to Jaspreet and Brihi for OATH frames, Sayan for Separability, and Aryan and Danny for OOD detection with NNK-means!
Sep 16, 2024	Welcome to our new postdoc: Greg Yauney!
Sep 15, 2024	Jaspreet and Brihi’s work on homelessness and OATH frames gets some USC media coverage.
Aug 26, 2024	Welcome to our 3 new PhD students in the DILL Lab: Xinyue Cui, Atharva Kulkarni and Muru Zhang.
Jun 15, 2024	Matt’s paper on Logits of API-Protected LLMs Leak Proprietary Information and Urja’s paper on Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks? both accepted to COLM’24! 🎉
Jun 15, 2024	DILL Lab submits 3 papers to EMNLP. Preprints out soon!
May 15, 2024	Xinyue’s paper on Structure-Conditioned Generation with FrameNet gets accepted to ACL’24! 🎉
Apr 19, 2024	Jaspreet received a best poster award at ShowCAIS 2024 for her work on OATH Frames. 🎉
Mar 29, 2024	DILL lab submits 2 papers to COLM’24.
Feb 16, 2024	DILL hosts Yanai Elazar, YI at the Allen Institute for AI.
Feb 15, 2024	DILL Lab submits three papers to ACL’24.
Jan 30, 2024	Matt gave an invited talk at CMU LTI.
Jan 16, 2024	Matt’s paper on the softmax bottleneck gets accepted to ICLR’24. 🎉
Dec 13, 2023	Yoonsoo’s paper on video summarization now accepted to ICASSP’24. 🎉
Dec 01, 2023	We hosted Kawin Ethayarajh, a PhD student at Stanford.
Nov 18, 2023	The DILL lab had a pre-Thanksgiving get-together with a dinner potluck.
Nov 17, 2023	Attended SoCalNLP 2023, where members of the DILL lab presented 4 papers.
Oct 26, 2023	We hosted Sarah Wiegreffe, a postdoctoral researcher from AI2.
Sep 29, 2023	Two new preprints on arxiv: on video summarization and on the softmax bottleneck.
Aug 31, 2023	We hosted Julia Mendelsohn, a PhD student from UMich who spoke about her work on Computational Analysis of Nuanced Political Rhetoric.
Jun 19, 2023	We had a summer ice cream social at Culver City Downtown with many new members.
Jun 14, 2023	Urja Khurana, a Phd Student from Vrije Universiteit, Amsterdam is visiting our lab this summer, working on hate speech detection.
Apr 29, 2023	We had a summer barbecue social along with the GLAMOR lab at the Kenneth Hahn State Park.
Apr 18, 2023	We hosted (incoming) Assistant Professor at IISc Bangalore, Danish Pruthi in our lab.
Apr 04, 2023	Sayan and Jaspreet hosted the first DILL Lab Office hours for USC undergrads interested in research.
Mar 07, 2023	Friend of the lab, Suchin Gururangan gave an invited talk at the group meeting.
Feb 28, 2023	DILL attended the ACM Undergrad Research Event to reach out to undergrads and masters students interested in the lab.
Feb 21, 2023	New USC PhD student Jiarui Zhang gave an invited talk at the group meeting.
Jan 24, 2023	Friend of the lab, Eunice Jun gave an invited talk at the group meeting.
Jan 12, 2023	Warm welcome to our latest PhD student, Brihi Joshi, now both at INK and DILL labs.
Jan 09, 2023	Software engineering PhD student Tooraj Helmi will be a guest at the DILL for the spring semester.
Nov 22, 2022	DILL celebrated Thanksgiving together with a couple of guests (Ali Omrani and Souti Chattopadhyay) at Swabha’s.
Nov 22, 2022	Sayan and Jaspreet presented a brief overview of their research projects at the USC-NLP lunch.
Nov 18, 2022	We attended the SoCalNLP Symposium where Sayan and Jaspreet presented their latest research posters, and Swabha gave an invited talk.
Aug 15, 2022	First day at USC for the entire DILL Lab!
Apr 15, 2022	We now have four new lab members!