DILL Lab
The Data, Interpretability, Language and Learning, (DILL) lab, led by Swabha Swayamdipta, explores questions at the intersection of language models, NLP and machine learning.
Check out our latest publications and open positions.
Here are some questions we have worked on recently:
-
- What do we understand about the geometries of language models?
- Has language model generation reached a performance saturation or do language models still make systematic errors owing to their design? We studied the softmax bottleneck as one concrete limitation and how that affects language generation and justifies truncation sampling. Can we build better language generators? The softmax bottleneck also leads logits leaking information in closed LLM APIs.
-
- What are the limits of the generative capabilities of LLMs?
- How do language models handle specific distributions of language, such as ambiguous language, comparative language or the language of explanations? Can language models generate structured data?
-
- How reliable is comparative generative evaluation?
- What cannot be measured, cannot be improved. Can we reliably compare the generative performance of two different models, in either close-ended generation tasks such as summarization or in open-ended generation? What makes model A better than model B, or are our test sets somewhat misleading us?
-
- What does our data tell us about our models?
- What makes a data collection valuable for instruction tuning or finetuning large language models? Is all human feedback equally valuable under PPO or DPO? Our Dataset Cartography offers point estimates, and V-Information offers both point and aggregate estimates of data quality. How can we build similar estimates for generative models? Are all modalities and all data necessary in multimodal settings?
-
- How can our models help us understand our society?
- How far can language models go in helping us understand complex social phenomena such as homelessness? Is it possible to create collaborative setups between humans and generative models to this end? What role does conversational and social context play in this understanding? Can socio-technical solutions work well for all?
news
Nov 14, 2024 | OATH-Frames wins an Outstanding Paper Award at EMNLP 2024! |
---|---|
Sep 20, 2024 | DILL has 3 acceptances to EMNLP 2024! Congrats to Jaspreet and Brihi for OATH frames, Sayan for Separability, and Aryan and Danny for OOD detection with NNK-means! |
Sep 16, 2024 | Welcome to our new postdoc: Greg Yauney! |
Sep 15, 2024 | Jaspreet and Brihi’s work on homelessness and OATH frames gets some USC media coverage. |
Aug 26, 2024 | Welcome to our 3 new PhD students in the DILL Lab: Xinyue Cui, Atharva Kulkarni and Muru Zhang. |
Jun 15, 2024 | Matt’s paper on Logits of API-Protected LLMs Leak Proprietary Information and Urja’s paper on Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks? both accepted to COLM’24! 🎉 |
Jun 15, 2024 | DILL Lab submits 3 papers to EMNLP. Preprints out soon! |
May 15, 2024 | Xinyue’s paper on Structure-Conditioned Generation with FrameNet gets accepted to ACL’24! 🎉 |
Apr 19, 2024 | Jaspreet received a best poster award at ShowCAIS 2024 for her work on OATH Frames. 🎉 |
Mar 29, 2024 | DILL lab submits 2 papers to COLM’24. |
Feb 16, 2024 | DILL hosts Yanai Elazar, YI at the Allen Institute for AI. |
Feb 15, 2024 | DILL Lab submits three papers to ACL’24. |
Jan 30, 2024 | Matt gave an invited talk at CMU LTI. |
Jan 16, 2024 | Matt’s paper on the softmax bottleneck gets accepted to ICLR’24. 🎉 |
Dec 13, 2023 | Yoonsoo’s paper on video summarization now accepted to ICASSP’24. 🎉 |
Dec 01, 2023 | We hosted Kawin Ethayarajh, a PhD student at Stanford. |
Nov 18, 2023 | The DILL lab had a pre-Thanksgiving get-together with a dinner potluck. |
Nov 17, 2023 | Attended SoCalNLP 2023, where members of the DILL lab presented 4 papers. |
Oct 26, 2023 | We hosted Sarah Wiegreffe, a postdoctoral researcher from AI2. |
Sep 29, 2023 | Two new preprints on arxiv: on video summarization and on the softmax bottleneck. |
Aug 31, 2023 | We hosted Julia Mendelsohn, a PhD student from UMich who spoke about her work on Computational Analysis of Nuanced Political Rhetoric. |
Jun 19, 2023 | We had a summer ice cream social at Culver City Downtown with many new members. |
Jun 14, 2023 | Urja Khurana, a Phd Student from Vrije Universiteit, Amsterdam is visiting our lab this summer, working on hate speech detection. |
Apr 29, 2023 | We had a summer barbecue social along with the GLAMOR lab at the Kenneth Hahn State Park. |
Apr 18, 2023 | We hosted (incoming) Assistant Professor at IISc Bangalore, Danish Pruthi in our lab. |
Apr 04, 2023 | Sayan and Jaspreet hosted the first DILL Lab Office hours for USC undergrads interested in research. |
Mar 07, 2023 | Friend of the lab, Suchin Gururangan gave an invited talk at the group meeting. |
Feb 28, 2023 | DILL attended the ACM Undergrad Research Event to reach out to undergrads and masters students interested in the lab. |
Feb 21, 2023 | New USC PhD student Jiarui Zhang gave an invited talk at the group meeting. |
Jan 24, 2023 | Friend of the lab, Eunice Jun gave an invited talk at the group meeting. |
Jan 12, 2023 | Warm welcome to our latest PhD student, Brihi Joshi, now both at INK and DILL labs. |
Jan 09, 2023 | Software engineering PhD student Tooraj Helmi will be a guest at the DILL for the spring semester. |
Nov 22, 2022 | DILL celebrated Thanksgiving together with a couple of guests (Ali Omrani and Souti Chattopadhyay) at Swabha’s. |
Nov 22, 2022 | Sayan and Jaspreet presented a brief overview of their research projects at the USC-NLP lunch. |
Nov 18, 2022 | We attended the SoCalNLP Symposium where Sayan and Jaspreet presented their latest research posters, and Swabha gave an invited talk. |
Aug 15, 2022 | First day at USC for the entire DILL Lab! |
Apr 15, 2022 | We now have four new lab members! |