Anna Rohrbach

Ernennungstermin Professorin Anna Rohrbach und Professor Markus Rohrbach, 28.8.2023 I have joined TU Darmstadt and hessian.AI as a full Professor on “Multimodal Grounded Learning”, further supported by a €2M LOEWE Start Professorship. Prior to that I was a Research Scientist at UC Berkeley, working with Prof. Trevor Darrell. I have completed my PhD at Max Planck Institute for Informatics and Saarland University under the supervision of Prof. Bernt Schiele. My research is at the intersection of vision and language. I have worked on a variety of tasks, including image and video description, visual grounding, visual question answering, text-to-image synthesis, multimodal forensics. I am interested in building explainable and compositional models, diagnosing and addressing bias, and developing multimodal models that can learn from language advice.

I am looking for prospective postdocs in multimodal AI. Prior experience in multimodal AI (aka vision&language) expected. Please do not email me, but apply here!

I am Ukrainian and I stand with my people against Russian aggression. One-pager with tiny instructions that make a huge difference.

News

2025

“Reasonable Artificial Intelligence” (RAI) Cluster of Excellence is looking for PhD students! For this and other positions, see here.
Super excited to join the organizers’ team of ECCV’26 as a Program Chair!
Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection and Classifiers Understand Compositionality, but Conditions Apply accepted at NeurIPS’25 and NeurIPS’25 D&B, respectively!
Our work When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning accepted at COLM 2025!
I am co-organizing the 10th “anniversary” edition of the Closing the Loop between the Vision and Language (CLVL) workshop at ICCV 2025, submissions are welcome!
I am co-organizing the 1st Workshop on the Findings of ICCV (at ICCV 2025), submissions are welcome!
Very proud an honored that our “Reasonable Artificial Intelligence” (RAI) project is funded as a new Cluster of Excellence!
Our multimodal fact-checking approach DEFAME to appear in ICML 2025!
I am serving as an Area Chair for ICCV 2025

2024

I am serving as an Area Chair for CVPR 2025
Congratulations to my team on scoring first in the AVERITEC Shared Task (FEVER Workshop @ EMNLP 2024)! Check out our paper!
Honored by the ECCV 2024 as an Outstanding Area Chair!
I was featured in the “120+ Women Spearheading Advances in Visual Tech and AI” , appreciate being included in the outstanding company!
The website for The Multimodal AI Lab, jointly led by me and Marcus Rohrbach at TU Darmstadt, is now live!

2023

I am honored to receive the DAGM German Pattern Recognition Award 2023
I have joined TU Darmstadt (Germany) as a full W3-Professor
I have been awarded €2M LOEWE Start Professorship from the state of Hesse
I am serving as an Area Chair for ICCV 2023, CVPR 2024

2022

Congrats to my team for winning the Ego4D PNR Temporal Localization Challenge 2022, technical report here!
Recognized as an Outstanding Reviewer at CVPR 2022
I am serving as an Area Chair for NeurIPS Datasets and Benchmarks 2022
An open letter from engineers and researchers around the world to IEEE Spectrum: Open Letter: IEEE Spectrum editors apparently fell for Russian propaganda
Check out our blog post on Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery

2021

Recognized as an Outstanding Reviewer at NeurIPS 2021.
I co-organized the 4th Workshop on Closing the Loop Between Vision and Language (in conjunction with ICCV 2021).
I gave a talk at the 2021 VizWiz Grand Challenge Workshop, in conjunction with CVPR 2021.
I gave a talk at the 2nd Workshop on Advances in Language and Vision Research (ALVR), in conjunction with NAACL 2021.
Recognized as an Outstanding Reviewer at CVPR 2021.
I am serving as an Area Chair for ICCV 2021.

2020

I gave a talk at the The 2nd workshop on Video Turing Test: Toward Human-Level Video Story Understanding, in conjunction with ECCV 2020.
I gave talks at the Visual Question Answering and Dialog Workshop and The End-of-End-to-End: A Video Understanding Pentathlon, in conjunction with CVPR 2020.

Preprints and Technical Reports

DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection,
Marcel Klemt*, Carlotta Segna*, Anna Rohrbach
* indicate equal contribution
Erased but Not Forgotten: How Backdoors Compromise Concept Erasure
Jonas Henry Grebe*, Tobias Braun*, Marcus Rohrbach, Anna Rohrbach
* indicate equal contribution
Chrono: A Simple Blueprint for Representing Time in MLLMs
Boris Meinardus, Hector Rodriguez, Anil Batra, Anna Rohrbach, Marcus Rohrbach
Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022
Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

Recent Publications

Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection
Reihaneh Zohrabi*, Hosein Hasani*, Mahdieh Soleymani Baghshah, Anna Rohrbach, Marcus Rohrbach, Mohammad Hossein Rohban
NeurIPS’25, * indicate equal contribution
Classifiers Understand Compositionality, but Conditions Apply,
Yujin Jeong*, Arnas Uselis*, Seong Joon Oh, Anna Rohrbach
NeurIPS’25 D&B, * indicate equal contribution
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi*, Hritik Bansal*, Arian Hosseini*, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach
COLM 2025, * indicate equal contribution
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun*, Mark Rothermel*, Marcus Rohrbach, Anna Rohrbach
ICML 2025, * indicate equal contribution
V² Dial: Unification of Video and Visual Dialog via Multimodal Experts
Adnen Abdessaied, Anna Rohrbach, Marcus Rohrbach, Andreas Bulling
CVPR 2025
InFact: A Strong Baseline for Automated Fact-Checking
Mark Rothermel*, Tobias Braun*, Marcus Rohrbach, Anna Rohrbach
FEVER @ EMNLP 2024, * indicate equal contribution
Shape-Guided Diffusion with Inside-Outside Attention
Dong Huk Park*, Grace Luo*, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell
WACV 2024, * indicate equal contribution
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding
Jun Chen*, Ming Hu*, Darren Cooker, Michale Berumen, Blair Costelloe, Sara Beery, Anna Rohrbach, Mohamed Elhoseiny
CVPR 2023, * indicate equal contribution
Using Language to Extend to Unseen Domains

Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E Gonzalez, Aditi Raghunanthan, Anna Rohrbach
ICLR 2023, Notable-top-25% (aka Spotlight)
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion
Shruti Agarwal, Liwen Hu, Evonne Ng, Trevor Darrell, Hao Li, Anna Rohrbach
WACV 2023
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell
WACV 2023
G^3: Geolocation via Guidebook Grounding
Grace Luo*, Giscard Biamby*, Trevor Darrell, Daniel Fried, Anna Rohrbach
Findings of EMNLP 2022, * indicate equal contribution
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Mingyang Zhou, Grace Luo, Anna Rohrbach, Zhou Yu
Findings of EMNLP 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen*, Chunyuan Li*, Xiaowei Hu*, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao
NeurIPS 2022, * indicate equal contribution, Oral
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
NeurIPS 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell*, Anna Rohrbach*, Cordelia Schmid*
ECCV 2022, * indicate equal contribution
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
ECCV 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel, Jena D Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi
ECCV 2022, Oral
Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation
Giscard Biamby, Grace Luo, Trevor Darrell, Anna Rohrbach
NAACL 2022
Exposing the Limits of Video-Text Models through Contrast Sets
Jae Sung Park, Sheng Shen, Ali Farhadi, Trevor Darrell, Yejin Choi, Anna Rohrbach
NAACL 2022
On Guiding Visual Attention with Language Specification
Suzanne Petryk, Lisa Dunlap, Keyan Nasseri, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach
CVPR 2022
Object-Region Video Transformers
Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
CVPR 2022
DETReg: Unsupervised Pretraining with Region Priors for Object Detection
Amir Bar, Xin Wang, Vadim Kantorov, Colorado J. Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
CVPR 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach
ACL 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?

Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer
ICLR 2022
CLIP-It! Language-Guided Video Summarization

Medhini Narasimhan, Anna Rohrbach, Trevor Darrell
NeurIPS 2021
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media
Grace Luo, Trevor Darrell, Anna Rohrbach
EMNLP 2021, Oral
Benchmark for Compositional Text-to-Image Synthesis
Dong Huk Park, Samaneh Azadi, Xihui Liu, Trevor Darrell, Anna Rohrbach
NeurIPS 2021 Track Datasets and Benchmarks 2021
Compositional Video Synthesis with Action Graphs
Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson
ICML 2021

The complete list of publications is available on my Google Scholar profile.