I am a Research Scientist at UC Berkeley, working with Prof. Trevor Darrell. I have completed my PhD at Max Planck Institute for Informatics under supervision of Prof. Bernt Schiele. My research is at the intersection of vision and language. I am interested in a variety of tasks, including image and video description, visual grounding, visual question answering and others. Recently, I am focusing on building explainable models, addressing bias in existing vision and language models, and detecting semantic mismatch in context of multimodal misinformation.
My old MPII homepage is here.
Also see the web page for our Berkeley group here.
You can reach me via firstname.lastname at berkeley.edu
- We are holding another LSMDC challenge this year, submission servers are open!
- I am co-organizing the 4th Workshop on Closing the Loop Between Vision and Language (in conjunction with ICCV 2021), we invite paper submissions!
- I gave a talk at the 2021 VizWiz Grand Challenge Workshop, in conjunction with CVPR 2021.
- I gave a talk at the 2nd Workshop on Advances in Language and Vision Research (ALVR), in conjunction with NAACL 2021.
- Recognized as an Outstanding Reviewer at CVPR 2021.
- 1 paper accepted to ICML 2021.
- I am serving as an Area Chair for ICCV 2021.
- I gave a talk at the The 2nd workshop on Video Turing Test: Toward Human-Level Video Story Understanding, in conjunction with ECCV 2020.
- 1 paper accepted to ECCV 2020.
- I gave talks at the Visual Question Answering and Dialog Workshop and The End-of-End-to-End: A Video Understanding Pentathlon, in conjunction with CVPR 2020.
- 1 paper accepted to CVPR 2020.
- I was recognized as a Best Reviewer at NeurIPS 2019.
- Our work on “Robust Change Captioning” is one of the Best Paper Nominations at ICCV 2019!
- I was recognized as an Outstanding Reviewer at ICCV 2019.
- 2 papers accepted to ICCV 2019, including 1 Oral.
- I co-organized the Workshop on Closing the Loop Between Vision and Language and The Large Scale Movie Description Challenge (LSMDC), at ICCV 2019.
- A short paper accepted to ACL 2019.
- I was recognized as an Outstanding Reviewer at CVPR 2019.
- 1 paper accepted to CVPR 2019 for an Oral presentation.
- I co-organized the Workshop on Fairness Accountability Transparency and Ethics in Computer Vision (at CVPR 2019).
- I was recognized as a Best Reviewer at EMNLP 2018.
- 1 paper accepted to ACCV 2018.
- 1 paper accepted to NeurIPS 2018.
- 1 paper accepted to EMNLP 2018.
- I was recognized as an Outstanding Reviewer at CVPR 2018.
- 2 papers accepted to ECCV 2018.
- 2 papers accepted to CVPR 2018, including one spotlight.
- I am honored to be a recipient of Otto Hahn Medal for 2017.
Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer.
Medhini Narasimhan, Anna Rohrbach, Trevor Darrell.
- DETReg: Unsupervised Pretraining with Region Priors for Object Detection.
Amir Bar, Xin Wang, Vadim Kantorov, Colorado J. Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson.
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media
Grace Luo, Trevor Darrell, Anna Rohrbach.
- Compositional Video Synthesis with Action Graphs.
Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson.
- Identity-Aware Multi-Sentence Video Description.
Jae Sung Park, Trevor Darrell, Anna Rohrbach.
- Advisable Learning for Self-driving Vehicles by Internalizing Observation-to-Action Rules.
Jinkyu Kim, Suhong Moon, Anna Rohrbach, Trevor Darrell, John Canny.
- Language-Conditioned Graph Networks for Relational Reasoning.
Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko.
- Robust Change Captioning.
Dong Huk Park, Trevor Darrell, Anna Rohrbach.
ICCV 2019, Oral, Best Paper Nomination.
- Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.
Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko
- Adversarial Inference for Multi-Sentence Video Description.
Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach.
CVPR 2019, Oral.
- Speaker-Follower Models for Vision-and-Language Navigation.
Daniel Fried*, Ronghang Hu*, Volkan Cirik*, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein**, and Trevor Darrell**.
NeurIPS 2018, *, ** indicate equal contribution.
- Video Object Segmentation with Language Referring Expressions.
Anna Khoreva, Anna Rohrbach, and Bernt Schiele.
- Object Hallucination in Image Captioning.
Anna Rohrbach*, Lisa Anne Hendricks*, Kaylee Burns, Trevor Darrell, and Kate Saenko.
EMNLP 2018, * indicates equal contribution.
- Women also Snowboard: Overcoming Bias in Captioning Models.
Lisa Anne Hendricks*, Kaylee Burns*, Kate Saenko, Trevor Darrell, Anna Rohrbach.
ECCV 2018, * indicates equal contribution.
- Textual Explanations for Self-Driving Vehicles.
Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, and Zeynep Akata.
- Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach.
CVPR 2018, Spotlight.
- Fooling Vision and Language Models Despite Localization and Attention Mechanisms.
Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darrell, and Dawn Song.
- Generating Descriptions with Grounded and Co-Referenced People.
Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, and Bernt Schiele.
- Grounding of Textual Phrases in Images by Reconstruction.
Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele.
ECCV 2016, Oral.
- Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
Akira Fukui*, Dong Huk Park*, Daylen Yang*, Anna Rohrbach*, Trevor Darrell, and Marcus Rohrbach.
EMNLP 2016, * indicates equal contribution.
- A Dataset for Movie Description.
Anna Rohrbach, Marcus Rohrbach, Niket Tandon, and Bernt Schiele.
The complete list of publications is available on my Google Scholar profile.