I am a Research Scientist at UC Berkeley, working with Prof. Trevor Darrell. I have completed my PhD at Max Planck Institute for Informatics under supervision of Prof. Bernt Schiele. My research is at the intersection of vision and language. I am interested in a variety of tasks, including image and video description, visual grounding, visual question answering, etc. Recently, I am focusing on building explainable models and addressing bias in existing vision and language models.
My old MPII homepage is here.
You can reach me via firstname.lastname at berkeley.edu
- I gave a talk at the The 2nd workshop on Video Turing Test: Toward Human-Level Video Story Understanding, in conjunction with ECCV 2020.
- 1 paper accepted to ECCV 2020.
- I gave talks at the Visual Question Answering and Dialog Workshop and The End-of-End-to-End: A Video Understanding Pentathlon, in conjunction with CVPR 2020.
- 1 paper accepted to CVPR 2020.
- I was recognized as a Best Reviewer at NeurIPS 2019.
- Our work on “Robust Change Captioning” is one of the Best Paper Nominations at ICCV 2019!
- I was recognized as an Outstanding Reviewer at ICCV 2019.
- 2 papers accepted to ICCV 2019, including 1 Oral.
- I co-organized the Workshop on Closing the Loop Between Vision and Language and The Large Scale Movie Description Challenge (LSMDC), at ICCV 2019.
- A short paper accepted to ACL 2019.
- I was recognized as an Outstanding Reviewer at CVPR 2019.
- 1 paper accepted to CVPR 2019 for an Oral presentation.
- I co-organized the Workshop on Fairness Accountability Transparency and Ethics in Computer Vision (at CVPR 2019).
- I was recognized as a Best Reviewer at EMNLP 2018.
- 1 paper accepted to ACCV 2018.
- 1 paper accepted to NeurIPS 2018.
- 1 paper accepted to EMNLP 2018.
- I was recognized as an Outstanding Reviewer at CVPR 2018.
- 2 papers accepted to ECCV 2018.
- 2 papers accepted to CVPR 2018, including one spotlight.
- I am honored to be a recipient of Otto Hahn Medal for 2017.
- Identity-Aware Multi-Sentence Video Description.
Jae Sung Park, Trevor Darrell, Anna Rohrbach.
- Advisable Learning for Self-driving Vehicles by Internalizing Observation-to-Action Rules.
Jinkyu Kim, Suhong Moon, Anna Rohrbach, Trevor Darrell, John Canny.
- Language-Conditioned Graph Networks for Relational Reasoning.
Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko.
- Robust Change Captioning.
Dong Huk Park, Trevor Darrell, Anna Rohrbach.
ICCV 2019, Oral, Best Paper Nomination.
- Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.
Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko
- Adversarial Inference for Multi-Sentence Video Description.
Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach.
CVPR 2019, Oral.
- Speaker-Follower Models for Vision-and-Language Navigation.
Daniel Fried*, Ronghang Hu*, Volkan Cirik*, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein**, and Trevor Darrell**.
NeurIPS 2018, *, ** indicate equal contribution.
- Video Object Segmentation with Language Referring Expressions.
Anna Khoreva, Anna Rohrbach, and Bernt Schiele.
- Object Hallucination in Image Captioning.
Anna Rohrbach*, Lisa Anne Hendricks*, Kaylee Burns, Trevor Darrell, and Kate Saenko.
EMNLP 2018, * indicates equal contribution.
- Women also Snowboard: Overcoming Bias in Captioning Models.
Lisa Anne Hendricks*, Kaylee Burns*, Kate Saenko, Trevor Darrell, Anna Rohrbach.
ECCV 2018, * indicates equal contribution.
- Textual Explanations for Self-Driving Vehicles.
Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, and Zeynep Akata.
- Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach.
CVPR 2018, Spotlight.
- Fooling Vision and Language Models Despite Localization and Attention Mechanisms.
Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darrell, and Dawn Song.
- Generating Descriptions with Grounded and Co-Referenced People.
Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, and Bernt Schiele.
- Grounding of Textual Phrases in Images by Reconstruction.
Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele.
ECCV 2016, Oral.
- Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
Akira Fukui*, Dong Huk Park*, Daylen Yang*, Anna Rohrbach*, Trevor Darrell, and Marcus Rohrbach.
EMNLP 2016, * indicates equal contribution.
- A Dataset for Movie Description.
Anna Rohrbach, Marcus Rohrbach, Niket Tandon, and Bernt Schiele.
The complete list of publications is available on my Google Scholar profile.