roundedI am a Postdoctoral Scholar at UC Berkeley, working with Prof. Trevor Darrell. I have completed my PhD at Max Planck Institute for Informatics under supervision of Prof. Bernt Schiele. My research is at the intersection of vision and language. I am interested in a variety of tasks, including image and video description, visual grounding, visual question answering, etc. Recently, I am focusing on building explainable models and addressing bias in existing vision and language models.

My old MPII homepage is here.

You can reach me via firstname.lastname at


Technical Reports

Selected Publications

  • Adversarial Inference for Multi-Sentence Video Description.
    Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach.
    CVPR 2019, Oral.
  • Speaker-follower models for vision-and-language navigation.
    Daniel Fried*, Ronghang Hu*, Volkan Cirik*, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein**, and Trevor Darrell**.
    NeurIPS 2018, *, ** indicate equal contribution.
  • Video object segmentation with language referring expressions.
    Anna Khoreva, Anna Rohrbach, and Bernt Schiele.
    ACCV 2018.
  • Object hallucination in image captioning.
    Anna Rohrbach*, Lisa Anne Hendricks*, Kaylee Burns, Trevor Darrell, and Kate Saenko.
    EMNLP 2018, * indicates equal contribution.
  • Women also Snowboard: Overcoming Bias in Captioning Models.
    Lisa Anne Hendricks*, Kaylee Burns*, Kate Saenko, Trevor Darrell, Anna Rohrbach.
    ECCV 2018, * indicates equal contribution.
  • Textual explanations for self-driving vehicles.
    Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, and Zeynep Akata.
    ECCV 2018.
  • Multimodal explanations: Justifying decisions and pointing to the evidence.
    Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach.
    CVPR 2018, Spotlight.
  • Fooling vision and language models despite localization and attention mechanisms.
    Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darrell, and Dawn Song.
    CVPR 2018.
  • Generating descriptions with grounded and co-referenced people.
    Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, and Bernt Schiele.
    CVPR 2017.
  • Grounding of textual phrases in images by reconstruction.
    Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele.
    ECCV 2016, Oral.
  • Multimodal compact bilinear pooling for visual question answering and visual grounding.
    Akira Fukui*, Dong Huk Park*, Daylen Yang*, Anna Rohrbach*, Trevor Darrell, and Marcus Rohrbach.
    EMNLP 2016, * indicates equal contribution.
  • A dataset for movie description.
    Anna Rohrbach, Marcus Rohrbach, Niket Tandon, and Bernt Schiele.
    CVPR 2015.

The complete list of publications is available on my Google Scholar profile.