Preprints

2026

  1. How You Move Tells What You’ll Do: Trajectory-Conditioned Egocentric Prediction.
    Sejoon Jun, Hai Nguyen-Truong, Luigi Seminara, and Lorenzo Torresani
    2026
  2. RECIPE: Procedural Planning via Grounding in Instructional Video
    Luigi Seminara, Antonino Furnari, and Lorenzo Torresani
    2026
  3. EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
    Minjoon Jung, Byoung-Tak Zhang, and Lorenzo Torresani
    2026

Publications

2025

  1. PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
    J. H. Cho, A. Madotto, E. Mavroudi, T. Afouras, T. Nagarajan, M. Maaz, Y. Song, and others
    In Advances in Neural Information Processing Systems, Spotlight (<3.5%) , 2025
  2. BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
    M. M. Islam, T. Nagarajan, H. Wang, G. Bertasius, and Lorenzo Torresani
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
  3. Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
    S. Pramanick, E. Mavroudi, Y. Song, R. Chellappa, Lorenzo Torresani, and T. Afouras
    In IEEE/CVF International Conference on Computer Vision (ICCV), Highlight (<2.5%) , 2025
  4. VITED: Video Temporal Evidence Distillation
    Y. Lu, Y. Song, W. Wang, Lorenzo Torresani, and T. Nagarajan
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2024

  1. UNICORN: A Unified Causal Video-Oriented Language-Modeling Framework for Temporal Video-Language Tasks
    Y. Xiong, Y. Nie, H. Liu, B. Wang, J. Chen, R. Jin, C.-J. Hsieh, Lorenzo Torresani, and J. Lei
    In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
  2. 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
    F. Cheng, M. Luo, H. Wang, A. Dimakis, Lorenzo Torresani, G. Bertasius, and others
    In European Conference on Computer Vision (ECCV), 2024
  3. Video ReCap: Recursive Captioning of Hour-Long Videos
    M. M. Islam, N. Ho, X. Yang, T. Nagarajan, Lorenzo Torresani, and G. Bertasius
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  4. Learning to Segment Referred Objects from Narrated Egocentric Videos
    Y. Shen, H. Wang, X. Yang, M. Feiszli, E. Elhamifar, Lorenzo Torresani, and E. Mavroudi
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  5. Step Differences in Instructional Video
    T. Nagarajan and Lorenzo Torresani
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  6. Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
    K. Grauman, A. Westbury, Lorenzo Torresani, K. Kitani, J. Malik, T. Afouras, and others
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024