Zhiyuan Yan
📖 PhD Student, 2024.09-2028.07(expected)
🏫 Peking University, previously at CUHK-SZ
💡 Multimodal, AIGC Detection, AIGC, AI4Science
Email / Github / Scholar / OpenReview
About Me
  • I am a second-year CS PhD Student at Peking University (PKU), supervised by Prof. Li Yuan (Deep Learning) and Prof. Fanyang Mo (AI4Science). I have published 10+ CCF-A papers (as the first/co-first author) at the top international AI conferences with a total Google Scholar citations of 1000+, including an oral paper at ICML 2025 (top 1%).

  • I have extensive industrial experience in multimodal foundation models. I am currently part of the Meituan LongCat Multimodal Foundation Model Team through the Beidou Program, where I am responsible for the development of the native unified multimodal system. Previously, I worked at Baidu ERNIE (worked directly with the CTO) through the “ERNIE Star Program,” and before that, I was a research intern at Tencent Youtu Lab and Tencent AI Lab.

  • My current research interests are: (1) Multimodal, especially unified multimodal generation and understanding: My representative work in this area is UAE, a unified multimodal system with large-scale long-context SFT and RL-based post-training designed for unified multimodal generation and understanding. Through this journey, I gain extensive hands-on experience, deep insights and understanding into SFT, RL, and data curation for unified models. (2) AI for Science: Leveraging the advanced multimodal techniques to address the key challenges in the science domains, especially organic chemistry like structure elucidation by spectroscopy and molecular representation modeling.

  • My previous research direction is AIGC Detection and Deepfake detection: I have developed generalizable and interpretable methods for detecting AI-generated images and videos. My work includes six first-author papers accepted at CCF-A venues. My impactful contribution, DeepfakeBench, integrates 37 SOTA detection methods and 13 datasets into a unified modular codebase with standardized training and evaluation protocols, with near 1k stars on GitHub.

  • Research Highlights (🧑‍💻 Co-first Author, 📮 Corresponding Author)
    See Google Scholar for my full publication list.
    Unified Multimodal Model as Auto-Encoder
    Zhiyuan Yan, et al.
    ArXiv, 2025
    arXiv / Project Page
    Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
    Zhiyuan Yan, et al.
    ICML, Oral 🏆, 2025
    arXiv / Project Page
    GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
    Zhiyuan Yan, et al.
    Tech Report, 2025
    arXiv / Project Page
    ImgEdit: A Unified Image Editing Dataset and Benchmark
    Yang Ye 🧑‍💻, et al., Zhiyuan Yan 🧑‍💻
    NeurIPS, 2025
    arXiv / Project Page
    Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding
    Liuzhenghao Lv, et al., Zhiyuan Yan
    Nature Communication (reviewed), 2024
    arXiv / Project Page
    DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection
    Zhiyuan Yan, et al.
    NeurIPS, 2023
    arXiv / Project Page