About

I am now machine learning system researcher scientist at ByteDance. I am in TopSeed program. I am also PostDoc at Tsinghua University and work with Professor Jidong Zhai on distributed machine learning compilers for LLMs.

I completed my Ph.D. in the School of CS at Peking University, where I was advised by Prof. Yun Liang. I worked with Professor Luis Ceze on LLM serving and optimization from September 2023 to January 2024 as visiting Ph.D. in SAMPL at the University of Washington. My recent publications investigate new algorithms, abstractions, and frameworks for efficient code generation on CPU and GPU. My research has been recognized with MICRO, ASPLOS, ISCA, HPCA, TPDS, DAC, and MLSys. I received my B.S. degree in the department of Computer Intelligence Science at Peking University. I am PC member of ChinaSys; reviewer of TPDS and TACO; sub-reviewer of MICRO, PPoPP, MLSys, ICS, and ICCAD.

Email: zheng.size [AT] bytedance.com or zhengsz [AT] pku.edu.cn or zhengsz [AT] mail.tsinghua.edu.cn

I am seeking highly motivated full-time employees and research interns. If interested, please contact me directly!

Research Interests

  • High-performance Inference System: System design for large language and vision models
  • AI Compiler: Compiler Design for the next generation of accelerators
  • Distributed Systems: Computation-communication co-optimization and automation

Awards

  • July 2024 Outstanding Doctoral Dissertation Award of Peking University
  • July 2024 Outstanding Ph.D. Graduate of both Beijing and Peking University

News

  • March 2025 Our paper Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing has been accepted by ISCA 2025!
  • Februray 2025 Our paper Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts has been accepted by MLSys 2025!
  • Februray 2025 Our paper TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives has been accepted by MLSys 2025!
  • January 2025 Join Tsinghua-ByteDance Joint PostDoc Program
  • December 2024 Visit Vinod at Nvidia Redmond and give a technique talk
  • September 2024 Our paper ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction has been accepted by NeurIPS 2024!
  • August 2024 Join ByteDance as TopSeed researcher