About

I am now machine learning system researcher scientist at ByteDance. I am in TopSeed program. I am also PostDoc at Tsinghua University and work with Professor Jidong Zhai on distributed machine learning compilers for LLMs. I lead the Triton-distributed project at ByteDance.

I completed my Ph.D. in the School of CS at Peking University, where I was advised by Prof. Yun Liang. I worked with Professor Luis Ceze on LLM serving and optimization from September 2023 to January 2024 as visiting Ph.D. in SAMPL at the University of Washington. My recent publications investigate new algorithms, abstractions, and frameworks for efficient code generation on CPU and GPU. My research has been recognized with MICRO, ASPLOS, ISCA, HPCA, TPDS, DAC, and MLSys. I received my B.S. degree in the department of Computer Intelligence Science at Peking University. I am PC member of ChinaSys; reviewer of TPDS and TACO; sub-reviewer of MICRO, PPoPP, MLSys, ICS, and ICCAD.

Email: zheng.size [AT] bytedance.com or zhengsz [AT] pku.edu.cn or zhengsz [AT] mail.tsinghua.edu.cn

I am seeking highly motivated full-time employees and research interns. If interested, please contact me directly!

Research Interests

High-performance Inference System: System design for large language and vision models
AI Compiler: Compiler Design for the next generation of accelerators
Distributed Systems: Computation-communication co-optimization and automation

Awards

July 2024 Outstanding Doctoral Dissertation Award of Peking University
July 2024 Outstanding Ph.D. Graduate of both Beijing and Peking University

News

May 2025 Our project Triton-distributed has been released
May 2025 Our papers ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference and MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design have been accepted by ICML 2025!
March 2025 Our paper Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing has been accepted by ISCA 2025!
Februray 2025 Our paper Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts has been accepted by MLSys 2025!
Februray 2025 Our paper TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives has been accepted by MLSys 2025!
January 2025 Join Tsinghua-ByteDance Joint PostDoc Program
December 2024 Visit Vinod at Nvidia Redmond and give a technique talk
September 2024 Our paper ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction has been accepted by NeurIPS 2024!
August 2024 Join ByteDance as TopSeed researcher

Si-Ze Zheng

Research Interests

Awards

News