About

I lead the Triton-distributed project at ByteDance Seed.

I completed my Ph.D. in the School of CS at Peking University, where I was advised by Prof. Yun Liang. I worked with Professor Luis Ceze on LLM serving and optimization from September 2023 to January 2024 as visiting Ph.D. in SAMPL at the University of Washington. After this, I worked at DeepSeek AI for a short term as research intern. At 2024, I joined ByteDance Seed as Machine Learning System Researcher Scientist. My recent publications investigate new algorithms, abstractions, and frameworks for efficient training and inference on CPU and GPU. My research has been recognized with MICRO, ASPLOS, ISCA, HPCA, TPDS, DAC, and MLSys. I received my B.S. degree in the department of Computer Intelligence Science at Peking University. I am PC member of ChinaSys; reviewer of ICLR, TPDS and TACO; sub-reviewer of MICRO, PPoPP, MLSys, ICS, and ICCAD.

Email: zheng.size [AT] outlook.com

Research Interests

High-performance Inference System: System design for large language and vision models
AI Compiler: Compiler Design for the next generation of accelerators
Distributed Systems: Computation-communication co-optimization and automation

Awards

July 2024 Outstanding Doctoral Dissertation Award of Peking University
July 2024 Outstanding Ph.D. Graduate of both Beijing and Peking University

News

May 2025 Our project Triton-distributed has been released
May 2025 Our papers ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference and MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design have been accepted by ICML 2025!
March 2025 Our paper Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing has been accepted by ISCA 2025!
Februray 2025 Our paper Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts has been accepted by MLSys 2025!
Februray 2025 Our paper TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives has been accepted by MLSys 2025!
December 2024 Visit Vinod at Nvidia Redmond and give a technique talk
September 2024 Our paper ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction has been accepted by NeurIPS 2024!

Si-Ze Zheng

Research Interests

Awards

News