About
I am Senior Staff Research Scientist at ByteDance Seed. I also work with Professor Jidong Zhai as Postdoc at Tsinghua University.
My current research interest include Agentic Compilation, Kernel Agent Design and LLM Post-training, High-performance Inference System and Kernel Implementation for accelerators, and Distributed Compiler Design and Kernel Optimization. I lead the Triton-distributed project at ByteDance Seed.
I completed my Ph.D. in the School of CS at Peking University, where I was advised by Prof. Yun Liang. I worked with Professor Luis Ceze on LLM serving and optimization from September 2023 to January 2024 as visiting Ph.D. in SAMPL at the University of Washington. After this, I worked at DeepSeek AI for a short term as research intern. My publications investigate new algorithms, abstractions, and frameworks for efficient training and inference on CPU and GPU. My research has been recognized with MICRO, ASPLOS, ISCA, HPCA, HPDC, ICML, TPDS, DAC, and MLSys. I am PC member of MICRO 2026 and ChinaSys 2025. I served as reviewer of ICML (golden award), ICLR, TPDS, TACO, and DAC. I received my B.S. degree in the department of Computer Intelligence Science at Peking University.
Email: zheng.size [AT] outlook.com
Research Interests
- Agentic Compilation and Kernel Agent Design: Building intelligent compilation workflows and kernel-generation agents for modern AI systems
- LLM Post-training and High-performance Inference Systems: Optimizing large language models from post-training to efficient serving on accelerators
- Distributed Compiler Design and Kernel Optimization: Co-designing distributed compilers, runtime systems, and accelerator kernels
Awards
- July 2024 Outstanding Doctoral Dissertation Award of Peking University
- July 2024 Outstanding Ph.D. Graduate of both Beijing and Peking University
News
- May 2026 Our paper DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs has been accepted by ICML 2026!
- April 2026 Our paper UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training has been accepted by HPDC 2026!
- May 2026 Serve as PC member of MICRO 2026
- May 2025 ByteDance Seed released our project Triton-distributed
- May 2025 Our papers ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference and MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design have been accepted by ICML 2025!
- March 2025 Our paper Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing has been accepted by ISCA 2025!
- Februray 2025 Our paper Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts has been accepted by MLSys 2025!
- Februray 2025 Our paper TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives has been accepted by MLSys 2025!
- December 2024 Visit Vinod at Nvidia Redmond and give a technique talk
- September 2024 Our paper ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction has been accepted by NeurIPS 2024!
