Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Tianqi Li, Haojie Duanmu, Renze Chen, Ruifan Xu, Yifan Guo, Ningxin Zheng, Ziheng Jiang, Xinyi Di, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Liqiang Lu, Yun Liang, Jidong Zhai, Xin Liu. arXiv 2025[PDF][link]
[Preprint 4]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI. arXiv 2024[PDF][link]
[Preprint 3]
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen. arXiv 2024[PDF][link]
[Preprint 2]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. arXiv 2023[PDF][link]
[Preprint 1]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. CoRR 2021[PDF][link]
DynaMo: Runtime Switchable Quantization for MoE with Cross-Dataset Adaptation Zihao Zheng, Xiuping Cui, Size Zheng, Maoliang Li, Jiayu Chen, Yun Liang, Xiang Chen. DATE 2026[PDF][link]
[Conference 25]
LATIAS: A General Architecture-Operator Model for Spatial Accelerators with Complex Topology and Memory Hierarchy Chengrui Zhang, Liancheng Jia, Chu Wang, Tianqi Li, Renze Chen, Xiuping Cui, Size Zheng, Shengen Yan, Xiuhong Li, Yu Wang, Xiang Chen, Yun Liang. DATE 2026[PDF][link]
[Conference 24]
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, Xi Wang, Cong Xie, Qi Huang, Wen Heng, Yiyuan Ma, Wenlei Bao, Size Zheng, Yanghua Peng, Haibin Lin, Xuanzhe Liu, Xin Jin, Xin Liu. EuroSys 2026[PDF][link]
[Conference 23]
SnakeMan: Applying Relation-centric Notation to Model and Optimize Data Swizzle in the Cache of Modern NPU Hanyu Zhang, Fangxu Guo, Liqiang Lu, Long Wang, Yunfei Du, Zhe Wang, Jinghan Zhang, Jie Zhang, Chenli Xue, Chengpeng Wu, Ziyi Zhang, Yun Liang, Size Zheng, Jianwei Yin. HPCA 2026[PDF][link]
SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration Cong Li, Zhe Zhou, Size Zheng, Jiaxi Zhang, Yun Liang, Guangyu Sun. ASPLOS 2024[PDF][link]
[Conference 13]
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN Renze Chen, Zijian Ding, Size Zheng, Chengrui Zhang, Jingwen Leng, Xuanzhe Liu, Yun Liang. ASPLOS 2024[PDF][link]
[Conference 12]
MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices Renze Chen, Zijian Ding, Size Zheng, Meng Li, Yun Liang. DAC 2024[PDF][link]
[Conference 11]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. MLSys 2024[PDF][link]
[Conference 10]
SpREM: Exploiting Hamming Sparsity for Fast Quantum Readout Error Mitigation Hanyu Zhang, Liqiang Lu, Siwei Tan, Size Zheng, Jia Yu and Jianwei Yin. DAC 2024[PDF][link]
[Conference 9]
TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis Size Zheng, Siyuan Chen, Siyuan Gao, Liancheng Jia, Guangyu Sun, Runsheng Wang, Yun Liang. MICRO 2023[PDF][link]
[Conference 8]
Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC Size Zheng, Siyuan Chen, Yun Liang. DAC 2023[PDF][link]
ARES: A Mapping Framework of DNNs towards Diverse PIMs with General Abstractions Xiuping Cui, Size Zheng, Tianyu Jia, Le Ye and Yun Liang. ICCAD 2023[PDF][link]
[Conference 5]
Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition Zizhang Luo, Liqiang Lu, Size Zheng, Jieming Yin, Jason Cong, Jianwei Yin, Yun Liang. DAC 2023[PDF][link]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. ISCA 2021[PDF][link]
[Conference 2]
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System Size Zheng, Yun Liang, Shuo Wang, Renze Chen, Kaiwen Sheng. ASPLOS 2020[PDF][link][Google Scholar]
[Conference 1]
SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs Yi-Hsiang Lai, Hongbo Rong, Size Zheng, Weihao Zhang, Xiuping Cui, Yunshan Jia, Jie Wang, Brendan Sullivan, Zhiru Zhang, Yun Liang, Youhui Zhang, Jason Cong, Nithin George, Jose Alvarez, Christopher J. Hughes, Pradeep Dubey. ICCAD 2020[PDF][link]
At ByteDance Seed
First Author
[Preprint 1]
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Tianqi Li, Haojie Duanmu, Renze Chen, Ruifan Xu, Yifan Guo, Ningxin Zheng, Ziheng Jiang, Xinyi Di, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Liqiang Lu, Yun Liang, Jidong Zhai, Xin Liu. arXiv 2025[PDF][link]
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen. arXiv 2024[PDF][link]
At DeepSeek
Co-Author
[Preprint 1]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI. arXiv 2024[PDF][link]
At University of Washington
Co-Author
[Conference 1]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. MLSys 2024[PDF][link]
[Preprint 1]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. arXiv 2023[PDF][link]
At Peking University
First Author
[Conference 6]
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang. MLSys 2024[PDF][link]
[Conference 5]
TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis Size Zheng, Siyuan Chen, Siyuan Gao, Liancheng Jia, Guangyu Sun, Runsheng Wang, Yun Liang. MICRO 2023[PDF][link]
[Conference 4]
Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC Size Zheng, Siyuan Chen, Yun Liang. DAC 2023[PDF][link]
SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration Cong Li, Zhe Zhou, Size Zheng, Jiaxi Zhang, Yun Liang, Guangyu Sun. ASPLOS 2024[PDF][link]
[Conference 6]
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN Renze Chen, Zijian Ding, Size Zheng, Chengrui Zhang, Jingwen Leng, Xuanzhe Liu, Yun Liang. ASPLOS 2024[PDF][link]
[Conference 5]
MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices Renze Chen, Zijian Ding, Size Zheng, Meng Li, Yun Liang. DAC 2024[PDF][link]
[Journal 3]
Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow Decomposition Liqiang Lu, Zizhang Luo, Size Zheng, Jieming Yin, Jason Cong, Yun Liang, Jianwei Yin. TCAD 2023[PDF][link]
[Conference 4]
ARES: A Mapping Framework of DNNs towards Diverse PIMs with General Abstractions Xiuping Cui, Size Zheng, Tianyu Jia, Le Ye and Yun Liang. ICCAD 2023[PDF][link]
[Conference 3]
Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition Zizhang Luo, Liqiang Lu, Size Zheng, Jieming Yin, Jason Cong, Jianwei Yin, Yun Liang. DAC 2023[PDF][link]
[Preprint 1]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. CoRR 2021[PDF][link]
[Journal 2]
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training Size Zheng, Renze Chen, Yicheng Jin, Anjiang Wei, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang. TPDS 2021[PDF][link]
[Conference 2]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. ISCA 2021[PDF][link]
[Conference 1]
SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs Yi-Hsiang Lai, Hongbo Rong, Size Zheng, Weihao Zhang, Xiuping Cui, Yunshan Jia, Jie Wang, Brendan Sullivan, Zhiru Zhang, Yun Liang, Youhui Zhang, Jason Cong, Nithin George, Jose Alvarez, Christopher J. Hughes, Pradeep Dubey. ICCAD 2020[PDF][link]
SnakeMan: Applying Relation-centric Notation to Model and Optimize Data Swizzle in the Cache of Modern NPU Hanyu Zhang, Fangxu Guo, Liqiang Lu, Long Wang, Yunfei Du, Zhe Wang, Jinghan Zhang, Jie Zhang, Chenli Xue, Chengpeng Wu, Ziyi Zhang, Yun Liang, Size Zheng, Jianwei Yin. HPCA 2026[PDF][link]