Publications


Preprint

[P2]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng*, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. arXiv 2023 [PDF] [link]
[P1]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Qingcheng Xiao, Size Zheng*, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. CoRR 2021 [PDF] [link]

Journal

[J3]
Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow Decomposition
Liqiang Lu, Zizhang Luo, Size Zheng*, Jieming Yin, Jason Cong, Yun Liang, Jianwei Yin. TCAD 2023 [PDF] [link]
[J2]
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training
Size Zheng*, Renze Chen, Yicheng Jin, Anjiang Wei, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang. TPDS 2021 [PDF] [link]
[J1]
Accelerating convolutional neural networks on FPGAs (中文)
Liqiang Lu, Size Zheng*, Qingcheng Xiao, Deming Chen, Yun Liang. SCIENTIA SINICA Informationis 2019 [PDF] [link]

Conference

[C15]
SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration
Cong Li, Zhe Zhou, Size Zheng*, Jiaxi Zhang, Yun Liang, Guangyu Sun. to appear ASPLOS 2024 [PDF] [link]
[C14]
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN
Renze Chen, Zijian Ding, Size Zheng*, Chengrui Zhang, Jingwen Leng, Xuanzhe Liu, Yun Liang. to appear ASPLOS 2024 [PDF] [link]
[C13]
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
Size Zheng*, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang. to appear MLSys 2024 [PDF] [link]
[C12]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng*, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. to appear MLSys 2024 [PDF] [link]
[C11]
SpREM: Exploiting Hamming Sparsity for Fast Quantum Readout Error Mitigation
Hanyu Zhang, Liqiang Lu, Siwei Tan, Size Zheng*, Jia Yu and Jianwei Yin. to appear DAC 2024 [PDF] [link]
[C10]
MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices
Renze Chen, Zijian Ding, Size Zheng*, Meng Li, Yun Liang. to appear DAC 2024 [PDF] [link]
[C9]
TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis
Size Zheng*, Siyuan Chen, Siyuan Gao, Liancheng Jia, Guangyu Sun, Runsheng Wang, Yun Liang. MICRO 2023 [PDF] [link]
[C8]
ARES: A Mapping Framework of DNNs towards Diverse PIMs with General Abstractions
Xiuping Cui, Size Zheng*, Tianyu Jia, Le Ye and Yun Liang. ICCAD 2023 [PDF] [link]
[C7]
Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC
Size Zheng*, Siyuan Chen, Yun Liang. DAC 2023 [PDF] [link]
[C6]
Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition
Zizhang Luo, Liqiang Lu, Size Zheng*, Jieming Yin, Jason Cong, Jianwei Yin, Yun Liang. DAC 2023 [PDF] [link]
[C5]
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion
Size Zheng*, Siyuan Chen, Peidi Song, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, Yun Liang. HPCA 2023 [PDF] [link]
[C4]
AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction
Size Zheng*, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang. ISCA 2022 [PDF] [link]
[C3]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Qingcheng Xiao, Size Zheng*, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. ISCA 2021 [PDF] [link]
[C2]
SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs
Yi-Hsiang Lai, Hongbo Rong, Size Zheng*, Weihao Zhang, Xiuping Cui, Yunshan Jia, Jie Wang, Brendan Sullivan, Zhiru Zhang, Yun Liang, Youhui Zhang, Jason Cong, Nithin George, Jose Alvarez, Christopher J. Hughes, Pradeep Dubey. ICCAD 2020 [PDF] [link]
[C1]
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
Size Zheng*, Yun Liang, Shuo Wang, Renze Chen, Kaiwen Sheng. ASPLOS 2020 [PDF] [link]