Publications


Preprint

[P3]
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen. arXiv 2024 [PDF] [link]
[P2]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. arXiv 2023 [PDF] [link]
[P1]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. CoRR 2021 [PDF] [link]

Journal

[J3]
Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow Decomposition
Liqiang Lu, Zizhang Luo, Size Zheng, Jieming Yin, Jason Cong, Yun Liang, Jianwei Yin. TCAD 2023 [PDF] [link]
[J2]
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training
Size Zheng, Renze Chen, Yicheng Jin, Anjiang Wei, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang. TPDS 2021 [PDF] [link]
[J1]
Accelerating convolutional neural networks on FPGAs (中文)
Liqiang Lu, Size Zheng, Qingcheng Xiao, Deming Chen, Yun Liang. SCIENTIA SINICA Informationis 2019 [PDF] [link]

Conference

[C16]
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
Renze Chen, Zhuofeng Wang, Beiquan Cao, Tong Wu, Size Zheng, Xiuhong Li, Xuechao Wei, Shengen Yan, Meng Li, Yun Liang. NeurIPS 2024 [PDF] [link]
[C15]
SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration
Cong Li, Zhe Zhou, Size Zheng, Jiaxi Zhang, Yun Liang, Guangyu Sun. ASPLOS 2024 [PDF] [link]
[C14]
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN
Renze Chen, Zijian Ding, Size Zheng, Chengrui Zhang, Jingwen Leng, Xuanzhe Liu, Yun Liang. ASPLOS 2024 [PDF] [link]
[C13]
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang. MLSys 2024 [PDF] [link]
[C12]
ATOM: LOW-BIT QUANTIZATION FOR EFFICIENT AND ACCURATE LLM SERVING
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci. MLSys 2024 [PDF] [link]
[C11]
SpREM: Exploiting Hamming Sparsity for Fast Quantum Readout Error Mitigation
Hanyu Zhang, Liqiang Lu, Siwei Tan, Size Zheng, Jia Yu and Jianwei Yin. DAC 2024 [PDF] [link]
[C10]
MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices
Renze Chen, Zijian Ding, Size Zheng, Meng Li, Yun Liang. DAC 2024 [PDF] [link]
[C9]
TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis
Size Zheng, Siyuan Chen, Siyuan Gao, Liancheng Jia, Guangyu Sun, Runsheng Wang, Yun Liang. MICRO 2023 [PDF] [link]
[C8]
ARES: A Mapping Framework of DNNs towards Diverse PIMs with General Abstractions
Xiuping Cui, Size Zheng, Tianyu Jia, Le Ye and Yun Liang. ICCAD 2023 [PDF] [link]
[C7]
Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC
Size Zheng, Siyuan Chen, Yun Liang. DAC 2023 [PDF] [link]
[C6]
Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition
Zizhang Luo, Liqiang Lu, Size Zheng, Jieming Yin, Jason Cong, Jianwei Yin, Yun Liang. DAC 2023 [PDF] [link]
[C5]
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion
Size Zheng, Siyuan Chen, Peidi Song, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, Yun Liang. HPCA 2023 [PDF] [link]
[C4]
AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction
Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang. ISCA 2022 [PDF] [link]
[C3]
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang. ISCA 2021 [PDF] [link]
[C2]
SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs
Yi-Hsiang Lai, Hongbo Rong, Size Zheng, Weihao Zhang, Xiuping Cui, Yunshan Jia, Jie Wang, Brendan Sullivan, Zhiru Zhang, Yun Liang, Youhui Zhang, Jason Cong, Nithin George, Jose Alvarez, Christopher J. Hughes, Pradeep Dubey. ICCAD 2020 [PDF] [link]
[C1]
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
Size Zheng, Yun Liang, Shuo Wang, Renze Chen, Kaiwen Sheng. ASPLOS 2020 [PDF] [link]