Open Source Projects

Triton-distributed

⭐ 1.2k stars MIT License

Triton-distributed is a distributed compiler based on Triton for parallel systems. It provides a set of easy-to-use primitives to support the development of distributed compute-communication overlapping kernels, enabling efficient parallel computation on modern AI systems.

The project offers both low-level and high-level primitives for programming communication kernels, allowing users to easily combine communication with computation to design overlapping kernels. Triton-distributed can achieve comparable or better performance than hand-tuned libraries.

Key Features

Low-level primitives for distributed compute-communication overlapping kernels
Support for single-node and cross-node operations (GEMM, MoE, Flash-Decoding)
High performance: comparable or better than hand-tuned libraries
Easy-to-use API for programming communication kernels
Support for multiple backends (NVIDIA, AMD GPUs)
Comprehensive documentation and tutorials

📦 GitHub Repository 📚 Documentation

Other Open Source Projects

FlexTensor

⭐ Loading... MIT License

FlexTensor is an automatic schedule exploration and optimization framework for tensor computation on heterogeneous systems. It can optimize tensor computation programs without human interference, allowing programmers to only work on high-level programming abstraction without considering the hardware platform details.

FlexTensor systematically explores the optimization design spaces that are composed of many different schedules for different hardware. Then, FlexTensor combines different exploration techniques, including heuristic method and machine learning method to find the optimized schedule configuration.

📦 GitHub Repository

compiler-and-arch

⭐ Loading...

compiler-and-arch is a curated list of tutorials, papers, talks, and open-source projects for emerging compiler and architecture research. This repository serves as a comprehensive resource for researchers and practitioners interested in compiler design and computer architecture.

📦 GitHub Repository

AMOS

⭐ Loading...

AMOS (Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators) is a framework for automatic mapping generation and optimization for spatial accelerators. It provides tools for exploring and optimizing mappings for various hardware accelerators.

📦 GitHub Repository

MatmulTutorial

⭐ Loading...

MatmulTutorial is an easy-to-understand TensorOp Matmul tutorial that provides comprehensive guides for understanding matrix multiplication operations on modern accelerators. It offers detailed explanations and examples for implementing efficient matrix multiplication kernels.

📦 GitHub Repository

Si-Ze Zheng

Triton-distributed

Key Features

Other Open Source Projects

FlexTensor

compiler-and-arch

AMOS

MatmulTutorial