Projects
An overview of the research directions that we work on.
Featured

LLM Serving
Systems for low-latency and high-throughput LLM serving and emerging paradigms: compound AI systems, agentic workflows, inference-time scaling, etc.

Novel Model Architectures
Cross-stack optimizations and innovations for State Space Models (e.g., Mamba) and Attention-SSM Hybrid models.

ML for Systems
Machine learning for systems: caching, resource management, network protocol design, etc.

Accelerator Toolkit
Expanding the accelerator toolkit for AI serving.