SGLang is an open-source, high-performance inference engine for serving large language models (LLMs) and multimodal models at scale, powering over 400,000 GPUs worldwide with deployments at xAI, NVIDIA, Microsoft Azure, and others. Commercialized as RadixArk (founded by Ying Sheng and Banghua Zhu and backed by $100M in seed funding), it extends SGLang with an end-to-end AI infrastructure platform covering inference, training, and post-training pipelines.
“The 2026 serving stack, vLLM versus SGLang, and which one fits your workload.”
“Our team did substantial engineering optimization and successfully ran both the inference and RL pipelines on the day DeepSeek V4 was released.”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.