Red Hat and AWS team up to deliver high-performance, hardware-agnostic GenAI inference at enterprise scale.
Red Hat has expanded its partnership with AWS to bring enterprise-grade GenAI inference to organizations that need consistent performance across diverse hardware environments. The collaboration tackles a core challenge for IT teams. This is by scaling GenAI workloads without being locked to specific chips or architectures. By aligning Red Hat AI with AWS’s custom silicon, the companies aim to give enterprises a unified inference layer. Wherein it can run next-generation models at lower cost and higher speed.
The initiative centers on Red Hat AI Inference Server, powered by vLLM, now optimized for AWS Inferentia2 and Trainium3. These chips are designed for high-throughput. As well as low-latency inference, enabling production deployments with up to 30–40% better price performance than comparable GPU instances. From this, it shift addresses the mounting pressure on enterprises to run inference at scale as model sizes grow. As well as when workload intensity increases. It also gives teams a way to standardize model serving across hybrid environments without re-architecting each deployment.
Red Hat has also infused GenAI capabilities across OpenShift, including a new AWS Neuron operator, OpenShift AI integrations, and OpenShift Service on AWS. These updates simplify how teams provision accelerators, schedule workloads, and orchestrate AI services. Enterprises gain a supported pathway to deploy LLMs, tune them, batch workloads, or run retrieval-augmented generation without deep platform engineering. The amazon.ai Certified Ansible Collection further reduces operational friction by automating model deployment and configuration at scale.
Both companies are also contributing upstream to improve vLLM support for AWS silicon, reinforcing open-source foundations for scalable inference. The work powers llm-d, an open project enabling distributed inference and now commercially supported in OpenShift AI 3. This expanded collaboration positions Red Hat and AWS to meet rising demand for efficient GenAI, giving enterprises flexible, cost-optimized infrastructure for modern AI across cloud and on-prem environments.