Graphics Processing Units (GPUs) are no longer just for gaming or graphics rendering. In today’s cloud environments, GPUs power artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), and even virtual desktops.
For cloud service providers (CSPs), system integrators, and enterprises, choosing the right GPUs can directly influence performance, scalability, and cost-efficiency. This guide will help you understand GPU options and how to choose the right one for your cloud workloads.
Why GPUs in the Cloud?
Cloud environments are evolving beyond basic compute and storage. Customers increasingly demand GPU-backed services for:
- AI/ML & Deep Learning: Model training, inferencing, computer vision, and natural language processing.
- Virtual Desktops (VDI): GPU acceleration for graphics-intensive applications.
- Cloud Gaming & Streaming: High-quality rendering and video delivery.
- Media & Graphics Production: 3D rendering, editing, and transcoding.
- HPC & Simulations: Scientific modeling, engineering simulations, and large-scale data processing.
- Enterprise & SaaS: GPU-powered apps, Dev/Test environments, and education workloads.
For CSPs, offering GPUs means new revenue streams, differentiated services, and meeting data sovereignty requirements while giving customers self-service access to specialized hardware.
GPU Deployment Models
Apache CloudStack and similar cloud platforms now integrate and offer GPU management for IaaS clouds, but the choice of GPU type matters greatly depending on the workload. Here are some common GPU deployment and usage models:
1. Passthrough GPUs
- One or more full physical GPU hardware is dedicated to an instance (virtual machine).
- Best for workloads requiring large VRAM, low latency, or real-time processing (e.g., robotics, video transcoding, complex AI training).
- Offers strong isolation and performance but limited scalability.
2. Shared / Virtualised GPUs
Multiple tenants share the underlying GPU hardware via virtualisation technologies. The virtualisation mechanism differs between vendors and GPU models, the variety includes:
-
- SR-IOV (AMD MxGPU, Intel Flex) – Hardware-based virtual functions with strong isolation.
- vGPU (NVIDIA) – Software-driven GPU partitions for AI inferencing, VDI, and media workloads.
- MIG (NVIDIA) – Multi-Instance GPU providing fully isolated GPU slices (ideal for multi-tenant AI).
- Time-Sliced Sharing – Simpler, lower-concurrency GPU sharing suitable for bursty workloads.
Comparison Criteria | Pass-through | SR-IOV | MDEV (Mediated Device) |
GPU Access and Technology | VM → Direct access to GPU through Hypervisor
GPU hardware passthrough to instance as PCI device. |
VM → connected via Virtual Functions (VFs)
GPU exposes Physical Function (PF) split into multiple VFs |
VM gets a shared GPU or vGPU slice via MDEV Emulation
Host OS partitions GPU using vendor driver (NVIDIA vGPU, Intel/AMD mdev) |
I/O | Uses IOMMU for secure DMA + device isolation | Managed by Hypervisor + IOMMU | Uses VFIO-mdev with IOMMU protection (Needs vendor drivers) |
Multi-tenant | Single Tenant | Multi-tenant / Hardware-dependent | Multi-tenant / Flexible-partitioning |
Performance | High performance | High / Near-native performance | Near-native / Fair performance |
Isolation | Strong isolation | Strong isolation | Hardware-dependent |
Key GPU Vendors
These are the current and upcoming GPU vendors in the ecosystem:
-
- NVIDIA: Current market leader and popular that offers CUDA-based ecosystem, strong AI/ML and vGPU support.
- AMD: Growing contender with ROCm stack and MxGPU virtualisation.
- Intel: Emerging player with oneAPI and Flex GPUs for SR-IOV virtualisation.
- Apple: Niche player with Metal based API/Stack, mainly for consumer devices.
- Others: Qualcomm (Adreno) and a few others in Android/mobile or proprietary ecosystems.
NVIDIA | AMD | Intel | |
Platform/Tech. | CUDA | ROCm | oneAPI |
Virtualisation | vGPU (GRID) | MxGPU (SR-IOV) | Basic/Flex |
Current Standing | Market Leader | Growing Contender | Catching up |
Ecosystem | PyTorch, TensorFlow… | Improving support in PyTorch, etc. | Laggard |
Choosing the Right GPU for Your Workload
When selecting GPUs, match workload requirements with the right GPU type, for example:
- High-performance AI training & real-time apps → Passthrough high-end GPUs (e.g., NVIDIA H100, A100).
- AI inferencing, VDI, and virtual apps → Shared GPUs (NVIDIA L40, A10, AMD MI300X, Intel Flex).
- Cloud Gaming, Education, Remote Apps → vGPU or SR-IOV-based shared GPUs.
- Enterprise-scale Multi-Tenant AI → NVIDIA MIG-enabled GPUs (A100, H100).
- Graphics/Media workloads → Quadro-based vGPUs (NVIDIA Q-series).
Practical Considerations
When building a GPU-enabled IaaS cloud, also consider:
- NUMA placement – Improper GPU-CPU memory alignment can severely affect performance.
- Hypervisor support – Ensure GPU drivers align with chosen OS and hypervisor (e.g., RHEL, Ubuntu with KVM).
- Licensing & Ecosystem – NVIDIA’s vGPU licensing vs. AMD/Intel’s open approaches.
- Scalability & Limits – The choice of IaaS/GPU or CMP platform generally will have tenant limits, for example, Apache CloudStack provides limit/quota controls for GPU resource usage (account.gpus, max.project.gpus).
KVM is a popular and growing choice for hypervisor in the IaaS/Cloud space. Some key consideration can be support for GPU vendor/model and technology as used and supported on KVM using GPU passthrough and virtualisation technologies:
GPU Considerations with KVM | MDEV | SR-IOV | VFIO Passthrough |
Sharing | Yes | Yes | No |
Uses VFIO | Yes (vfio_mdev) | Yes (vfio-pci) | Yes (vfio-pci) |
Granularity | Fine-grained virtual functions | Hardware-based VFs | Entire physical device |
Device support | Software-defined (via driver) | Hardware-defined | Full passthrough |
Needs IOMMU | Yes | Yes | Yes |
Examples | NVIDIA vGPU | AMD MxGPU, Intel Flex | Full GPU passthrough |
Guest Driver | Vendor vGPU driver | Vendor driver | Vendor driver |
Challenges & The Road Ahead
While GPU integration into IaaS clouds is advancing rapidly, challenges remain:
- Testing and validation across diverse GPU hardware.
- Improving GPU resource management and orchestration.
- Supporting advanced features like GPU-enable instance live migration.
- Expanding multi-hypervisor support and richer GPU metrics.
- Lack of consistent virtualisation and technology specification across vendors.
Future cloud platforms will continue to mature GPU integration, making GPU-backed workloads as seamless as traditional CPU and storage provisioning.
Conclusion
Choosing the right GPU for your cloud depends on balancing performance, scalability, and workload type. Passthrough GPUs excel at raw power and isolation, while shared and virtualized GPUs enable multi-tenant efficiency and flexibility. NVIDIA, AMD, and Intel each offer unique strengths, and the decision ultimately rests on the nature of workloads you plan to support.
By carefully aligning GPU types with workload demands, cloud providers and enterprises can unlock new opportunities in AI, VDI, HPC, and beyond — building clouds that are ready for the next generation of compute.
Rohit Yadav oversees the Software Engineering function at ShapeBlue, providing leadership and mentorship to our ever-growing Engineering Team. He has been a PMC member of the project since 2015. Rohit is the author & maintainer of the CloudStack CloudMonkey project and has been instrumental in the development of many of CloudStack’s flagship features. Rohit regularly speaks at events, focussing on developer access to the project, and has also mentored Google Summer of Code students.