A Guide on Choosing GPUs for Your Cloud

Graphics Processing Units (GPUs) are no longer just for gaming or graphics rendering. In today’s cloud environments, GPUs power artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), and even virtual desktops.

For cloud service providers (CSPs), system integrators, and enterprises, choosing the right GPUs can directly influence performance, scalability, and cost-efficiency. This guide will help you understand GPU options and how to choose the right one for your cloud workloads.

Why GPUs in the Cloud?

Cloud environments are evolving beyond basic compute and storage. Customers increasingly demand GPU-backed services for:

AI/ML & Deep Learning: Model training, inferencing, computer vision, and natural language processing.
Virtual Desktops (VDI): GPU acceleration for graphics-intensive applications.
Cloud Gaming & Streaming: High-quality rendering and video delivery.
Media & Graphics Production: 3D rendering, editing, and transcoding.
HPC & Simulations: Scientific modeling, engineering simulations, and large-scale data processing.
Enterprise & SaaS: GPU-powered apps, Dev/Test environments, and education workloads.

For CSPs, offering GPUs means new revenue streams, differentiated services, and meeting data sovereignty requirements while giving customers self-service access to specialized hardware.

GPU Deployment Models

Apache CloudStack and similar cloud platforms now integrate and offer GPU management for IaaS clouds, but the choice of GPU type matters greatly depending on the workload. Here are some common GPU deployment and usage models:

1. Passthrough GPUs

One or more full physical GPU hardware is dedicated to an instance (virtual machine).
Best for workloads requiring large VRAM, low latency, or real-time processing (e.g., robotics, video transcoding, complex AI training).
Offers strong isolation and performance but limited scalability.

2. Shared / Virtualised GPUs

Multiple tenants share the underlying GPU hardware via virtualisation technologies. The virtualisation mechanism differs between vendors and GPU models, the variety includes:

- SR-IOV (AMD MxGPU, Intel Flex) – Hardware-based virtual functions with strong isolation.
- vGPU (NVIDIA) – Software-driven GPU partitions for AI inferencing, VDI, and media workloads.
- MIG (NVIDIA) – Multi-Instance GPU providing fully isolated GPU slices (ideal for multi-tenant AI).
- Time-Sliced Sharing – Simpler, lower-concurrency GPU sharing suitable for bursty workloads.

Comparison Criteria	Pass-through	SR-IOV	MDEV (Mediated Device)
GPU Access and Technology	VM → Direct access to GPU through Hypervisor GPU hardware passthrough to instance as PCI device.	VM → connected via Virtual Functions (VFs) GPU exposes Physical Function (PF) split into multiple VFs	VM gets a shared GPU or vGPU slice via MDEV Emulation Host OS partitions GPU using vendor driver (NVIDIA vGPU, Intel/AMD mdev)
I/O	Uses IOMMU for secure DMA + device isolation	Managed by Hypervisor + IOMMU	Uses VFIO-mdev with IOMMU protection (Needs vendor drivers)
Multi-tenant	Single Tenant	Multi-tenant / Hardware-dependent	Multi-tenant / Flexible-partitioning
Performance	High performance	High / Near-native performance	Near-native / Fair performance
Isolation	Strong isolation	Strong isolation	Hardware-dependent

Key GPU Vendors

These are the current and upcoming GPU vendors in the ecosystem:

- NVIDIA: Current market leader and popular that offers CUDA-based ecosystem, strong AI/ML and vGPU support.
- AMD: Growing contender with ROCm stack and MxGPU virtualisation.
- Intel: Emerging player with oneAPI and Flex GPUs for SR-IOV virtualisation.
- Apple: Niche player with Metal based API/Stack, mainly for consumer devices.
- Others: Qualcomm (Adreno) and a few others in Android/mobile or proprietary ecosystems.

	NVIDIA	AMD	Intel
Platform/Tech.	CUDA	ROCm	oneAPI
Virtualisation	vGPU (GRID)	MxGPU (SR-IOV)	Basic/Flex
Current Standing	Market Leader	Growing Contender	Catching up
Ecosystem	PyTorch, TensorFlow…	Improving support in PyTorch, etc.	Laggard

Choosing the Right GPU for Your Workload

When selecting GPUs, match workload requirements with the right GPU type, for example:

High-performance AI training & real-time apps → Passthrough high-end GPUs (e.g., NVIDIA H100, A100).
AI inferencing, VDI, and virtual apps → Shared GPUs (NVIDIA L40, A10, AMD MI300X, Intel Flex).
Cloud Gaming, Education, Remote Apps → vGPU or SR-IOV-based shared GPUs.
Enterprise-scale Multi-Tenant AI → NVIDIA MIG-enabled GPUs (A100, H100).
Graphics/Media workloads → Quadro-based vGPUs (NVIDIA Q-series).

Practical Considerations

When building a GPU-enabled IaaS cloud, also consider:

NUMA placement – Improper GPU-CPU memory alignment can severely affect performance.
Hypervisor support – Ensure GPU drivers align with chosen OS and hypervisor (e.g., RHEL, Ubuntu with KVM).
Licensing & Ecosystem – NVIDIA’s vGPU licensing vs. AMD/Intel’s open approaches.
Scalability & Limits – The choice of IaaS/GPU or CMP platform generally will have tenant limits, for example, Apache CloudStack provides limit/quota controls for GPU resource usage (account.gpus, max.project.gpus).

KVM is a popular and growing choice for hypervisor in the IaaS/Cloud space. Some key consideration can be support for GPU vendor/model and technology as used and supported on KVM using GPU passthrough and virtualisation technologies:

GPU Considerations with KVM	MDEV	SR-IOV	VFIO Passthrough
Sharing	Yes	Yes	No
Uses VFIO	Yes (vfio_mdev)	Yes (vfio-pci)	Yes (vfio-pci)
Granularity	Fine-grained virtual functions	Hardware-based VFs	Entire physical device
Device support	Software-defined (via driver)	Hardware-defined	Full passthrough
Needs IOMMU	Yes	Yes	Yes
Examples	NVIDIA vGPU	AMD MxGPU, Intel Flex	Full GPU passthrough
Guest Driver	Vendor vGPU driver	Vendor driver	Vendor driver

Challenges & The Road Ahead

While GPU integration into IaaS clouds is advancing rapidly, challenges remain:

Testing and validation across diverse GPU hardware.
Improving GPU resource management and orchestration.
Supporting advanced features like GPU-enable instance live migration.
Expanding multi-hypervisor support and richer GPU metrics.
Lack of consistent virtualisation and technology specification across vendors.

Future cloud platforms will continue to mature GPU integration, making GPU-backed workloads as seamless as traditional CPU and storage provisioning.

Conclusion

Choosing the right GPU for your cloud depends on balancing performance, scalability, and workload type. Passthrough GPUs excel at raw power and isolation, while shared and virtualized GPUs enable multi-tenant efficiency and flexibility. NVIDIA, AMD, and Intel each offer unique strengths, and the decision ultimately rests on the nature of workloads you plan to support.

By carefully aligning GPU types with workload demands, cloud providers and enterprises can unlock new opportunities in AI, VDI, HPC, and beyond — building clouds that are ready for the next generation of compute.

Rohit Yadav

Rohit Yadav oversees the Software Engineering function at ShapeBlue, providing leadership and mentorship to our ever-growing Engineering Team. He has been a PMC member of the project since 2015. Rohit is the author & maintainer of the CloudStack CloudMonkey project and has been instrumental in the development of many of CloudStack’s flagship features. Rohit regularly speaks at events, focussing on developer access to the project, and has also mentored Google Summer of Code students.

Apache CloudStack

Our Services

RESOURCES

About ShapeBlue

Contact