For a while, the CloudStack community has been working on adding support for containers. ShapeBlue successfully implemented the CloudStack Container Service and donated it to the project in 2016, but it was not completely integrated into the codebase. However, with the recent CloudStack 4.14 LTS release, the CloudStack Kubernetes Service (CKS) plugin adds full Kubernetes integration to CloudStack – allowing users to run containerized services using Kubernetes clusters through CloudStack.

CKS adds several new APIs (and updates to the UI) to provision Kubernetes clusters with minimal configuration by the user. It also provides the ability to add and manage different Kubernetes versions, meaning not only deploying clusters with chosen version, but also providing the option to upgrade an existing cluster to a new version.

The integration

CKS leverages CloudStack’s plugin framework and is disabled by default (for a fresh install or upgrade) – enabled using a global setting. It also adds global settings to set the template for a Kubernetes cluster node virtual machine for different hypervisors; to set the default network offering for a new network for a Kubernetes cluster; and to set different timeout values for the lifecycle operations of a Kubernetes cluster:

cloud.kubernetes.service.enabled Indicates whether the CKS plugin is enabled or not. Management server restart needed Name of the template to be used for creating Kubernetes cluster nodes on HyperV Name of the template to be used for creating Kubernetes cluster nodes on KVM Name of the template to be used for creating Kubernetes cluster nodes on VMware Name of the template to be used for creating Kubernetes cluster nodes on Xenserver Name of the network offering that will be used to create isolated network in which Kubernetes cluster VMs will be launched
cloud.kubernetes.cluster.start.timeout Timeout interval (in seconds) in which start operation for a Kubernetes cluster should be completed
cloud.kubernetes.cluster.scale.timeout Timeout interval (in seconds) in which scale operation for a Kubernetes cluster should be completed
cloud.kubernetes.cluster.upgrade.timeout Timeout interval (in seconds) in which upgrade operation for a Kubernetes cluster should be completed*
cloud.kubernetes.cluster.experimental.features.enabled Indicates whether experimental feature for Kubernetes cluster such as Docker private registry are enabled or not

* There can be some variation while obeying cloud.kubernetes.cluster.upgrade.timeout as the upgrade on a cluster node must be finished either successfully or as failure for CloudStack to report status of the cluster upgrade.

Once the initial configuration is complete and the plugin is enabled, the UI starts showing a new tab ‘Kubernetes Service’ and different APIs become accessible:

Under the hood

Provisioning a Kubernetes cluster in itself can be a complex process based on the tool used (minikube, kubeadm, kubespray, etc.). CKS simplifies  and automates the complete processusing the kubeadm tool for provisioning clusters and performing lifecycle operations. As mentioned in the kubeadm documentation:

Kubeadm performs the actions necessary to get a minimum viable cluster up and running. By design, it cares only about bootstrapping, not about provisioning machines. Likewise, installing various nice-to-have addons, like the Kubernetes Dashboard, monitoring solutions, and cloud-specific addons, is not in scope.

Therefore, all orchestration for cluster node virtual machines is taken care of by CloudStack, and it is only CloudStack that decides the host or storage for the node virtual machines. CKS uses the kubectl tool for communicating with the Kubernetes cluster to query its state, active nodes, version, etc. Kubectl is a command-line tool for controlling Kubernetes clusters.

For node virtual machines, CKS requires a CoreOS based template. CoreOS has been chosen as it provides docker installation and the networking rules needed for Kubernetes. Considering the current CoreOS situation, support for a different host OS could be added in the future.

Networking for the Kubernetes cluster is provisioned using Weave Net CNI provider plugin.

The prerequisites

To successfully provision a Kubernetes cluster using CKS there are few pre-requisites and conditions that must be met:

  1. The template registered for a node virtual machine must a public template.
  2. Currently supported Kubernetes versions are 1.11.x to 1.16.x. At present, v1.17 and above might not work due to their incompatibility with weave-net plugin.
  3. A multi-master, HA cluster can be created using Kubernetes versions 1.16.x only.
  4. While creating a multi-master, HA cluster over a shared network, an external load-balancer must be manually setup. This load-balancer should have port-forwarding rules for SSH and the Kubernetes API server access. CKS assumes SSH access to cluster nodes is available from port 2222 to (2222 + cluster node count -1). Similarly, for API access port 6443 must be forwarded to master nodes. Over the CloudStack isolated network, these rules are automatically provisioned.
  5. Currently only a CloudStack isolated or shared network can be used for deployment of a Kubernetes cluster. Network must have the Userdata service enabled.
  6. For CoreOS, a minimum of 2 CPU cores and 2GB of RAM is required for deployment of a virtual machine. Therefore, a suitable service offering must be created and used while deploying a Kubernetes cluster.
  7. Node virtual machines must have Internet access at the time of cluster provisioning, scale and upgrade operations, as kubeadm cannot perform certain cluster provisioning steps without it.

The flow

After completing the initial configuration and confirming the requirements, an administrator can proceed with adding the supported Kubernetes version and deploying a Kubernetes Cluster. The addition and management of Kubernetes versions can only be done by an administrator and other users only have permissions to list supported versions. Each Kubernetes version in CKS can only be added as an ISO – this will be a binaries ISO which will contain all the Kubernetes binaries and Docker images for any given Kubernetes release. Using an ISO with the required binaries allows faster installation of Kubernetes on the node virtual machines. kubeadm needs active Internet on the master nodes during the cluster provisioning. Using an ISO with binaries and docker images prevents their download from Internet. To facilitate the creation of an ISO for a given Kubernetes release a new script named, has been added in cloudstack-common packages. More about this script can be found in the CloudStack documentation.

Add Kubernetes cluster form in CloudStack UI:

Once there is at least one enabled and ready Kubernetes version and the node VM template in place, CKS will be ready to deploy Kubernetes clusters, which can be created using either the UI or API. Several parameters such as Kubernetes version, compute offering, network, size, HA support, node VM root disk size, etc. can be configured while creating the cluster.

Different operations can be performed on a successfully created Kubernetes cluster, such as start-stop, retrieval of cluster kubeconfig, scale, upgrade or destroy. Both UI and API provide the means to do that.

Kubernetes cluster details tab in CloudStack UI:

Once a Kubernetes cluster has been successfully provisioned, CKS deploys the Kubernetes Dashboard UI. A user can download the cluster’s kubeconfig and use that to access the cluster locally, or to deploy services on the cluster. Alternatively, kubectl tool can be used along with a kubeconfig file to access the Kubernetes cluster via the command line. Instructions for both kubectl and Kubernetes dashboard access will be available in the Kubernetes cluster details page in CloudStack.

Kubernetes Dashboard UI accessible for Kubernetes clusters deployed with CKS:

The new APIs

CKS adds a number of new APIs for performing different operations on a Kubernetes supported version and Kubernetes cluster,

Kubernetes version related APIs:

addKubernetesSupportedVersion Available only to Admin, this API allows adding a new supported Kubernetes version
deleteKubernetesSupportedVersion Available only to Admin, this API allows deletion of an existing supported Kubernetes version
updateKubernetesSupportedVersion Available only to Admin, this API allows update of an existing supported Kubernetes version
listKubernetesSupportedVersions Lists Kubernetes supported versions

Kubernetes cluster related APIs:

createKubernetesCluster For creating a Kubernetes cluster
startKubernetesCluster For starting a stopped Kubernetes cluster
stopKubernetesCluster For stopping a running Kubernetes cluster
deleteKubernetesCluster For deleting a Kubernetes cluster
getKubernetesClusterConfig For retrieving Kubernetes cluster config
scaleKubernetesCluster For scaling a created, running or stopped Kubernetes cluster
upgradeKubernetesCluster For upgrading a running Kubernetes cluster
listKubernetesClusters For listing Kubernetes clusters

The CloudStack Kubernetes Service adds a new dimension to CloudStack, allowing cloud operators to provide their users with Kubernetes offerings, but this is just the beginning! There are already ideas for improvements within the community such as support for different CloudStack zone types, support for VPC network, and use of a Debian based or a user-defined host OS template for node virtual machine. If you have an improvement to suggest, please log it in the CloudStack Github project.

More details about CloudStack Kubernetes Service can be found in the CloudStack documentation.

About the author

Abhishek Kumar is a Software Engineer at ShapeBlue, the Cloud Specialists. Apart from spending most of his time implementing new features and fixing bugs in Apache CloudStack, he likes reading about technology and politics. Outside work he spends most of his time with family and tries to work out regularly.

There is currently significant effort going on in the Apache CloudStack community to develop a new, modern, UI (user interface) for CloudStack: Project Primate. In this article, I discuss why this new UI is required, the history of this project and how it will be included in future CloudStack releases.

There are a number of key dates that current users of CloudStack should take note of and plan for, which are listed towards the end of this article.


We also recently held a webinar on this subject:


The current CloudStack UI

The current UI for Apache CloudStack was developed in 2012/13 as a single browser page UI “handcrafted” in javascript. Despite becoming the familiar face of CloudStack, the UI has always had limitations, such as no browser history, poor rendering on tablets / phones and  loss of context on refresh. Its look and feel , although good for when it was created, has become dated. However, by far the biggest issue with the existing UI  is that its 90,000 lines of code have become very difficult to maintain and extend for new CloudStack functionality. This has resulted in some new CloudStack functionality being developed as API only, and a disproportionate amount of effort required to develop new UI functionality.

How to build a new UI for Cloudstack ?

A UI R&D project was undertaken by Rohit Yadav in early 2019. Rohit is the creator and maintainer of CloudMonkey (CloudStack CLI tool) and he set off to use the lessons he’d learnt creating CloudMonkey to evaluate the different options for creating a new UI for CloudStack.

Rohit’s initial R&D work identified a set of overall UI requirements and also a set of design principles.

UI Requirements:

  • Clean Enterprise Admin  & user UI
  • Intuitive to use
  • To match existing CloudStack UI functionality and features
  • Separate UI code from core Management server code so the UI becomes a client to the CloudStack API
  • API auto-discovery of new CloudStack functionality
  • Config and Role-based rendering of buttons, actions, views etc. Dashboard, list and detail views
  • URL router and browser history driven
  • Local-storage based notification and polling
  • Dynamic language translations
  • Support desktop, tablet and mobile screen form factors

Design principles:

  • Declarative programming and web-component based
  • API discovery and param-completion like CloudMonkey
  • Auto-generated UI widgets, views, behaviour
  • Data-driven behaviour and views, buttons, actions etc. based on role-based permissions
  • Easy to learn, develop, customise, extend and maintain
  • Use modern development methodologies, frameworks and tooling
  • No DIY frameworks, reuse opensource project(s)

A number of different JavaScript frameworks were evaluated for implementation, with Vue.JS being chosen due to the speed and ease that it could be harnessed to create a modern UI. Ant Design was also chosen as it gave off-the-shelf, enterprise-class, UI building blocks and components.

Project Primate

VM Instance details in Primate

Out of these initial principles came the first iteration of Project Primate , a new Vue based UI for Apache CloudStack. Rohit presented his first cut of Primate at the Cloudstack Collaboration conference in Las Vegas in September 2019 to much excitement and enthusiasm from the community.

Unlike the old UI, primate is not part of the core CloudStack Management server code, giving a much more modular and flexible approach. This allows Primate to be “pointed” at any CloudStack API endpoint or even multiple versions of the UI to be used concurrently. The API auto-discovery allows Primate to recognise new functionality in the CloudStack API, much like CloudMonkey currently does.

Primate is designed to work across all browsers, tablets and phones. From a developer perspective, the codebase should be about a quarter that of the old UI and, most importantly, the Vue.JS framework is far easier for developers to work with.

Adoption of Project Primate by Apache Cloudstack

Primate is now being developed  by CloudStack community members in a Specialist Interest Group (S.I.G). Members of that group include developers from EWERK, PCExtreme, IndiQus, SwissTXT  and ShapeBlue.

In late October, the CloudStack community voted to adopt Project Primate as the new UI for Apache CloudStack and deprecate the old UI. The code was donated to the Apache Software Foundation and the following plan for replacement of the old UI was agreed:

Technical preview – Winter 2019 LTS release

A technical preview of the new UI will be included with the Winter 2019 LTS release of CloudStack (targeted to be in Q1 2020 and based on the 4.14 release of CloudStack). The technical preview will have feature parity with the existing UI. The  release will still ship with the existing UI for production use, but CloudStack users will be able to deploy the new UI in parallel  for testing and familiarisation purposes. The release will also include a formal advance deprecation notice of the existing UI.

At this stage, the CloudStack community will also stop taking feature requests for new functionality in the existing UI. Any new feature development in CloudStack will be based on the new UI. In parallel to this, work will be done on the UI upgrade path and documentation.

General Availability  – Summer 2020 LTS release

The summer 2020 LTS release of CloudStack will ship with the production release of the new UI. It will also be the last version of CloudStack to ship with the old UI. This release will also have the final deprecation notice for the old UI.

Old UI deprecated – Winter 2020 LTS release

The old UI code base will be removed from the Winter 2020 LTS release of CloudStack, and will not be available in releases from then onwards.

It is worth noting that, as the new primate UI is a discrete client for CloudStack that uses API discovery, the UI will be no longer bound to the core CloudStack code.  This may mean that long term the UI  may adopt its own release cycle, independent of core CloudStack releases. This long term release strategy is yet to be decided by the CloudStack community.

What CloudStack users need to do

As the old UI is being deprecated, organisations need to plan to migrate to the new CloudStack UI.

What actions specific organisations need to take depends on their use of the current UI. Many organisations only use the CloudStack UI for admin purposes, choosing other solutions to present to their end-users. It is expected that the amount of training required for admins to use the new UI will be minimal and therefore such organisations will not need to extensively plan the deployment of the new UI.

For organisations that do use the CloudStack UI to present to their users, more considered planning is suggested. Although  the new UI gives a much enhanced & intuitive experience, it is anticipated that users may need documentation updates, etc and the new UI will need to be extensively tested with any 3rd party integrations at users sites.

A summary of support for the old / new UI’s is below

Cloudstack versionLikely Release dateShips with old UIShips with new UILTS support until*
Winter 2019 LTSQ1 2020YesTechnical Previewc. Sept 2021
Summer 2020 LTSQ2/3 2020Yes (although will contain no new features from previous version)Yesc. Feb 2022
Winter 2020 LTSQ1 2021NoYesc. Sept 2022

*LTS support cycle from the Apache CloudStack community. Providers of commercial support services (such as ShapeBlue) may have different cycles.

Anybody actively developing new functionality for CloudStack needs to be aware that changes to the old UI code will not be accepted after the Winter 2019 LTS release.

Get involved

Primate on an iphone

As development of Project Primate is still ongoing, I encourage CloudStack users to download and run the Primate UI before release – it is not recommended to use the new UI in production environments until it is at GA. The code and install documentation can be found at This provides a unique opportunity to view the work to date, contribute ideas and test in your environment before the release date. Anybody wishing to join the SIG can do so on the mailing list.




In my previous post, I described the new ‘Open vSwitch with DPDK support’ on CloudStack for KVM hosts. There, I focused on describing the feature, as it was new to CloudStack, and also explained the necessary configuration on the KVM agents’ side to enable DPDK support.

DPDK (Data Plane Development Kit) ( is a set of libraries and NIC drivers for fast package processing in userspace. Using DPDK along with OVS brings benefits to networking performance on VMs and networking appliances. DPDK support in CloudStack requires that the KVM hypervisor is running on DPDK compatible hardware.

In this post, I will describe the new functions which ShapeBlue has introduced for the CloudStack 4.13 LTS. With these new features, DPDK support is extended, allowing administrators to:

  • Create service offerings with additional configurations. In particular, the DPDK required additional configurations can be included on service offerings
  • Select the DPDK vHost User mode to use on each VM deployment, from these service offerings
  • Perform live migrations of DPDK-enabled VMs between DPDK-enabled hosts

CloudStack new additions for DPDK support

In the first place, it is necessary to mention that DPDK support works along with additional VM configurations. Please ensure that the global setting ‘enable.additional.vm.configuration’ is turned on.

As a reminder from the previous post, DPDK support is enabled on VMs with additional configuration details/keys:

  • ‘extraconfig-dpdk-numa’
  • ‘extraconfig-dpdk-hugepages’

One of the new additions for DPDK support is the ability to select vHost user mode to use for DPDK by the administrator via service offerings. The vHost user mode describes a client / server model between Openvswitch along with DPDK and QEMU, in which one acts as a client while the other as a server. The server creates and manages the vHost user sockets, and the client connects to the sockets created by the server.

Additional configurations on service offerings

CloudStack allows VM XML additional configurations and DPDK vHost user mode to be stored on service offerings as details and be used on VM deployments from the service offering. Additional configurations and the DPDK vHost user mode for VM deployments must be passed as service offering details to ‘createServiceOffering’ API by the administrator.

For example, the following format is valid:

(cloudmonkey)> create serviceoffering name=NAME displaytext=TEXT domainid=DOMAIN hosttags=TAGS
serviceofferingdetails[0].key=DPDK-VHOSTUSER serviceofferingdetails[0].value=server
serviceofferingdetails[1].key=extraconfig-dpdk-numa serviceofferingdetails[1].value=NUMACONF
serviceofferingdetails[2].key=extraconfig-dpdk-hugepages serviceofferingdetails[2].value=HUGEPAGESCONF

Please note:

  • Each additional configuration value must be URL UTF-8 encoded (NUMACONF and HUGEPAGESCONF in the example above).
  • The DPDK vHost user mode key must be: “DPDK-VHOSTUSER”, and its possible values are “client” and “server”. Its value is passed to the KVM hypervisors. If it is not passed, then “server” mode is assumed. Please note this value must not be encoded.
  • Additional configurations on VMs are additive to the additional configurations on service offerings.
  • In case one or more additional configuration have the same name (or key), then the additional configurations on the VM take precedence over the additional configuration on the service offering.

On VM deployment, the DPDK vHost user mode is passed to the KVM host. Based on its value:

  • When DPDK vHost user mode = “server”:
    • OVS with DPDK acts as the server, while QEMU acts as the client. This means that VM’s interfaces are created in ‘client’ mode.
    • The DPDK ports are created with type: ‘dpdkvhostuser’
  • When DPDK vHost user mode = “client”:
    • OVS with DPDK acts as the client, and QEMU acts as the server.
    • If Openvswitch is restarted, then the sockets can reconnect to the existing sockets on the server, and standard connectivity can be resumed.
    • The DPDK ports are created with type: ‘dpdkvhostuserclient’

Live migrations of DPDK-enabled VMs

Another useful functionality of DPDK support is live migration between DPDK enabled hosts. This is possible by introducing a new host capability on DPDK enabled hosts (enablement has been described on the previous post). CloudStack uses the DPDK host capability to determine which hosts are DPDK enabled.

However, the management servers also need a mechanism to decide if a VM is DPDK-enabled, before allowing live migration to DPDK enabled hosts. The decision is made on the following criteria:

  • A VM is running on a DPDK-enabled host.
  • The VM possess the DPDK required configuration from VM details or service offering details.

This allows administrators to live migrate these VMs to suitable hosts.


As the previous post describes, DPDK support was initially introduced in CloudStack 4.12. This blog post covers the DPDK support extension for CloudStack 4.13 LTS, introducing more flexibility and improving its usage. As CloudStack recently started supporting DPDK, more additions to its support are expected to be added in future versions.

Future work may involve UI support for the previously described features. Please note that it is currently not possible to pass additional configuration to VMs or service offerings using the CloudStack UI, it is only available through the API.

For references, please check PRs:

About the author

Nicolas Vazquez is a Senior Software Engineer at ShapeBlue, the Cloud Specialists, and is a committer in the Apache CloudStack project. Nicolas spends his time designing and implementing features in Apache CloudStack.


The original CloudMonkey was contributed to the Apache CloudStack project on 31 Oct 2012 under the Apache License 2.0. It is written in Python and shipped using the Python CheeseShop, and since its inception has gone through several refactors and rewrites. While this has worked well over the years, the installation and usage have been limited to just a few modern platforms due to the dependency on Python 2.7, meaning it is hard to install on older distributions such as CentOS6.

Over the past two years, several attempts have been made to make the code compatible across Python 2.6, 2.7 and 3.x. However, it proved to be a maintenance and release challenge – making it code compatible across all the platforms, all the Python versions and the varied dependency versions; whilst also keeping it easy to install and use. During late 2017, an experimental CloudMonkey rewrite called cmk was written in Go, a modern, statically typed and compiled programming language which could produce cross-platforms standalone binaries. Finally, in early 2018, after reaching a promising state the results of the experiment were shared with the community, to build support and gather feedback for moving the CloudMonkey codebase to Go and deprecate the Python version.

During 2018, two Go-based ports were written using two different readline and prompt libraries. The alpha / beta builds were shared with the community who tested them, reported bugs and provided valuable feedback (especially around tab-completion) which drove the final implementation. With the new rewrite CloudMonkey (for the first time) ships as a single executable file for Windows which can be easily installed and used having mostly the same user experience one would get on Linux or Mac OSX. The rewrite aims to maintain command-line tool backward compatibility as a drop-in replacement for the legacy Python-based CloudMonkey (i.e. shell scripts using legacy CloudMonkey can also use the modern CloudMonkey cmk). Legacy Python-based CloudMonkey will continue to be available for installation via pip but it will not be maintained moving forward.

CloudMonkey 6.0 requires a final round of testing and bug-fixing before the release process will commence. The beta binaries are available for testing here: 

Major changes in CloudMonkey 6.0

  • Ships as standalone 32-bit and 64-bit binaries targeting Windows, Linux and Mac including ARM support (for example, to run on Raspberry Pi)
  • Drop-in replacement for legacy Python-based CloudMonkey as a command line tool
  • Interactive selection of API commands, arguments, and argument options
  • JSON is the default API response output format
  • Improved help docs output when ‘-h’ is passed to an API command
  • Added new output format ‘column’ that outputs API response in a new columnar way like modern CLIs such as kubectl and docker
  • Added new set option ‘debug’ to enable debug mode, set option ‘display’ renamed as ‘output’
  • New CloudMonkey configuration file locking mechanism to avoid file corruption when multiple cmk instances run
  • New configuration folder ~/.cmk to avoid conflict with legacy Python-based version

Features removed in CloudMonkey 6.0:

  • Removed XML output format.
  • Removed CloudMonkey logging API requests and responses to a file.
  • Coloured output removed.
  • Removed set options: color (for coloured output), signatureversion and expires (no longer acceptable API parameters), paramcompletion (API parameter completion is not enabled by default), cache_file (the default cache file, now at ~/.cmk/cache ), history_file (the history file), log_file (API log file).

About the author

Rohit Yadav is a Software Architect at ShapeBlue, the Cloud Specialists, and is a committer and PMC member of Apache CloudStack. Rohit spends most of his time designing and implementing features in Apache CloudStack.


This blog describes a new feature to be introduced in the CloudStack 4.12 release (already in the current master branch of the CloudStack repository). This feature will provide support for the Data Plane Development Kit (DPDK) in conjunction with Open vSwitch (OVS) for guest VMs and is targeted at the KVM hypervisor.

The Data Plane Development Kit ( is a set of libraries and NIC drivers for fast package processing in userspace. Using DPDK along with OVS brings benefits to networking performance on VMs and networking appliances. In this blog, we will introduce how DPDK can be used on on guest VMs once the feature is released.

Please note – DPDK support in CloudStack requires that the KVM hypervisor is running on DPDK compatible hardware.

Enable DPDK support

This feature extends the Open vSwitch feature in CloudStack with DPDK integration. As a prerequisite, Open vSwitch needs to be installed on KVM hosts and enabled in CloudStack. In addition, administrators need to install DPDK libraries on KVM hosts before configuring the CloudStack agents, and I will go into the configuration in detail.

KVM Agent Configuration

An administrator can follow this guide to enable DPDK on a KVM host:


  • Install OVS on the target KVM host
  • Configure CloudStack agent by editing the /etc/cloudstack/agent/ file:
    • # network.bridge.type=openvswitch
  • Install DPDK. Installation guide can be found on this link:


Edit /etc/cloudstack/agent/ file, in which <OVS_PATH> is the path in which your OVS ports are created, typically /var/run/openvswitch/:

  • # openvswitch.dpdk.enable=true
    # openvswitch.dpdk.ovs.path=<OVS_PATH>

Restart CloudStack agent so that changes take effect:

# systemctl restart cloudstack-agent

DPDK inside guest VMs

Now that CloudStack agents have been configured, users are able to deploy their guest VMs using DPDK. In order to achieve this, they will need to pass extra configurations to enable DPDK:

  • Enable “HugePages” on the VM
  • NUMA node configuration

As of 4.12, passing extra configurations to VM deployments will be allowed. In the case of KVM, the extra configurations are added to the VM XML domain. The CloudStack API methods deployVirtualMachine and updateVirtualMachinewill support the new optional parameter extraconfigand will work in the following way:

# deploy virtualmachine ... extraconfig=<URL_UTF-8_ENCODED_CONFIGS>

CloudStack will expect a URL UTF-8 encoded string which can support multiple extra configurations. For example, if a user wants to enable DPDK, they will need to pass two extra configurations as we have mentioned above. An example of both configurations are the following:


<cpu mode='host-passthrough'>
      <cell id='0' cpus='0' memory='9437184' unit='KiB' memAccess='shared'/>

…which becomes this URL UTF-8 encoded string, and is the one that CloudStack will expect on VM deployments:


KVM networking verification

Administrators can verify how OVS ports are created with DPDK support on DPDK enabled hosts, in which users have deployed DPDK enabled guest VMs. These port names start with “csdpdk”:


# ovs-vsctl show
Port "csdpdk-1"
   tag: 30
   Interface "csdpdk-1"
      type: dpdkvhostuser
Port "csdpdk-4"
   tag: 30
   Interface "csdpdk-4"
      type: dpdkvhostuser

About the author

Nicolas Vazquez is a Senior Software Engineer at ShapeBlue, the Cloud Specialists, and is a committer in the Apache CloudStack project. Nicolas spends his time designing and implementing features in Apache CloudStack.


We published the original blog post on KVM networking in 2016 – but in the meantime we have moved on a generation in CentOS and Ubuntu operating systems, and some of the original information is therefore out of date. In this revisit of the original blog post we cover new configuration options for CentOS 7.x as well as Ubuntu 18.04, both of which are now supported hypervisor operating systems in CloudStack 4.11. Ubuntu 18.04 has replaced the legacy networking model with the new Netplan implementation, and this does mean different configuration both for linux bridge setups as well as OpenvSwitch.

KVM hypervisor networking for CloudStack can sometimes be a challenge, considering KVM doesn’t quite have the same mature guest networking model found in the likes of VMware vSphere and Citrix XenServer. In this blog post we’re looking at the options for networking KVM hosts using bridges and VLANs, and dive a bit deeper into the configuration for these options. Installation of the hypervisor and CloudStack agent is pretty well covered in the CloudStack installation guide, so we’ll not spend too much time on this.

Network bridges

On a linux KVM host guest networking is accomplished using network bridges. These are similar to vSwitches on a VMware ESXi host or networks on a XenServer host (in fact networking on a XenServer host is also accomplished using bridges).

A KVM network bridge is a Layer-2 software device which allows traffic to be forwarded between ports internally on the bridge and the physical network uplinks. The traffic flow is controlled by MAC address tables maintained by the bridge itself, which determine which hosts are connected to which bridge port. The bridges allow for traffic segregation using traditional Layer-2 VLANs as well as SDN Layer-3 overlay networks.


Linux bridges vs OpenVswitch

The bridging on a KVM host can be accomplished using traditional linux bridge networking or by adopting the OpenVswitch back end. Traditional linux bridges have been implemented in the linux kernel since version 2.2, and have been maintained through the 2.x and 3.x kernels. Linux bridges provide all the basic Layer-2 networking required for a KVM hypervisor back end, but it lacks some automation options and is configured on a per host basis.

OpenVswitch was developed to address this, and provides additional automation in addition to new networking capabilities like Software Defined Networking (SDN). OpenVswitch allows for centralised control and distribution across physical hypervisor hosts, similar to distributed vSwitches in VMware vSphere. Distributed switch control does require additional controller infrastructure like OpenDaylight, Nicira, VMware NSX, etc. – which we won’t cover in this article as it’s not a requirement for CloudStack.

It is also worth noting Citrix started using the OpenVswitch backend in XenServer 6.0.

Network configuration overview

For this example we will configure the following networking model, assuming a linux host with four network interfaces which are bonded for resilience. We also assume all switch ports are trunk ports:

  • Network interfaces eth0 + eth1 are bonded as bond0.
  • Network interfaces eth2 + eth3 are bonded as bond1.
  • Bond0 provides the physical uplink for the bridge “cloudbr0”. This bridge carries the untagged host network interface / IP address, and will also be used for the VLAN tagged guest networks.
  • Bond1 provides the physical uplink for the bridge “cloudbr1”. This bridge handles the VLAN tagged public traffic.

The CloudStack zone networks will then be configured as follows:

  • Management and guest traffic is configured to use KVM traffic label “cloudbr0”.
  • Public traffic is configured to use KVM traffic label “cloudbr1”.

In addition to the above it’s important to remember CloudStack itself requires internal connectivity from the hypervisor host to system VMs (Virtual Routers, SSVM and CPVM) over the link local subnet. This is done over a host-only bridge “cloud0”, which is created by CloudStack when the host is added to a CloudStack zone.



Linux bridge configuration – CentOS

In the following CentOS example we have changed the NIC naming convention back to the legacy “eth0” format rather than the new “eno16777728” format. This is a personal preference – and is generally done to make automation of configuration settings easier. The configuration suggested throughout this blog post can also be implemented using the new NIC naming format.

Across all CentOS versions the “NetworkManager” service is also generally disabled, since this has been found to complicate KVM network configuration and cause unwanted behaviour:

# systemctl stop NetworkManager
# systemctl disable NetworkManager

To enable bonding and bridging CentOS 7.x requires the modules installed / loaded:

# modprobe --first-time bonding
# yum -y install bridge-utils

If IPv6 isn’t required we also add the following lines to /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1 
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

In CentOS the linux bridge configuration is done with configuration files in /etc/sysconfig/network-scripts/. Each of the four individual NIC interfaces are configured as follows (eth0 / eth1 / eth2 / eth3 are all configured the same way). Note there is no IP configuration against the NICs themselves – these purely point to the respective bonds:

# vi /etc/sysconfig/network-scripts/ifcfg-eth0

The bond configurations are specified in the equivalent ifcfg-bond scripts and specify bonding options as well as the upstream bridge name. In this case we’re just setting a basic active-passive bond (mode=1) with up/down delays of zero and status monitoring every 100ms (miimon=100). Note there are a multitude of bonding options – please refer to the CentOS / RedHat official documentation to tune these to your specific use case.

# vi /etc/sysconfig/network-scripts/ifcfg-bond0
BONDING_OPTS="mode=active-backup miimon=100 updelay=0 downdelay=0"

The same goes for bond1:

# vi /etc/sysconfig/network-scripts/ifcfg-bond1
BONDING_OPTS="mode=active-backup miimon=100 updelay=0 downdelay=0"

Cloudbr0 is configured in the ifcfg-cloudbr0 script. In addition to the bridge configuration we also specify the host IP address, which is tied directly to the bridge since it is on an untagged VLAN:

# vi /etc/sysconfig/network-scripts/ifcfg-cloudbr0

Cloudbr1 does not have an IP address configured hence the configuration is simpler:

# vi /etc/sysconfig/network-scripts/ifcfg-cloudbr1

Optional tagged interface for storage traffic

If a dedicated VLAN tagged IP interface is required for e.g. storage traffic this can be accomplished by created a VLAN on top of the bond and tying this to a dedicated bridge. In this case we create a new bridge on bond0 using VLAN 100:

# vi /etc/sysconfig/network-scripts/ifcfg-bond.100

The bridge can now be configured with the desired IP address for storage connectivity:

# vi /etc/sysconfig/network-scripts/ifcfg-cloudbr100

Internal bridge cloud0

When using linux bridge networking there is no requirement to configure the internal “cloud0” bridge, this is all handled by CloudStack.

Network startup

Note – once all network startup scripts are in place and the network service is restarted you may lose connectivity to the host if there are any configuration errors in the files, hence make sure you have console access to rectify any issues.

To make the configuration live restart the network service:

# systemctl restart network

To check the bridges use the brctl command:

# brctl show
bridge name bridge id STP enabled interfaces
cloudbr0 8000.000c29b55932 no bond0
cloudbr1 8000.000c29b45956 no bond1

The bonds can be checked with:

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0c:xx:xx:xx:xx
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0c:xx:xx:xx:xx
Slave queue ID: 0

Linux bridge configuration – Ubuntu

With the 18.04 “Bionic Beaver” release Ubuntu have retired the legacy way of configuring networking through /etc/network/interfaces in favour of Netplan – This changes how networking is configured – although the principles around bridge configuration are the same as in previous Ubuntu versions.

First of all ensure correct hostname and FQDN are set in /etc/hostname and /etc/hosts respectively.

To stop network bridge traffic from traversing IPtables / ARPtables also add the following lines to /etc/sysctl.conf, this prevents bridge traffic from traversing IPtables / ARPtables on the host.

# vi /etc/sysctl.conf
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

Ubuntu 18.04 installs the “bridge-utils” and bridge/bonding kernel options by default, and the corresponding modules are also loaded by default, hence there are no requirements to add anything to /etc/modules.

In Ubuntu 18.04 all interface, bond and bridge configuration are configured using cloud-init and the Netplan configuration in /etc/netplan/XX-cloud-init.yaml. Same as for CentOS we are configuring basic active-passive bonds (mode=1) with status monitoring every 100ms (miimon=100), and configuring bridges on top of these. As before the host IP address is tied to cloudbr0:

# vi /etc/netplan/50-cloud-init.yaml
            dhcp4: no
            dhcp4: no
            dhcp4: no
            dhcp4: no
            dhcp4: no
                - eth0
                - eth1
                mode: active-backup
                primary: eth0
            dhcp4: no
                - eth2
                - eth3
                mode: active-backup
                primary: eth2
                search: [mycloud.local]
                addresses: [,]
                - bond0
            dhcp4: no
                - bond1
    version: 2

Optional tagged interface for storage traffic

To add an options VLAN tagged interface for storage traffic add a VLAN and a new bridge to the above configuration:

# vi /etc/netplan/50-cloud-init.yaml
            id: 100
            link: bond0
            dhcp4: no
               - bond100

Internal bridge cloud0

When using linux bridge networking the internal “cloud0” bridge is again handled by CloudStack, i.e. there’s no need for specific configuration to be specified for this.

Network startup

Note – once all network startup scripts are in place and the network service is restarted you may lose connectivity to the host if there are any configuration errors in the files, hence make sure you have console access to rectify any issues.

To make the configuration reload Netplan with

# netplan apply

To check the bridges use the brctl command:

# brctl show
bridge name	bridge id		STP enabled	interfaces
cloud0		8000.000000000000	no
cloudbr0	8000.52664b74c6a7	no		bond0
cloudbr1	8000.2e13dfd92f96	no		bond1
cloudbr100	8000.02684d6541db	no		bond100

To check the VLANs and bonds:

# cat /proc/net/vlan/config
VLAN Dev name | VLAN ID
bond100 | 100 | bond0
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 10
Permanent HW addr: 00:0c:xx:xx:xx:xx
Slave queue ID: 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 10
Permanent HW addr: 00:0c:xx:xx:xx:xx
Slave queue ID: 0


OpenVswitch bridge configuration – CentOS

The OpenVswitch version in the standard CentOS repositories is relatively old (version 2.0). To install a newer version either locate and install this from a third party CentOS/Fedora/RedHat repository, alternatively download and compile the packages from the OVS website (notes on how to compile the packages can be found in

Once packages are available install and enable OVS with

# yum localinstall openvswitch-<version>.rpm
# systemctl start openvswitch
# systemctl enable openvswitch

In addition to this the bridge module should be blacklisted. Experience has shown that even blacklisting this module does not prevent it from being loaded. To force this set the module install to /bin/false. Please note the CloudStack agent install depends on the bridge module being in place, hence this step should be carried out after agent install.

echo "install bridge /bin/false" > /etc/modprobe.d/bridge-blacklist.conf

As with linux bridging above the following examples assumes IPv6 has been disabled and legacy ethX network interface names are used. In addition the hostname has been set in /etc/sysconfig/network and /etc/hosts.

Add the initial OVS bridges using the ovs-vsctl toolset:

# ovs-vsctl add-br cloudbr0
# ovs-vsctl add-br cloudbr1
# ovs-vsctl add-bond cloudbr0 bond0 eth0 eth1
# ovs-vsctl add-bond cloudbr1 bond1 eth2 eth3

This will configure the bridges in the OVS database, but the settings will not be persistent. To make the settings persistent we need to configure the network configuration scripts in /etc/sysconfig/network-scripts/, similar to when using linux bridges.

Each individual network interface has a generic configuration – note there is no reference to bonds at this stage. The following ifcfg-eth script applies to all interfaces:

# vi /etc/sysconfig/network-scripts/ifcfg-eth0

The bonds reference the interfaces as well as the upstream bridge. In addition the bond configuration specifies the OVS specific settings for the bond (active-backup, no LACP, 100ms status monitoring):

# vi /etc/sysconfig/network-scripts/ifcfg-bond0
BOND_IFACES="eth0 eth1"
OVS_OPTIONS="bond_mode=active-backup lacp=off other_config:bond-detect-mode=miimon other_config:bond-miimon-interval=100"
# vi /etc/sysconfig/network-scripts/ifcfg-bond1
BOND_IFACES="eth2 eth3"
OVS_OPTIONS="bond_mode=active-backup lacp=off other_config:bond-detect-mode=miimon other_config:bond-miimon-interval=100"

The bridges are now configured as follows. The host IP address is specified on the untagged cloudbr0 bridge:

# vi /etc/sysconfig/network-scripts/ifcfg-cloudbr0

Cloudbr1 is configured without an IP address:

# vi /etc/sysconfig/network-scripts/ifcfg-cloudbr1

Internal bridge cloud0

Under CentOS7.x and CloudStack 4.11 the cloud0 bridge is automatically configured, hence no additional configuration steps required.

Optional tagged interface for storage traffic

If a dedicated VLAN tagged IP interface is required for e.g. storage traffic this is accomplished by creating a VLAN tagged fake bridge on top of one of the cloud bridges. In this case we add it to cloudbr0 with VLAN 100:

# ovs-vsctl add-br cloudbr100 cloudbr0 100
# vi /etc/sysconfig/network-scripts/ifcfg-cloudbr100
OVS_OPTIONS="cloudbr0 100"

Additional OVS network settings

To finish off the OVS network configuration specify the hostname, gateway and IPv6 settings:

vim /etc/sysconfig/network

VLAN problems when using OVS

Kernel versions older than 3.3 had some issues with VLAN traffic propagating between KVM hosts. This has not been observed in CentOS 7.5 (kernel version 3.10) – however if this issue is encountered look up the OVS VLAN splinter workaround.

Network startup

Note – as mentioned for linux bridge networking – once all network startup scripts are in place and the network service is restarted you may lose connectivity to the host if there are any configuration errors in the files, hence make sure you have console access to rectify any issues.

To make the configuration live restart the network service:

# systemctl restart network

To check the bridges use the ovs-vsctl command. The following shows the optional cloudbr100 on VLAN 100:

# ovs-vsctl show
    Bridge "cloudbr0";
        Port "cloudbr0";
            Interface "cloudbr0"
                type: internal
        Port "cloudbr100"
            tag: 100
            Interface "cloudbr100"
                type: internal
        Port "bond0"
            Interface "veth0";
            Interface "eth0"
    Bridge "cloudbr1"
        Port "bond1"
            Interface "eth1"
            Interface "veth1"
        Port "cloudbr1"
            Interface "cloudbr1"
                type: internal
    Bridge "cloud0"
        Port "cloud0"
            Interface "cloud0"
                type: internal
    ovs_version: "2.9.2"

The bond status can be checked with the ovs-appctl command:

ovs-appctl bond/show bond0
---- bond0 ----
bond_mode: active-backup
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
lacp_status: off
active slave mac: 00:0c:xx:xx:xx:xx(eth0)

slave eth0: enabled
active slave
may_enable: true

slave eth1: enabled
may_enable: true

To ensure that only OVS bridges are used also check that linux bridge control returns no bridges:

# brctl show
bridge name	bridge id		STP enabled	interfaces

As a final note – the CloudStack agent also requires the following two lines added to /etc/cloudstack/agent/ after install:


OpenVswitch bridge configuration – Ubuntu

As discussed earlier in this blog post Ubuntu 18.04 introduced Netplan as a replacement to the legacy “/etc/network/interfaces” network configuration. Unfortunately Netplan does not support OVS, hence the first challenge is to revert Ubuntu to the legacy configuration method.

To disable Netplan first of all add “netcfg/do_not_use_netplan=true” to the GRUB_CMDLINE_LINUX option in /etc/default/grub. The following example also shows the use of legacy interface names as well as IPv6 being disabled:

GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 ipv6.disable=1 netcfg/do_not_use_netplan=true"

Then rebuild GRUB and reboot the server:

grub-mkconfig -o /boot/grub/grub.cfg

To set the hostname first of all edit “/etc/cloud/cloud.cfg” and set this to preserve the system hostname:

preserve_hostname: true

Thereafter set the hostname with hostnamectl:

hostnamectl set-hostname --static --transient --pretty <hostname>

Now remove Netplan, install OVS from the Ubuntu repositories as well the “ifupdown” package to get standard network functionality back:

apt-get purge nplan
apt-get install openvswitch-switch
apt-get install ifupdown

As for CentOS we need to blacklist the bridge module to prevent standard bridges being created. Please note the CloudStack agent install depends on the bridge module being in place, hence this step should be carried out after agent install.

echo "install bridge /bin/false" > /etc/modprobe.d/bridge-blacklist.conf

To stop network bridge traffic from traversing IPtables / ARPtables also add the following lines to /etc/sysctl.conf:

# vi /etc/sysctl.conf
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

Same as for CentOS we first of all add the OVS bridges and bonds from command line using the ovs-vsctl command line tools. In this case we also add the additional tagged fake bridge cloudbr100 on VLAN 100:

# ovs-vsctl add-br cloudbr0
# ovs-vsctl add-br cloudbr1
# ovs-vsctl add-bond cloudbr0 bond0 eth0 eth1 bond_mode=active-backup other_config:bond-detect-mode=miimon other_config:bond-miimon-interval=100
# ovs-vsctl add-bond cloudbr1 bond1 eth2 eth3 bond_mode=active-backup other_config:bond-detect-mode=miimon other_config:bond-miimon-interval=100
# ovs-vsctl add-br cloudbr100 cloudbr0 100

As for linux bridge all network configuration is applied in “/etc/network/interfaces”:

# vi /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual

auto cloudbr0
allow-ovs cloudbr0
iface cloudbr0 inet static
  ovs_type OVSBridge
  ovs_ports bond0

allow-cloudbr0 bond0 
iface bond0 inet manual 
  ovs_bridge cloudbr0 
  ovs_type OVSBond 
  ovs_bonds eth0 eth1 
  ovs_option bond_mode=active-backup other_config:miimon=100

auto cloudbr1
allow-ovs cloudbr1
iface cloudbr1 inet manual

allow-cloudbr1 bond1 
iface bond1 inet manual 
  ovs_bridge cloudbr1 
  ovs_type OVSBond 
  ovs_bonds eth2 eth3 
  ovs_option bond_mode=active-backup other_config:miimon=100

Network startup

Since Ubuntu 14.04 the bridges have started automatically without any requirement for additional startup scripts. Since OVS uses the same toolset across both CentOS and Ubuntu the same processes as described earlier in this blog post can be utilised.

# ovs-appctl bond/show bond0
# ovs-vsctl show

To ensure that only OVS bridges are used also check that linux bridge control returns no bridges:

# brctl show
bridge name	bridge id		STP enabled	interfaces

As mentioned earlier the following also needs added to the /etc/cloudstack/agent/ file:


Internal bridge cloud0

In Ubuntu there is no requirement to add additional configuration for the internal cloud0 bridge, CloudStack manages this.

Optional tagged interface for storage traffic

Additional VLAN tagged interfaces are again accomplished by creating a VLAN tagged fake bridge on top of one of the cloud bridges. In this case we add it to cloudbr0 with VLAN 100 at the end of the interfaces file:

# ovs-vsctl add-br cloudbr100 cloudbr0 100
# vi /etc/network/interfaces
auto cloudbr100
allow-cloudbr0 cloudbr100
iface cloudbr100 inet static
  ovs_type OVSIntPort
  ovs_bridge cloudbr0
  ovs_options tag=100


As KVM is becoming more stable and mature, more people are going to start looking at using it rather that the more traditional XenServer or vSphere solutions, and we hope this article will assist in configuring host networking. As always we’re happy to receive feedback , so please get in touch with any comments, questions or suggestions.

About The Author

Dag Sonstebo is  a Cloud Architect at ShapeBlue, The Cloud Specialists. Dag spends most of his time designing, implementing and automating IaaS solutions based on Apache CloudStack.


CloudStack 4.11.1 introduces a new security enhancement on top of the new CA framework to secure live KVM VM migrations. This feature allows live migration of guest VMs across KVM hosts using secured TLS enabled libvirtd process. Without this feature, the live migration of guest VMs across KVM hosts would use an unsecured TCP connection, which is prone to man-in-the-middle attacks leading to leakage of critical VM data (the VM state and memory). This feature brings stability and security enhancements for CloudStack and KVM users.


The initial implementation of the CA framework was limited to the provisioning of X509 certificates to secure the KVM/CPVM/SSVM agent(s)  and the CloudStack management server(s). With the new enhancement, the X509 certificates are now also used by the libvirtd process on the KVM host to secure live VM migration to another secured KVM host.

The migration URI used by two secured KVM hosts is qemu+tls:// as opposed to qemu+tcp:// that is used by an unsecured host. We’ve also enforced that live VM migration is allowed only between either two secured KVM hosts or two unsecured hosts, but not between KVM hosts with a different security configuration. Between two secured KVM hosts, the web of trust is established by the common root CA certificate that can validate the server certificate chain when live VM migration is initiated.

As part of the process of securing a KVM host the CA framework issues X509 certificates and provisions them to a KVM host and libvirtd is reconfigured to listen on the default TLS port of 16514 and use the same X509 certificates as used by thecloudstack-agent. In an existing environment, the admin will need to ensure that the default TLS port 16514 is not blocked however in a fresh environment suitable iptables rules and other configurations are done via cloudstack-setup-agent using a new '-s' flag.

Starting CloudStack 4.11.1, hosts that don’t have both cloudstack-agent and libvirtd processes secured and in Up state will show up in ‘Unsecure’ state in the UI (and in host details as part of listHosts API response):

This will allow admins to easily identify and secure hosts using a new ‘provision certificate’ button that can be used from the host’s details tab in the UI:

After a KVM host is successfully secured it will show up in the Up state:

As part of the onboarding and securing process, after securing all the KVM hosts the admin can also enforce authentication strictness of client X509 certificates by the CA framework, by setting the global setting ‘ca.plugin.root.auth.strictness' to true (this does not require restarting of the management server(s)).

About the author

Rohit Yadav is a Software Architect at ShapeBlue, the Cloud Specialists, and is a committer and PMC member of Apache CloudStack. Rohit spends most of his time designing and implementing features in Apache CloudStack.


Last year we implemented a new CA Framework on CloudStack 4.11 to make communications between CloudStack management servers it’s hypervisor agents more secure. As part of that work, we introduced the ability for CloudStack agents to connect to multiple management servers, avoiding the usage of an external load balancer.

We’ve now extended the CA Framework by implementing load balance sorting algorithms which are applied to the list of management servers before being sent to the indirect agents.  This allows the CloudStack management servers to balance the agent load between themselves, with no reliance on an external load balancer. This will  be available in CloudStack 4.11.1 The new functionality also  introduces the notion of a preferred management server for agents, and a background mechanism to check and eventually connect to the preferred management server (assumed to be the first on the list the agent receives).


The CloudStack administrator is responsible for setting the list of management servers to connect to and an algorithm (to sort the management servers list) from the CloudStack management server using global configurations.

Management server perspective

This feature uses (and introduces) these configurations:

  • ‘’: The algorithm to be applied to the list of management servers on ‘host’ configuration before being sent to the agents. Allowed algorithm values are:
    • ‘static’: Each agent receives the same list as provided on ‘host’ configuration. Therefore, no load balancing performed.
    • ’roundrobin’: The agents are evenly spread across management servers
    • ‘shuffle’: Randomly sorts the list before being sent to each agent.
  • ‘’: The interval in seconds after which agent should check and try to connect to its preferred host.

Any changes to these global configurations are dynamic and do not require restarting the management server.

There are three cases in which new lists are propagated to the agents:

  • Addition of a host
  • Connection or reconnection of an agent
  • A change on the ‘host’ or ‘’ configurations

Agents perspective

Agents receive the list of management servers, the algorithm and the check interval (if provided) and persist them on their file as:


The first management server on the list is considered the preferred host. The check interval to check for the preferred host should be greater than 0, in which case it is persisted on on the ‘’ key. In case the interval is greater than 0 and the host which the agent is connected to is not the preferred host, the agent will attempt connection to the preferred host

When connection is established between an agent and a management server, the agent sends its list of management servers. The management server checks if the list the agent has is up to date, sending the updated list if it is outdated. This behaviour ensures that each agent should get the updated version of the list of management servers even after any failure.


Assuming a test environment consisting on:

  • 3 management servers: M1, M2 and M3
  • 4 KVM hosts: H1, H2, H3 and H4

The ‘host’ global configuration should be set to ‘M1,M2,M3’

If the CloudStack administrator wishes no load balancing between agents and management servers, it would set the ‘static’ algorithm as the ‘’ global configuration. Each agent receives the same list (M1,M2,M3), and will be connected to the same management server.

If the CloudStack administrator wishes to balance connections between agents and management servers, the ’roundrobin’ algorithm is recommended. In this case:

  • H1 receives the list (M1, M2, M3)
  • H2 receives the list (M2, M3, M1)
  • H3 receives the list (M3, M1, M2)
  • H4 receives the list (M1, M2, M3)

There is also a ‘shuffle’ algorithm, in which the list is randomized before being sent to any agent. With this algorithm, the CloudStack administrator has no control of the load balancing so it is not recommened production use at the moment.

Combined with the algorithm, the CloudStack administrator can also set the ‘’ global configuration to ‘X’. This ensures that each X seconds, every agent will check if the management server they are connected to is the same as the first element of their list (preferred host). If there is a mismatch, the agent will attempt connecting to the preferred host.

About the author

Nicolas Vazquez is a Senior Software Engineer at ShapeBlue, the Cloud Specialists, and is a committer in the Apache CloudStack project. Nicolas spends his time designing and implementing features in Apache CloudStack.

Version 4.11 of Apache CloudStack has been released with some exciting new features and a long list of improvements and fixes. It includes more than 400 commits, 220 pull requests, and fixes more than 250 issues.  This version has been worked on for 8 months and is the first release of the 4.11 LTS releases, which will be supported until  1 July 2019.

We’ve been heavily involved in this release at ShapeBlue; our engineering team has contributed a number of the major new features and our own Rohit Yadav has been the 4.11 Release Manager.

As well as some really interesting new features, CloudStack 4.11 has significant performance and reliability improvements to the Virtual Router.

This is far from an exhaustive list, but here are the headline items that we think are most significant.

New Features and Improvements

  • Support for XenServer 7.1 and 7.2, and improved support for VMware 6.5.
  • Host-HA framework and HA-provider for KVM hosts with and NFS as primary storage, and a new background polling task manager.
  • Secure agents communication: new certificate authority framework and a default built-in root CA provider.
  • New network type – L2.
  • CloudStack metrics exporter for Prometheus.
  • Cloudian Hyperstore connector for CloudStack.
  • Annotation feature for CloudStack entities such as hosts.
  • Separation of volume snapshot creation on primary storage and backing operation on secondary storage.
  • Limit admin access from specified CIDRs.
  • Expansion of Management IP Range.
  • Dedication of public IPs to SSVM and CPVM.
  • Support for separate subnet for SSVM and CPVM.
  • Bypass secondary storage template copy/transfer for KVM.
  • Support for multi-disk OVA template for VMware.
  • Storage overprovisioning for local storage.
  • LDAP mapping with domain scope, and mapping of LDAP group to an account.
  • Move user across accounts.
  • Support for “VSD managed” networks with Nuage Networks.
  • Extend config drive support for user data, metadata, and password (Nuage networks).
  • Nuage domain template selection per VPC and support for network migration.
  • Managed storage enhancements.
  • Support for watchdog timer to KVM Instances.
  • Support for Secondary IPv6 Addresses and Subnets.
  • IPv6 Prefix Delegation support in basic networking.
  • Ability to specific MAC address while deploying VM or adding a NIC to a VM.
  • VMware dvSwitch security policies configuration in network offering
  • Allow more than 7 NICs to be added to a VMware VM.
  • Network rate usage for guest offering for VRs.
  • Usage metrics for VM snapshot on primary storage.
  • Enable Netscaler inline mode.
  • NCC integration in CloudStack.
  • The retirement of the Midonet network plugin.

UI Improvements

  • High precision of metrics percentage in the dashboard:
  • Event timeline – filter related events:

  • Navigation improvements between related entities:
  • Bulk operation support for stopping and destroying VMs (note: minor known issue where manual refresh required afterwards):
  • List view improvements and additional columns with state icon:

Structural Improvements

  • Embedded Jetty and improved CloudStack management server configuration.
  • Improved support for Java 8 for building artifacts/modules, packaging, and in the systemvm template.
  • New Debian 9 based systemvm template:
    • Patches system VM without reboot, reduces VR/system VM startup time to few tens of seconds.
    • Faster console proxy startup and service availability.
    • Improved support for redundant virtual routers, conntrackd and keepalived.
    • Improved strongswan provided VPN (s2s and remote access).
    • Packer based systemvm template generation and reduced disk size.
    • Several optimization and improvements.

Documentation and Downloads

The official installation, administration and API documentation can be found below: 

The release notes can be found at: 

The instruction and links to use ShapeBlue provided (noredist) packages repository can be found at: 


CloudStack usage is a complimentary service which tracks end user consumption of CloudStack resources and summarises this in a separate database for reporting or billing. The usage database can be queried directly, through the CloudStack API, or it can be integrated into external billing or reporting systems.

For background information on the usage service please refer to the CloudStack documentation set:

In this blog post we will go a step further and deep dive into how the usage service works, how you can run usage reports from the database either directly or through the API, and also how to troubleshoot this.

Please note – in this blog post we will be discussing the underlying database structure for the CloudStack management and usage services. Whilst these have separate databases they do in some cases share table names – hence please note the databases referenced throughout – e.g. cloud.usage_event versus cloudstack_usage.usage_event, etc.



As per the official CloudStack documentation the usage service is simply installed and started. In CentOS/RHEL this is done as follows:

# yum install cloudstack-usage
# chkconfig cloudstack-usage on
# service cloudstack-usage on

whilst on a Debian/Ubuntu server:

# apt-get install cloudstack-usage
# update-rc.d cloudstack-usage defaults
# service cloudstack-usage on

Once configure the usage service will use the same MySQL connection details as the main CloudStack management service. This is automatically added when the management service is configured with the “cloudstack-setup-databases” script (refer to The usage service installation simply adds a symbolic link to the same file as is used by cloudstack-management:

# ls -l /etc/cloudstack/usage/ total 4 
lrwxrwxrwx. 1 root root 40 Sep 8 08:18 > /etc/cloudstack/management/ 
lrwxrwxrwx. 1 root root 30 Sep 8 08:18 key > /etc/cloudstack/management/key 
-rw-r--r--. 1 root root 2968 Jul 12 10:36 log4j-cloud.xml 

Please note whilst the cloudstack-usage and cloudstack-management service share the same configuration file this will still contain individual settings for each service:

# grep -i usage /etc/cloudstack/usage/
# usage database tuning parameters
# usage database settings
db.usage.failOverReadOnly=false DB host IP address)
db.usage.password=ENC(Encrypted password)
#usage Database

Note the above settings would need changed if:

  • the usage DB is installed on a different MySQL server than the main CloudStack database
  • if the usage database is using a different set of login credentials

Also note that the passwords in the file above are encrypted using the method specified during the “cloudstack-setup-databases” script run – hence this also uses the referenced “key” file as shown in the above folder listing.

Application settings

Once installed the usage service is configured with the following global settings in CloudStack:

  • enable.usage.server:
    • Switches usage service on/off
    • true|false
  • usage.aggregation.timezone:
    • Timezone used for usage aggregation.
    • Refer to for formatting.
    • Defaults to “GMT”.
  • usage.execution.timezone:
    • Timezone for usage job execution.
    • Refer to for formatting.
  • usage.sanity.check.interval:
    • Interval (in days) to check sanity of usage data.
    • Set the value to true if snapshot usage need to consider virtual size, else physical size is considered.
    • true|false – defaults to false.
  • usage.stats.job.aggregation.range:
    • The range of time for aggregating the user statistics specified in minutes (e.g. 1440 for daily, 60 for hourly. Default is 60 minutes).
    • Please note this setting would be changed in a chargeback situation where VM resources are charged on an hourly/daily/monthly basis.
  • usage.stats.job.exec.time:
    • The time at which the usage statistics aggregation job will run as an HH:MM time, e.g. 00:30 to run at 12:30am.
    • Default is 00:15.
    • Please note this time follows the setting in usage.execution.timezone above.

Please note – if any of these settings are updated then only the cloudstack-usage service needs restarted (i.e. there is no need to restart cloudstack-management).

Usage types

To track the resources utilised in CloudStack every API call where a resource is created, destroyed, stopped, started, requested and released are tracked in the cloud.usage_event table. This table has entries for every event since the start of the CloudStack instance creation, hence may grow to become quite big.

During processing every event in this table are assigned a usage type. The usage types are listed in the CloudStack documentation, or it can simply be queried using the CloudStack “listUsagetypes” API call:

# cloudmonkey list usagetypes
count = 19
| usagetypeid | description                             |
|  1          |  Running Vm Usage                       |
|  2          |  Allocated Vm Usage                     |
|  3          |  IP Address Usage                       |
|  4          |  Network Usage (Bytes Sent)             |
|  5          |  Network Usage (Bytes Received)         |
|  6          |  Volume Usage                           |
|  7          |  Template Usage                         |
|  8          |  ISO Usage                              |
|  9          |  Snapshot Usage                         |
| 10          |  Security Group Usage                   |
| 11          |  Load Balancer Usage                    |
| 12          |  Port Forwarding Usage                  |
| 13          |  Network Offering Usage                 |
| 14          |  VPN users usage                        |
| 21          |  VM Disk usage(I/O Read)                |
| 22          |  VM Disk usage(I/O Write)               |
| 23          |  VM Disk usage(Bytes Read)              |
| 24          |  VM Disk usage(Bytes Write)             |
| 25          |  VM Snapshot storage usage              |

Please note these usage types are calculated depending on the nature of resource used, e.g.:

  • “Running VM usage” will simply count the hours a single VM instance is used.
  • “Volume usage” will however track both the size of each volume in addition to the time utilised.

Process flow


From a high level point of view the usage service processes data already generated by the CloudStack management service, copies this to the cloud_usage database before processing and aggregating the data in the cloud_usage.cloud_usage database:



Using a running VM instance as example the data process flow is as follows.

Usage_event table entries

CloudStack management writes all events to the cloud.usage_event table. This happens whether the cloudstack-usage service is running or not.

In this example we will track the VM with instance ID 17. The resource tracked – be it a VM, a volume, a port forwarding rule , etc. – is listed in the usage_event table as “resource_id”, which points to the main ID field in the vm_instance, volume tables etc.

   type like '%VM%' and resource_id=17;

68VM.CREATE62017-09-08 11:14:31117bbannervm12175NULLXenServer0NULL
70VM.START62017-09-08 11:14:41117bbannervm12175NULLXenServer0NULL
123VM.STOP62017-09-26 13:44:48117bbannervm12175NULLXenServer0NULL
125VM.DESTROY62017-09-26 13:45:00117bbannervm12175NULLXenServer0NULL

Please note: a lot of the resources will obviously still be in use – i.e. they will not have a destroy/release entry. In this case the usage service considers the end date to be open, i.e. all calculations are up until today.

Usage_event copy

When the usage job runs (at “usage.stats.job.exec.time”) it first copies all new entries since the last processing time from the cloud.usage_event table to the cloud_usage.usage_event table.

The only difference between the two tables is the “processed” column – in the cloud database this is always set to 0 – nil, however once the table entry is processed in the cloud_usage database this field is updated to 1.

In comparison – the entries in the cloud database:

   id > 130;
131VOLUME.CREATE62017-09-26 13:45:44131bbannerdata36NULL2147483648NULL0NULL
132NET.IPASSIGN62017-09-26 13:46:0511710.1.34.77NULL00VirtualNetwork0NULL
133VM.STOP82017-09-28 10:31:44123secretprojectvm1175NULLXenServer0NULL
134NETWORK.OFFERING.REMOVE82017-09-28 10:31:44123418NULL0NULL0NULL

Compared to the same entries in cloud_usage:

   id > 130;
131VOLUME.CREATE62017-09-26 13:45:44131bbannerdata36NULL2147483648NULL1NULL
132NET.IPASSIGN62017-09-26 13:46:0511710.1.34.77NULL00VirtualNetwork1NULL
133VM.STOP82017-09-28 10:31:44123secretprojectvm1175NULLXenServer1NULL
134NETWORK.OFFERING.REMOVE82017-09-28 10:31:44123418NULL0NULL1NULL

Account copy

As part of this copy job the cloudstack-usage service will also make a copy of some of the columns in the cloud.account table such that a ownership of resources can be easily established during processing.

Usage summary and helper tables

In the first usage aggregation step all usage data per account and per usage type is summarised in helper tables. Continuing the example above the CREATE+DESTROY events as well as the VM START+STOP events are summarised in the “usage_vm_instance” table:


11617bbannervm12175XenServer2017-09-08 11:14:412017-09-26 13:44:48NULLNULLNULL
21617bbannervm12175XenServer2017-09-08 11:14:312017-09-26 13:45:00NULLNULLNULL

Note the helper table has now summarised the data with the usage type mentioned above – and the start/end dates are contained in the same database row.

Please note – if a resource is still in use then the end date simply isn’t populated, i.e. all calculations will work on rolling end date of today.

If we now also compare the volume used by VM instance ID 17 we find this in the cloud_usage.usage_volume helper table:

 cloud.volumes ON ( =
 cloud.volumes.instance_id = 17;
18162NULL5214748364802017-09-08 11:14:312017-09-26 13:45:00

As the database selects above show – each helper table will contain only the information pertinent to that specific usage type, hence the cloud_usage.usage_vm_instance contains information about VM service offering, template and hypervisor type the cloud_usage.usage_volume contains information about disk offering ID, template ID and size.

If a usage type for a resource has been started/stopped or requested/released multiple times then each period of use will be listed in the helper tables:


11612bbannervm2175XenServer2017-09-08 09:30:372017-09-08 09:30:49NULLNULLNULL
11612bbannervm2175XenServer2017-09-08 11:14:03NULLNULLNULLNULL
21612bbannervm2175XenServer2017-09-08 09:30:20NULLNULLNULLNULL

Usage data aggregation

Once all helper tables have been populated the usage service now creates time aggregated database entries in the cloud_usage.cloud_usage table. In all simplicity this process:

  1. Analyses all entries in the helper tables.
  2. Splits up this data based on “usage.stats.job.aggregation.range” to create individual usage timeblocks.
  3. Repeats this process for all accounts and for all resources.

So – looking at the VM with ID=17 analysed above:

  • This had a running start date of 2017-09-08 11:14:41, an end date of 2017-09-26 13:44:48.
  • The usage service is set up with usage.stats.job.aggregation.range=1440, i.e. 24 hours.
  • The usage service will now create entries in the cloud_usage.cloud_usage table for every full and partial 24 hour period this VM was running.

   usage_id=17 and usage_type=1;

64162bbannervm12 running time (ServiceOffering: 17) (Template: 5)12.755278 Hrs112.75527763366699217bbannervm1217517XenServerNULLNULL2017-09-08 00:00:002017-09-08 23:59:59NULLNULLNULLNULL0
146162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-09 00:00:002017-09-09 23:59:59NULLNULLNULLNULL0
221162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-10 00:00:002017-09-10 23:59:59NULLNULLNULLNULL0
1271162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-24 00:00:002017-09-24 23:59:59NULLNULLNULLNULL0
1346162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-25 00:00:002017-09-25 23:59:59NULLNULLNULLNULL0
1427162bbannervm12 running time (ServiceOffering: 17) (Template: 5)13.746667 Hrs113.7466669082641617bbannervm1217517XenServerNULLNULL2017-09-26 00:00:002017-09-26 23:59:59NULLNULLNULLNULL0

Since all of these entries are split into specific dates it is now relatively straight forward to run a report to capture all resource usage for an account over a specific time period, e.g. if a monthly bill is required.

Querying usage data through the API

The usage records can also be queried through the API by using the “listUsagerecords” API call. This uses similar syntax to the above – but there are some differences:

  • The API call requires start and end dates, these are in a “yyyy-MM-dd HH:mm:ss” or simply a “yyyy-MM-dd” format.
  • The usage type is same as above, e.g. type=1 for running VMs.
  • Usage ID is however the UUID attached to the resource in question, e.g. in the following example VM ID 17 actually has UUID 4358f436-bc9b-4793-b1be-95fa9b074fd5 in the vm_instance table.
  • The API call can also be filtered for account/accountid/domain.

More information on the syntax can be found in .

The following API query will list the first three day’s worth of usage data listed in the table above:

# cloudmonkey list usagerecords type=1 startdate=2017-09-09 enddate=2017-09-10 usageid=4358f436-bc9b-4793-b1be-95fa9b074fd5
count = 3
| startdate                   | account | domainid                             | enddate                     | description                                                  | name        | virtualmachineid                     | offeringid                           | usagetype | domain     | zoneid                               | rawusage | templateid                           | usage         | usageid                              | type      | accountid                            |
| 2017-09-08'T'00:00:00+00:00 | bbanner | f3501b29-01f7-44ce-a266-9e3f12c17394 | 2017-09-08'T'23:59:59+00:00 | bbannervm12 running time (ServiceOffering: 17) (Template: 5) | bbannervm12 | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | 60d9aaf1-7ff7-472e-b29f-6768d0cb5702 | 1         | Subdomain1 | d4b9d32e-d779-48b8-814d-d7847d55a684 | 12.755278| 47dd8c98-946e-11e7-b419-0666ae010714 | 12.755278 Hrs | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | XenServer | 8c2d592f-78e1-4e92-a910-1e4b865240cf |
| 2017-09-09'T'00:00:00+00:00 | bbanner | f3501b29-01f7-44ce-a266-9e3f12c17394 | 2017-09-09'T'23:59:59+00:00 | bbannervm12 running time (ServiceOffering: 17) (Template: 5) | bbannervm12 | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | 60d9aaf1-7ff7-472e-b29f-6768d0cb5702 | 1         | Subdomain1 | d4b9d32e-d779-48b8-814d-d7847d55a684 | 24       | 47dd8c98-946e-11e7-b419-0666ae010714 | 24 Hrs        | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | XenServer | 8c2d592f-78e1-4e92-a910-1e4b865240cf |
| 2017-09-10'T'00:00:00+00:00 | bbanner | f3501b29-01f7-44ce-a266-9e3f12c17394 | 2017-09-10'T'23:59:59+00:00 | bbannervm12 running time (ServiceOffering: 17) (Template: 5) | bbannervm12 | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | 60d9aaf1-7ff7-472e-b29f-6768d0cb5702 | 1         | Subdomain1 | d4b9d32e-d779-48b8-814d-d7847d55a684 | 24       | 47dd8c98-946e-11e7-b419-0666ae010714 | 24 Hrs        | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | XenServer | 8c2d592f-78e1-4e92-a910-1e4b865240cf |

Analysing and reporting on usage data

The usage data can be analysed in any reporting tool – from the various CloudStack billing platforms, to enterprise billing systems as well as simpler tools like Excel. Since the cloud_usage.cloud_usage data is fully aggregated into time utilised blocks, it is now just a question of summarising data based on usage type, accounts, service offerings, etc.

The following SQL queries are provided as examples only – in a real use case these will most likely require to be changed and refined to the specific reporting requirements.

Running VMs

To find usage data for all running VMs run during the month of September we search for usage type=1 and group by vm_instance. For a VM instance we summarise how many hours each VM has been running – however in a real billing scenario this would most likely also be broken down into e.g. how many hours of VM usage has been utilised per VM service offering.

   SUM(raw_usage) as VMRunHours
   cloud_usage.account on (cloud_usage.account_id =
   start_date LIKE '2017-09%' 
   AND usage_type = 1
   account_id ASC, vm_instance_id ASC;

Network utilisation

The following will summarise network usage for sent (usage type=4) and received (usage type=5) traffic on a per account basis, again this is listing for the month of September.

For network utilisation the usage is simply summarised as total Bytes sent or received:

   SUM(raw_usage) as TotalBytes
   cloud_usage.account on (cloud_usage.account_id =
   start_date LIKE '2017-09%' 
   AND usage_type in (4,5)
   account_id, usage_type
   account_id ASC;

Volume utilisation

For volume or general storage utilisation (applies to snapshots as well) the usage is calculated as storage hours – e.g. GbHours. In this example we again summarise for all volumes (usage type=6) on a per account and disk basis during the month of September. Please note in this case we have to do multiple joins (or nested WHERE statements) to look up volume IDs, VM name, etc.

   cloud_usage.cloud_usage.usage_id, as Instance_Name, as Volume_Name,
   cloud_usage.cloud_usage.size/(1024*1024*1024) as DiskSizeGb,
   SUM(cloud_usage.cloud_usage.raw_usage) as TotalHours,
   sum(cloud_usage.cloud_usage.raw_usage*cloud_usage.cloud_usage.size/(1024*1024*1024)) as GbHours
   cloud_usage.account on (cloud_usage.account_id =
   cloud.volumes on (cloud_usage.usage_id =
   cloud.vm_instance on (cloud.volumes.instance_id =
   start_date LIKE '2017-09%' AND usage_type = 6
   account_id ASC, usage_id ASC;



IP addresses, port forwarding rules and VPN users

For other usage types where – similar to VM running hours – we simply report on the total hours utilised we again summarise the raw_usage, but since the description in is clear enough we don’t need to go looking elsewhere for this information. In the following example we report on IP address usage (usage type=3), port forwarding rules (12) and VPN users (14):

   SUM(cloud_usage.cloud_usage.raw_usage) as TotalHours
   cloud_usage.account on (cloud_usage.account_id =
   start_date LIKE '2017-09%' AND usage_type in (3,12,14)
   account_id ASC, usage_id ASC;


6bbanner141VPN User: bbannervpn1, Id: 1 usage time542.4766664505005
6bbanner142VPN User: brucesdogvpn1, Id: 2 usage time1.7355557680130005
6bbanner143VPN User: bruceswifevpn1, Id: 3 usage time540.7405557632446
6bbanner144VPN User: stanleevpn1, Id: 4 usage time540.7180547714233
6bbanner129Port Forwarding Rule: 9 usage time1.6469446420669556


Service management

As described earlier in this blog post the usage job will run at a time specified in the usage.stats.job.exec.time global setting.

Once the job has ran it will update its own internal database with the run time and the start/end times processed:

SELECT * FROM cloud_usage.usage_job;


1acshostname/ 00:00:002017-09-08 23:59:5912017-09-09 00:14:53
2acshostname/ 00:00:002017-09-09 23:59:5912017-09-10 00:14:53
3acshostname/ 00:00:002017-09-10 23:59:5912017-09-11 00:14:53
4acshostname/ 00:00:002017-09-11 23:59:5912017-09-12 00:14:53
5acshostname/ 00:00:002017-09-12 23:59:5912017-09-13 00:14:53

A couple of things to note on this lists:

  • Start_millis and end_millis simply list the epoch timestamp in start_date and end_date. The epoch time is used by the usage service to determine cloud_usage.cloud_usage entries.
  • Exec_time will list how long the usage job ran for. This is useful in cases where the usage job processing time is longer than 24 hours – i.e. where usage job schedules may start overlapping.
  • The success field is set to 1 for success, 0 for failure.
  • Heartbeat lists when the job was ran.

When the cloudstack-usage service is restarted this will run checks against the usage_jobs table to determine:

  • If the last scheduled job was ran. If this wasn’t done the job is ran again, i.e. a service startup will run a single missed job.
  • Thereafter the usage job will run at its normal scheduled time.

Usage troubleshooting – general advice

Since this blog post covers topics around adding/updating/removing entries in the cloud and cloud_usage databases we always advise CloudStack users to take MySQL dumps of both databases before doing any work – whether this directly in MySQL or via the usage API calls. 

Database inconsistencies

Under certain circumstances (e.g. if the cloudstack-management service crashes) the cloud.usage_event table may have inconsistent entries, e.g.:

  • STOP entries without a START entry, or DESTROY entries without a CREATE.
  • Double entries – i.e. a VM has two START entries.

The usage logs will show where these failures occur. The fix for these issues is to add/delete entries as required in the cloud.usage_event table, e.g. add a VM.START with date stamp if missing and so on.

Usage service logs

The usage service writes all logs to /var/log/cloudstack/usage/usage.log. These logs are relatively verbose and will outline all actions performed during the usage job:

DEBUG [usage.parser.IPAddressUsageParser] (Usage-Job-1:null) (logid:) Parsing IP Address usage for account: 2
DEBUG [usage.parser.IPAddressUsageParser] (Usage-Job-1:null) (logid:) Total usage time 86400000ms
DEBUG [usage.parser.IPAddressUsageParser] (Usage-Job-1:null) (logid:) Creating IP usage record with id: 3, usage: 24, startDate: Tue Oct 10 00:00:00 UTC 2017, endDate: Tue Oct 10 23:59:59 UTC 2017, for account: 2
DEBUG [usage.parser.VPNUserUsageParser] (Usage-Job-1:null) (logid:) Parsing all VPN user usage events for account: 2
DEBUG [usage.parser.VPNUserUsageParser] (Usage-Job-1:null) (logid:) No VPN user usage events for this period
DEBUG [usage.parser.VMSnapshotUsageParser] (Usage-Job-1:null) (logid:) Parsing all VmSnapshot volume usage events for account: 2
DEBUG [usage.parser.VMSnapshotUsageParser] (Usage-Job-1:null) (logid:) No VM snapshot usage events for this period
DEBUG [usage.parser.VMInstanceUsageParser] (Usage-Job-1:null) (logid:) Parsing all VMInstance usage events for account: 3
DEBUG [usage.parser.NetworkUsageParser] (Usage-Job-1:null) (logid:) Parsing all Network usage events for account: 3
DEBUG [usage.parser.VmDiskUsageParser] (Usage-Job-1:null) (logid:) Parsing all Vm Disk usage events for account: 3

Housekeeping of cloud_usage table

To carry out housekeeping of the cloud_usage.cloud_usage table the “RemoveRawUsageRecords” API call can be used to delete all usage entries older than a certain number of dates. Note – since the cloud_usage table only contains completed parsed entries deleting anything from this table will not lead to inconsistencies – rather just cut down on the number of usage records being reported on.

More information can be found in

The following example deletes all usage records older than 5 days:

# cloudmonkey removeRawUsageRecords interval=5
success = true

Regenerating usage data

The CloudStack API also has a call for regenerating usage records – generateUsageRecords. This can be utilised to rerun the usage job in case of job failure. More information can be found in the CloudStack documentation –

Please note the comment on the above documentation page:  “This will generate records only if there any records to be generated, i.e. if the scheduled usage job was not run or failed”. In other words this API call should not be made ad-hoc apart from in this specific situation.

# cloudmonkey generateUsageRecords startdate=2017-09-01 enddate=2017-09-30
success = true

Quota service

Anyone looking through the cloud_usage database will notice a number of quota_* tables. These are not directly linked to the usage service itself, they are rather consumed by the Quota service. This service was created to monitor usage of CloudStack resources based on a per account credit limit and a per resource credit cost.

For more information on the Quota service please refer to the official CloudStack documentation / CloudStack wiki:


The CloudStack usage service can seem complicated for someone just getting started with it. We hope this blog post has managed to explain the background processes and how to get useful data out of the service.

We always value feedback – so if you have any comments or questions around this blog post please feel free to get in touch with the ShapeBlue team.

About The Author

Dag Sonstebo is a Cloud Architect at ShapeBlue, The Cloud Specialists. Dag spends his time designing, implementing and automating IaaS solutions based around Apache CloudStack.