Apache CloudStack on RaspberryPi4 with KVM

IOT  and edge-computing present an unprecedented demand for cloud operators to provide orchestrated environments to devices not usually associated with IaaS platforms. In this article, Rohit Yadav explores the feasibility of building an IaaS orchestrated environment using Raspberry Pi4, a popular single-board ARM64  computer, often found in these use-cases.

Virtualization on Raspberry Pi4

Raspberry Pi4 can run the GNU/Linux kernel with KVM but, by default, KVM is not enabled in the ARM64 pre-built images. I found this during my research where I built Linux kernel with KVM enabled on the ARM64 RPi4 board and shared my findings with the Ubuntu kernel team and influence the build team to have KVM enabled by default. With the changes, now Ubuntu 19.10 ARM64 builds have KVM enabled.

First steps

To get started I had the following:

  • RPi4 board with cortex-72 quad-core 1.5Ghz processor and 4GB RAM.
  • Ubuntu 19.10 arm64 image installed on a Samsung EVO+ 128GB micro sd card (any 4GB+ class 10 u3/v30 sdcard will do).
  • An external USB-based SSD storage with high iops to be used as a primary/secondary storage.

The latest Ubutnu 19.10+ image with the latest linux kernel should have KVM enabled, if not the following arm64 kernel builds may be used: (newer build, fixed USB issue)

After basic installation, check and ensure that 64-bit mode is enabled:

# Edit mount-img/boot/firmware/config.txt and check/add:
# Save file and reboot RPi4

Ensure that KVM is available at /dev/kvm or by running kvm-ok.

Next, install basic packages and setup time:

apt-get install ntpdate openssh-server sudo vim htop tar iotop
ntpdate # update time
hostnamectl set-hostname cloudstack-mgmt

Disable automatic upgrades and unnecessary packages:

apt-get remove --purge unattended-upgrades snapd cloud-init
# Edit the file: /etc/apt/apt.conf.d/20auto-upgrades to the following
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Unattended-Upgrade "1";

Allow the root user for ssh access using password, fix /etc/ssh/sshd_config. Change and remember the root password:

passwd root

Reduce load on the RaspberryPi4 sdcard by changing how soon changes are committed to the disk, for example in /etc/fstab:

LABEL=writable  /        ext4   defaults,commit=60      0 0
LABEL=system-boot       /boot/firmware  vfat    defaults        0       1

CloudStack Support

CloudStack by default does not work with ARM64, this guide was based on a custom build that used this patch/pull-request:

Note: The PR has been merged CloudStack will support ARM64/RPi4 to be added as KVM host.

Setup Networking

Next, we need to setup host networking using Linux bridges that can handle CloudStack’s public, guest, management and storage traffic. For simplicity, we will use a single bridge cloudbr0 to be used for all traffic types on the same physical network. Install bridge utilities:

apt-get install bridge-utils

Note: This guide assumes that you’re in a network which is a typical RFC1918 private network.

Ubuntu 19.10

Starting Ubuntu bionic, admins can use netplan to configure networking. The default installation creates a file at /etc/netplan/50-cloud-init.yaml which you can comment, and create a file at /etc/netplan/01-netcfg.yaml applying your network specific changes:

   version: 2
   renderer: networkd
       dhcp4: false
       dhcp6: false
       optional: true
       addresses: []
         addresses: []
       interfaces: [eth0]
       dhcp4: false
       dhcp6: false
         stp: false
         forward-delay: 0

Tip: If you want to use VXLAN based traffic isolation, make sure to increase the MTU setting of the physical nics by 50 bytes (because VXLAN header size is 50 bytes). For example:

        macaddress: 00:01:2e:4f:f7:d0
      mtu: 1550
      dhcp4: false
      dhcp6: false
      mtu: 1550

Save the file and apply network config, finally reboot:

netplan generate
netplan apply

CloudStack Management Server Setup

Install CloudStack management server and MySQL server: (run as root)

apt-key adv --keyserver --recv-keys BDF0E176584DF93F
echo deb / > /etc/apt/sources.list.d/cloudstack.list
apt-get update
apt-get install mariadb-server

Note: the repository was added to just get the base packages, custom built 4.13 deb packages were then used to manually install the cloudstack-management, cloudstack-common and cloudstack-agent. In future, the support will be added and no custom builts will be necessary.

Make a note of the MySQL server’s root user password. Configure InnoDB settings in /etc/mysql/mariadb.conf.d/50-server.cnf:


server_id = 1
binlog-format = 'ROW'

Restart database:

systemctl restart mariadb

Installing management server may give dependency errors, so download and manually install:

apt-get install cloudstack-common libslf4j-java
dpkg -i libmysql-java_5.1.45-1_all.deb
apt-get download cloudstack-management
dpkg -i cloudstack-management*.deb
# edit /var/lib/dpkg/status and remove `mysql-client` from cloudstack-management section/dependencies
apt-get install -f
systemctl stop cloudstack-management # stop the automatic start after install

Setup database:

cloudstack-setup-databases cloud:cloud@localhost --deploy-as=root: -i

Storage Setup

In my setup, I’m using an external USB SSD for storage. I plug in the USB SSD storage, with the partition formatted as ext4, here’s the /etc/fstab:

LABEL=writable  /        ext4   defaults        0 0
LABEL=system-boot       /boot/firmware  vfat    defaults        0       1
UUID="91175b3a-ee2c-47a7-a1e5-f4528e127523" /export ext4 defaults 0 0

Then mount the storage as:

mkdir -p /export
mount -a

Install NFS server:

apt-get install nfs-kernel-server quota

Create exports:

echo "/export  *(rw,async,no_root_squash,no_subtree_check)" > /etc/exports
mkdir -p /export/primary /export/secondary
exportfs -a

Configure and restart NFS server:

sed -i -e 's/^RPCMOUNTDOPTS="--manage-gids"$/RPCMOUNTDOPTS="-p 892 --manage-gids"/g' /etc/default/nfs-kernel-server
sed -i -e 's/^STATDOPTS=$/STATDOPTS="--port 662 --outgoing-port 2020"/g' /etc/default/nfs-common
echo "NEED_STATD=yes" >> /etc/default/nfs-common
sed -i -e 's/^RPCRQUOTADOPTS=$/RPCRQUOTADOPTS="-p 875"/g' /etc/default/quota
service nfs-kernel-server restart

Seed systemvm template:

/usr/share/cloudstack-common/scripts/storage/secondary/cloud-install-sys-tmplt \
          -m /export/secondary -f systemvmtemplate- -h kvm \
          -o localhost -r cloud -d cloud

Setup KVM host

Install KVM and CloudStack agent, configure libvirt:

apt-get install qemu-kvm cloudstack-agent

By default due to platform/version issue cloudstack-agent will fail to start, please copy the following jars from the upstream project to /usr/share/cloudstack-agent/lib:


And remove the following:

rm -f /usr/share/cloudstack-agent/lib/jna-4.0.0.jar

Fix the following in /etc/cloudstack/agent/


Enable VNC for console proxy:

sed -i -e 's/\#vnc_listen.*$/vnc_listen = ""/g' /etc/libvirt/qemu.conf

Enable libvirtd in listen mode:

sed -i -e 's/.*libvirtd_opts.*/libvirtd_opts="-l"/' /etc/default/libvirtd

Configure default libvirtd config:

echo 'listen_tls=0' >> /etc/libvirt/libvirtd.conf
echo 'listen_tcp=1' >> /etc/libvirt/libvirtd.conf
echo 'tcp_port = "16509"' >> /etc/libvirt/libvirtd.conf
echo 'mdns_adv = 0' >> /etc/libvirt/libvirtd.conf
echo 'auth_tcp = "none"' >> /etc/libvirt/libvirtd.conf
systemctl restart libvirtd

Ensure the following options in the /etc/cloudstack/agent/


Configure Firewall

Configure firewall:

# configure iptables
iptables -A INPUT -s $NETWORK -m state --state NEW -p udp --dport 111 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 111 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 2049 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 32803 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p udp --dport 32769 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 892 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 875 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 662 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 8250 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 9090 -j ACCEPT
iptables -A INPUT -s $NETWORK -m state --state NEW -p tcp --dport 16514 -j ACCEPT

apt-get install iptables-persistent

# Disable apparmour on libvirtd
ln -s /etc/apparmor.d/usr.sbin.libvirtd /etc/apparmor.d/disable/
ln -s /etc/apparmor.d/usr.lib.libvirt.virt-aa-helper /etc/apparmor.d/disable/
apparmor_parser -R /etc/apparmor.d/usr.sbin.libvirtd
apparmor_parser -R /etc/apparmor.d/usr.lib.libvirt.virt-aa-helper

Launch Management Server

Start your cloud:

systemctl status cloudstack-management
tail -f /var/log/cloudstack/management/management-server.log

After management server is UP, proceed to and log in using the default credentials – username admin and password password.

Example: Advanced Zone Deployment

The following is an example of how you can setup an advanced zone in the network.

Setup Zone

Go to Infrastructure > Zone and click on add zone button, select advanced zone and provide following configuration:

Name - any name
Public DNS 1 -
Internal DNS1 -
Hypervisor - KVM

Setup Network

Use the default, which is VLAN isolation method on a single physical nic (on the host) that will carry all traffic types (management, public, guest etc).

Note: If you’ve iproute2 installed and host’s physical NIC MTUs configured, you can used VXLAN as well.

Public traffic configuration:

Gateway -
Netmask -
VLAN/VNI - (leave blank for vlan://untagged or in case of VXLAN use vxlan://untagged)
Start IP -
End IP -

Pod Configuration:

Name - any name
Gateway -
Start/end reserved system IPs - -

Guest traffic:

VLAN/VNI range: 500-800

Add Resources

Create a cluster with following:

Name - any name
Hypervisor - Choose KVM

Add your default/first host:

Hostname -
Username - root
Password - 

Note: root user ssh-access is disabled by default, please enable it.

Add primary storage:

Name - any name
Scope - zone-wide
Protocol - NFS
Server -
Path - /export/primary

Add secondary storage:

Provider - NFS
Name - any name
Server -
Path - /export/secondary

Next, click Launch Zone which will perform following actions:

Create Zone
Create Physical networks:
  - Add various traffic types to the physical network
  - Update and enable the physical network
  - Configure, enable and update various network provider and elements such as the virtual network element
Create Pod
Configure public traffic
Configure guest traffic (vlan range for physical network)
Create Cluster
Add host
Create primary storage (also mounts it on the KVM host)
Create secondary storage
Complete zone creation

Finally, confirm and enable the zone. Wait for the system VMs to come up, then you can proceed with your IaaS usage. You can build your own template or test the guest templates at:


In conclusion, this demonstrates that Raspberry Pi4 and other newer ARM64 IoT boards can in fact run Apache CloudStack and serve as KVM hosts and allow deploying an ARM64 based IaaS cloud for CloudStack and arm64-apps dev-testing, IoT, edge-computing and niche use-cases.

About the author

Rohit Yadav is a Software Architect at ShapeBlue, the Cloud Specialists, and is a committer and PMC member of Apache CloudStack. Rohit spends most of his time designing and implementing features in Apache CloudStack.


In the previous two parts of this article series, we have covered the complete Ceph installation process and implemented Ceph as an additional Primary Storage in CloudStack. In this final part, I will show you some examples of working with RBD images, and will cover some Ceph specifics, both in general and related to the CloudStack.

RBD image manipulations

In case you need to do some low-level client support, you can even try to mount that image as the local disk on any KVM (or Ceph) node. For this purpose, we use a tool, conveniently named, “rbd”, which is used to operate RBD images in general (i.e. to create new images, snapshots, clones, delete images, etc.).

From any KVM node, let’s attempt to use rbd kernel client to mount an image:

[root@kvm1 ~]# rbd map cloudstack/ad9a6725-4a65-4c8b-b60a-843ed88618be
rbd: sysfs write failed
RBD image feature set mismatch. Try disabling features unsupported by the kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

As you can, see the above map command will fail. This happens as we are running on default kernel 3.x on the latest CentOS 7.6 and some of the new Ceph image features are not supported by the kernel client. Actually, even upgrading kernel to 5.0 will not bring Ceph kernel client to a usable state for our Mimic (or even Luminous) cluster – so one could wonder why kernel client exists at all….

So what we should do is to use rbd-nbd tool to map an image. Rbd-nbd is a client for RADOS block device (RBD) images similar to rbd kernel module, but unlike the rbd kernel module (which communicates with Ceph cluster directly), rbd-nbd uses NBD (generic block driver in kernel) to convert read/write requests to proper commands that are sent through network using librbd (user space client).

So, as stated, we are using librbd which is always on par with the cluster capabilities/features. But here the fun begins – the NBD kernel driver is not available by default with CentOS 7 (Red Hat decided to not include it with their kernel), so you have a couple of options: either you can rebuild the specific kernel version from the official kernel sources and extract the NBD kernel module or you can upgrade a kernel to the one provided by a well-known Elrepo repository, which I find simpler and easier to manage in the long run. For those of you possibly not familiar with Elrepo repository, this is a community repository, providing many different kinds of additional packages for Centos/RHEL, including fresh kernel versions – Elrepo kernel packages are built from the official sources while the kernel configuration is based upon the default RHEL configuration with added functionality enabled as appropriate.

A simple way to move to Elrepo kernel would be as following:

rpm --import
yum install
yum --enablerepo=elrepo-kernel install kernel-lt

If the above URLs become invalid, please find the new one at

Now that we’ve got the new kernel installed, let’s check the order of kernels offered for boot:

[root@kvm1 ~]# awk -F\' /^menuentry/{print\$2} /etc/grub2.cfg
CentOS Linux (4.4.178-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux (3.10.0-957.10.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-693.11.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-327.el7.x86_64) 7 (Core)
CentOS Linux (0-rescue-76b17df4966743ce9a20fe9a7098e2b6) 7 (Core)

Above we see our new kernel (version 4.4) on position 0, so let’s set it as the default one, and reboot:

grub2-set-default 0

Finally, with the NBD driver in place (as part of new kernel), let’s install rbd-nbd and mount our image:

[root@kvm1 ~]#  yum install rbd-nbd -y 
[root@kvm1 ~]#  rbd-nbd map cloudstack/ad9a6725-4a65-4c8b-b60a-843ed88618be

The above map command completed successfully (make sure that the image/volume is not mounted elsewhere to avoid file system corruption) and we can now operate /dev/nbd0 as any other locally attached drive, i.e.:

[root@kvm1 Ceph]# fdisk -l /dev/nbd0
Disk /dev/nbd0: 5368 MB, 5368709120 bytes, 10485760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4194304 bytes / 4194304 bytes
Disk label type: dos
Disk identifier: 0x264c895d
     Device Boot      Start         End      Blocks   Id  System
/dev/nbd0p1            8192    10485759     5238784   83  Linux

You can also show any other mapped images (if any):

[root@kvm1 Ceph]# rbd-nbd list-mapped 
id    pool       image                                snap device
11371 cloudstack ad9a6725-4a65-4c8b-b60a-843ed88618be -    /dev/nbd0

When done with this, unmap the RBD image with:

[root@kvm1 Ceph]# rbd-nbd unmap /dev/nbd0 

Alternatively, just to make the above exercise more complete, we can also attach the RBD image to our host with qemu-nbd (as you can guess this also requires NBD kernel module, a.k.a. newer kernel):

qemu-nbd --connect=/dev/nbd0 rbd:cloudstack/ad9a6725-4a65-4c8b-b60a-843ed88618be
qemu-nbd --disconnect /dev/nbd0

Again, the above tool talks to librbd, which relies on ceph.conf and admin key being present in /etc/ceph/ folder.

RBD image manipulations, for real this time

Away from the NBD magic and back to “rbd” tool – let’s briefly show some usage examples to manipulate RBD images.

At this point, I would expect you to be able to create a Compute Offering of your own, targeting Ceph as Primary Storage pool (similarly to how we created a Disk offering with “RBD” tag) and create a VM from a template. Assuming you have done so, let’s examine this VM’s ROOT image on Ceph.

Let’s list all volumes in our Ceph cluster:

[root@ceph1 ~]# rbd -p cloudstack ls

In my example above (an empty cluster), I have only 2 images present, so let’s examine them:

[root@ceph1 ~]# rbd info cloudstack/d9a1586d-a30b-4c52-99cc-c5ee6433fe18
rbd image 'd9a1586d-a30b-4c52-99cc-c5ee6433fe18':
        size 50 MiB in 13 objects
        order 22 (4 MiB objects)
        id: 1b1d26b8b4567
        block_name_prefix: rbd_data.1b1d26b8b4567
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        create_timestamp: Mon Apr  8 21:40:21 2019

[root@ceph1 ~]# rbd info cloudstack/fb3ee723-5e4e-48b3-ad7d-936162656cb4
rbd image 'fb3ee723-5e4e-48b3-ad7d-936162656cb4':
        size 50 MiB in 13 objects
        order 22 (4 MiB objects)
        id: 1b250327b23c6
        block_name_prefix: rbd_data.1b250327b23c6
        format: 2
        features: layering
        create_timestamp: Mon Apr  8 21:50:10 2019
        parent: cloudstack/d9a1586d-a30b-4c52-99cc-c5ee6433fe18@cloudstack-base-snap

What we see above is the first image of 50 MB in size (here I’m using a very small template from our community friends at For the second image, we see that it has it’s “parent” set to the first image. What is happening here is that CloudStack will copy over the template from Secondary Storage (creating image d9a1586d-a30b-4c52-99cc-c5ee6433fe18), it will then create a snapshots from this image (“cloudstack-base-snap”, as shown above) and protect it (required from Ceph side), and then create an image clone ( image fb3ee723-5e4e-48b3-ad7d-936162656cb4) with a parent image being the previously created/protected snapshot. This is effectively a linked clone setup, in Ceph’s world.

Let’s quickly emulate above behavior manually – we will just create an empty file instead of copying a real template from Secondary Storage pool:

rbd create -p cloudstack mytemplate --size 100GB
rbd snap create cloudstack/mytemplate@cloudstack-base-snap
rbd snap protect cloudstack/mytemplate@cloudstack-base-snap
rbd clone cloudstack/mytemplate@cloudstack-base-snap cloudstack/myVMvolume

Finally, let’s check our “myVMvolume” image:

[root@ceph1 ~]# rbd info cloudstack/myVMvolume
rbd image 'myVMvolume':
        size 100 GiB in 25600 objects
        order 22 (4 MiB objects)
        id: fcba6b8b4567
        block_name_prefix: rbd_data.fcba6b8b4567
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        create_timestamp: Mon Apr  8 22:08:00 2019
        parent: cloudstack/mytemplate@cloudstack-base-snap
        overlap: 100 GiB

The reason we have above protected a snapshots of the main template, is that it makes it impossible to delete this template file and thus cause major damage (effectively destroy all VMs from this template). So in order to revert the exercise from above, we would need to first remove a clone (VM image), unprotect and delete the snapshot, and finally delete the “template” image:

[root@ceph1 ~]# rbd rm cloudstack/myVMvolume
Removing image: 100% complete...done.
[root@ceph1 ~]# rbd snap unprotect cloudstack/mytemplate@cloudstack-base-snap
[root@ceph1 ~]# rbd snap rm cloudstack/mytemplate@cloudstack-base-snap
Removing snap: 100% complete...done.
[root@ceph1 ~]# rbd rm cloudstack/mytemplate
Removing image: 100% complete...done.

As you can see from above, the “rbd” tool is the tool to use to manipulate RBD images – for many other examples, please see

In the next sections, I will try to cover a bit of what to expect, dos and don’ts of using Ceph cluster – again, this is not meant to be a comprehensive guide, since CloudStack storage feature sets related to Ceph and also Ceph itself is constantly changing and improving.

Cloudstack with Ceph

CloudStack has supported Ceph for many years now and though in the early days, many of the CloudStack’s storage features were not on pair with NFS, with the wider adoption of Ceph as the Primary Storage solution for CloudStack, most of the missing features have been implemented. That being said, there are still some minor specifics with Ceph:

  • In contrast to NFS, which provides a file system, Ceph provides raw block devices for VMs and thus it’s not possible to make a full VM snapshot the way it can be done with NFS – i.e. there is no filesystem to which to write the content of the RAM memory or other metadata needed.
  • Continuing on above, since no file system, there is (currently) no way to write a heartbeat file the way it’s implemented on NFS (but some work is currently being planned around this)
  • Speaking of a simple volume snapshot, it’s currently not possible to really restore a volume from a snapshot (which became possible in 4.11 with NFS or SolidFire Managed Storage). But this is rather a simple thing to implement.
  • By using Ceph with KVM and CloudStack, there are two external libraries used – librbd (a user-space client, used by Qemu to speak to Ceph cluster) and rados-java (a Java wrapper for librados). Historically, there have been some issues/bugs with rados-java (though they have been resolved long time ago), so you should probably keep an eye on these two and make sure they are up to date.

Learning curve

Ceph can be considered a rather complex storage system to comprehend and it definitively has a steep learning curve. Even when using Ceph on its own (i.e. not with CloudStack), make sure you know the storage system well before relying on it in production and also make sure to be able to troubleshoot problematic situations when they arise. Anyone prepared to put in a bit of effort and planning can deploy a nice Ceph cluster, but it takes skills and a deeper understanding of how things work under the hood when it comes to troubleshooting unusual situations or how (for example) replacing failed disks (or nodes) can influence performance of the clients (in our case, CloudStack VMs). Based on my previous experience managing a production Ceph cluster, I would definitively strongly suggest taking the above recommendations very seriously. On the other hand, experimenting with such an advanced storage solution can be really interesting and rewarding.

Performance considerations

Ceph is a great distributed storage solution that performs very well under sequential IO workload, can scale indefinitely and has a very interesting architecture. However, as with any distributed storage solution, when you write data to a cluster, it actually takes time to write that data to first node, have that write operation replicated to other 2 nodes (replica size 3) and send back ACK to the client that the write is successful. In case you are using NVME devices, like some users in community, you can expect very low latencies in the ballpark of around 0.5ms, but in case you opt out for a HDD based solution (with journals on SSDs) as many users do when starting their Ceph journey, you can expect 10-30ms of latency, depending on many factors (cluster size, networking latency, SSD/journal latency, and so on). In other words, don’t expect miracles with commodity hardware – if something “works on commodity hardware”, that doesn’t mean it also performs well. Actually, until 2-3 years ago, Ceph was (unofficially) considered unsuitable for any more serious random IO workload, which is backed up by referenced guides published between major HW vendors and RedHat, where they clearly stated that Ceph is not the “best choice” when it comes to random IO. In contrast to sequential IO workload benchmarks (where Ceph actually shines), these reference guides were completely missing any benchmark data with random IO workload/pattern. That being said, with the introduction of BlueStore as the new storage backend in recent Ceph releases and with some more architectural changes, Ceph has become much more suitable for pure SSD (and NVME) clusters and performance has improved drastically.

I hope this Ceph article series has been useful and interesting and helped you get up and running. All in all Ceph is an interesting development in the storage space and can provide a true cloud storage solution if implemented correctly.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.

In the previous article we covered some basics around Ceph and deployed a working Ceph cluster. In this article, we are going to finish the Ceph configuration needed for CloudStack and add it as a new Primary Storage pool. We are also going to deploy Ceph volumes via CloudStack and examine them. Finally, in part 3 (to be published soon), I will show you some examples of working with RBD images and will cover some Ceph specifics, both in general and related to the CloudStack.

Before proceeding with the actual work, let me first mention that CloudStack supports Ceph with KVM only, so most of the work we do below is KVM related. Let’s define the high-level steps to be done:

  • Create a dedicated RBD pool for CloudStack in which all RBD images (volumes) will be created
  • Create a dedicated authentication key for the previously created pool
  • Update / install required Ceph binaries on KVM nodes
  • Add Ceph as Primary Storage in CloudStack
  • Implement custom storage tag for Ceph Primary Storage
  • Create new Compute / Disk offerings with same storage tag in order to target Ceph

From any Ceph node…

Ceph groups RBD (RADOS block device) images in pools and manages authentication on a per pool level. Each image is collection of many RADOS objects, with each object having a default size of 4MB (configurable per image). At this moment we have no pools created. But before creating a pool, let’s go through some basics around the different kind of pools in Ceph.

There are 2 kind of pools, based on the way the objects are stored across cluster:

  • Replicated – makes sure that there are always total of N replicas/copies of an object
  • Erasure Coding – simplest way to think of this is a network RAID 5/6

Replicated pools are used for better performance at the expense of space consumption, and you can think of it as a network-based RAID 1, where we have n number of replicas of an object. On the other hand, erasure coding pools are usually used when using Ceph for S3 Object Storage purposes and for more space efficient storage where bigger latency and lower performance is acceptable, since it is similar to RAID 5 or RAID 6 (requires some computation power). Here, for example, we may have 4 chunks of actual data and 2 parity chunks (EC 4+2), with just 50% of space overhead, while (depending on the setup), we can still survive losing a Ceph node or even two.

So, let’s create a dedicated pool for CloudStack, set its replica size and finally initialize it:

ceph osd pool create cloudstack 64 replicated
ceph osd pool set cloudstack size 3
rbd pool init cloudstack

The commands above will create a replicated pool named “cloudstack” with total of 64 placement groups (more info on placement groups here) with a replica size of 3, which is recommended for a production cluster. Optionally, you can set replica size of 2 during testing, for somewhat increased performance and less space consumed on the cluster.

Next, let’s generate a dedicated authentication key for our CloudStack pool:

ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack'

The command above will output a key to STDOUT only – please save the given key, since we will use it when adding Ceph to CloudStack later:

key = AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

Now that the pool for CloudStack is ready, we need to prepare KVM nodes with proper Ceph binaries as well as the write-back caching configuration.

From the Ceph admin node…

Starting from Centos 7.2  (and Ubuntu 14.04), libvirt / QEMU comes by default with support for RBD, so there’s no need to compile the binaries yourself. That being said, if we check KVM nodes with “rpm -qa | grep librbd1″ it will return an existing versions of ‘librbd1” package (version 10.2.5 in my case)  already installed, but most certainly it will not be the current version that corresponds to the cluster version we just installed (13.2.5 in this case). For the record, librbd is a user space Ceph client, to which the qemu / libvirt talks effectively.

Furthermore, if we run command “ceph features” from any Ceph node, it will return (in our fresh Mimic cluster) “luminous” as the minimum compatible release version for the client – that means that our Ceph client (librbd) needs to be of a minimum of “luminous” version (which translates to 12.2.0), but our current librbd version is 10.2.5 – so let’s upgrade it to same Mimic versions as the version of our cluster:

ceph-deploy install --cli kvm1  kvm2

The command above will add Mimic repo to my two KVM nodes and install only the cli binaries (“ceph-common” package). This will also trigger the upgrade of existing “librbd1” package to the correct version. In addition, please make sure that name resolution of the KVM nodes works from the Ceph admin node.

Optionally, if you don’t want to install Ceph cli tools on KVM nodes, you can just upgrade the “librbd1” package while having previously created a proper Ceph Mimic repository on each KVM node (i.e. clone repo file from any Ceph cluster node).

Some of you might want to be able to manage Ceph cluster from KVM nodes as well (beside being able to manage it from Ceph nodes) and to be able to interact with RBD images with via “rbd” or “qemu-img“ tools – in this case we need this “rbd” tool installed on KVM nodes (part of “ceph-common” package, already installed in previous step), then we need ceph.conf locally on KVM nodes in order for the “rbd” tool to know how to connect to cluster, which MONs to target, etc. and finally we need the admin authentication key – this is the file “ceph.client.admin.keyring” which was created on our Ceph admin node when we created our cluster initially (in folder /root/CEPH-CLUSTER, as mentioned in Part 1 of this article series).

Additionally, if we want to use qemu-img tool to examine RBD images, we can either have qemu-img installed on the Ceph cluster nodes or we have to provide the above mentioned ceph.conf and admin keys in their default location (/etc/ceph/) on the KVM nodes, where librbd (client) will pick them up automatically, so we don’t need to specify MON IP/URL and admin key on the command line.

If you don’t want to be able to manage your Ceph cluster from KVM nodes, simply don’t copy over the “ceph.client.admin.keyring” file to KVM nodes. The ceph.conf file is still a must due to RBD caching as explained later. I have decided to make my KVM nodes happy by providing them with ceph.conf and admin keys, as below:

ceph-deploy admin kvm1 kvm2

The command above will effectively just copy ceph.conf and ceph.client.admin.keyring files to /etc/ceph/ folder on KVM nodes. Actually, you can still operate RBD images and manage your cluster from KVM nodes even if you don’t have ceph.conf and admin key present locally – you can always pass required parameters on the command line to “rbd” or “qemu-img” tools, as shown later.

RBD caching

After we have pushed the ceph.conf file to KVM nodes, librbd will read it for any configuration directives under the “[client]” section of that file (beside the other sections), but that section is missing at this moment!

Before we proceed into configuring the RBD caching, let me do here a copy/paste from the original docs that is important to understand regarding RBD caching:

” The user space implementation of the Ceph block device (i.e., librbd) cannot take advantage of the Linux page cache, so it includes its own in-memory caching, called “RBD caching.” RBD caching behaves just like well-behaved hard disk caching. When the OS sends a barrier or a flush request, all dirty data is written to the OSDs. This means that using write-back caching is just as safe as using a well-behaved physical hard disk with a VM that properly sends flushes (i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) algorithm, and in write-back mode it can coalesce contiguous requests for better throughput. “

After digesting the above info, we can proceed into a brief configuration of caching. We can either fix it manually on each KVM node by adding the missing section in ceph.conf file, or we can do it in a more proper way by changing ceph.conf on the Ceph admin node and then pushing new file version to all KVM (and optionally Ceph cluster) nodes:

cat << EOM >> /root/CEPH-CLUSTER/ceph.conf
  rbd cache = true
  rbd cache writethrough until flush = true
ceph-deploy --overwrite-conf admin kvm1 kvm2

Please note the above “writethrough until flush = true”. This is a safety mechanism which will force writethrough cache mode until it receives the very first flush request from the VM OS (which means that the OS is sending proper flush requests to the underlying storage, i.e. kernel >= 2.6.32) and then cache mode will change to the write-back, which actually brings performance benefits.

If case you want to play more with RBD caching, please see here – where you can find some important default values which we didn’t explicitly configure i.e. default rbd cache size is 32 MB (this is per volume) – so in case of 50 VMs with 4 volumes each, that translates to 50 x 4 x 32MB =  6.4GB of additional RAM consumed on a KVM host – keep that in mind !

Finally, let’s add the Ceph to CloudStack as an additional Primary Storage – we can do it via GUI or optionally via CloudMonkey (API) as following:


Or via CloudMonkey:

create storagepool scope=zone zoneid=3c764ee1-6590-417d-b873-f073d0c550be hypervisor=KVM name=MyCephCluster provider=Defaultprimary url=rbd://cloudstack:AQAFSZpc0t-BIBAAO95rOl+jgRwuOopojEtr_g==@ tags=RBD

Most of the parameters are self-explanatory but let’s explain a few of them:

  • RADOS Monitor: This is the IP address (or DNS name) of the Ceph Monitor (MON) instance – in my case I have defined a very first MON instance (IP address of the Ceph1 node from my cluster) – but in production environment you will want to have an internal Round Robin DNS setup on some internal DNS server (i.e. single zone on Bind) – such that KVM nodes will resolve the ULR (i.e. mon.myceph.cluster) in a round robin fashion to multiple MON instances – this is the way to achieve high availability of Ceph MONs, though some manual DNS zone changes are needed in case of prolonged MON maintenance
  • RADOS Pool: This is the pool “cloudstack” which we created in the beginning of the article
  • RADOS User and RADOS Secret: This are the values from the authentication key which we generated in the beginning of the article, shown below again for your convenience

key = AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

The above command, used to add Ceph to CloudStack, will effectively do a few things:

  • On each KVM node, it will create a new storage pool in libvirt
  • The storage pool definition files (xml and the secret) will be written to /etc/libvirt/secrets/ folder as shown below
  • Every time CloudStack Agent is restarted, it will recreate the Ceph storage pool (even if you manually remove the files below)

[root@kvm1]# cat /etc/libvirt/secrets/ef9cfd17-abe1-343d-97a0-cee6c71a6dad.xml
<secret ephemeral='no' private='no'>
  <usage type='ceph'>

[root@kvm1]# cat /etc/libvirt/secrets/ef9cfd17-abe1-343d-97a0-cee6c71a6dad.base64

If we check the libvirt pool created above, we can see that it’s not persistent and it doesn’t start automatically – i.e. when you restart libvirt alone, it will not recreate / start the Ceph storage pool in libvirt– the CloudStack agent is the one doing this for us:

virsh # pool-info ef9cfd17-abe1-343d-97a0-cee6c71a6dad
Name:           ef9cfd17-abe1-343d-97a0-cee6c71a6dad
UUID:           ef9cfd17-abe1-343d-97a0-cee6c71a6dad
State:          running
Persistent:     no
Autostart:      no
Capacity:       299.99 GiB
Allocation:     68.19 MiB
Available:      286.02 GiB

Note that in the example above, I was actually using DNS name for the Ceph MON (ceph1.local) instead of the IP – Ceph MON’s DNS name is resolved to IP both when you add Ceph to CloudStack and every time you start a VM or attach new volume, etc. – so DNS resolution needs to be fast and stable here.

Now that we added Ceph to CloudStack, let’s create a Data disk offering with tag “RBD” – this will make sure that any new volume from this offering is created on storage pool with tag “RBD” – which is Ceph in our case . Here, we are using storage tags to avoid messing up with your existing CloudStack installation – but it’s not required otherwise:

(localcloud) SBCM5> > create diskoffering name=5GB-Ceph displaytext=5GB-Ceph storagetype=shared provisioningtype=thin customized=false disksize=5 tags=RBD
  "diskoffering": {
    "created": "2019-03-26T19:27:32+0000",
    "disksize": 5,
    "displayoffering": true,
    "displaytext": "5GB-Ceph",
    "id": "2c74becc-c39d-4aa8-beec-195b351bdaf0",
    "iscustomized": false,
    "name": "5GB-Ceph",
    "provisioningtype": "thin",
    "storagetype": "shared",
    "tags": "RBD"

Note the offering ID from above (2c74becc-c39d-4aa8-beec-195b351bdaf0) – and let’s create a disk from it:

(localcloud) SBCM5> > create volume diskofferingid=2c74becc-c39d-4aa8-beec-195b351bdaf0 name=MyFirstCephDisk zoneid=3c764ee1-6590-417d-b873-f073d0c550be
  "volume": {
    "account": "admin",
    "created": "2019-03-26T19:52:05+0000",
    "destroyed": false,
    "diskofferingdisplaytext": "5GB-Ceph",
    "diskofferingid": "2c74becc-c39d-4aa8-beec-195b351bdaf0",
    "diskofferingname": "5GB-Ceph",
    "displayvolume": true,
    "domain": "ROOT",
    "domainid": "401ce404-44c1-11e9-96c5-1e009001076e",
    "hypervisor": "None",
    "id": "47b1cfe5-6bab-4506-87b6-d85b77d9b69c",
    "isextractable": true,
    "jobid": "49a682ab-42f9-4974-8e42-452a13c97553",
    "jobstatus": 0,
    "name": "MyFirstCephDisk",
    "provisioningtype": "thin",
    "quiescevm": false,
    "size": 5368709120,
    "state": "Allocated",
    "storagetype": "shared",
    "tags": [],
    "type": "DATADISK",
    "zoneid": "3c764ee1-6590-417d-b873-f073d0c550be",
    "zonename": "ref-trl-1019-k-M7-apanic"

Finally, since volume creation is a lazy provisioning process (i.e. volume is created in DB only, not really on storage pool), let’s attach the disk to a running VM (using volume ID “47b1cfe5-6bab-4506-87b6-d85b77d9b69c” from previous command output), which will trigger the actual disk creation on our Ceph cluster (output shortened for brevity):

(localcloud) SBCM5> > attach volume id=47b1cfe5-6bab-4506-87b6-d85b77d9b69c virtualmachineid=19a67e20-c747-43bb-b149-c2b2294002f9
  "volume": {
    "jobstatus": 0,
    "name": "MyFirstCephDisk",
    "path": "47b1cfe5-6bab-4506-87b6-d85b77d9b69c",
    …  }

Note the “path” output field (which is usually the same as the ID of the volume, except in some special cases) – and let’s check our Ceph cluster if we can find this volume and check it’s properties.

From any KVM node…

[root@kvm1 ~]# rbd ls -p cloudstack
[root@kvm1 ~]# rbd info cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
rbd image '47b1cfe5-6bab-4506-87b6-d85b77d9b69c':
        size 5 GiB in 1280 objects
        order 22 (4 MiB objects)
        id: d43b4c04a8af
        block_name_prefix: rbd_data.d43b4c04a8af
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        create_timestamp: Tue Mar 28 19:46:32 2019

We can also examine the Ceph RBD image with qemu-img tool:

[root@kvm1 ~]# qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
image: rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
file format: raw
virtual size: 5.0G (5368709120 bytes)
disk size: unavailable

As you can see in qemu-img command above, we did not specify any username and authentication keys, because we have our ceph.conf and the admin key files present in /etc/ceph/ folder. If you decided to opt-out of having these 2 files present on KMV nodes, you will have to use a cumbersome command as below:

qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c:mon_host=

In the above command we are specifying the MON IP address, username and key for authentication.

Now that you got the basics of consuming Ceph from CloudStack, feel free to also create Compute Offerings and System Offerings for Virtual Routers, Secondary Storage VM, Console Proxy VM and experiment with volume migration from i.e. NFS to Ceph. Be sure to have storage tags under control.

I hope that this article series has been interesting so far. In part 3 (which will be the final part), I will show you some examples of working with RBD images and will cover some Ceph specifics, both in general and related to CloudStack.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.

As well as NFS and various block storage solutions for Primary Storage, CloudStack has supported Ceph with KVM for a number of years now. Thanks to some great Ceph users in the community lots of previously missing CloudStack storage features have been implemented for Ceph (and lots of bugs squashed), making it the perfect choice for CloudStack if you are looking for easy scaling of storage and decent performance.

In this and my next article, I am going to cover all steps needed to actually install a Ceph cluster from scratch, and subsequently add it to CloudStack. In this article I will cover installation and basic configuration of a standalone Ceph cluster, whilst in part 2 I will go into creating a pool for a CloudStack installation, adding Ceph to CloudStack as an additional Primary Storage and creating Compute and Disk offerings for Ceph. In part 3, I will also try to explain some of the differences between Ceph and NFS, both from architectural / integration point of view, as well as when it makes sense (or doesn’t) to use it as the Primary Storage solution.

It is worth mentioning that the Ceph cluster we build in this first article can be consumed by any RBD client (not just CloudStack). Although in part 2 we move onto integrating your new Ceph cluster into CloudStack, this article is about creating a standalone Ceph cluster – so you are free to experiment with Ceph.

Firstly, I would like to share some high-level recommendations from very experienced community members, who have been using Ceph with CloudStack for a number of years:

  • Make sure that your production cluster is at least 10 nodes so as to minimize any impact on performance during data rebalancing (in case of disk or whole node failure). Having to rebalance 10% of data has a much smaller impact (and duration) than having to rebalance 33% of data; another reason is improved performance as data is distributed across more drives and thus read / write performance is better
  • Use 10GB networking or faster – a separate network for client and replication traffic is needed for optimal performance
  • Don’t rely on cache tiering, unless you have a very specific IO pattern / use case. Moving data in and out of cache tier can quickly create a bottleneck and do more harm than good
  • If running an older version of Ceph cluster (eg. FileStore based OSD), you will probably place your journals on SSDs. If so, make sure that you properly benchmark SSD for the synchronous IO write performance (Ceph writes to journal devices with O_DIRECT and D_SYNC flags). Don’t try to put too many journals on single SSD; consumer grade SSDs are unacceptable, since their synchronous write performance is usually extremely bad and they have proven to be exceptionally unreliable when used in a Ceph cluster as journal device

Before we continue, let me state that this first article is NOT meant to be a comprehensive guide on Ceph history, theory, installation or optimization, but merely a simple step-by-step guide for a basic installation, just to get us going. Still, in order to be able to better follow the article, it’s good to define some basics around Ceph architecture.

Ceph has a couple of different components and daemons, which serves different purposes, so let’s mention some of these (relevant for our setup):

  • OSD (Object Storage Daemon) – usually maps to a single drive (HDD, SDD, NVME) and it’s the one containing user data. As can be concluded from it’s name, there is a Linux process for each OSD running in a node. A node hosting only OSDs can be considered as a Storage or OSD node in Ceph’s terminology.
  • MON (Monitor daemon) – holds the cluster map(s), which provides to Ceph Clients and Ceph OSD Daemons with the knowledge of the cluster topology. To clarify this further, in the heart of Ceph is the CRUSH algorithm, which makes sure that OSDs and clients can calculate the location of specific chunk of data in the cluster (and connect to specific OSDs for read/write of data), without a need to read it’s position from somewhere (as opposite to a regular file systems which have pointers to the actual data location on a partition).

A couple of other things are worth mentioning:

  • For cluster redundancy, it’s required to have multiple Ceph MONs installed, always aiming for an odd number to avoid a chance of split-brain scenario. For smaller clusters, these could be placed on VMs or even collocated with other Ceph roles (i.e. OSD nodes), though busier clusters will need a dedicated, powerful servers/VMs. In contrast to OSDs, there can be only one MON instance per server/VM.
  • For improved performance, you might want to place MON’s database (LevelDB) on dedicated SSDs (versus the defaults of being placed on OS partition).
  • There are two ways that OSDs can manage the data they store. Starting with the Luminous 12.2.z release, the new default (and recommended) backend is BlueStore. Prior to Luminous, the default (and only option) was FileStore. With FileStore, data is first written to a Journal (which can be collocated with the OSD on same device or it can be a completely separate partition on a faster, dedicated device) and then later committed to OSD. With BlueStore, there is no true Journal per se, but a RocksDB key/value database (for managing OSD’s internal metadata). FileStore OSD will use XFS on top of it’s partition, while BlueStore write data directly to raw device, without a need for a file system. With it’s new architecture, BlueStore brings big speed improvement over FileStore.
  • When building and operating a cluster, you will probably want to have a dedicated server/VM used as the deployment or admin node. This node will host your deployment tools (be it a basic ceph-deploy tool or a full blown ansible playbook), as well as cluster definition and configuration files, which can be changed on central place (this node) and then pushed to cluster nodes as required.

Armed with above knowledge (and against all recommendations given previously) we are going to deploy a very minimalistic installation of Ceph cluster on top of 3 servers (VMs), with 1 volume per node being dedicated for an OSD daemon, and Ceph MONs collocated with the Operating System on the system volume. The reason for choosing such a minimalistic setup is the ability to quickly build a test cluster on top of 3 VMs (which most people will do when building their very first Ceph cluster) and to keep configuration as short as possible. Remember, we just want to be able to consume Ceph from CloudStack, and currently don’t care about performance or uptime / redundancy (beside some basic things, which we will cover explicitly).

Our setup will be as following:

  • We will already have a working CloudStack 4.11.2 installation (i.e. we expect you to have a working CloudStack installation)
  • We will add Ceph storage as an additional Primary Storage to CloudStack and create offerings for it
  • CloudStack Management Server will be used as Ceph admin (deployment) node
  • Management Server and KVM nodes details:
    • CloudStack Management Server: IP
    • KVM host1: IP, hostname “kvm1”
    • KVM host2: IP, hostname “kvm2”
  • Ceph nodes details (dedicated nodes):
    • 2 CPU, 4GB RAM, OS volume 20GB, DATA volume 100GB
    • Single NIC per node, attached to the CloudStack Management Network – i.e. there is no dedicated network for Primary Storage traffic between our KVM hosts and the Ceph nodes
    • Node1: IP, hostname “ceph1”
    • Node2: IP, hostname “ceph2”
    • Node3: IP, hostname “ceph3”
    • Single OSD (100GB) running on each node
    • MON instance running on each node
    • Ceph Mimic (13.latest) release
    • All nodes will be running latest CentOS 7 release, with default QEMU and Libvirt versions on KVM nodes

As stated above Ceph admin (deployment) node will be on CloudStack Management Server, but as you can guess, you can use a dedicated VM/Server for this purpose as well.

Before proceeding with the actual work, let’s define the high-level steps required to deploy a working Ceph cluster

  • Building the Ceph cluster:
    • Setting time synchronization, host name resolution and password-less login
    • Setting up firewall and SELinux
    • Creating a cluster definition file and auth keys on the deployment node
    • Installation of binaries on cluster nodes
    • Provisioning of MON daemons
    • Copying over the ceph.conf and admin keys to be able to manage the cluster
    • Provisioning of Ceph manager daemons (Ceph Dashboard)
    • Provisioning of OSD daemons
    • Basic configuration

We will cover configuration of KVM nodes in second article.

Let’s start!

On all nodes…

It is critical that the time is properly synchronized across all nodes. If you are running on hypervisor, your VMs might already be synced with the host, otherwise do it the old-fashioned way:

ntpdate -s
yum install ntp
systemctl enable ntpd
systemctl start ntpd

Make sure each node can resolve the name of each other node –  if not using DNS, make sure to populate /etc/hosts file properly across all 4 nodes (including admin node):

cat << EOM >> /etc/hosts ceph1 ceph2 ceph3

On CEPH admin node…

We start by installing ceph-deploy, a tool which we will use to deploy our whole cluster later:

cat << EOM > /etc/yum.repos.d/ceph.repo
name=Ceph noarch packages
yum install ceph-deploy -y

Let’s enable password-less login for root account – generate SSH keys and seed public key into /root/.ssh/authorized_keys file on all Ceph nodes (in production environment, you might want to use a user with limited privileges with sudo escalation):

ssh-keygen -f $HOME/.ssh/id_rsa -t rsa -N ''
ssh-copy-id root@ceph1
ssh-copy-id root@ceph2
ssh-copy-id root@ceph3

On all CEPH nodes…

Before beginning, ensure that SELINUX is set to permissive mode and verify that firewall is not blocking required connections between Ceph components:

firewall-cmd --zone=public --add-service=ceph-mon --permanent
firewall-cmd --zone=public --add-service=ceph --permanent
firewall-cmd --reload
setenforce 0

Make sure that you make SELINUX changes permanent, by editing /etc/selinux.config and setting ‘SELINUX=permissive’

As for the firewall, in case you are using different distribution or don’t consume firewalld, please refer to the networking configuration reference at

On CEPH admin node…

Let’s create cluster definition locally on admin node:

ceph-deploy new ceph1 ceph2 ceph3

This will trigger a ssh connection to each of above referenced Ceph nodes (to check for machine platform and IP addresses) and will then write a local cluster definition and the MON auth key in the current folder.  Let’s check the files generated:

# ls -la
-rw-r--r-- ceph.conf
-rw-r--r-- ceph-deploy-ceph.log
-rw------- ceph.mon.keyring

On Centos7, if you get the “ImportError: No module named pkg_resources” error message while running ceph-deploy tool, you might need to install missing packages:

yum install python-setuptools

In case that you have multiple network interfaces on Ceph nodes, you will be required to explicitly define public network (which accepts client’s connections) – in this case edit previously created ceph.conf on the local admin node to include public network setting:

echo "public network =" >> ceph.conf

If you only have one NIC in each Ceph node, the above line is not required.

Still on admin node, let’s start the installation of Ceph binaries across cluster nodes (no services started yet):

 ceph-deploy install ceph1 ceph2 ceph3 

Command above will also output the version of Ceph binaries installed on each node – make sure that you did not get a wrong Ceph version installed due to some other repos present (we are installing Mimic 13.2.5, which is latest as of the time of writing).

Let’s create (initial) MONs on all 3 Ceph nodes:

ceph-deploy mon create-initial

In order to be able to actually manage our Ceph cluster, let’s copy over the admin key and the ceph.conf files to all Ceph nodes:

ceph-deploy admin ceph1 ceph2 ceph3

On any CEPH node…

After previous step, you should be able to issue “ceph -s” from any Ceph node, and this will return the cluster health. If you are lucky enough, your cluster will be in HEALTH_OK state, but it might happen that your MON daemons will complain on time mismatch between the nodes, as following:

[root@ceph1 ~]# ceph -w
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_WARN
            clock skew detected on mon.ceph1, mon.ceph3 

In this case, we should stop NTP daemon, force time update (a few times), and start NTP daemon again – and after doing this across all nodes, it would be required to restart Ceph monitors on each node, one by one (give it a few seconds between restart on different nodes) – below we are restarting all Ceph daemons – which effectively means just MONs since we deployed only MONs so far:

systemctl stop ntpd
ntpdate -s; ntpdate -s; ntpdate -s
systemctl start ntpd
systemctl restart

After time has been properly synchronized (with less then 0.05 seconds of time difference between the nodes), you should be able to see a cluster in HEALTH_OK state, as below:

[root@ceph1 ~]# ceph -s
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_OK

On CEPH admin node…

Now that we are up and running with all Ceph monitors, let’s deploy Ceph manager daemon (Ceph dashboard, that comes with newer releases) on all nodes since they operate in active/standby configuration (we will configure it later):

ceph-deploy mgr create ceph1 ceph2 ceph3

Finally, let’s deploy some OSDs so our cluster can actually hold some data eventually:

ceph-deploy osd create --data /dev/sdb ceph1
ceph-deploy osd create --data /dev/sdb ceph2
ceph-deploy osd create --data /dev/sdb ceph3

Note in commands above, we reference /dev/sdb as the 100GB volume that is used for OSD.

As mentioned previously, newer versions of Ceph (as in our case) will use by default BlueStore as the storage backend, with (by default) collocating block data and RocksDB key/value database (for managing its internal metadata) on the same device (/dev/sdb in our case). In more complex setups, one can choose to separate RockDB DB on faster devices, while block data will remain on slower devices – somewhat similar with the older FileStore setups, where block data would be located on HDDs/SSDs devices, while Journals would be usually placed on SSD/NVME partitions.

On any CEPH node…

After previous step is done, we should get the output similar to below – confirming that we have a 300GB of space available:

[root@ceph1 ~]# ceph -s
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_OK

    mon: 3 daemons, quorum ceph2,ceph1,ceph3
    mgr: ceph1(active)
    osd: 3 osds: 3 up, 3 in

    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   3.0 GiB used, 297 GiB / 300 GiB avail

Finally, let’s enable the Dashboard manager and set the username/password for authentication (which will be encrypted and stored in monitor’s DB) to be able to access it.
In our lab, we will disable SSL connections and keep it simple – but obviously in production environment, you would want to force SSL connections and also install proper SSL certificate:

ceph config set mgr mgr/dashboard/ssl false
ceph mgr module enable dashboard
ceph dashboard set-login-credentials admin password

Let’s login to the Dashboard manager on the active node (ceph1 in our case, as can be seen in the output from “ceph -s” command above):

And there you go – you now have a working Ceph cluster, which concludes part 1 of this Ceph article series. In part 2 (published soon), we will continue our work by creating a dedicated RBD pool and authentication keys for our CloudStack installation, add Ceph to CloudStack, finally consuming it with dedicated Compute / Disk offerings.

It’s worth mentioning that Ceph itself does provided additional services – i.e. it supports S3 object storage (requires installation / configuration of Ceph Object Gateway) as well as POSIX-compliant file system CephFS (requires installation/configuration of Metadata Server), but for CloudStack, we only need Rados Block Device (RBD) services from Ceph.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.

Apache CloudStack is generally considered secure out of the box, however it does have the capability of protecting both system VM traffic as well as management traffic with TLS certificates. Prior to version 4.11 CloudStack used Tomcat as the web server and servlet container. With 4.11 this has been changed to embedded Jetty web server and servlet container that makes CloudStack more secure and independent of distribution provided Tomcat version which could be prone to security issues. This changes the management server’s TLS configuration.

In this blog post we’ll go through how to implement native TLS to protect both the system VMs – CPVM and SSVM – as well as the management server with public TLS certificates.

Please note since SSL is now being deprecated we now refer to the replacement – TLS – throughout this post.


The TLS configuration in CloudStack provides different functionality depending on the system role:

  • TLS protecting the management server is a Jetty configuration which protects the main web GUI and API endpoint only. The configuration of this is handled in the underlying embedded Jetty configuration files.
  • System VMs:
    • The Console Proxy VM TLS configuration provides secure HTTPS connection between the main console screen (your browser) and the CPVM itself.
    • The Secondary Storage VM SSL configuration provides a secure HTTPS connection for uploads and downloads of templates / ISOs / volumes to secondary storage as well as between zones.
    • System VM SSL configuration is carried out through global settings as well as the CloudStack GUI.

System VM HTTPS configuration

Global settings

The following global setting are required configured to allow system VM TLS configuration:

Global settingValue
consoleproxy.url.domaindomain used for CPVM (see below)
consoleproxy.sslEnabledSwitches SSL configuration of the CPVM on / off
Values true | false
secstorage.ssl.cert.domaindomain used for SSVM (see below)
secstorage.encrypt.copySwitches SSL configuration of the SSVM on / off
Values true | false

The URL configurations can take three formats – and these also determine what kind of TLS certificate is required.

  • Blank: if left blank / unconfigured the URLs used for CPVM and SSVM will simply be passed as the actual public IP addresses of the system VMs.
  • Static URL: e.g. or In these cases CloudStack rely on external URL load balancing / redirection and/or DNS resolution of the URL to the IP address of the CPVM or SSVM. This can be achieved in a number of different ways through load balancing appliances or scripted DNS updates.
    This configuration relies on:

    • The same URL used for both CPVM and SSVM, or
    • a multi-domain certificate provided to cover both URLs if different ones are used for CPVM and SSVM.
  • Dynamic URL: e.g. * In this case CloudStack will redirect the connections to the CPVM / SSVM to the URL “” where a/b/c/d represent the IP address, i.e. a real world URL would be
    This relies on two things:

    • DNS name resolution configured for the full public system VM IP range, such that all combinations of “” can be resolved. Please note in CloudStack version 4.11 the public IP range used purely by system VMs can be limited by reserving a subrange of public IP addresses just for system use.
    • An TLS wildcard certificate covering the full “” domain name.

Configuration process – GUI

The first step is to configure the four global settings above, then restart the CloudStack management service to make these settings live.

Next upload the TLS root certificate chain, the actual TLS certificate as well as the PKCS8 formatted private key using the “TLS certificate” button in the zone configuration. In this example we use a wildcard certificate.

Configuration process – API / CloudMonkey

The uploadCustomCertificate API call can be used to upload the TLS certificates. Please note the upload process does require at least two API calls – more depending on how many intermediary certificates are used. If you use CloudMonkey the certificates can be uploaded in cleartext – otherwise they have to be URLencoded when passed as part of a normal HTTP GET API call.

  • In the first API call the combined root and intermediary certificates are uploaded. In this API call the following parameters are passed – note we don’t pass the private key:
    • id=1
    • name: Give the certificate a name.
    • certificate: the root / intermediary certificate in cleartext, with all formatting / line breaks in place. In the example below we pass a chain of root and intermediary certificates.
    • domainsuffix: provide the suffix used, e.g. *
  • The second API call in our example uploads the site certificate. In this case we do not give the certificate a name (CloudStack automatically names this):
    • id=2
    • certificate: the issued site certificate, again in cleartext.
    • privatekey: the private key, same cleartext format as certificate.
    • domainsuffix: as above.
cloudmonkey upload customcertificate id=1 name=RootCertificate certificate='-----BEGIN CERTIFICATE-----

cloudmonkey upload customcertificate id=2 certificate='-----BEGIN CERTIFICATE-----
privatekey='-----BEGIN PRIVATE KEY-----
-----END PRIVATE KEY-----'


System VM restart

Once uploaded the CPVM and SSVM will automatically restart to pick up the new certificates. If the system VMs do not restart cleanly they can be destroyed and will come back online with the TLS configuration in place.

Testing the TLS protected system VMs

To test the Console Proxy VM simply open the console to any user VM or system VM. The popup browser window will at this point show up as a non-secure website. The reason for this is the actual console session is provided in an inline frame – and the popup page itself is presented from the unsecured management servers. If you however look at the page source for the page the HTTPS link which presents the inline frame shows up in the expected format:

Once the management server is also TLS protected the CPVM console popup window will also show as secured:

The SSVM will also now utilise the HTTPS links for all browser based uploads as well as downloads from secondary storage:

Securing the CloudStack management server GUI with HTTPS

Protecting the management servers requires creating a PCKS12 certificate and updating the underlying Jetty configuration to take this into account. This is described in

First of all combine the TLS private key file, certificate and root certificate chain into one file (in the specified order), then convert this into PKCS12 format and write it to a new CloudStack keystore. Enter a new keystore password when prompted:

# cat myprivatekey.key mycertificate.crt gd_bundle-g2-g1.crt > mycombinedcert.crt

# openssl pkcs12 -in mycombinedcert.crt -export -out mycombinedcert.pkcs12
Enter Export Password: ************
Verifying - Enter Export Password:************

# keytool -importkeystore -srckeystore mycombinedcert.pkcs12 -srcstoretype PKCS12 -destkeystore /etc/cloudstack/management/keystore.pkcs12 -deststoretype pkcs12
Importing keystore mycombinedcert.pkcs12 to /etc/cloudstack/management/keystore.pkcs12...
Enter destination keystore password:************
Re-enter new password:************
Enter source keystore password:************
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled


Next edit /etc/cloudstack/management/ and update the following:

https.keystore.password=<enter the same password as used for conversion>


In addition automatic redirect from HTTP/port 8080 to HTTPS/port 8443 can also be configured in /usr/share/cloudstack-management/webapp/WEB-INF/web.xml. Add the following section before the section around line 22:



Lastly restart the management service:

# systemctl restart cloudstack-management


The TLS capabilities of CloudStack has been extended over the last few releases – and we hope this article helps to explain the new global settings and configuration procedure.

About The Author

Dag Sonstebo is  a Cloud Architect at ShapeBlue, The Cloud Specialists. Dag spends most of his time designing, implementing and automating IaaS solutions based on Apache CloudStack.


Upgrading CloudStack can sometimes be a little daunting – but as the 5P’s proverb goes – Proper Planning Prevents Poor Performance. With planning, testing and the right strategy upgrades will have a high chance of success and have minimal impact on your CloudStack end users.

The CloudStack upgrade process is documented in the release notes for each CloudStack version (e.g. version 4.6 to 4.9 upgrade guide). These upgrade notes should always be adhered to in detail since they cover both the base CloudStack upgrade as well as the required hypervisor upgrade changes. This blog article does however aim to add some advise and best practices to make the overall upgrade process easier.

The following, outlines some best practice steps which can – and should – be carried out in advance of the actual CloudStack upgrade. This ensures all steps are done accurately and without the pressure of an upgrade change window.

Strategy – in-place upgrade vs. parallel builds

The official CloudStack upgrade guides are based on an in-place upgrade process. This process works absolutely fine  – it does however have some drawbacks:

  • It doesn’t easily lend itself to technology refreshes – if you want to carry out hardware or OS level upgrades or configuration changes to your CloudStack management and MySQL servers at the time of upgrade this adds complexity and time.
  • To ensure you have rollback point in case of problems with the upgrade you need to either take VM snapshots of the infrastructure (if your management and MySQL servers are Virtual Machines) or otherwise back up all configurations.
  • In case of a rollback scenario you also have to ensure all upgraded system CloudStack logs as well as the database is backed up before reverting to any snapshots / backups – such that a post-mortem investigation can be carried out.

To get around these issues it is easier and a less risky to simple prepare and build new CloudStack management and MySQL servers in advance of the actual upgrade:

  • This means you can carry out the upgrade on brand new servers, whilst leaving the old servers offline for a much quicker rollback.
  • This also means in the case of a rollback scenario the newly built servers can simply be disabled and left for further investigation whilst the original infrastructure is brought back online.
  • Since most people run their CloudStack management infrastructure as Virtual Machines the cost of adding a handful of new VMs for this purpose is negligible compared to the benefits of a more efficient and less risky upgrade process.

By using this parallel build process the upgrade process will include a few more steps – these will be covered later in this article:

  • Upgrade starts with stopping and disabling CloudStack management on the original infrastructure.
  • The original database is then copied to the new MySQL server(s), before MySQL is disabled.
  • Once completed the new CloudStack management servers are configured to point to the new MySQL server(s).
  • The upgrade is then run on the new CloudStack management servers, pointing to the new MySQL server(s).
  • As an additional step this does require the CloudStack global “host” to be reconfigured.

New CloudStack management infrastructure build

Following the standard documented build procedure for the new / upgrade version of CloudStack:

Build new management servers but without carrying out the following steps:

  • Do not carry out the steps for seeding system templates.
  • Do not carry our the cloudstack-setup-databases step.
  • Do not run cloudstack-setup-management.

Build new MySQL servers:

  • Ensure the cloud username / password is configured.
  • Create new empty databases for cloud / cloud_usage / cloudbridge.
  • If possible configure the MySQL master-slave configuration.

 Upgrade lab testing

As for any technical project testing the upgrade prior to upgrading is important. If you don’t have a production equal lab to do this in it is worth the effort to build one.

A couple of things to keep in mind:

  • Try to make the upgrade lab as close to production as possible.
  • Make sure you use the same OS’es, the same patch level and the same software versions (especially Java and Tomcat).
  • Use the same hardware model for the hypervisors if possible. These don’t have to be the same CPU / memory spec. but should otherwise match as closely as possible.
  • Use the same storage back-ends for primary and secondary storage. If not possible at least use the same protocols (NFS, iSCSI. FC, etc.)
  • Try to match your production cloud infrastructure with regards to number of zones.
  • If your production infrastructure has multiple clusters or pods also try to match this.

Once the lab has been built make sure it is prepared with a test workload similar to production:

  • Configure a selection of test domain and accounts + users.
  • Create VM infrastructure similar to production – e.g. configure isolated and shared networks, VPCs, VPN connections etc.
  • Create a number of VMs across multiple guest OS’es – at the very minimum try to have both Windows and Linux VMs running.
  • Make sure you also have VMs with additional disks attached.
  • Keep in mind your post-upgrade testing may be destructive – i.e. you may delete or otherwise break VMs. To cover for this it is good practice to prepare a lot more VMs than you think you may need.

For the test upgrade:

  • Ensure you have rollback points – you may want run the test upgrade multiple times. An easy way is to take VM snapshots and SAN/NAS snapshots.
  • Using the parallel build process create a new management server and carry out the upgrade (see below).
  • Investigate and fix any problems encountered during the upgrade process and add this to the production upgrade plan.

Once upgraded carry out thorough functional, regression and user acceptance testing. The nature of these depend very much on the type, complexity and nature of the production cloud, but should at the very minimum include:

User actions
  • Logins with passwords created prior to the upgrade.
  • Creation of new domains / accounts / users.
  • Deletion of old and new domains / accounts / users.
All VM lifecycle actions
  • Create / delete new VMs as well as deletion of existing VMs.
  • Start / stop of new and existing VMs.
  • Additions and removals of disks.
  • Additions and removals of networks to VMs.
Network lifecycle actions
  • Creation of new isolated / shared / VPC networks.
  • Deletion of existing as well as new isolated / shared / VPC networks.
  • Connectivity testing from VMs across all network types.
Storage lifecycle actions
  • Additions and deletions of disks.
  • Test any integrated systems – billing portals, custom GUIs / portals, etc.

If any problems are encountered these need to be investigated and addressed prior to the production upgrade.

Production DB upgrade test

The majority of the CloudStack upgrade is done against the underlying databases. As a result the lab testing discussed above will not highlight any issues resulting from:

  • Schema changes
  • Database or table sizes
  • Internal database constraints

To test for this it is good practice to carry out the following test steps. This process should be carried out as close in time to the actual upgrade as possible:

  • Build a completely network isolated management server using the new upgrade version of CloudStack. For this test simply use the same server for management and MySQL.
  • Import a copy of the production databases.
  • Start the upgrade by running the standard documented cloudstack-setup-databases configuration command as well as cloudstack-setup-management. The latter will complete the server configuration and start the cloudstack-management service.
  • Monitor the CloudStack logs (/var/log/cloudstack/management/management-server.log).

This will highlight any issues with the underlying SQL upgrade scripts. If any problems are encountered these need to be investigated thoroughly and fixed.

Please note – if any problems are found and subsequently fixed the original database needs to be imported again and the full upgrade restarted, otherwise the upgrade process will attempt to carry out upgrade steps which have already been completed – which in most cases will lead to additional errors.

Once all issues have been resolved and the upgrade completes without errors the time taken for the upgrade should be noted. The database upgrade process can take from minutes up to multiple hours depending on the size of the production databases – something which needs to be taken into account when planning for the production upgrade. If the upgrade takes longer than expected it may also be prudent to consider database housekeeping of event and usage tables to cut down on size and thereby speeding up the upgrade process.

To prevent re-occurrence of problems during the production upgrade it is key that all database upgrade fixes and workarounds are documented and added to the production upgrade plan.

Install system VM templates

The official upgrade documentation specifies which system VM templates to upload prior to the upgrade.

Please note – whether you do the upgrade in-place or on a parallel build the system VM templates need to be uploaded on the original CloudStack infrastructure (i.e. it can not be done on the new infrastructure when using parallel builds).

In other words – the template upload can and should be done in the days prior to the upgrade. Uploading these will not have any adverse effect on the existing CloudStack instance – since the new templates are simply uploaded in a similar fashion to any other user template upload. Carrying this out in advance will also ensure the templates should be fully uploaded by the time the production upgrade is carried out, thereby cutting down the upgrade process time.

Other things to consider prior to upgrade

  • Always make sure the production CloudStack infrastructure is healthy prior to the upgrade – an upgrade will very seldom fix underlying issues – these will however quickly cause problems during the upgrade and lead to extended downtime and rollbacks.
  • Once the upgrade has been completed it is very difficult to backport any post-upgrade user changes back to the original database in case of a rollback scenario. In other words – it is prudent to disable user access to the CloudStack infrastructure until the upgrade has been deemed a success.
  • It is good practice to at the very minimum have the CloudStack MySQL servers configured in a master-slave configuration. If this has not been done on the original infrastructure it can be configured on the newly built MySQL servers prior to any database imports, again cutting down on processing time during the actual production upgrade.


Step 1 – review all documentation

The official upgrade documentation is updated for each version of CloudStack (e.g. CloudStack 4.6 to 4.9 upgrade guide) – and should always be taken into consideration during an upgrade.

Step 2 – confirm all system VM templates are in place

In the CloudStack GUI make sure the previously uploaded system VM templates are successfully uploaded:

  • Status: Download complete
  • Ready: Yes

Step 3 – stop all running CloudStack services

On all existing CloudStack servers stop and disable the cloudstack-management service:

# service cloudstack-management stop;
# service cloudstack-usage stop;
# chkconfig cloudstack-management off;
# chkconfig cloudstack-usage off;

Step 4 – back up existing databases and disable MySQL

On the existing MySQL servers back up all CloudStack databases:

# mysqldump -u root -p cloud > /root/cloud-backup.sql;
# mysqldump -u root -p cloud_usage > /root/cloud_usage-backup.sql;
# mysqldump -u root -p cloudbridge > /root/cloudbridge-backup.sql;

# service mysqld stop;
# chkconfig mysqld off;

Step 5 – copy and import databases on the new MySQL master server

# scp root@<original MySQL server IP>:/root/cloud*.sql;
# mysql –u root –p cloud < cloud-backup.sql;
# mysql –u root –p cloud_usage < cloud_usage-backup.sql;
# mysql –u root –p cloudbridge < cloudbridge-backup.sql;

Step 6 – update the “host” global setting

The parallel builds method introduces new management servers whilst disabling the old ones. If no load balancers are being used the “host” global setting requires to be updated to specify which new management server the hypervisors should check into:

# mysql –p;
# update cloud.configuration set value=”<new management server IP address | new loadbalancer VIP>” where name=”host”;

Step 7 – carry out all hypervisor specific upgrade steps

Following the official upgrade documentation upgrade all hypervisors as specified.

Step 8 – configure and start the first management server

On the first management server only carry out the following steps to configure connectivity to the new MySQL (master) server and start the management service. Note the “cloudstack-setup-databases” command is executed without the “–deploy-as” option:

# cloudstack-setup-databases <cloud db username>:<cloud db password>@<cloud db host>; 
# cloudstack-setup-management;
# service cloudstack-management status;

Step 9 – monitor startup

  • Monitor the CloudStack service startup:
    # tail –f /var/log/cloudstack/management/management-server.log
  • Open the CloudStack GUI and check and confirm all hypervisors check in OK.

Step 10 – update system VMs

  • Destroy the CPVM and make sure this is recreated with the new system VM template.
  • Destroy the SSVM and again makes sure this comes back online.
  • Restart guest networks “with cleanup” to ensure all Virtual Routers are recreated with the new template.
  • Checking versions of system VMs / VRs can be done by either:
    • Checking /etc/cloudstack-release locally on each appliance.
    • By checking the vm_instance / vm_templates table in the cloud database.
    • Updating the “minreq.sysvmtemplate.version” global setting to the new system VM template version – which will show which VRs require updated in the CloudStack GUI.

Step 11 – configure and start additional management servers

Following the same steps as in step 8 above configure and start the cloudstack-management service on any additional management servers.


Rollbacks of upgrades should only be carried out in worst case scenarios. A rollback is not an exact science and may require additional steps not covered in this article. In addition any rollback should be carried out as soon as possible after the original upgrade – otherwise the new infrastructure will simply have too many changes to VMs, networks, storage, etc. which can not be easily backported to the original database.

Step 1 – disable new CloudStack management / MySQL servers

On the new CloudStack management servers stop and disable the management service:

# service cloudstack-management stop;
# chkconfig cloudstack-management off;


On the new MySQL server stop and disable MySQL:

# service mysqld stop;
# chkconfig mysqld off;

Step 2 – tidy up new VMs on the hypervisor infrastructure

Using hypervisor specific tools identify all VMs (system VMs and VRs) created since the upgrade. These instances need to be deleted since the old management infrastructure and databases have no knowledge of them.

Step 3 – enable the old CloudStack management / MySQL servers:

On the original MySQL servers:

# service mysqld start;
# chkconfig mysqld on;


On the original CloudStack management servers:

# service cloudstack-management start;
# chkconfig cloudstack-management on;

Step 4 – restart system VMs and VRs

System VMs and VRs may restart automatically (since the original instances are gone). If these do not come online follow the procedure described in upgrade step 10 to recreate these.


Many CloudStack users are reluctant to upgrade their platforms – however the risk of not upgrading and falling behind on technology and security tends to be a greater risk in the long run. Following these best practices should reduce the overall workload, risk and stress of upgrades by both allowing a lot of work to be carried out upfront, as well as providing a relatively straight forward rollback procedure.

As always we’re happy to receive feedback , so please get in touch with any comments, questions or suggestions.

About The Author

Dag Sonstebo is a Cloud Architect at ShapeBlue, The Cloud Specialists. Dag spends most of his time designing, implementing and automating IaaS solutions based on on Apache CloudStack.

While, in my opinion VMware’s vSphere is the best performing and most stable hypervisor available, vCenter obstinately remains a single point of failure when using vSphere and it’s no different when leveraging vCenter in a CloudStack environment.  Therefore, very occasionally there is a requirement to rebuild a vCentre server which was previously running in your CloudStack environment.  Working with one of our clients we found out (the hard way) how to do this.

Replacing a Pre-Existing vCenter Instance

Recreating a vCenter Instance

The recovery steps below apply to both the Windows-based vCenter server or an appliance.

The first step, which I won’t cover in detail here, is to build a new vCenter server and attach the hosts to it.  Follow VMware standard procedure when building your new vCenter server.  To save some additional steps later on, reuse the host name and IP address of the previous vCenter server. Ensure that the permissions have been reapplied allowing CloudStack to connect at the vCentre level with full administrative privileges and that datacenter and cluster names have been recreated accurately.  When you re-add the hosts, the VMs running on those hosts will automatically be pulled into the vCentre inventory.


The environment that we were working on was using standard vSwitches; therefore the configuration of those switches was held on each of the hosts independently and pulled into the vCenter inventory when the hosts were re-added to vCenter.  Distributed vSwitches (dvSwitches) have their configuration held in the vCenter.  We did not have to deal with this complication and so the recreation of dvSwitches is not covered here.


The changes that are to be made can easily be undone, as they make no change and trigger no change in the physical environment.  However, it is never a bad idea to have a backup, you shouldn’t ever need to roll back the whole database, but it’s good to have a record of your before/after states.

Update vCenter Password in CloudStack

If you have changed the vCenter password, it’ll need updating in the CloudStack database in its encrypted form.  The following instructions are taken from the CloudStack documentation:

  • Generate the encrypted equivalent of your vCenter password:
$ java -classpath /usr/share/cloudstack-common/lib/jasypt-1.9.0.jar org.jasypt.intf.cli.JasyptPBEStringEncryptionCLI input="_your_vCenter_password_" password="`cat /etc/cloudstack/management/key`" verbose=false
  • Store the output from this step, we need to add this in cluster_details table and vmware_data_center tables in place of the plain text password
  • Find the ID of the row of cluster_details table that you have to update:
$ mysql -u <USERNAME> -p<PASSWORD>
select * from cloud.cluster_details;
  • Update the password with the new encrypted one
update cloud.cluster_details set value = '_ciphertext_from_step_1_' where id = _id_from_step_2_;
  • Confirm that the table is updated:
select * from cloud.cluster_details;
  • Find the ID of the correct row of vmware_data_center that you want to update
select * from cloud.vmware_data_center;
  • update the plain text password with the encrypted one:
update cloud.vmware_data_center set password = '_ciphertext_from_step_1_' where id = _id_from_step_5_;
  • Confirm that the table is updated:
select * from cloud.vmware_data_center;

Reallocating the VMware Datacenters to CloudStack

The next step is to recreate the property within vSphere which is used to track VMware datacenters which have been connected to CloudStack.   Adding a VMware datacenter to CloudStack creates this custom attribute and sets it to “true”.  CloudStack will check for this attribute and value and show an error on the management log if it attempts to connect to a vCenter server which does not have this attribute set.  While the property can be edited using the full vSphere client, it must be created using PowerCLI utility.

Using the PowerCLI utility do the following:


(you’ll be prompted for a username and password with sufficient permissions for the vCenter Server)

New-CustomAttribute -Name "" -TargetType Datacenter

Get-Datacenter <DATACENTER_NAME> | Set-Annotation -CustomAttribute "" -Value true

Reconnecting the ESXi Hosts to CloudStack

CloudStack will now be connected to the vCenter server, but the hosts will still probably all be marked as down.  This is because vCenter uses an internal identifier for hosts, so that names and IP addresses can change, but it still can keep track of which host is which.

This identifier appears in two places in CloudStack’s database; The ‘guid’ in the table and also in the cloud.host_details table against the ‘guid’ value. The host table can be queried as follows:

SELECT id,name,status,guid FROM WHERE hypervisor_type = 'VMware' AND status != 'Removed';

The guid takes the form:


You can find the new Host ID either though the PowerCLI again, or via a utility like Robware’s RVTools.  RVTools is a very light-weight way of viewing information from a vCenter. It also allows export to csv so is very handy for manual reporting and analysis. The screen grap below shows a view called vHosts; note the Object ID.  This is vCenter’s internal ID of the host.


Through PowerCLI you would use:


Get-VMHost | ft -Property Name,Id -autosize

Now you can update the CloudStack tables with the new host ids for each of your hosts. for  host ‘’ on vCenter ‘’ might previously have read:


should be updated to:


Remember to check the host and host_details table.

Once that you have updated the entries for all of the active hosts in the zone, you can start the management service again now and all of the hosts should reconnect, as their pointers have been fixed in the CloudStack database.


This article explains how to re-insert a lost or failed vCenter server into CloudStack.  This blog describes the use of PowerCLI to both set and retrieve values from vCenter.

This process could be used to migrate an appliance based vCenter to a Windows based environment in order to support additional capacity as well as replacing a broken vCenter installation.

About The Author

Paul Angus is VP Technology & Cloud Architect at ShapeBlue, The Cloud Specialists. He has designed and implemented numerous CloudStack environments for customers across 4 continents, based on Apache CloudstackCitrix Cloudplatform and Citrix CloudPortal.

When not building Clouds, Paul likes to create Ansible playbooks that build clouds

We recently came across a very unusual issue where a client had a major security breach on their network. As well as lots of other damage their CloudStack infrastructure was maliciously damaged beyond recovery. Luckily the hackers hadn’t manage to damage the backend XenServer hypervisors so they were quite happily still running user VMs and Virtual Routers, just not under CloudStack control.  The client had a good backup regime so the CloudStack database was available to recover some key CloudStack information from. As building a new CloudStack instance was the only option the challenge was now how to recover all the user VMs to such a state they could be imported into the new CloudStack instance under the original accounts.

Mapping out user VMs and VHD disks

The first challenge is to map out all the VMs and work out which VMs belong to which CloudStack account, and which VHD disks belong to which VMs. To do this first of all recover the original CloudStack database and then query the vm_instance, service_offering, account and domain  tables.

In short we are interested in:

  • VM instance ID and names
  • VM instance owner account ID and account name
  • VM instance owner domain ID and domain name
  • VM service offering, which determines the VM CPU / memory spec
  • VM volume ID, name, size, path, type (root disk or data) and state for all VM disks – root or data.

At the same time we are not interested in:

  • System VMs
  • User VMs in state “Expunging”, “Expunged”, “Destroyed” or “Error”. The “Error” state would indicate the VM was not healthy on the original infrastructure.
  • VM disk volumes which are in state “Expunged” or “Expunging”. Both of these would indicate the VM was in the process of being deleted on the original CloudStack instance.

From a SQL point of view we do this as follows:

SELECT as vmid, as vmname,
cloud.vm_instance.instance_name as vminstname,
cloud.vm_instance.display_name as vmdispname,
cloud.vm_instance.account_id as vmacctid,
cloud.account.account_name as vmacctname,
cloud.vm_instance.domain_id as vmdomainid, as vmdomname,
cloud.vm_instance.service_offering_id as vmofferingid,
cloud.service_offering.speed as vmspeed,
cloud.service_offering.ram_size as vmmem, as volid, as volname,
cloud.volumes.size as volsize,
cloud.volumes.path as volpath,
cloud.volumes.volume_type as voltype,
cloud.volumes.state as volstate
FROM cloud.vm_instance
right join cloud.service_offering on (
right join cloud.volumes on (
right join cloud.account on (
right join cloud.domain on (
and not (cloud.vm_instance.state='Expunging' or cloud.vm_instance.state='Destroyed' or cloud.vm_instance.state='Error')
and not (cloud.volumes.state='Expunged' or cloud.volumes.state='Expunging')
order by;

This will return a list of VMs and disks like the following:

25ppvm2i-5-25-VMppvm25peterparker2SPDM Inc150051231ROOT-252147483648010c12a4f-7bf6-45c4-9a4e-c1806c5dd54aROOTReady
26ppvm3i-5-26-VMppvm35peterparker2SPDM Inc150051232ROOT-26214748364807046409c-f1c2-49db-ad33-b2ba03a1c257ROOTReady
26ppvm3i-5-26-VMppvm35peterparker2SPDM Inc150051236ppdatavol5368709120b9a51f4a-3eb4-4d17-a36d-5359333a5d71DATADISKReady

This now gives us all the information required to import the VM into the right account once the VHD disk file has been recovered from the original primary storage pool.

Recovering VHD files

Using the information from the database query above we now know that e.g. the VM “ppvm3”:

  • Is owned by the account “peterparker” in domain “SPDM Inc”.
  • Used to have 1 vCPU @ 500MHz and 500MB vRAM.
  • Had two disks:
    • A root disk with ID 7046409c-f1c2-49db-ad33-b2ba03a1c257.
    • A data disk with ID b9a51f4a-3eb4-4d17-a36d-5359333a5d71.

If we now check the original primary storage repository we can see these disks:

-rw-r--r-- 1 root root 502782464 Nov 17 12:08 7046409c-f1c2-49db-ad33-b2ba03a1c257.vhd
-rw-r--r-- 1 root root 13312 Nov 17 11:49 b9a51f4a-3eb4-4d17-a36d-5359333a5d71.vhd

This should in theory make recovery easy. Unfortunately due to the nature of XenServer VHD disk chains it’s not that straight forward. If we tried to import the root VHD disks as a template this would succeed, but as soon as we try to spin up a new VM from this template we get the “insufficient resources” error from CloudStack. If we trace this back in the CloudStack management log or XenServer SMlog we will most likely find an error along the lines of “Got exception SR_BACKEND_FAILURE_65 ; Failed to load VDI”. The root cause of this is we have imported a VHD differencing disk, or in common terms a delta or “child” disk. These reference a parent VHD disk – which we so far have not recovered.

To fully recover healthy VHD images we have two options:

  1. If we have access to the original storage repository from a running XenServer we can use the “xe” command line tools to export each VDI image. This method is preferable as it involves less copy operations and less manual work.
  2. If we have no access from a running XenServer we can copy the disk images and use the “vhd-util” utility to merge files.

Recovery using XenServer

VHD file export using the built in XenServer tools is relatively straight forward. The “xe vdi-export” tool can be used to export and merge the disk in a single operation. The first step in the process is to map an external storage repository to the XenServer (normally the same repository which is used for upload of the VHD images to CloudStack later on), e.g. an external NFS share.

We now use the vdi-export option as follows:

# xe vdi-export uuid=7046409c-f1c2-49db-ad33-b2ba03a1c257 format=vhd filename=ppvm3root.vhd --progress
[|] ######################################################&amp;gt; (100% ETA 00:00:00)
Total time: 00:01:12
# xe vdi-export uuid=b9a51f4a-3eb4-4d17-a36d-5359333a5d71 format=vhd filename=ppvm3data.vhd --progress
[\] ######################################################&gt; (100% ETA 00:00:00)
Total time: 00:00:00
# ll
total 43788816
-rw------- 1 root root 12800 Nov 18 2015 ppvm3data.vhd
-rw------- 1 root root 1890038784 Nov 18 2015 ppvm3root.vhd

If we now utilise vhd-util to scan the disks we see they are both dynamic disks with no parent:

# vhd-util read -p -n ppvm3root.vhd
VHD Footer Summary:
Cookie : conectix
Features : (0x00000002) &lt;RESV&gt;
File format version : Major: 1, Minor: 0
Data offset : 512
Timestamp : Sat Jan 1 00:00:00 2000
Creator Application : 'caml'
Creator version : Major: 0, Minor: 1
Creator OS : Unknown!
Original disk size : 20480 MB (21474836480 Bytes)
Current disk size : 20480 MB (21474836480 Bytes)
Geometry : Cyl: 41610, Hds: 16, Sctrs: 63
: = 20479 MB (21474754560 Bytes)
Disk type : Dynamic hard disk
Checksum : 0xffffefb4|0xffffefb4 (Good!)
UUID : 4fc66aa3-ad5e-44e6-a4e2-b7e90ae9c192
Saved state : No
Hidden : 0

VHD Header Summary:
Cookie : cxsparse
Data offset (unusd) : 18446744073709
Table offset : 2048
Header version : 0x00010000
Max BAT size : 10240
Block size : 2097152 (2 MB)
Parent name :
Parent UUID : 00000000-0000-0000-0000-000000000000
Parent timestamp : Sat Jan 1 00:00:00 2000
Checksum : 0xfffff44d|0xfffff44d (Good!)

# vhd-util read -p -n ppvm3data.vhd
VHD Footer Summary:
Cookie : conectix
Features : (0x00000002) &lt;RESV&gt;
File format version : Major: 1, Minor: 0
Data offset : 512
Timestamp : Sat Jan 1 00:00:00 2000
Creator Application : 'caml'
Creator version : Major: 0, Minor: 1
Creator OS : Unknown!
Original disk size : 5120 MB (5368709120 Bytes)
Current disk size : 5120 MB (5368709120 Bytes)
Geometry : Cyl: 10402, Hds: 16, Sctrs: 63
: = 5119 MB (5368430592 Bytes)
Disk type : Dynamic hard disk
Checksum : 0xfffff16b|0xfffff16b (Good!)
UUID : 03fd60a4-d9d9-44a0-ab5d-3508d0731db7
Saved state : No
Hidden : 0

VHD Header Summary:
Cookie : cxsparse
Data offset (unusd) : 18446744073709
Table offset : 2048
Header version : 0x00010000
Max BAT size : 2560
Block size : 2097152 (2 MB)
Parent name :
Parent UUID : 00000000-0000-0000-0000-000000000000
Parent timestamp : Sat Jan 1 00:00:00 2000
Checksum : 0xfffff46b|0xfffff46b (Good!)

# vhd-util scan -f -m'*.vhd' -p
vhd=ppvm3data.vhd capacity=5368709120 size=12800 hidden=0 parent=none
vhd=ppvm3root.vhd capacity=21474836480 size=1890038784 hidden=0 parent=none

These files are now ready for upload to the new CloudStack instance.

Note: using the Xen API it is also in theory possible to download / upload a VDI image straight from XenServer, using the “export_raw_vdi” API call. This can be achieved using a URL like:

https://<account>:<password>@<XenServer IP or hostname>/export_raw_vdi?vdi=<VDI UUID>&format=vhd

At the moment this method unfortunately doesn’t download the VHD file as a sparse disk image, hence the VHD image is downloaded in it’s full original disk size, which makes this very space hungry method. It is also a relatively new addition to the Xen API and is marked as experimental. More information can be found on

Recovery using vhd-util

If all we have access to is the original XenServer storage repository we can utilise the “vhd-util” binary which can be downloaded from (note this is a slightly different version to the one built in to XenServer).

If we run this with the “read” option we can find out more information about what kind of disk this is and if it has a parent. For the root disk this results in the following information:

# vhd-util read -p -n 7046409c-f1c2-49db-ad33-b2ba03a1c257.vhd
VHD Footer Summary:
Cookie : conectix
Features : (0x00000002) &amp;amp;lt;RESV&amp;amp;gt;
File format version : Major: 1, Minor: 0
Data offset : 512
Timestamp : Sun Nov 15 23:06:44 2015
Creator Application : 'tap'
Creator version : Major: 1, Minor: 3
Creator OS : Unknown!
Original disk size : 20480 MB (21474836480 Bytes)
Current disk size : 20480 MB (21474836480 Bytes)
Geometry : Cyl: 41610, Hds: 16, Sctrs: 63
: = 20479 MB (21474754560 Bytes)
Disk type : Differencing hard disk
Checksum : 0xffffefe6|0xffffefe6 (Good!)
UUID : 2a2cb4fb-1945-4bad-9682-6ea059e64598
Saved state : No
Hidden : 0

VHD Header Summary:
Cookie : cxsparse
Data offset (unusd) : 18446744073709
Table offset : 1536
Header version : 0x00010000
Max BAT size : 10240
Block size : 2097152 (2 MB)
Parent name : cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd
Parent UUID : f6f05652-20fa-4f5f-9784-d41734489b32
Parent timestamp : Fri Nov 13 12:16:48 2015
Checksum : 0xffffd82b|0xffffd82b (Good!)

From the above we notice two things about the root disk:

  • Disk type : Differencing hard disk
  • Parent name : cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd

I.e. the VM root disk is a delta disk which relies on parent VHD disk cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd.

If we run this against the data disk the story is slightly different:

# vhd-util read -p -n b9a51f4a-3eb4-4d17-a36d-5359333a5d71.vhd
VHD Footer Summary:
Cookie : conectix
Features : (0x00000002) &amp;amp;lt;RESV&amp;amp;gt;
File format version : Major: 1, Minor: 0
Data offset : 512
Timestamp : Mon Nov 16 01:52:51 2015
Creator Application : 'tap'
Creator version : Major: 1, Minor: 3
Creator OS : Unknown!
Original disk size : 5120 MB (5368709120 Bytes)
Current disk size : 5120 MB (5368709120 Bytes)
Geometry : Cyl: 10402, Hds: 16, Sctrs: 63
: = 5119 MB (5368430592 Bytes)
Disk type : Dynamic hard disk
Checksum : 0xfffff158|0xfffff158 (Good!)
UUID : 7013e511-b839-4504-ba88-269b2c97394e
Saved state : No
Hidden : 0

VHD Header Summary:
Cookie : cxsparse
Data offset (unusd) : 18446744073709
Table offset : 1536
Header version : 0x00010000
Max BAT size : 2560
Block size : 2097152 (2 MB)
Parent name :
Parent UUID : 00000000-0000-0000-0000-000000000000
Parent timestamp : Sat Jan 1 00:00:00 2000
Checksum : 0xfffff46d|0xfffff46d (Good!)

In other words the data disk is showing up with:

  • Disk type : Dynamic hard disk
  • Parent name : <blank>

This behaviour is typical for VHD disk chains. The root disk is created from an original template file, hence it has a parent disk, whilst the data disk was just created as a raw storage disk, hence has no parent.

Before moving forward with the recovery it is very important to make copies of both the differencing disks and parent disk to a separate location for further processing. 

The full recovery of the VM instance root disk relies on the differencing disk being coalesced or merged into the parent disk – but since the parent disk was a template disk it is used by a number of differencing disks the coalesce process will change this parent disk and render any other differencing disks unrecoverable.

Once we have copied the root VHD disk and it’s parent disk to a separate location we use the vhd-util “scan” option to verify we have all disks in the disk chain. The”scan” option will show an indented list of disks which gives a tree like view of disks and parent disks.

Please note if the original VM had a number of snapshots there might be more than two disks in the chain. If so use the process above to identify all the differencing disks and download them to the same folder.

# vhd-util scan -f -m'*.vhd' -p
vhd=cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd capacity=21474836480 size=1758786048 hidden=1 parent=none
vhd=7046409c-f1c2-49db-ad33-b2ba03a1c257.vhd capacity=21474836480 size=507038208 hidden=0 parent=cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd

Once all differencing disks have been copied we can now use the vhd-util “coalesce” option to merge the child difference disk(s) into the parent disk:

# ls -l
-rw-r--r-- 1 root root 507038208 Nov 17 12:45 7046409c-f1c2-49db-ad33-b2ba03a1c257.vhd
-rw-r--r-- 1 root root 1758786048 Nov 17 12:47 cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd
# vhd-util coalesce -n 7046409c-f1c2-49db-ad33-b2ba03a1c257.vhd
# ls -l
-rw-r--r-- 1 root root 507038208 Nov 17 12:45 7046409c-f1c2-49db-ad33-b2ba03a1c257.vhd
-rw-r--r-- 1 root root 1863848448 Nov 17 13:36 cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd

Note the vhd-util coalesce option has no output. Also note the size change of the parent disk cfe206b7-cacb-4291-a1c0-f248ff53ed4b.vhd.

Now the root disk has been merged it can be uploaded as a template to CloudStack to allow build of the original VM.

Import of VM into new CloudStack instance

We now have all details for the original VM:

  • Owner account details
  • VM virtual hardware specification
  • Merged root disk
  • Data disk

The import process is now relatively straight forward. For each VM:

  1. Ensure the account is created.
  2. In the context of the account (either via GUI or API):
    1. Import the root disk as a new template.
    2. Import the data disk as a new volume.
  3. Create a new instance from the uploaded template.
  4. Once the new VM instance is online attach the uploaded data disk to the VM.

In a larger CloudStack estate the above process is obviously both time consuming and resource intensive, but can to a certain degree be automated. As long as the VHD files were healthy to start off with it will however allow for successful recovery of XenServer based VMs between CloudStack instances.

About The Author

Dag Sonstebo is  a Cloud Architect at ShapeBlue, The Cloud Specialists. Dag spends most of his time designing and implementing IaaS solutions based on on Apache CloudStack.


Recently we’ve seen a few clients being tripped up by the System VM upgrades during the CloudStack upgrades in multi-zone deployments.

The issue occurs when the System VM is pre-deployed individually to multiple zones, rather than being a single template deployed to multiple zones. This may be done in error or because the specific version of CloudStack/CloudPlatform which you are upgrading from does not allow you to provision the template as zone-wide.

The result is that only the system VMs in one zone recreate properly after the upgrade and the other zones do not recreate the SSVM or CPVM when they are destroyed.

The vm_template table will have entries like this for a given System VM template:

 SELECT id,name,type,cross_zones,state FROM cloud.vm_template WHERE name like '%systemvm-xenserver%' AND removed IS NULL; 

634 systemvm-xenserver-4.5SYSTEM 0Active
635 systemvm-xenserver-4.5SYSTEM 0Active
636 systemvm-xenserver-4.5SYSTEM 0Active

And template_store_ref:

 SELECT,template_store_ref.store_id,template_store_ref.template_id,image_store.data_center_id, FROM template_store_ref JOIN image_store ON WHERE template_store_ref.destroyed=0; 


The tables should actually show the same template ID in each store:


and template_store_ref:

21634 1
32634 2
43634 3

In the case of the first table, CloudStack will take the template with id of 636 as being THE region-wide System VM template, and will not be able to find it in zones 1 and 2. and therefore not be able to recreate System VMs in those zones.  The management server log will report that the zone is not ready as it can’t find the template.

The above example assumes that there are 3 zones, with 1 secondary storage pool in each zone, if there are multiple secondary storage pools, then only one in each zone will have initially received the template.

You could edit the path in the template_store_ref to point to the location where the template has been downloaded to on the secondary storage, however, this will leave the system in a non-standard state.

The safest way forward is to set one of the templates to cross_zones=1 and then coax CloudStack into creating an SSVM in each zone (one at a time), which will then download that cross-zone template.


In this example I’ll use template 634 to be the ultimate region-wide template.

In order to download the cross_zone template we need to make one template the region-wide template at a time. So assuming that zone relating to secondary storage pool 3 upgraded properly, we first make template 635 the region-wide template. By changing the database to this:

UPDATE cloud.vm_template SET state='Inactive', cross_zones=1 WHERE id=634; UPDATE cloud.vm_template SET state='Active' WHERE id=635; UPDATE cloud.vm_template SET state='Inactive' WHERE id=636;

634systemvm-xenserver-4.5SYSTEM 1Inactive
635systemvm-xenserver-4.5SYSTEM 0Active
636systemvm-xenserver-4.5SYSTEM 0Inactive

Zone 2 will now be able to recreate the CPVM and SSVM, while zone 3 CPVM and SSVM are already running so are unaffected.
Once the CPVM and SSVM in zone 2 are up, you can now make template 634 the region-wide system VM template, by editing the database like this.

UPDATE cloud.vm_template SET state='Active' WHERE id=634; UPDATE cloud.vm_template SET state='Inactive' WHERE id=635;


The CPVM and SSVM in zone 1 will now be able to start.

The SSVM in zones 2 and 3 will need their cloud service restarting in order to prompt the re-scanning of the templates table. This can be done by a restart or stopping then starting the cloud service on the SSVM – DO NOT DESTROY the SSVM in zones 2 or 3 as they don’t have the correct template yet. The SSVMs in zones 2 and 3 will now start downloading template id 634 to their secondary storage pools.

Once the template 634 has downloaded to zones 2 and 3 (it was already in zone 1), CloudStack will be able to recreate system VMs in any zones.


This article explains how to recover from the upgrade failure scenario when system VMs will not recreate in zones in a multi-zone deployment.

About The Author

Paul Angus is VP Technology & Cloud Architect at ShapeBlue, The Cloud Specialists. He has designed and implemented numerous CloudStack environments for customers across 4 continents, based on Apache Cloudstack ,Citrix Cloudplatform and Citrix Cloudportal.

When not building Clouds, Paul likes to create Ansible playbooks that build clouds

As a cloud infrastructure scales to hundreds or thousands of servers, high availability becomes a key requirement of the production environments supporting multiple applications and services. Since the management servers use a MySQL database to store the state of all its objects, the database could become a single point of failure. The CloudStack manual recommends MySQL replication with manual failover in the event of database loss.

We have worked with Severalnines to produce what we believe is a better way.

In this blog post, we’ll show you how to deploy redundant CloudStack management servers with MariaDB Galera Cluster on CentOS 6.5 64bit. We will have two load balancer nodes fronting the management servers and the database servers. Since CloudStack relies on MySQL’s GET_LOCK and RELEASE LOCK, which are not supported by Galera, we will redirect all database requests to only one MariaDB node and automatically failover to another node in case the former goes down. So, we’re effectively getting the HA benefits of Galera clustering (auto-failover, full consistency between DB nodes, no slave lag), while avoiding the Galera limitations as we’re not concurrently accessing all the nodes. We will deploy a two-node Galera Cluster (plus an arbitrator on a separate ClusterControl node).

Our setup will look like this:

Note that this blog post does not cover the installation of hypervisor and storage hosts. Our setup consists of 4 servers:

  • lb1: HAproxy + keepalived (master)
  • lb2: HAproxy + keepalived (backup) + ClusterControl + garbd
  • mgm1: CloudStack Management + database server
  • mgm2: CloudStack Management + database server


Our main steps would be:

  1. Prepare 4 hosts
  2. Deploy MariaDB Galera Cluster 10.x with garbd onto mgm1, mgm2 and lb2 from lb2
  3. Configure Keepalived and HAProxy for database and CloudStack load balancing
  4. Install CloudStack Management #1
  5. Install CloudStack Management #2


Preparing Hosts

1. Add the following hosts definition in /etc/hosts of all nodes:		virtual-ip mgm.cloustack.local		lb1 haproxy1		lb2 haproxy2 clustercontrol		mgm1.cloudstack.local mgm1 mysql1		mgm2.cloudstack.local mgm2 mysql2

2. Install NTP daemon:

$ yum -y install ntp
$ chkconfig ntpd on
$ service ntpd start

3. Ensure each host is using a valid FQDN, for example on mgm1:

$ hostname --fqdn

Deploying MariaDB Galera Cluster

** The deployment of the database cluster will be done from lb2, i.e., the ClusterControl node.

1. To set up MariaDB Galera Cluster, go to the Severalnines Galera Configurator to generate a deployment package. In the wizard, we used the following values when configuring our database cluster (take note that we specified one of the DB nodes twice under Database Servers’ textbox):

Vendor                   : MariaDB
MySQL Version            : 10.x
Infrastructure           : none/on-premises 
Operating System         : RHEL6 - Redhat 6.4/Fedora/Centos 6.4/OLN 6.4/Amazon AMI 
Number of Galera Servers : 3
Max connections	     	 : 350
OS user                  : root
ClusterControl Server    :
Database Servers         :

At the end of the wizard, a deployment package will be generated and emailed to you.

2. Download and extract the deployment package:

$ wget
$ tar -xzf s9s-galera-mariadb-3.5.0-rpm.tar.gz

3. Before we proceed with the deployment, we need to perform some customization to fit the CloudStack database environment. Go to the deployment script’s MySQL configuration file at ~/s9s-galera-mariadb-3.5.0-rpm/mysql/config/my.cnf and ensure the following options exist under the [MYSQLD] section:


4. Then, go to ~/s9s-galera-mariadb-3.5.0-rpm/mysql/config/cmon.cnf.controller and remove the repeated value on mysql_server_addresses so it becomes as below:


5. Now we are ready to start the deployment:

$ cd ~/s9s-galera-codership-3.5.0-rpm/mysql/scripts/install/
$ bash ./ 2>&1 | tee cc.log

6. The DB cluster deployment will take about 15 minutes, and once completed, the ClusterControl UI is accessible at .

7. It is recommended to run Galera on at least three nodes. So, install garbd, a lightweight arbitrator daemon for Galera on the ClusterControl node from the ClusterControl UI. Go to Manage > Load Balancer > Install Garbd > choose the ClusterControl node IP address from the dropdown > Install Garbd.

You will now see your MariaDB Galera Cluster with garbd installed and binlog enabled (master) as per below:


Load Balancer and Virtual IP

1. Before we start to deploy the load balancers, make sure lb1 is accessible using passwordless SSH from ClusterControl/lb2. On lb2, copy the SSH keys to

$ ssh-copy-id -i ~/.ssh/id_rsa root@

2. Login to ClusterControl, drill down to the database cluster and click Add Load Balancer button. Deploy HAProxy on lb1 and lb2 similar to below:

** Take note that for RHEL, ensure you check Build from source? to install HAProxy from source. This will install the latest version of HAProxy.

3. Install Keepalived on lb1(master) and lb2(backup) with as virtual IP:

4. The load balancer nodes have now been installed, and are integrated with ClusterControl. You can verify this by checking out the ClusterControl summary bar:

5. By default, our script will configure the MySQL reverse proxy service to listen on port 33306 in active-active mode. We need to change this to active-passive multi master mode by declaring the second Galera node as backup, On lb1 and lb2, open /etc/haproxy/haproxy.cfg and append the word ‘backup’ into the last line:

	server check
	server check backup

6. We also need to add the load balancing definition for CloudStack. According to the documentation, we need to load balance port 8080 and 8025. To allow session stickiness, we will use source load balancing algorithm, where the same source address will be forwarded to the same management server unless it fails. On lb1 and lb2, open/etc/haproxy/haproxy.cfg and add the following lines:

frontend cloudstack_ui_8080
        bind *:8080
        mode http
        option httpchk OPTIONS /client
        option forwardfor
        option httplog
        balance source
        server mgm1.cloudstack.local maxconn 32 check inter 5000
        server mgm2.cloudstack.local maxconn 32 check inter 5000
frontend cloudstack_systemvm_8250
        bind *:8250
        mode tcp
        balance source
        server mgm1.cloudstack.local maxconn 32 check
        server mgm2.cloudstack.local maxconn 32 check

6. Restart HAProxy to apply the changes:

$ service haproxy restart

Or, you can just kill the haproxy process and let ClusterControl recover it.

7. Configure iptables to allow connections to port configured in HAProxy and Keepalived. Add the following lines:

$ iptables -I INPUT -m tcp -p tcp --dport 33306 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 8080 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 8250 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 80 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 443 -j ACCEPT
$ iptables -I INPUT -m tcp -p tcp --dport 9600 -j ACCEPT
$ iptables -I INPUT -i eth0 -d -j ACCEPT
$ iptables -I INPUT -p 112 -i eth0 -j ACCEPT
$ iptables -I OUTPUT -p 112 -o eth0 -j ACCEPT

Save the iptables rules:

$ service iptables save

Installing CloudStack Management Server #1

** The following steps should be performed on mgm1.

1. Add the CloudStack repository, create /etc/yum.repos.d/cloudstack.repo and insert the following information.


2. Install the CloudStack Management server:

$ yum -y install cloudstack-management

3. Create a root user with wildcard host for the CloudStack database setup. On mgm1, run the following statement:

$ mysql -u root -p

And execute the following statements:


4. On mgm1, run the following command to configure the CloudStack databases:

$ cloudstack-setup-databases cloud:cloudpassword@ --deploy-as=root:password

5. Setup the CloudStack management application:

$ cloudstack-setup-management

** Allow some time for the CloudStack application to bootstrap on each startup. You can monitor the process at/var/log/cloudstack/management/catalina.out.

6. Open the CloudStack management UI at virtual IP, with default user ‘admin’ and password ‘password’. Configure your CloudStack environment by following the deployment wizard and let CloudStack build the infrastructure:

If completed successfully, you should then be redirected to the CloudStack Dashboard:

The installation of the first management server is now complete. We’ll now proceed with the second management server.


Installing CloudStack Management Server #2

** The following steps should be performed on mgm2

1. Add the CloudStack repository, create /etc/yum.repos.d/cloudstack.repo and insert the following information.


2. Install the CloudStack Management server:

$ yum -y install cloudstack-management

3. Run the following command to setup the CloudStack database (note the absence of –deploy-as argument):

$ cloudstack-setup-databases cloud:cloudpassword@

4. Setup the CloudStack management application:

$ cloudstack-setup-management

** Allow some time for the CloudStack application to bootstrap on each startup. You can monitor the process at/var/log/cloudstack/management/catalina.out. At this point, this management server will automatically discover the other management server and form a cluster. Both management servers are load balanced and accessible via virtual IP,

Lastly, change the management host IP address on every agent host at/etc/cloudstack/agent/ to the virtual IP address similar to below:


Restart the cloudstack agent service to apply the change:

$ service cloudstack-agent restart

Verify the Setup

1. Check the HAProxy statistics by logging into the HAProxy admin page at lb1 host port 9600. The default username/password is admin/admin. You should see the status of nodes from the HAProxy point-of-view. Our Galera cluster is in master-standby mode, while the CloudStack management servers are load balanced:

2. Check and observe the traffic on your database cluster from the ClusterControl overview page at

Reproduced with the kind permission of Severalnines