Tag Archive for: CloudStack

KDDI Corporation Case Study CloudStack/ShapeBlue

Whilst agility and speed of service are key factors in choosing a cloud management platform, it is becoming apparent that the reliability of that platform is proving to be the vital differentiator for many service providers. KDDI Corporation, Japan’s global telecommunications pioneer and a Global Fortune 500 company chose CloudStack to guarantee the speed, agility and reliability of their cloud environment – KDDI Cloud Platform Service (KCPS). They engaged in a strategic partnership with ShapeBlue to take advantage of our deep expertise in CloudStack and years of experience in building large-scale, open-source cloud infrastructures.

“We were introduced to ShapeBlue by one of our partners,” said Masashi Nakamura, Group Leader of KDDI’s Platform Technology Department. “They are very reliable and flexible in their approach to what we need. Their fast response to our problems has made us very happy with the level of support we receive. And of course, as they are a leader in the CloudStack community they help to keep it active and the product keep developing.”

 

 

Carrier-Grade Cloud Platform Orchestrated by CloudStack

KCPS was launched in July 2012 and is built on Apache CloudStack utilizing the KVM and VMware hypervisors. This carrier-grade cloud platform provides the flexibility to migrate a variety of business applications making it possible to meet exact needs. KCPS has a one-stop contracting service that enables customers to quickly construct systems in conjunction with their global expansion plans, whilst maintaining high degrees of security and availability.

 

 

Masashi Nakamura - KDDI Corporation

Masashi Nakamura continued “We were already a leading provider of mobile and network services when we launched KCPS. We wanted to provide a cloud platform that serviced both existing customers and attracted new ones. We didn’t want to compete with the hyperclouds like AWS and Azure, but wanted to provide an agile, secure and stable platform that could be tailored to meet the needs and demanding requirements of our enterprise customers. KDDI’s mission is “to connect” everything and KCPS was designed with that aim in mind.
As part of our due diligence process we evaluated a number of products on which we could construct KCPS, but eventually chose CloudStack. The biggest advantages of using CloudStack was its inherent stability, flexibility and the fact that it uses widely operating network services. In addition, it can manage multiple hypervisors – we use KVM and Vmware – and is a much more mature product than many of its competitors”.

 

 

Enabling Services with High Availability – 99.999% Uptime for KDDI’s Enterprise Customers

KDDI is using KCPS to deliver a broad range of customer services including IaaS, Disaster Recovery (DRaaS), Operating System provision, security operations and datacenter connection services. The 99.999% uptime they offer is a vital factor in providing these services and is why enterprise customers choose them. This robustness and reliability are what attracts enterprise customers to operate mission-critical workloads on it.

Hiroyuki Kijima - KDDI CorporationHiroyuki Kijima, KDDI’s Product Manager for Private Cloud said “CloudStack is very stable and has a simple to use interface making it quick for our customers to get up to speed on. In addition, its API offers quick integration with existing customer networks and it is customisable, meaning it is easy to add additional services. Its flexible inter-cluster migration means we can easily migrate a customer’s resources when onboarding them

 

 

About KDDI Corporation

KDDI Corporation Logo - CloudStack Case StudyKDDI Corporation is Japan’s global telecommunications pioneer and a Global Fortune 500 company providing an international customer base with data centers, networks, content delivery, system integration, and more around the world. They started operations in 1953 and with 10,000+ employees and US$48 billion in annual revenue are one of Asia’s largest telecommunications providers. KDDI has developed Japan’s largest public cloud to deliver cloud services across Asia, Europe, and the US.

KDDI’s Cloud environment – KCPS (KDDI Cloud Platform Service) –  operates in Japan with thousands of physical hosts delivering a service with 99.999% availability.

In December 2020, Red Hat announced major changes to the roadmap of its CentOS distribution, including the early end of life of CentOS 8 in December 2021. Due to this sudden and unexpected change, new Linux distributions have been developed as possible alternatives to CentOS, which are binary-compatible with Red Hat Enterprise Linux 8 (RHEL8). One such distribution is Rocky Linux (named as a tribute to early CentOS co-founder Rocky McGaugh).

 

Rocky Linux Support in CloudStack

Led by Gregory Kurtzer, founder of the CentOS project, Rocky Linux is a community-supported, production-grade enterprise operating system designed to be 100% compatible with RHEL8. With its General Availability release in June, it will be a suitable replacement for CentOS 8 due to its binary compatibility and easy migration path.

In keeping with the CloudStack legacy of supporting popular OSes and hypervisors, support has been added for Rocky Linux. CloudStack can now be installed on Rocky Linux 8.4, which can also be used as a Management and Usage server, as well as a KVM hypervisor host. Additionally, there is no reason that CloudStack could not be installed and used on other EL8 binary-compatible versions such as RHEL 8, Oracle Linux 8, CentOS Stream, and Alma Linux 8. However, we have only tested and verified Rocky 8 explicitly.

 

 

openSUSE Support in CloudStack

Support has also been added for openSUSE Leap, an open-source community project with Linux-based distributions sponsored by SUSE Software Solutions Germany GmbH and other companies. OpenSUSE Leap is openSUSE’s regular release with guaranteed stability, and can be used as a Management and Usage Server as well as a KVM hypervisor host. Since openSUSE Leap is compatible with SUSE Linux Enterprise, CloudStack could be installed and used on SUSE Linux Enterprise as well. However, only OpenSUSE Leap has been tested and verified for most features.

 

The addition of these Operating Systems to the plethora of those already supported in CloudStack expands its influence, ensuring that CloudStack is the prevalent turnkey solution in the ever-changing world of Cloud Infrastructure.

These new features will be available in the 2021 Q3/4 LTS release of CloudStack along with several other additions and improvements.

CloudStack UI - Enable Bulk Actions

Want to start or stop a VM (Virtual Machine), or delete an offering in CloudStack? No big deal. However – what if you need to delete 20 offerings, or stop 20 VMs? Having to perform the same operation on each individual item is time-consuming, laborious and can be frustrating. Bearing this in mind, we’ve come up with a nice little enhancement to improve the user experience by making it possible to perform bulk operations via the UI.

Various views in the UI can already show checkboxes against items (such as VMs, events and alerts) making it easier for users to perform common operations on multiple items. In the next LTS release (Q3/4 2021), we’ve amped it up a notch. Not only have we extended the support for additional items, we have also added a progress view window to keep track of the status of the operations in flight.

With this enhancement, you can perform bulk actions on the following resources/operations:

  • VM: start, stop, restart, destroy
  • Kubernetes: start, stop, destroy
  • Instance groups: delete
  • SSH keypair: delete
  • Affinity groups: delete
  • Volumes: destroy
  • Snapshots: delete
  • VM snapshots: delete
  • Networks: restart, delete
  • VPC: restart, delete
  • Public IP Addresses: delete
  • VPN customer gateway and VPN users: delete
  • Firewall, port forwarding, LB rules: delete
  • Template and ISOs: delete (In the zone tab)
  • Events and Alerts: archive, delete
  • Projects: active, suspend, delete
  • Accounts: enable, disable, lock, delete
  • System VMs and Virtual Routers: stop, start, reboot and destroy
  • Internal LB: stop, start
  • Compute, system, disk and backup offerings: delete
  • Network and VPC offerings: enable, disable, delete

If an item supports bulk actions, it will have a checkbox as the first column in the list view. Checking this box will display the supported bulk operations available for that item. On clicking the button to perform the action (eg. start operation on VMs) it will display a pop-up window listing the resources that you have selected for validation.

Upon clicking OK, the operation begins, and the window will transition to the progress view.

In the bulk action dialog boxes, we’ve reversed the colours of the ‘Cancel’ and ‘OK’ buttons, to reduce the risk of performing bulk operations (like delete) on resources that you didn’t intend to. We’ve also added an eye-catching alert message for operations that can be categorized as destructive in nature, such as delete or power off.

When the progress dialog box is open the usual notifications that appear at the top right corner of the screen are suppressed. This is as we can see the status of the operation in the current window. However, the notification list will still be populated so users can understand the reason for any individual failures. We’ve also added a filter button to the Operation Status column which is useful when there are many resources, and you want to filter resources in a particular state (In Progress, Success, Failed).

Another cool addition to the user experience – we’ve added links to the notifications of failed operations or tasks. So, now you can just click on the highlighted name, and it’ll take you to the corresponding resource’s info/details view.

These enhancements to the UI will be available in the 2021 Q3/4 release of Apache CloudStack. Hope you’ll like it!

Thursday, May 27 was the first-ever vCSEUG, and the first time the community had met since February 2020 (the last pre-COVID meetup in Berlin). It was time to reconnect! We have missed the chance to interact with other community members, learn what’s new in CloudStack and hear from the companies using it. So, we needed to find a solution to overcome all barriers. Organizing a virtual event was not only a great chance for the EU User Group members to meet but also to invite CloudStack community members and contributors from all around the globe to join in.

About the Attendees and Speakers

The vCSEUG proved to be a huge success. People joined from 23 countries and 4 continents (by my count) – from Germany, UK, Switzerland, India, Bulgaria, Greece, Poland, Serbia, Brazil, Chile, Russia, USA, Canada, Japan, France, Uruguay, Korea … (apologies if I have missed anyone)! We also had a record number of registrations and attendees for a CloudStack User Group Event. Physical distance was not an issue for our speakers, who joined the event from 6 different countries.

As usual, this was a half-day event, but we chose to have shorter talks so that we could accommodate a wider range of topics, proving that when you want to learn new things and meet the CloudStack community, nothing can stop you. During the day, we enjoyed 6 sessions from industry leaders, with conversation continuing in the ‘vPub’ after the last talk.

Without the need for sponsorship for a venue and refreshments, instead, we surprised some of the attendees with great prizes provided by the event sponsors – ShapeBlue, LINBIT, Dimsi, StorPool and PC Extreme – all of whom helped us to make the virtual event happen.

 

vCSEUG Proved to be a Huge Success

You may ask: “What is the secret behind making a virtual event happen?”. We think it is the great community we have and its dedication to doing something valuable and exciting after a long time of being not able to meet. This was evident by the number of attendees and the quality of talks, questions and ongoing collaboration.

If you missed the event or some of the talks, we are happy to share recordings as usual and will make all slides available on SlideShare. Continue reading and discover more about the event sessions and our awesome speakers.

 

vCSEUG Talks and Presentations

 

What’s New in CloudStack 4.15 – Giles Sirett

vCSEUG started with a talk from Giles Sirett, Chairman of the CSEUG and PMC member, Apache CloudStack. Giles welcomed the attendees and shared in-depth insights about the new features and functionalities in CloudStack 4.15. He also provided info on when 4.15.1 and 4.16 are expected, presented the new VP of Apache CloudStack (Gabriel Brascher), talked through the latest integrations to CloudStack, improvements in the UI, new OS support, advanced capabilities of vSphere, OVF support, dynamic roles enhancements and more. Giles talk in full here:

 

Customising the CloudStack UI – Abhishek Kumar

The next talk came from Abhishek Kumar, Software Developer at ShapeBlue, who focused on customizing the new UI. It aimed to teach administrators how to tailor the UI aesthetics to their organization’s preferences. The session targeted advanced users, training them to alter the layout of the UI, add, remove, or restrict resource actions from the UI. You can read more on how to customize the CloudStack UI on our blog and watch a full recording of Abhishesk’s session:

 

From Мetal to Service: 100% automation with Apache CloudStack and Ansible – Rafael del Valle

The next talk was presented by Rafael del Valle, Co-Founder of Celpax. His session was “From metal to service: 100% automation with Apache CloudStack and Ansible”. Celpax.com has recently deployed Apache CloudStack on Hetzner+Premises with full metal to service automation. In this talk, Rafael presented their success story. Furthermore, he shared why they chose open-source technologies and what advantages they got.

 

 

KVM High Availability Regardless of Storage – Gabriel Brascher

One of the most highly-anticipated talks was by the new Apache Cloudstack VP – Gabriel Beims Bräscher – talking about KVM High Availability Regardless of Storage. One of the great advantages of CloudStack is that it is vendor-independent, meaning you can decide the technology stack above and below based on your experience or needs. Having High Availability enabled for KVM hosts can improve greatly the QoS by handling (fence/recover) a problematic Host as well as re-starting its stopped VMs on healthy hosts. However, there is a limitation on CloudStack HA for KVM – it relies mainly on NFS heartbeat script checks. Gabriel’s talk illustrated how CloudStack HA works for KVM hosts and presented a way of improving its implementation in a way that KVM HA works with any storage system pluggable on KVM, not just NFS.

 

CloudStack and Tungsten Fabric SDN Integration – Simon Weller, Radu Todirica

After a short break, we continued with CloudStack and Tungsten Fabric SDN Integration Update, presented by Radu Todirica and Simon Weller from Education Networks of America (ENA). Over the past year, ENA and EWERK have been collaborating on a new ACS plugin for the Tungsten Fabric SDN controller. Simon Weller and Radu Todirica provided an update on progress, feature overview and a live demo.

 

CloudStack Deployments for Edge Use Cases – Rudraksh Kulshreshtha

The honour of the last talk of the day was given to Rudraksh Kulshreshtha from IndiQus, who presented how to architect lean CloudStack deployments for Edge use cases.

 

After the last talk, and the usual round of questions and answers, we headed over to the vPub where conversation, collaboration and debate continued. We would like to thank all attendees, sponsors and people engaged in the event who made it happen.

See you soon on another virtual or maybe live event …

 

The latest version of XCP-ng – the opensource hypervisor based on XenServer – XCP-ng 8.2 was released in November 2020, and is the first long term support (LTS) version. As such it will receive support and updates for the next 5 years compared to only a year for a standard XCP release. The hypervisor is getting more and more popular in the open-source world thanks to its modern and user-friendly UI, scalability, live migration capabilities and security level.

XCP-ng 8.2 comes with a wide range of new capabilities including UEFI support, openflow controller access, native support for Gluster, ZFS, XFS, CephFS, new CPUs support and security improvements.

CloudStack has included support for XCP-ng since its early days. CloudStack 4.15.0 added support for  XCP-ng 8.0 and 8.1, and the upcoming 4.15.1 will add support for XCP-ng 8.2 LTS. 4.15.1 will also add guest OS mappings for new and missing operating systems for XCP-ng 8.1 and 8.2. Support for the following operating systems has been added:

  • SUSE Linux Enterprise Desktop 12 SP3 (64-bit)
  • SUSE Linux Enterprise Desktop 12 SP4 (64-bit)
  • SUSE Linux Enterprise Server 12 SP4 (64-bit)
  • Scientific Linux 7
  • NeoKylin Linux Server 7
  • CentOS 8
  • Debian Buster 10
  • SUSE Linux Enterprise 15 (64-bit)

XCP-ng Support in CloudStack

Additionally, CloudStack 4.15.1 will also come with fixes for dynamic scalable template behaviour for XCP-ng and XenServer hypervisor.

 

Introduction

In the previous two parts of this article series, we have covered the complete Ceph installation process and implemented Ceph as an additional Primary Storage in CloudStack. In this final part, I will show you some examples of working with RBD images, and will cover some Ceph specifics, both in general and related to the CloudStack.

RBD image manipulations

In case you need to do some low-level client support, you can even try to mount that image as the local disk on any KVM (or Ceph) node. For this purpose, we use a tool, conveniently named, “rbd”, which is used to operate RBD images in general (i.e. to create new images, snapshots, clones, delete images, etc.).

From any KVM node, let’s attempt to use rbd kernel client to mount an image:

[root@kvm1 ~]# rbd map cloudstack/ad9a6725-4a65-4c8b-b60a-843ed88618be
rbd: sysfs write failed
RBD image feature set mismatch. Try disabling features unsupported by the kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

As you can, see the above map command will fail. This happens as we are running on default kernel 3.x on the latest CentOS 7.6 and some of the new Ceph image features are not supported by the kernel client. Actually, even upgrading kernel to 5.0 will not bring Ceph kernel client to a usable state for our Mimic (or even Luminous) cluster – so one could wonder why kernel client exists at all….

So what we should do is to use rbd-nbd tool to map an image. Rbd-nbd is a client for RADOS block device (RBD) images similar to rbd kernel module, but unlike the rbd kernel module (which communicates with Ceph cluster directly), rbd-nbd uses NBD (generic block driver in kernel) to convert read/write requests to proper commands that are sent through network using librbd (user space client).

So, as stated, we are using librbd which is always on par with the cluster capabilities/features. But here the fun begins – the NBD kernel driver is not available by default with CentOS 7 (Red Hat decided to not include it with their kernel), so you have a couple of options: either you can rebuild the specific kernel version from the official kernel sources and extract the NBD kernel module or you can upgrade a kernel to the one provided by a well-known Elrepo repository, which I find simpler and easier to manage in the long run. For those of you possibly not familiar with Elrepo repository, this is a community repository, providing many different kinds of additional packages for Centos/RHEL, including fresh kernel versions – Elrepo kernel packages are built from the official sources while the kernel configuration is based upon the default RHEL configuration with added functionality enabled as appropriate.

A simple way to move to Elrepo kernel would be as following:

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
yum install https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel install kernel-lt

If the above URLs become invalid, please find the new one at https://elrepo.org/tiki/tiki-index.php.

Now that we’ve got the new kernel installed, let’s check the order of kernels offered for boot:

[root@kvm1 ~]# awk -F\' /^menuentry/{print\$2} /etc/grub2.cfg
CentOS Linux (4.4.178-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux (3.10.0-957.10.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-693.11.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-327.el7.x86_64) 7 (Core)
CentOS Linux (0-rescue-76b17df4966743ce9a20fe9a7098e2b6) 7 (Core)

Above we see our new kernel (version 4.4) on position 0, so let’s set it as the default one, and reboot:

grub2-set-default 0
reboot

Finally, with the NBD driver in place (as part of new kernel), let’s install rbd-nbd and mount our image:

[root@kvm1 ~]#  yum install rbd-nbd -y 
[root@kvm1 ~]#  rbd-nbd map cloudstack/ad9a6725-4a65-4c8b-b60a-843ed88618be
/dev/nbd0

The above map command completed successfully (make sure that the image/volume is not mounted elsewhere to avoid file system corruption) and we can now operate /dev/nbd0 as any other locally attached drive, i.e.:

[root@kvm1 Ceph]# fdisk -l /dev/nbd0
 
Disk /dev/nbd0: 5368 MB, 5368709120 bytes, 10485760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4194304 bytes / 4194304 bytes
Disk label type: dos
Disk identifier: 0x264c895d
 
     Device Boot      Start         End      Blocks   Id  System
/dev/nbd0p1            8192    10485759     5238784   83  Linux

You can also show any other mapped images (if any):

[root@kvm1 Ceph]# rbd-nbd list-mapped 
id    pool       image                                snap device
11371 cloudstack ad9a6725-4a65-4c8b-b60a-843ed88618be -    /dev/nbd0

When done with this, unmap the RBD image with:

[root@kvm1 Ceph]# rbd-nbd unmap /dev/nbd0 

Alternatively, just to make the above exercise more complete, we can also attach the RBD image to our host with qemu-nbd (as you can guess this also requires NBD kernel module, a.k.a. newer kernel):

qemu-nbd --connect=/dev/nbd0 rbd:cloudstack/ad9a6725-4a65-4c8b-b60a-843ed88618be
qemu-nbd --disconnect /dev/nbd0

Again, the above tool talks to librbd, which relies on ceph.conf and admin key being present in /etc/ceph/ folder.

RBD image manipulations, for real this time

Away from the NBD magic and back to “rbd” tool – let’s briefly show some usage examples to manipulate RBD images.

At this point, I would expect you to be able to create a Compute Offering of your own, targeting Ceph as Primary Storage pool (similarly to how we created a Disk offering with “RBD” tag) and create a VM from a template. Assuming you have done so, let’s examine this VM’s ROOT image on Ceph.

Let’s list all volumes in our Ceph cluster:

[root@ceph1 ~]# rbd -p cloudstack ls
d9a1586d-a30b-4c52-99cc-c5ee6433fe18
fb3ee723-5e4e-48b3-ad7d-936162656cb4

In my example above (an empty cluster), I have only 2 images present, so let’s examine them:

[root@ceph1 ~]# rbd info cloudstack/d9a1586d-a30b-4c52-99cc-c5ee6433fe18
rbd image 'd9a1586d-a30b-4c52-99cc-c5ee6433fe18':
        size 50 MiB in 13 objects
        order 22 (4 MiB objects)
        id: 1b1d26b8b4567
        block_name_prefix: rbd_data.1b1d26b8b4567
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features:
        flags:
        create_timestamp: Mon Apr  8 21:40:21 2019

[root@ceph1 ~]# rbd info cloudstack/fb3ee723-5e4e-48b3-ad7d-936162656cb4
rbd image 'fb3ee723-5e4e-48b3-ad7d-936162656cb4':
        size 50 MiB in 13 objects
        order 22 (4 MiB objects)
        id: 1b250327b23c6
        block_name_prefix: rbd_data.1b250327b23c6
        format: 2
        features: layering
        op_features:
        flags:
        create_timestamp: Mon Apr  8 21:50:10 2019
        parent: cloudstack/d9a1586d-a30b-4c52-99cc-c5ee6433fe18@cloudstack-base-snap

What we see above is the first image of 50 MB in size (here I’m using a very small template from our community friends at http://www.openvm.eu/). For the second image, we see that it has it’s “parent” set to the first image. What is happening here is that CloudStack will copy over the template from Secondary Storage (creating image d9a1586d-a30b-4c52-99cc-c5ee6433fe18), it will then create a snapshots from this image (“cloudstack-base-snap”, as shown above) and protect it (required from Ceph side), and then create an image clone ( image fb3ee723-5e4e-48b3-ad7d-936162656cb4) with a parent image being the previously created/protected snapshot. This is effectively a linked clone setup, in Ceph’s world.

Let’s quickly emulate above behavior manually – we will just create an empty file instead of copying a real template from Secondary Storage pool:

rbd create -p cloudstack mytemplate --size 100GB
rbd snap create cloudstack/mytemplate@cloudstack-base-snap
rbd snap protect cloudstack/mytemplate@cloudstack-base-snap
rbd clone cloudstack/mytemplate@cloudstack-base-snap cloudstack/myVMvolume

Finally, let’s check our “myVMvolume” image:

[root@ceph1 ~]# rbd info cloudstack/myVMvolume
rbd image 'myVMvolume':
        size 100 GiB in 25600 objects
        order 22 (4 MiB objects)
        id: fcba6b8b4567
        block_name_prefix: rbd_data.fcba6b8b4567
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features:
        flags:
        create_timestamp: Mon Apr  8 22:08:00 2019
        parent: cloudstack/mytemplate@cloudstack-base-snap
        overlap: 100 GiB

The reason we have above protected a snapshots of the main template, is that it makes it impossible to delete this template file and thus cause major damage (effectively destroy all VMs from this template). So in order to revert the exercise from above, we would need to first remove a clone (VM image), unprotect and delete the snapshot, and finally delete the “template” image:

[root@ceph1 ~]# rbd rm cloudstack/myVMvolume
Removing image: 100% complete...done.
[root@ceph1 ~]# rbd snap unprotect cloudstack/mytemplate@cloudstack-base-snap
[root@ceph1 ~]# rbd snap rm cloudstack/mytemplate@cloudstack-base-snap
Removing snap: 100% complete...done.
[root@ceph1 ~]# rbd rm cloudstack/mytemplate
Removing image: 100% complete...done.

As you can see from above, the “rbd” tool is the tool to use to manipulate RBD images – for many other examples, please see http://docs.ceph.com/docs/master/rbd/rados-rbd-cmds/

In the next sections, I will try to cover a bit of what to expect, dos and don’ts of using Ceph cluster – again, this is not meant to be a comprehensive guide, since CloudStack storage feature sets related to Ceph and also Ceph itself is constantly changing and improving.

Cloudstack with Ceph

CloudStack has supported Ceph for many years now and though in the early days, many of the CloudStack’s storage features were not on pair with NFS, with the wider adoption of Ceph as the Primary Storage solution for CloudStack, most of the missing features have been implemented. That being said, there are still some minor specifics with Ceph:

  • In contrast to NFS, which provides a file system, Ceph provides raw block devices for VMs and thus it’s not possible to make a full VM snapshot the way it can be done with NFS – i.e. there is no filesystem to which to write the content of the RAM memory or other metadata needed.
  • Continuing on above, since no file system, there is (currently) no way to write a heartbeat file the way it’s implemented on NFS (but some work is currently being planned around this)
  • Speaking of a simple volume snapshot, it’s currently not possible to really restore a volume from a snapshot (which became possible in 4.11 with NFS or SolidFire Managed Storage). But this is rather a simple thing to implement.
  • By using Ceph with KVM and CloudStack, there are two external libraries used – librbd (a user-space client, used by Qemu to speak to Ceph cluster) and rados-java (a Java wrapper for librados). Historically, there have been some issues/bugs with rados-java (though they have been resolved long time ago), so you should probably keep an eye on these two and make sure they are up to date.

Learning curve

Ceph can be considered a rather complex storage system to comprehend and it definitively has a steep learning curve. Even when using Ceph on its own (i.e. not with CloudStack), make sure you know the storage system well before relying on it in production and also make sure to be able to troubleshoot problematic situations when they arise. Anyone prepared to put in a bit of effort and planning can deploy a nice Ceph cluster, but it takes skills and a deeper understanding of how things work under the hood when it comes to troubleshooting unusual situations or how (for example) replacing failed disks (or nodes) can influence performance of the clients (in our case, CloudStack VMs). Based on my previous experience managing a production Ceph cluster, I would definitively strongly suggest taking the above recommendations very seriously. On the other hand, experimenting with such an advanced storage solution can be really interesting and rewarding.

Performance considerations

Ceph is a great distributed storage solution that performs very well under sequential IO workload, can scale indefinitely and has a very interesting architecture. However, as with any distributed storage solution, when you write data to a cluster, it actually takes time to write that data to first node, have that write operation replicated to other 2 nodes (replica size 3) and send back ACK to the client that the write is successful. In case you are using NVME devices, like some users in community, you can expect very low latencies in the ballpark of around 0.5ms, but in case you opt out for a HDD based solution (with journals on SSDs) as many users do when starting their Ceph journey, you can expect 10-30ms of latency, depending on many factors (cluster size, networking latency, SSD/journal latency, and so on). In other words, don’t expect miracles with commodity hardware – if something “works on commodity hardware”, that doesn’t mean it also performs well. Actually, until 2-3 years ago, Ceph was (unofficially) considered unsuitable for any more serious random IO workload, which is backed up by referenced guides published between major HW vendors and RedHat, where they clearly stated that Ceph is not the “best choice” when it comes to random IO. In contrast to sequential IO workload benchmarks (where Ceph actually shines), these reference guides were completely missing any benchmark data with random IO workload/pattern. That being said, with the introduction of BlueStore as the new storage backend in recent Ceph releases and with some more architectural changes, Ceph has become much more suitable for pure SSD (and NVME) clusters and performance has improved drastically.

I hope this Ceph article series has been useful and interesting and helped you get up and running. All in all Ceph is an interesting development in the storage space and can provide a true cloud storage solution if implemented correctly.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.

Blog by Ivet Petrova, StorPool.

On June 13th, StorPool had the honour and privilege to host and organize the European Cloud Infrastructure and CloudStack User Group together with its partner ShapeBlue. The event was a get together of the local IT infrastructure experts and CloudStack users. Main focus were talks presenting best practices and useful information on how to build an efficient public or private infrastructure. In addition, worlds leading experts and contributors to the open-source Apache CloudStack Project presented its latest functionalities and updates in the project.

What is CloudStack? Key features and use cases

CloudStack is a scalable cloud orchestration platform for delivering turnkey infrastructure as a service clouds. As it is relatively easy to deploy and manage, it attracts the attention of people considering which cloud management system to use. Firstly, its architecture is highly scalable and reliable. The most massive known production cloud with CloudStack installation was reaching approx. 35 000 physical hosts and was running smoothly. Secondly, CloudStack is hypervisor agnostic. It supports KVM, Xen, VMware, HyperV, OVM, etc. Moreover, it also presents a REST API and is used for cloud infrastructure as a service, containers as a service, and many more use cases in which enterprises need a reliable solution to manage complex infrastructure and virtualizations.

CloudStack supports different storage options, and StorPool has its CloudStack integration. More about the story of building StorPool’s CloudStack integration, you can read here. 

CloudStack Market Growth

The European Cloud Infrastructure and CloudStack User Day started with a keynote session of Giles Sirett, CEO of ShapeBlue and widely recognized contributor to the Apache project. Giles talked us through the history of CloudStack, its main advantages, and the value it can bring to companies. After that, he made an overview of interesting use case and shared information for its releases and user communities. According to him, the most significant value of CloudStack is that it is a user-driven project and community, which makes it vibrant and rapidly developed. In conclusion, Giles also shared that CloudStack adoption is quickly growing and now it is used by some of the biggest companies globally.

Achieving the ultimate performance with KVM

Next to the stage was Boyan Krosnov, CPO of StorPool. In his session, he discussed a private cloud setup with KVM achieving 1M IOPS per hyper-converged (storage+compute) node. Besides, Boyan answered the question: What is the optimum architecture and configuration for performance and efficiency? His session was a deep technical dive into the ways for building an efficient and high-performance cloud infrastructure. Furthermore, Boyan explained why performance matters and how many companies even do not understand they are struggling with performance issues … until the moment their customers notify them for this.

During the presentation, the CPO of StorPool covered essential aspects of building cloud infrastructure, part of which were:

  • why the same hardware can bring you 10 times better performance  than before
  • how hardware, compute and networking affect the performance
  • tips and trick for getting ultimate KVM performance
  • …and many more

Boyan advised all participants in the event to pay attention on their cloud performance, apply possible optimizations for accelerating it and closely monitor the cloud performance.

CloudStack: A Service Managers Perspective

After a short break, we welcomed Maria Barta from Itelligence Global Managed Services GmbH. Maria presented a different perspective on CloudStack – “A Service Managers Perspective”. Agile business processes are becoming increasingly important in successful IT services. Itelligence GmbH, provides many different ultra-flexible and highly adaptable cloud solutions. To ensure customer / user satisfaction (i.e. availability, data security and product transparency) and simultaneously facilitate effective agile product development within their team, the role of the service manager is steadily evolving. As a conclusion, the talk provided an insight to the benefits and limitations of CloudStack in relation to the service manager objectives and Maria’s attempt to overcome these in her specific internal IaaS solution.

What’s new in CloudStack 4.13

Paul Angus, VP Technology of ShapeBlue and current VP of CloudStack. He was one of the most awaited speakers at the event. Mainly because he is the most experienced person in the community, who has exceptional knowledge in CloudStack. His session was focused on the new release of CloudStack. The 4.13 version is due for release this summer. With 100s of updates and new features, Paul went through user features. He also talked about operator features and integrations, demonstrating just how much work and development is going into CloudStack.

Paul also shared that version 4.14 most probably will arrive at the end of 2019 / beginning of 2020. He enjoyed great attention from the European CloudStack community and managed to give valuable pieces of advice to the admins dealing with complex cloud issues.

Challenges with high-density networks 

Last, but not least, Marian Marinov from SiteGround web hosting company shared his experience in the problems when managing high-density networks. In cloud environments, people consider the network as a given and almost limitless resource. You get an interface and you are told its bandwidth capacity. From the perspective of the client, this is true. But from the perspective of the provider, this is far from the truth. In his talk, Marian took a look at some DataCenter network designs and what technologies / protocols were used to battle the problem with high-density clouds. All participants in the event had the chance to learn about VXLAN and “L3” switching.

After the final official talk, we managed to organize a great networking event between the speakers and the event attendees. One more opportunity to learn new things for cloud infrastructure and about building a cloud with StorPool and CloudStack.

For StorPool’s team, it was a pleasure to be host and co-organizer of the event and to put the beginning of a new CloudStack community in Bulgaria.

Our presenters’ slides can be found here:

Giles Sirett – CloudStack EU User Group 13 june 2019 – Sofia

Boyan Krosnov – Achieving the ultimate performance with KVM

Maria Barta – CS Day Sofia_ CS – A service manager perspective_20190613

Paul Angus – CSEUG19-What’s coming in CloudStack

Marian Marinov – Challenges with high-density networks

 

In the previous article we covered some basics around Ceph and deployed a working Ceph cluster. In this article, we are going to finish the Ceph configuration needed for CloudStack and add it as a new Primary Storage pool. We are also going to deploy Ceph volumes via CloudStack and examine them. Finally, in part 3 (to be published soon), I will show you some examples of working with RBD images and will cover some Ceph specifics, both in general and related to the CloudStack.

Before proceeding with the actual work, let me first mention that CloudStack supports Ceph with KVM only, so most of the work we do below is KVM related. Let’s define the high-level steps to be done:

  • Create a dedicated RBD pool for CloudStack in which all RBD images (volumes) will be created
  • Create a dedicated authentication key for the previously created pool
  • Update / install required Ceph binaries on KVM nodes
  • Add Ceph as Primary Storage in CloudStack
  • Implement custom storage tag for Ceph Primary Storage
  • Create new Compute / Disk offerings with same storage tag in order to target Ceph

From any Ceph node…

Ceph groups RBD (RADOS block device) images in pools and manages authentication on a per pool level. Each image is collection of many RADOS objects, with each object having a default size of 4MB (configurable per image). At this moment we have no pools created. But before creating a pool, let’s go through some basics around the different kind of pools in Ceph.

There are 2 kind of pools, based on the way the objects are stored across cluster:

  • Replicated – makes sure that there are always total of N replicas/copies of an object
  • Erasure Coding – simplest way to think of this is a network RAID 5/6

Replicated pools are used for better performance at the expense of space consumption, and you can think of it as a network-based RAID 1, where we have n number of replicas of an object. On the other hand, erasure coding pools are usually used when using Ceph for S3 Object Storage purposes and for more space efficient storage where bigger latency and lower performance is acceptable, since it is similar to RAID 5 or RAID 6 (requires some computation power). Here, for example, we may have 4 chunks of actual data and 2 parity chunks (EC 4+2), with just 50% of space overhead, while (depending on the setup), we can still survive losing a Ceph node or even two.

So, let’s create a dedicated pool for CloudStack, set its replica size and finally initialize it:

ceph osd pool create cloudstack 64 replicated
ceph osd pool set cloudstack size 3
rbd pool init cloudstack

The commands above will create a replicated pool named “cloudstack” with total of 64 placement groups (more info on placement groups here) with a replica size of 3, which is recommended for a production cluster. Optionally, you can set replica size of 2 during testing, for somewhat increased performance and less space consumed on the cluster.

Next, let’s generate a dedicated authentication key for our CloudStack pool:

ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack'

The command above will output a key to STDOUT only – please save the given key, since we will use it when adding Ceph to CloudStack later:

[client.cloudstack]
key = AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

Now that the pool for CloudStack is ready, we need to prepare KVM nodes with proper Ceph binaries as well as the write-back caching configuration.

From the Ceph admin node…

Starting from Centos 7.2  (and Ubuntu 14.04), libvirt / QEMU comes by default with support for RBD, so there’s no need to compile the binaries yourself. That being said, if we check KVM nodes with “rpm -qa | grep librbd1″ it will return an existing versions of ‘librbd1” package (version 10.2.5 in my case)  already installed, but most certainly it will not be the current version that corresponds to the cluster version we just installed (13.2.5 in this case). For the record, librbd is a user space Ceph client, to which the qemu / libvirt talks effectively.

Furthermore, if we run command “ceph features” from any Ceph node, it will return (in our fresh Mimic cluster) “luminous” as the minimum compatible release version for the client – that means that our Ceph client (librbd) needs to be of a minimum of “luminous” version (which translates to 12.2.0), but our current librbd version is 10.2.5 – so let’s upgrade it to same Mimic versions as the version of our cluster:

ceph-deploy install --cli kvm1  kvm2

The command above will add Mimic repo to my two KVM nodes and install only the cli binaries (“ceph-common” package). This will also trigger the upgrade of existing “librbd1” package to the correct version. In addition, please make sure that name resolution of the KVM nodes works from the Ceph admin node.

Optionally, if you don’t want to install Ceph cli tools on KVM nodes, you can just upgrade the “librbd1” package while having previously created a proper Ceph Mimic repository on each KVM node (i.e. clone repo file from any Ceph cluster node).

Some of you might want to be able to manage Ceph cluster from KVM nodes as well (beside being able to manage it from Ceph nodes) and to be able to interact with RBD images with via “rbd” or “qemu-img“ tools – in this case we need this “rbd” tool installed on KVM nodes (part of “ceph-common” package, already installed in previous step), then we need ceph.conf locally on KVM nodes in order for the “rbd” tool to know how to connect to cluster, which MONs to target, etc. and finally we need the admin authentication key – this is the file “ceph.client.admin.keyring” which was created on our Ceph admin node when we created our cluster initially (in folder /root/CEPH-CLUSTER, as mentioned in Part 1 of this article series).

Additionally, if we want to use qemu-img tool to examine RBD images, we can either have qemu-img installed on the Ceph cluster nodes or we have to provide the above mentioned ceph.conf and admin keys in their default location (/etc/ceph/) on the KVM nodes, where librbd (client) will pick them up automatically, so we don’t need to specify MON IP/URL and admin key on the command line.

If you don’t want to be able to manage your Ceph cluster from KVM nodes, simply don’t copy over the “ceph.client.admin.keyring” file to KVM nodes. The ceph.conf file is still a must due to RBD caching as explained later. I have decided to make my KVM nodes happy by providing them with ceph.conf and admin keys, as below:

ceph-deploy admin kvm1 kvm2

The command above will effectively just copy ceph.conf and ceph.client.admin.keyring files to /etc/ceph/ folder on KVM nodes. Actually, you can still operate RBD images and manage your cluster from KVM nodes even if you don’t have ceph.conf and admin key present locally – you can always pass required parameters on the command line to “rbd” or “qemu-img” tools, as shown later.

RBD caching

After we have pushed the ceph.conf file to KVM nodes, librbd will read it for any configuration directives under the “[client]” section of that file (beside the other sections), but that section is missing at this moment!

Before we proceed into configuring the RBD caching, let me do here a copy/paste from the original docs that is important to understand regarding RBD caching:

” The user space implementation of the Ceph block device (i.e., librbd) cannot take advantage of the Linux page cache, so it includes its own in-memory caching, called “RBD caching.” RBD caching behaves just like well-behaved hard disk caching. When the OS sends a barrier or a flush request, all dirty data is written to the OSDs. This means that using write-back caching is just as safe as using a well-behaved physical hard disk with a VM that properly sends flushes (i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) algorithm, and in write-back mode it can coalesce contiguous requests for better throughput. “

After digesting the above info, we can proceed into a brief configuration of caching. We can either fix it manually on each KVM node by adding the missing section in ceph.conf file, or we can do it in a more proper way by changing ceph.conf on the Ceph admin node and then pushing new file version to all KVM (and optionally Ceph cluster) nodes:

cat << EOM >> /root/CEPH-CLUSTER/ceph.conf
[client]
  rbd cache = true
  rbd cache writethrough until flush = true
EOM
 
ceph-deploy --overwrite-conf admin kvm1 kvm2

Please note the above “writethrough until flush = true”. This is a safety mechanism which will force writethrough cache mode until it receives the very first flush request from the VM OS (which means that the OS is sending proper flush requests to the underlying storage, i.e. kernel >= 2.6.32) and then cache mode will change to the write-back, which actually brings performance benefits.

If case you want to play more with RBD caching, please see here – where you can find some important default values which we didn’t explicitly configure i.e. default rbd cache size is 32 MB (this is per volume) – so in case of 50 VMs with 4 volumes each, that translates to 50 x 4 x 32MB =  6.4GB of additional RAM consumed on a KVM host – keep that in mind !

Finally, let’s add the Ceph to CloudStack as an additional Primary Storage – we can do it via GUI or optionally via CloudMonkey (API) as following:

 

Or via CloudMonkey:

create storagepool scope=zone zoneid=3c764ee1-6590-417d-b873-f073d0c550be hypervisor=KVM name=MyCephCluster provider=Defaultprimary url=rbd://cloudstack:AQAFSZpc0t-BIBAAO95rOl+jgRwuOopojEtr_g==@10.2.2.219/cloudstack tags=RBD

Most of the parameters are self-explanatory but let’s explain a few of them:

  • RADOS Monitor: This is the IP address (or DNS name) of the Ceph Monitor (MON) instance – in my case I have defined a very first MON instance (IP address of the Ceph1 node from my cluster) – but in production environment you will want to have an internal Round Robin DNS setup on some internal DNS server (i.e. single zone on Bind) – such that KVM nodes will resolve the ULR (i.e. mon.myceph.cluster) in a round robin fashion to multiple MON instances – this is the way to achieve high availability of Ceph MONs, though some manual DNS zone changes are needed in case of prolonged MON maintenance
  • RADOS Pool: This is the pool “cloudstack” which we created in the beginning of the article
  • RADOS User and RADOS Secret: This are the values from the authentication key which we generated in the beginning of the article, shown below again for your convenience

[client.cloudstack]
key = AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

The above command, used to add Ceph to CloudStack, will effectively do a few things:

  • On each KVM node, it will create a new storage pool in libvirt
  • The storage pool definition files (xml and the secret) will be written to /etc/libvirt/secrets/ folder as shown below
  • Every time CloudStack Agent is restarted, it will recreate the Ceph storage pool (even if you manually remove the files below)

[root@kvm1]# cat /etc/libvirt/secrets/ef9cfd17-abe1-343d-97a0-cee6c71a6dad.xml
<secret ephemeral='no' private='no'>
  <uuid>ef9cfd17-abe1-343d-97a0-cee6c71a6dad</uuid>
  <usage type='ceph'>
    <name>cloudstack@ceph1.local:6789/cloudstack</name>
  </usage>
</secret>

[root@kvm1]# cat /etc/libvirt/secrets/ef9cfd17-abe1-343d-97a0-cee6c71a6dad.base64
AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

If we check the libvirt pool created above, we can see that it’s not persistent and it doesn’t start automatically – i.e. when you restart libvirt alone, it will not recreate / start the Ceph storage pool in libvirt– the CloudStack agent is the one doing this for us:

virsh # pool-info ef9cfd17-abe1-343d-97a0-cee6c71a6dad
Name:           ef9cfd17-abe1-343d-97a0-cee6c71a6dad
UUID:           ef9cfd17-abe1-343d-97a0-cee6c71a6dad
State:          running
Persistent:     no
Autostart:      no
Capacity:       299.99 GiB
Allocation:     68.19 MiB
Available:      286.02 GiB

Note that in the example above, I was actually using DNS name for the Ceph MON (ceph1.local) instead of the IP – Ceph MON’s DNS name is resolved to IP both when you add Ceph to CloudStack and every time you start a VM or attach new volume, etc. – so DNS resolution needs to be fast and stable here.

Now that we added Ceph to CloudStack, let’s create a Data disk offering with tag “RBD” – this will make sure that any new volume from this offering is created on storage pool with tag “RBD” – which is Ceph in our case . Here, we are using storage tags to avoid messing up with your existing CloudStack installation – but it’s not required otherwise:

(localcloud) SBCM5> > create diskoffering name=5GB-Ceph displaytext=5GB-Ceph storagetype=shared provisioningtype=thin customized=false disksize=5 tags=RBD
{
  "diskoffering": {
    "created": "2019-03-26T19:27:32+0000",
    "disksize": 5,
    "displayoffering": true,
    "displaytext": "5GB-Ceph",
    "id": "2c74becc-c39d-4aa8-beec-195b351bdaf0",
    "iscustomized": false,
    "name": "5GB-Ceph",
    "provisioningtype": "thin",
    "storagetype": "shared",
    "tags": "RBD"
  }
}

Note the offering ID from above (2c74becc-c39d-4aa8-beec-195b351bdaf0) – and let’s create a disk from it:

(localcloud) SBCM5> > create volume diskofferingid=2c74becc-c39d-4aa8-beec-195b351bdaf0 name=MyFirstCephDisk zoneid=3c764ee1-6590-417d-b873-f073d0c550be
{
  "volume": {
    "account": "admin",
    "created": "2019-03-26T19:52:05+0000",
    "destroyed": false,
    "diskofferingdisplaytext": "5GB-Ceph",
    "diskofferingid": "2c74becc-c39d-4aa8-beec-195b351bdaf0",
    "diskofferingname": "5GB-Ceph",
    "displayvolume": true,
    "domain": "ROOT",
    "domainid": "401ce404-44c1-11e9-96c5-1e009001076e",
    "hypervisor": "None",
    "id": "47b1cfe5-6bab-4506-87b6-d85b77d9b69c",
    "isextractable": true,
    "jobid": "49a682ab-42f9-4974-8e42-452a13c97553",
    "jobstatus": 0,
    "name": "MyFirstCephDisk",
    "provisioningtype": "thin",
    "quiescevm": false,
    "size": 5368709120,
    "state": "Allocated",
    "storagetype": "shared",
    "tags": [],
    "type": "DATADISK",
    "zoneid": "3c764ee1-6590-417d-b873-f073d0c550be",
    "zonename": "ref-trl-1019-k-M7-apanic"
  }
}

Finally, since volume creation is a lazy provisioning process (i.e. volume is created in DB only, not really on storage pool), let’s attach the disk to a running VM (using volume ID “47b1cfe5-6bab-4506-87b6-d85b77d9b69c” from previous command output), which will trigger the actual disk creation on our Ceph cluster (output shortened for brevity):

(localcloud) SBCM5> > attach volume id=47b1cfe5-6bab-4506-87b6-d85b77d9b69c virtualmachineid=19a67e20-c747-43bb-b149-c2b2294002f9
{
  "volume": {
    …
    "jobstatus": 0,
    "name": "MyFirstCephDisk",
    "path": "47b1cfe5-6bab-4506-87b6-d85b77d9b69c",
    …  }
}

Note the “path” output field (which is usually the same as the ID of the volume, except in some special cases) – and let’s check our Ceph cluster if we can find this volume and check it’s properties.

From any KVM node…

[root@kvm1 ~]# rbd ls -p cloudstack
47b1cfe5-6bab-4506-87b6-d85b77d9b69c
 
[root@kvm1 ~]# rbd info cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
rbd image '47b1cfe5-6bab-4506-87b6-d85b77d9b69c':
        size 5 GiB in 1280 objects
        order 22 (4 MiB objects)
        id: d43b4c04a8af
        block_name_prefix: rbd_data.d43b4c04a8af
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features:
        flags:
        create_timestamp: Tue Mar 28 19:46:32 2019

We can also examine the Ceph RBD image with qemu-img tool:

[root@kvm1 ~]# qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
image: rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
file format: raw
virtual size: 5.0G (5368709120 bytes)
disk size: unavailable

As you can see in qemu-img command above, we did not specify any username and authentication keys, because we have our ceph.conf and the admin key files present in /etc/ceph/ folder. If you decided to opt-out of having these 2 files present on KMV nodes, you will have to use a cumbersome command as below:

qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c:mon_host=10.2.2.219:auth_supported=Cephx:id=cloudstack:key=AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

In the above command we are specifying the MON IP address, username and key for authentication.

Now that you got the basics of consuming Ceph from CloudStack, feel free to also create Compute Offerings and System Offerings for Virtual Routers, Secondary Storage VM, Console Proxy VM and experiment with volume migration from i.e. NFS to Ceph. Be sure to have storage tags under control.

I hope that this article series has been interesting so far. In part 3 (which will be the final part), I will show you some examples of working with RBD images and will cover some Ceph specifics, both in general and related to CloudStack.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.

Hello all, this is Abhishek Kumar, currently the newest member of the ShapeBlue family. It’s been over a month since I started working as a Software Engineer on Apache CloudStack at ShapeBlue, and I’m here to tell you about how it’s gone.

2019 has been an exciting year for me as I moved from the application development domain to infrastructure development. I always knew it will be a challenging task but also a rewarding one.

The beginning

It was last year that I moved to Gurugram, India, to work for a major med-tech company dealing with navigated intra-operative products. Prior to that, I’d been freelancing as a desktop and mobile application developer. Moving to Gurugram meant getting back in touch with some of the friends and batchmates from college who were already living and working in the city. Late last year one of these friends suggested to me the idea of applying to ShapeBlue, the company he had been working at for a number of years. I had previously heard about ShapeBlue and Apache Cloudstack from him, and I was interested in how the company works with a distributed team and how they contribute to open-source while delivering for their customers – they are community leaders in a sense. Initially I was quite unsure as I had never worked on something like this but after some deliberation, I decided to go through the compact yet effective hiring process of ShapeBlue. It involved two interviews, a coding challenge and a knowledge test on a subject that was chosen because they KNEW it was new to me (they were testing my ability to pick up new concepts very quickly). The whole process only took a week or so before I was hired as a Software Engineer at ShapeBlue!

The learning

This was my first experience of being a developer with infrastructure software and working as part of a large, open-source project. To be honest, to start with, everything was a bit overwhelming as it was mostly new to me – and the people I’m working with are probably the champions of the field. Within this first month, I’ve transitioned from C++ to Java, and have learned complex concepts of networking topologies, hypervisors and many other new subjects. I’ve not only been learning the fundamentals of the Apache Cloudstack project and working on customer projects but I’m also starting to contribute to the open-source community. A large part of this learning can be credited to the awesome training program that ShapeBlue provides for a new joinee. It is a very well-structured training course (called the “hackerbook”) that constitutes several chapters that explain a particular topic and then require the trainee to do some coding exercises to test the acquired knowledge. During the training period, a mentor is assigned to the trainee to clear any doubts, review progress and even have 1-2-1 sessions on complex topics. This contrasts with what I’ve experienced with previous employers and most programmers experience as well, where they are given access to a codebase and some limited documentation and left to figure things out on their own.

The challenges

As expected there have been a number of challenges. Moving from developing consumer-centric small applications to working on massive, complex infrastructure orchestration software would never have been easy. Then there are always those regular things one faces when moving to a new job: onboarding on company infra, learning new services and technologies to do daily tasks, following new practices & policies, etc. With Apache Cloudstack being an open-source project, it adds another dimension as it is not just your own organization but the larger community that you are dealing with.

Apart from the technical aspect I also find the social aspect of onboarding with a new organization a bit testing personally. Being a reserved, quiet, person, gelling with new people isn’t always easy for me. However, over the last month at ShapeBlue I can safely say that all these have been exciting challenges. While the technical aspects were taken care of with well-structured training, the social aspect took care of itself due the intrinsic flat organizational structure at ShapeBlue where everyone has equal say and has the freedom to communicate with anybody else in the company irrespective of their position.

The joy

I have liked being able to jump between my training course and real-world customer facing development. I was able to use the concepts I learned, during this period, in the customer project I’m working on. Within this short span of time, even though I don’t have the expertise that my team has, I still feel like I can make a contribution to the project we are working on. I can still participate in the development of new features for customers and contribute to open-source community to some extent.

Conclusions

My time so far at ShapeBlue has been nothing less than amazing! I could not wish for a better mix of challenges and rewards. Most days I do have to work hard to make sense of a very large codebase or some complex network concepts, but with enough effort, I can work my way through and go home satisfied. Being a software developer in the infrastructure domain can be challenging and learning to become a better and more efficient one might be even harder, but so far, I’m enjoying this job and loving this journey with my new work family: ShapeBlue!

As well as NFS and various block storage solutions for Primary Storage, CloudStack has supported Ceph with KVM for a number of years now. Thanks to some great Ceph users in the community lots of previously missing CloudStack storage features have been implemented for Ceph (and lots of bugs squashed), making it the perfect choice for CloudStack if you are looking for easy scaling of storage and decent performance.

In this and my next article, I am going to cover all steps needed to actually install a Ceph cluster from scratch, and subsequently add it to CloudStack. In this article I will cover installation and basic configuration of a standalone Ceph cluster, whilst in part 2 I will go into creating a pool for a CloudStack installation, adding Ceph to CloudStack as an additional Primary Storage and creating Compute and Disk offerings for Ceph. In part 3, I will also try to explain some of the differences between Ceph and NFS, both from architectural / integration point of view, as well as when it makes sense (or doesn’t) to use it as the Primary Storage solution.

It is worth mentioning that the Ceph cluster we build in this first article can be consumed by any RBD client (not just CloudStack). Although in part 2 we move onto integrating your new Ceph cluster into CloudStack, this article is about creating a standalone Ceph cluster – so you are free to experiment with Ceph.

Firstly, I would like to share some high-level recommendations from very experienced community members, who have been using Ceph with CloudStack for a number of years:

  • Make sure that your production cluster is at least 10 nodes so as to minimize any impact on performance during data rebalancing (in case of disk or whole node failure). Having to rebalance 10% of data has a much smaller impact (and duration) than having to rebalance 33% of data; another reason is improved performance as data is distributed across more drives and thus read / write performance is better
  • Use 10GB networking or faster – a separate network for client and replication traffic is needed for optimal performance
  • Don’t rely on cache tiering, unless you have a very specific IO pattern / use case. Moving data in and out of cache tier can quickly create a bottleneck and do more harm than good
  • If running an older version of Ceph cluster (eg. FileStore based OSD), you will probably place your journals on SSDs. If so, make sure that you properly benchmark SSD for the synchronous IO write performance (Ceph writes to journal devices with O_DIRECT and D_SYNC flags). Don’t try to put too many journals on single SSD; consumer grade SSDs are unacceptable, since their synchronous write performance is usually extremely bad and they have proven to be exceptionally unreliable when used in a Ceph cluster as journal device

Before we continue, let me state that this first article is NOT meant to be a comprehensive guide on Ceph history, theory, installation or optimization, but merely a simple step-by-step guide for a basic installation, just to get us going. Still, in order to be able to better follow the article, it’s good to define some basics around Ceph architecture.

Ceph has a couple of different components and daemons, which serves different purposes, so let’s mention some of these (relevant for our setup):

  • OSD (Object Storage Daemon) – usually maps to a single drive (HDD, SDD, NVME) and it’s the one containing user data. As can be concluded from it’s name, there is a Linux process for each OSD running in a node. A node hosting only OSDs can be considered as a Storage or OSD node in Ceph’s terminology.
  • MON (Monitor daemon) – holds the cluster map(s), which provides to Ceph Clients and Ceph OSD Daemons with the knowledge of the cluster topology. To clarify this further, in the heart of Ceph is the CRUSH algorithm, which makes sure that OSDs and clients can calculate the location of specific chunk of data in the cluster (and connect to specific OSDs for read/write of data), without a need to read it’s position from somewhere (as opposite to a regular file systems which have pointers to the actual data location on a partition).

A couple of other things are worth mentioning:

  • For cluster redundancy, it’s required to have multiple Ceph MONs installed, always aiming for an odd number to avoid a chance of split-brain scenario. For smaller clusters, these could be placed on VMs or even collocated with other Ceph roles (i.e. OSD nodes), though busier clusters will need a dedicated, powerful servers/VMs. In contrast to OSDs, there can be only one MON instance per server/VM.
  • For improved performance, you might want to place MON’s database (LevelDB) on dedicated SSDs (versus the defaults of being placed on OS partition).
  • There are two ways that OSDs can manage the data they store. Starting with the Luminous 12.2.z release, the new default (and recommended) backend is BlueStore. Prior to Luminous, the default (and only option) was FileStore. With FileStore, data is first written to a Journal (which can be collocated with the OSD on same device or it can be a completely separate partition on a faster, dedicated device) and then later committed to OSD. With BlueStore, there is no true Journal per se, but a RocksDB key/value database (for managing OSD’s internal metadata). FileStore OSD will use XFS on top of it’s partition, while BlueStore write data directly to raw device, without a need for a file system. With it’s new architecture, BlueStore brings big speed improvement over FileStore.
  • When building and operating a cluster, you will probably want to have a dedicated server/VM used as the deployment or admin node. This node will host your deployment tools (be it a basic ceph-deploy tool or a full blown ansible playbook), as well as cluster definition and configuration files, which can be changed on central place (this node) and then pushed to cluster nodes as required.

Armed with above knowledge (and against all recommendations given previously) we are going to deploy a very minimalistic installation of Ceph cluster on top of 3 servers (VMs), with 1 volume per node being dedicated for an OSD daemon, and Ceph MONs collocated with the Operating System on the system volume. The reason for choosing such a minimalistic setup is the ability to quickly build a test cluster on top of 3 VMs (which most people will do when building their very first Ceph cluster) and to keep configuration as short as possible. Remember, we just want to be able to consume Ceph from CloudStack, and currently don’t care about performance or uptime / redundancy (beside some basic things, which we will cover explicitly).

Our setup will be as following:

  • We will already have a working CloudStack 4.11.2 installation (i.e. we expect you to have a working CloudStack installation)
  • We will add Ceph storage as an additional Primary Storage to CloudStack and create offerings for it
  • CloudStack Management Server will be used as Ceph admin (deployment) node
  • Management Server and KVM nodes details:
    • CloudStack Management Server: IP 10.2.2.118
    • KVM host1: IP 10.2.3.135, hostname “kvm1”
    • KVM host2: IP 10.2.2.208, hostname “kvm2”
  • Ceph nodes details (dedicated nodes):
    • 2 CPU, 4GB RAM, OS volume 20GB, DATA volume 100GB
    • Single NIC per node, attached to the CloudStack Management Network – i.e. there is no dedicated network for Primary Storage traffic between our KVM hosts and the Ceph nodes
    • Node1: IP 10.2.2.119, hostname “ceph1”
    • Node2: IP 10.2.2.116, hostname “ceph2”
    • Node3: IP 10.2.3.159, hostname “ceph3”
    • Single OSD (100GB) running on each node
    • MON instance running on each node
    • Ceph Mimic (13.latest) release
    • All nodes will be running latest CentOS 7 release, with default QEMU and Libvirt versions on KVM nodes

As stated above Ceph admin (deployment) node will be on CloudStack Management Server, but as you can guess, you can use a dedicated VM/Server for this purpose as well.

Before proceeding with the actual work, let’s define the high-level steps required to deploy a working Ceph cluster

  • Building the Ceph cluster:
    • Setting time synchronization, host name resolution and password-less login
    • Setting up firewall and SELinux
    • Creating a cluster definition file and auth keys on the deployment node
    • Installation of binaries on cluster nodes
    • Provisioning of MON daemons
    • Copying over the ceph.conf and admin keys to be able to manage the cluster
    • Provisioning of Ceph manager daemons (Ceph Dashboard)
    • Provisioning of OSD daemons
    • Basic configuration

We will cover configuration of KVM nodes in second article.

Let’s start!

On all nodes…

It is critical that the time is properly synchronized across all nodes. If you are running on hypervisor, your VMs might already be synced with the host, otherwise do it the old-fashioned way:

ntpdate -s time.nist.gov
yum install ntp
systemctl enable ntpd
systemctl start ntpd

Make sure each node can resolve the name of each other node –  if not using DNS, make sure to populate /etc/hosts file properly across all 4 nodes (including admin node):

cat << EOM >> /etc/hosts
10.2.2.219 ceph1
10.2.2.116 ceph2
10.2.3.159 ceph3
EOM

On CEPH admin node…

We start by installing ceph-deploy, a tool which we will use to deploy our whole cluster later:

release=mimic
cat << EOM > /etc/yum.repos.d/ceph.repo
[ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-$release/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
EOM
 
yum install ceph-deploy -y

Let’s enable password-less login for root account – generate SSH keys and seed public key into /root/.ssh/authorized_keys file on all Ceph nodes (in production environment, you might want to use a user with limited privileges with sudo escalation):

ssh-keygen -f $HOME/.ssh/id_rsa -t rsa -N ''
ssh-copy-id root@ceph1
ssh-copy-id root@ceph2
ssh-copy-id root@ceph3

On all CEPH nodes…

Before beginning, ensure that SELINUX is set to permissive mode and verify that firewall is not blocking required connections between Ceph components:

firewall-cmd --zone=public --add-service=ceph-mon --permanent
firewall-cmd --zone=public --add-service=ceph --permanent
firewall-cmd --reload
setenforce 0

Make sure that you make SELINUX changes permanent, by editing /etc/selinux.config and setting ‘SELINUX=permissive’

As for the firewall, in case you are using different distribution or don’t consume firewalld, please refer to the networking configuration reference at http://docs.ceph.com/docs/mimic/rados/configuration/network-config-ref/

On CEPH admin node…

Let’s create cluster definition locally on admin node:

mkdir CEPH-CLUSTER; cd CEPH-CLUSTER/
ceph-deploy new ceph1 ceph2 ceph3

This will trigger a ssh connection to each of above referenced Ceph nodes (to check for machine platform and IP addresses) and will then write a local cluster definition and the MON auth key in the current folder.  Let’s check the files generated:

# ls -la
-rw-r--r-- ceph.conf
-rw-r--r-- ceph-deploy-ceph.log
-rw------- ceph.mon.keyring

On Centos7, if you get the “ImportError: No module named pkg_resources” error message while running ceph-deploy tool, you might need to install missing packages:

yum install python-setuptools

In case that you have multiple network interfaces on Ceph nodes, you will be required to explicitly define public network (which accepts client’s connections) – in this case edit previously created ceph.conf on the local admin node to include public network setting:

echo "public network = 10.2.0.0/16" >> ceph.conf

If you only have one NIC in each Ceph node, the above line is not required.

Still on admin node, let’s start the installation of Ceph binaries across cluster nodes (no services started yet):

 ceph-deploy install ceph1 ceph2 ceph3 

Command above will also output the version of Ceph binaries installed on each node – make sure that you did not get a wrong Ceph version installed due to some other repos present (we are installing Mimic 13.2.5, which is latest as of the time of writing).

Let’s create (initial) MONs on all 3 Ceph nodes:

ceph-deploy mon create-initial

In order to be able to actually manage our Ceph cluster, let’s copy over the admin key and the ceph.conf files to all Ceph nodes:

ceph-deploy admin ceph1 ceph2 ceph3

On any CEPH node…

After previous step, you should be able to issue “ceph -s” from any Ceph node, and this will return the cluster health. If you are lucky enough, your cluster will be in HEALTH_OK state, but it might happen that your MON daemons will complain on time mismatch between the nodes, as following:

[root@ceph1 ~]# ceph -w
  cluster:
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_WARN
            clock skew detected on mon.ceph1, mon.ceph3 

In this case, we should stop NTP daemon, force time update (a few times), and start NTP daemon again – and after doing this across all nodes, it would be required to restart Ceph monitors on each node, one by one (give it a few seconds between restart on different nodes) – below we are restarting all Ceph daemons – which effectively means just MONs since we deployed only MONs so far:

systemctl stop ntpd
ntpdate -s time.nist.gov; ntpdate -s time.nist.gov; ntpdate -s time.nist.gov
systemctl start ntpd
systemctl restart ceph.target

After time has been properly synchronized (with less then 0.05 seconds of time difference between the nodes), you should be able to see a cluster in HEALTH_OK state, as below:

[root@ceph1 ~]# ceph -s
  cluster:
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_OK

On CEPH admin node…

Now that we are up and running with all Ceph monitors, let’s deploy Ceph manager daemon (Ceph dashboard, that comes with newer releases) on all nodes since they operate in active/standby configuration (we will configure it later):

ceph-deploy mgr create ceph1 ceph2 ceph3

Finally, let’s deploy some OSDs so our cluster can actually hold some data eventually:

ceph-deploy osd create --data /dev/sdb ceph1
ceph-deploy osd create --data /dev/sdb ceph2
ceph-deploy osd create --data /dev/sdb ceph3

Note in commands above, we reference /dev/sdb as the 100GB volume that is used for OSD.

As mentioned previously, newer versions of Ceph (as in our case) will use by default BlueStore as the storage backend, with (by default) collocating block data and RocksDB key/value database (for managing its internal metadata) on the same device (/dev/sdb in our case). In more complex setups, one can choose to separate RockDB DB on faster devices, while block data will remain on slower devices – somewhat similar with the older FileStore setups, where block data would be located on HDDs/SSDs devices, while Journals would be usually placed on SSD/NVME partitions.

On any CEPH node…

After previous step is done, we should get the output similar to below – confirming that we have a 300GB of space available:

[root@ceph1 ~]# ceph -s
  cluster:
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph2,ceph1,ceph3
    mgr: ceph1(active)
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   3.0 GiB used, 297 GiB / 300 GiB avail
    pgs:  

Finally, let’s enable the Dashboard manager and set the username/password for authentication (which will be encrypted and stored in monitor’s DB) to be able to access it.
In our lab, we will disable SSL connections and keep it simple – but obviously in production environment, you would want to force SSL connections and also install proper SSL certificate:

ceph config set mgr mgr/dashboard/ssl false
ceph mgr module enable dashboard
ceph dashboard set-login-credentials admin password

Let’s login to the Dashboard manager on the active node (ceph1 in our case, as can be seen in the output from “ceph -s” command above):

And there you go – you now have a working Ceph cluster, which concludes part 1 of this Ceph article series. In part 2 (published soon), we will continue our work by creating a dedicated RBD pool and authentication keys for our CloudStack installation, add Ceph to CloudStack, finally consuming it with dedicated Compute / Disk offerings.

It’s worth mentioning that Ceph itself does provided additional services – i.e. it supports S3 object storage (requires installation / configuration of Ceph Object Gateway) as well as POSIX-compliant file system CephFS (requires installation/configuration of Metadata Server), but for CloudStack, we only need Rados Block Device (RBD) services from Ceph.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.