Tag Archive for: cloud computing

Blog by Ivet Petrova, StorPool.

On June 13th, StorPool had the honour and privilege to host and organize the European Cloud Infrastructure and CloudStack User Group together with its partner ShapeBlue. The event was a get together of the local IT infrastructure experts and CloudStack users. Main focus were talks presenting best practices and useful information on how to build an efficient public or private infrastructure. In addition, worlds leading experts and contributors to the open-source Apache CloudStack Project presented its latest functionalities and updates in the project.

What is CloudStack? Key features and use cases

CloudStack is a scalable cloud orchestration platform for delivering turnkey infrastructure as a service clouds. As it is relatively easy to deploy and manage, it attracts the attention of people considering which cloud management system to use. Firstly, its architecture is highly scalable and reliable. The most massive known production cloud with CloudStack installation was reaching approx. 35 000 physical hosts and was running smoothly. Secondly, CloudStack is hypervisor agnostic. It supports KVM, Xen, VMware, HyperV, OVM, etc. Moreover, it also presents a REST API and is used for cloud infrastructure as a service, containers as a service, and many more use cases in which enterprises need a reliable solution to manage complex infrastructure and virtualizations.

CloudStack supports different storage options, and StorPool has its CloudStack integration. More about the story of building StorPool’s CloudStack integration, you can read here. 

CloudStack Market Growth

The European Cloud Infrastructure and CloudStack User Day started with a keynote session of Giles Sirett, CEO of ShapeBlue and widely recognized contributor to the Apache project. Giles talked us through the history of CloudStack, its main advantages, and the value it can bring to companies. After that, he made an overview of interesting use case and shared information for its releases and user communities. According to him, the most significant value of CloudStack is that it is a user-driven project and community, which makes it vibrant and rapidly developed. In conclusion, Giles also shared that CloudStack adoption is quickly growing and now it is used by some of the biggest companies globally.

Achieving the ultimate performance with KVM

Next to the stage was Boyan Krosnov, CPO of StorPool. In his session, he discussed a private cloud setup with KVM achieving 1M IOPS per hyper-converged (storage+compute) node. Besides, Boyan answered the question: What is the optimum architecture and configuration for performance and efficiency? His session was a deep technical dive into the ways for building an efficient and high-performance cloud infrastructure. Furthermore, Boyan explained why performance matters and how many companies even do not understand they are struggling with performance issues … until the moment their customers notify them for this.

During the presentation, the CPO of StorPool covered essential aspects of building cloud infrastructure, part of which were:

  • why the same hardware can bring you 10 times better performance  than before
  • how hardware, compute and networking affect the performance
  • tips and trick for getting ultimate KVM performance
  • …and many more

Boyan advised all participants in the event to pay attention on their cloud performance, apply possible optimizations for accelerating it and closely monitor the cloud performance.

CloudStack: A Service Managers Perspective

After a short break, we welcomed Maria Barta from Itelligence Global Managed Services GmbH. Maria presented a different perspective on CloudStack – “A Service Managers Perspective”. Agile business processes are becoming increasingly important in successful IT services. Itelligence GmbH, provides many different ultra-flexible and highly adaptable cloud solutions. To ensure customer / user satisfaction (i.e. availability, data security and product transparency) and simultaneously facilitate effective agile product development within their team, the role of the service manager is steadily evolving. As a conclusion, the talk provided an insight to the benefits and limitations of CloudStack in relation to the service manager objectives and Maria’s attempt to overcome these in her specific internal IaaS solution.

What’s new in CloudStack 4.13

Paul Angus, VP Technology of ShapeBlue and current VP of CloudStack. He was one of the most awaited speakers at the event. Mainly because he is the most experienced person in the community, who has exceptional knowledge in CloudStack. His session was focused on the new release of CloudStack. The 4.13 version is due for release this summer. With 100s of updates and new features, Paul went through user features. He also talked about operator features and integrations, demonstrating just how much work and development is going into CloudStack.

Paul also shared that version 4.14 most probably will arrive at the end of 2019 / beginning of 2020. He enjoyed great attention from the European CloudStack community and managed to give valuable pieces of advice to the admins dealing with complex cloud issues.

Challenges with high-density networks 

Last, but not least, Marian Marinov from SiteGround web hosting company shared his experience in the problems when managing high-density networks. In cloud environments, people consider the network as a given and almost limitless resource. You get an interface and you are told its bandwidth capacity. From the perspective of the client, this is true. But from the perspective of the provider, this is far from the truth. In his talk, Marian took a look at some DataCenter network designs and what technologies / protocols were used to battle the problem with high-density clouds. All participants in the event had the chance to learn about VXLAN and “L3” switching.

After the final official talk, we managed to organize a great networking event between the speakers and the event attendees. One more opportunity to learn new things for cloud infrastructure and about building a cloud with StorPool and CloudStack.

For StorPool’s team, it was a pleasure to be host and co-organizer of the event and to put the beginning of a new CloudStack community in Bulgaria.

Our presenters’ slides can be found here:

Giles Sirett – CloudStack EU User Group 13 june 2019 – Sofia

Boyan Krosnov – Achieving the ultimate performance with KVM

Maria Barta – CS Day Sofia_ CS – A service manager perspective_20190613

Paul Angus – CSEUG19-What’s coming in CloudStack

Marian Marinov – Challenges with high-density networks

 

In the previous article we covered some basics around Ceph and deployed a working Ceph cluster. In this article, we are going to finish the Ceph configuration needed for CloudStack and add it as a new Primary Storage pool. We are also going to deploy Ceph volumes via CloudStack and examine them. Finally, in part 3 (to be published soon), I will show you some examples of working with RBD images and will cover some Ceph specifics, both in general and related to the CloudStack.

Before proceeding with the actual work, let me first mention that CloudStack supports Ceph with KVM only, so most of the work we do below is KVM related. Let’s define the high-level steps to be done:

  • Create a dedicated RBD pool for CloudStack in which all RBD images (volumes) will be created
  • Create a dedicated authentication key for the previously created pool
  • Update / install required Ceph binaries on KVM nodes
  • Add Ceph as Primary Storage in CloudStack
  • Implement custom storage tag for Ceph Primary Storage
  • Create new Compute / Disk offerings with same storage tag in order to target Ceph

From any Ceph node…

Ceph groups RBD (RADOS block device) images in pools and manages authentication on a per pool level. Each image is collection of many RADOS objects, with each object having a default size of 4MB (configurable per image). At this moment we have no pools created. But before creating a pool, let’s go through some basics around the different kind of pools in Ceph.

There are 2 kind of pools, based on the way the objects are stored across cluster:

  • Replicated – makes sure that there are always total of N replicas/copies of an object
  • Erasure Coding – simplest way to think of this is a network RAID 5/6

Replicated pools are used for better performance at the expense of space consumption, and you can think of it as a network-based RAID 1, where we have n number of replicas of an object. On the other hand, erasure coding pools are usually used when using Ceph for S3 Object Storage purposes and for more space efficient storage where bigger latency and lower performance is acceptable, since it is similar to RAID 5 or RAID 6 (requires some computation power). Here, for example, we may have 4 chunks of actual data and 2 parity chunks (EC 4+2), with just 50% of space overhead, while (depending on the setup), we can still survive losing a Ceph node or even two.

So, let’s create a dedicated pool for CloudStack, set its replica size and finally initialize it:

ceph osd pool create cloudstack 64 replicated
ceph osd pool set cloudstack size 3
rbd pool init cloudstack

The commands above will create a replicated pool named “cloudstack” with total of 64 placement groups (more info on placement groups here) with a replica size of 3, which is recommended for a production cluster. Optionally, you can set replica size of 2 during testing, for somewhat increased performance and less space consumed on the cluster.

Next, let’s generate a dedicated authentication key for our CloudStack pool:

ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack'

The command above will output a key to STDOUT only – please save the given key, since we will use it when adding Ceph to CloudStack later:

[client.cloudstack]
key = AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

Now that the pool for CloudStack is ready, we need to prepare KVM nodes with proper Ceph binaries as well as the write-back caching configuration.

From the Ceph admin node…

Starting from Centos 7.2  (and Ubuntu 14.04), libvirt / QEMU comes by default with support for RBD, so there’s no need to compile the binaries yourself. That being said, if we check KVM nodes with “rpm -qa | grep librbd1″ it will return an existing versions of ‘librbd1” package (version 10.2.5 in my case)  already installed, but most certainly it will not be the current version that corresponds to the cluster version we just installed (13.2.5 in this case). For the record, librbd is a user space Ceph client, to which the qemu / libvirt talks effectively.

Furthermore, if we run command “ceph features” from any Ceph node, it will return (in our fresh Mimic cluster) “luminous” as the minimum compatible release version for the client – that means that our Ceph client (librbd) needs to be of a minimum of “luminous” version (which translates to 12.2.0), but our current librbd version is 10.2.5 – so let’s upgrade it to same Mimic versions as the version of our cluster:

ceph-deploy install --cli kvm1  kvm2

The command above will add Mimic repo to my two KVM nodes and install only the cli binaries (“ceph-common” package). This will also trigger the upgrade of existing “librbd1” package to the correct version. In addition, please make sure that name resolution of the KVM nodes works from the Ceph admin node.

Optionally, if you don’t want to install Ceph cli tools on KVM nodes, you can just upgrade the “librbd1” package while having previously created a proper Ceph Mimic repository on each KVM node (i.e. clone repo file from any Ceph cluster node).

Some of you might want to be able to manage Ceph cluster from KVM nodes as well (beside being able to manage it from Ceph nodes) and to be able to interact with RBD images with via “rbd” or “qemu-img“ tools – in this case we need this “rbd” tool installed on KVM nodes (part of “ceph-common” package, already installed in previous step), then we need ceph.conf locally on KVM nodes in order for the “rbd” tool to know how to connect to cluster, which MONs to target, etc. and finally we need the admin authentication key – this is the file “ceph.client.admin.keyring” which was created on our Ceph admin node when we created our cluster initially (in folder /root/CEPH-CLUSTER, as mentioned in Part 1 of this article series).

Additionally, if we want to use qemu-img tool to examine RBD images, we can either have qemu-img installed on the Ceph cluster nodes or we have to provide the above mentioned ceph.conf and admin keys in their default location (/etc/ceph/) on the KVM nodes, where librbd (client) will pick them up automatically, so we don’t need to specify MON IP/URL and admin key on the command line.

If you don’t want to be able to manage your Ceph cluster from KVM nodes, simply don’t copy over the “ceph.client.admin.keyring” file to KVM nodes. The ceph.conf file is still a must due to RBD caching as explained later. I have decided to make my KVM nodes happy by providing them with ceph.conf and admin keys, as below:

ceph-deploy admin kvm1 kvm2

The command above will effectively just copy ceph.conf and ceph.client.admin.keyring files to /etc/ceph/ folder on KVM nodes. Actually, you can still operate RBD images and manage your cluster from KVM nodes even if you don’t have ceph.conf and admin key present locally – you can always pass required parameters on the command line to “rbd” or “qemu-img” tools, as shown later.

RBD caching

After we have pushed the ceph.conf file to KVM nodes, librbd will read it for any configuration directives under the “[client]” section of that file (beside the other sections), but that section is missing at this moment!

Before we proceed into configuring the RBD caching, let me do here a copy/paste from the original docs that is important to understand regarding RBD caching:

” The user space implementation of the Ceph block device (i.e., librbd) cannot take advantage of the Linux page cache, so it includes its own in-memory caching, called “RBD caching.” RBD caching behaves just like well-behaved hard disk caching. When the OS sends a barrier or a flush request, all dirty data is written to the OSDs. This means that using write-back caching is just as safe as using a well-behaved physical hard disk with a VM that properly sends flushes (i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) algorithm, and in write-back mode it can coalesce contiguous requests for better throughput. “

After digesting the above info, we can proceed into a brief configuration of caching. We can either fix it manually on each KVM node by adding the missing section in ceph.conf file, or we can do it in a more proper way by changing ceph.conf on the Ceph admin node and then pushing new file version to all KVM (and optionally Ceph cluster) nodes:

cat << EOM >> /root/CEPH-CLUSTER/ceph.conf
[client]
  rbd cache = true
  rbd cache writethrough until flush = true
EOM
 
ceph-deploy --overwrite-conf admin kvm1 kvm2

Please note the above “writethrough until flush = true”. This is a safety mechanism which will force writethrough cache mode until it receives the very first flush request from the VM OS (which means that the OS is sending proper flush requests to the underlying storage, i.e. kernel >= 2.6.32) and then cache mode will change to the write-back, which actually brings performance benefits.

If case you want to play more with RBD caching, please see here – where you can find some important default values which we didn’t explicitly configure i.e. default rbd cache size is 32 MB (this is per volume) – so in case of 50 VMs with 4 volumes each, that translates to 50 x 4 x 32MB =  6.4GB of additional RAM consumed on a KVM host – keep that in mind !

Finally, let’s add the Ceph to CloudStack as an additional Primary Storage – we can do it via GUI or optionally via CloudMonkey (API) as following:

 

Or via CloudMonkey:

create storagepool scope=zone zoneid=3c764ee1-6590-417d-b873-f073d0c550be hypervisor=KVM name=MyCephCluster provider=Defaultprimary url=rbd://cloudstack:AQAFSZpc0t-BIBAAO95rOl+jgRwuOopojEtr_g==@10.2.2.219/cloudstack tags=RBD

Most of the parameters are self-explanatory but let’s explain a few of them:

  • RADOS Monitor: This is the IP address (or DNS name) of the Ceph Monitor (MON) instance – in my case I have defined a very first MON instance (IP address of the Ceph1 node from my cluster) – but in production environment you will want to have an internal Round Robin DNS setup on some internal DNS server (i.e. single zone on Bind) – such that KVM nodes will resolve the ULR (i.e. mon.myceph.cluster) in a round robin fashion to multiple MON instances – this is the way to achieve high availability of Ceph MONs, though some manual DNS zone changes are needed in case of prolonged MON maintenance
  • RADOS Pool: This is the pool “cloudstack” which we created in the beginning of the article
  • RADOS User and RADOS Secret: This are the values from the authentication key which we generated in the beginning of the article, shown below again for your convenience

[client.cloudstack]
key = AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

The above command, used to add Ceph to CloudStack, will effectively do a few things:

  • On each KVM node, it will create a new storage pool in libvirt
  • The storage pool definition files (xml and the secret) will be written to /etc/libvirt/secrets/ folder as shown below
  • Every time CloudStack Agent is restarted, it will recreate the Ceph storage pool (even if you manually remove the files below)

[root@kvm1]# cat /etc/libvirt/secrets/ef9cfd17-abe1-343d-97a0-cee6c71a6dad.xml
<secret ephemeral='no' private='no'>
  <uuid>ef9cfd17-abe1-343d-97a0-cee6c71a6dad</uuid>
  <usage type='ceph'>
    <name>cloudstack@ceph1.local:6789/cloudstack</name>
  </usage>
</secret>

[root@kvm1]# cat /etc/libvirt/secrets/ef9cfd17-abe1-343d-97a0-cee6c71a6dad.base64
AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

If we check the libvirt pool created above, we can see that it’s not persistent and it doesn’t start automatically – i.e. when you restart libvirt alone, it will not recreate / start the Ceph storage pool in libvirt– the CloudStack agent is the one doing this for us:

virsh # pool-info ef9cfd17-abe1-343d-97a0-cee6c71a6dad
Name:           ef9cfd17-abe1-343d-97a0-cee6c71a6dad
UUID:           ef9cfd17-abe1-343d-97a0-cee6c71a6dad
State:          running
Persistent:     no
Autostart:      no
Capacity:       299.99 GiB
Allocation:     68.19 MiB
Available:      286.02 GiB

Note that in the example above, I was actually using DNS name for the Ceph MON (ceph1.local) instead of the IP – Ceph MON’s DNS name is resolved to IP both when you add Ceph to CloudStack and every time you start a VM or attach new volume, etc. – so DNS resolution needs to be fast and stable here.

Now that we added Ceph to CloudStack, let’s create a Data disk offering with tag “RBD” – this will make sure that any new volume from this offering is created on storage pool with tag “RBD” – which is Ceph in our case . Here, we are using storage tags to avoid messing up with your existing CloudStack installation – but it’s not required otherwise:

(localcloud) SBCM5> > create diskoffering name=5GB-Ceph displaytext=5GB-Ceph storagetype=shared provisioningtype=thin customized=false disksize=5 tags=RBD
{
  "diskoffering": {
    "created": "2019-03-26T19:27:32+0000",
    "disksize": 5,
    "displayoffering": true,
    "displaytext": "5GB-Ceph",
    "id": "2c74becc-c39d-4aa8-beec-195b351bdaf0",
    "iscustomized": false,
    "name": "5GB-Ceph",
    "provisioningtype": "thin",
    "storagetype": "shared",
    "tags": "RBD"
  }
}

Note the offering ID from above (2c74becc-c39d-4aa8-beec-195b351bdaf0) – and let’s create a disk from it:

(localcloud) SBCM5> > create volume diskofferingid=2c74becc-c39d-4aa8-beec-195b351bdaf0 name=MyFirstCephDisk zoneid=3c764ee1-6590-417d-b873-f073d0c550be
{
  "volume": {
    "account": "admin",
    "created": "2019-03-26T19:52:05+0000",
    "destroyed": false,
    "diskofferingdisplaytext": "5GB-Ceph",
    "diskofferingid": "2c74becc-c39d-4aa8-beec-195b351bdaf0",
    "diskofferingname": "5GB-Ceph",
    "displayvolume": true,
    "domain": "ROOT",
    "domainid": "401ce404-44c1-11e9-96c5-1e009001076e",
    "hypervisor": "None",
    "id": "47b1cfe5-6bab-4506-87b6-d85b77d9b69c",
    "isextractable": true,
    "jobid": "49a682ab-42f9-4974-8e42-452a13c97553",
    "jobstatus": 0,
    "name": "MyFirstCephDisk",
    "provisioningtype": "thin",
    "quiescevm": false,
    "size": 5368709120,
    "state": "Allocated",
    "storagetype": "shared",
    "tags": [],
    "type": "DATADISK",
    "zoneid": "3c764ee1-6590-417d-b873-f073d0c550be",
    "zonename": "ref-trl-1019-k-M7-apanic"
  }
}

Finally, since volume creation is a lazy provisioning process (i.e. volume is created in DB only, not really on storage pool), let’s attach the disk to a running VM (using volume ID “47b1cfe5-6bab-4506-87b6-d85b77d9b69c” from previous command output), which will trigger the actual disk creation on our Ceph cluster (output shortened for brevity):

(localcloud) SBCM5> > attach volume id=47b1cfe5-6bab-4506-87b6-d85b77d9b69c virtualmachineid=19a67e20-c747-43bb-b149-c2b2294002f9
{
  "volume": {
    …
    "jobstatus": 0,
    "name": "MyFirstCephDisk",
    "path": "47b1cfe5-6bab-4506-87b6-d85b77d9b69c",
    …  }
}

Note the “path” output field (which is usually the same as the ID of the volume, except in some special cases) – and let’s check our Ceph cluster if we can find this volume and check it’s properties.

From any KVM node…

[root@kvm1 ~]# rbd ls -p cloudstack
47b1cfe5-6bab-4506-87b6-d85b77d9b69c
 
[root@kvm1 ~]# rbd info cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
rbd image '47b1cfe5-6bab-4506-87b6-d85b77d9b69c':
        size 5 GiB in 1280 objects
        order 22 (4 MiB objects)
        id: d43b4c04a8af
        block_name_prefix: rbd_data.d43b4c04a8af
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features:
        flags:
        create_timestamp: Tue Mar 28 19:46:32 2019

We can also examine the Ceph RBD image with qemu-img tool:

[root@kvm1 ~]# qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
image: rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c
file format: raw
virtual size: 5.0G (5368709120 bytes)
disk size: unavailable

As you can see in qemu-img command above, we did not specify any username and authentication keys, because we have our ceph.conf and the admin key files present in /etc/ceph/ folder. If you decided to opt-out of having these 2 files present on KMV nodes, you will have to use a cumbersome command as below:

qemu-img info rbd:cloudstack/47b1cfe5-6bab-4506-87b6-d85b77d9b69c:mon_host=10.2.2.219:auth_supported=Cephx:id=cloudstack:key=AQAFSZpc0t+BIBAAO95rOl+jgRwuOopojEtr/g==

In the above command we are specifying the MON IP address, username and key for authentication.

Now that you got the basics of consuming Ceph from CloudStack, feel free to also create Compute Offerings and System Offerings for Virtual Routers, Secondary Storage VM, Console Proxy VM and experiment with volume migration from i.e. NFS to Ceph. Be sure to have storage tags under control.

I hope that this article series has been interesting so far. In part 3 (which will be the final part), I will show you some examples of working with RBD images and will cover some Ceph specifics, both in general and related to CloudStack.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.

Hello all, this is Abhishek Kumar, currently the newest member of the ShapeBlue family. It’s been over a month since I started working as a Software Engineer on Apache CloudStack at ShapeBlue, and I’m here to tell you about how it’s gone.

2019 has been an exciting year for me as I moved from the application development domain to infrastructure development. I always knew it will be a challenging task but also a rewarding one.

The beginning

It was last year that I moved to Gurugram, India, to work for a major med-tech company dealing with navigated intra-operative products. Prior to that, I’d been freelancing as a desktop and mobile application developer. Moving to Gurugram meant getting back in touch with some of the friends and batchmates from college who were already living and working in the city. Late last year one of these friends suggested to me the idea of applying to ShapeBlue, the company he had been working at for a number of years. I had previously heard about ShapeBlue and Apache Cloudstack from him, and I was interested in how the company works with a distributed team and how they contribute to open-source while delivering for their customers – they are community leaders in a sense. Initially I was quite unsure as I had never worked on something like this but after some deliberation, I decided to go through the compact yet effective hiring process of ShapeBlue. It involved two interviews, a coding challenge and a knowledge test on a subject that was chosen because they KNEW it was new to me (they were testing my ability to pick up new concepts very quickly). The whole process only took a week or so before I was hired as a Software Engineer at ShapeBlue!

The learning

This was my first experience of being a developer with infrastructure software and working as part of a large, open-source project. To be honest, to start with, everything was a bit overwhelming as it was mostly new to me – and the people I’m working with are probably the champions of the field. Within this first month, I’ve transitioned from C++ to Java, and have learned complex concepts of networking topologies, hypervisors and many other new subjects. I’ve not only been learning the fundamentals of the Apache Cloudstack project and working on customer projects but I’m also starting to contribute to the open-source community. A large part of this learning can be credited to the awesome training program that ShapeBlue provides for a new joinee. It is a very well-structured training course (called the “hackerbook”) that constitutes several chapters that explain a particular topic and then require the trainee to do some coding exercises to test the acquired knowledge. During the training period, a mentor is assigned to the trainee to clear any doubts, review progress and even have 1-2-1 sessions on complex topics. This contrasts with what I’ve experienced with previous employers and most programmers experience as well, where they are given access to a codebase and some limited documentation and left to figure things out on their own.

The challenges

As expected there have been a number of challenges. Moving from developing consumer-centric small applications to working on massive, complex infrastructure orchestration software would never have been easy. Then there are always those regular things one faces when moving to a new job: onboarding on company infra, learning new services and technologies to do daily tasks, following new practices & policies, etc. With Apache Cloudstack being an open-source project, it adds another dimension as it is not just your own organization but the larger community that you are dealing with.

Apart from the technical aspect I also find the social aspect of onboarding with a new organization a bit testing personally. Being a reserved, quiet, person, gelling with new people isn’t always easy for me. However, over the last month at ShapeBlue I can safely say that all these have been exciting challenges. While the technical aspects were taken care of with well-structured training, the social aspect took care of itself due the intrinsic flat organizational structure at ShapeBlue where everyone has equal say and has the freedom to communicate with anybody else in the company irrespective of their position.

The joy

I have liked being able to jump between my training course and real-world customer facing development. I was able to use the concepts I learned, during this period, in the customer project I’m working on. Within this short span of time, even though I don’t have the expertise that my team has, I still feel like I can make a contribution to the project we are working on. I can still participate in the development of new features for customers and contribute to open-source community to some extent.

Conclusions

My time so far at ShapeBlue has been nothing less than amazing! I could not wish for a better mix of challenges and rewards. Most days I do have to work hard to make sense of a very large codebase or some complex network concepts, but with enough effort, I can work my way through and go home satisfied. Being a software developer in the infrastructure domain can be challenging and learning to become a better and more efficient one might be even harder, but so far, I’m enjoying this job and loving this journey with my new work family: ShapeBlue!

As well as NFS and various block storage solutions for Primary Storage, CloudStack has supported Ceph with KVM for a number of years now. Thanks to some great Ceph users in the community lots of previously missing CloudStack storage features have been implemented for Ceph (and lots of bugs squashed), making it the perfect choice for CloudStack if you are looking for easy scaling of storage and decent performance.

In this and my next article, I am going to cover all steps needed to actually install a Ceph cluster from scratch, and subsequently add it to CloudStack. In this article I will cover installation and basic configuration of a standalone Ceph cluster, whilst in part 2 I will go into creating a pool for a CloudStack installation, adding Ceph to CloudStack as an additional Primary Storage and creating Compute and Disk offerings for Ceph. In part 3, I will also try to explain some of the differences between Ceph and NFS, both from architectural / integration point of view, as well as when it makes sense (or doesn’t) to use it as the Primary Storage solution.

It is worth mentioning that the Ceph cluster we build in this first article can be consumed by any RBD client (not just CloudStack). Although in part 2 we move onto integrating your new Ceph cluster into CloudStack, this article is about creating a standalone Ceph cluster – so you are free to experiment with Ceph.

Firstly, I would like to share some high-level recommendations from very experienced community members, who have been using Ceph with CloudStack for a number of years:

  • Make sure that your production cluster is at least 10 nodes so as to minimize any impact on performance during data rebalancing (in case of disk or whole node failure). Having to rebalance 10% of data has a much smaller impact (and duration) than having to rebalance 33% of data; another reason is improved performance as data is distributed across more drives and thus read / write performance is better
  • Use 10GB networking or faster – a separate network for client and replication traffic is needed for optimal performance
  • Don’t rely on cache tiering, unless you have a very specific IO pattern / use case. Moving data in and out of cache tier can quickly create a bottleneck and do more harm than good
  • If running an older version of Ceph cluster (eg. FileStore based OSD), you will probably place your journals on SSDs. If so, make sure that you properly benchmark SSD for the synchronous IO write performance (Ceph writes to journal devices with O_DIRECT and D_SYNC flags). Don’t try to put too many journals on single SSD; consumer grade SSDs are unacceptable, since their synchronous write performance is usually extremely bad and they have proven to be exceptionally unreliable when used in a Ceph cluster as journal device

Before we continue, let me state that this first article is NOT meant to be a comprehensive guide on Ceph history, theory, installation or optimization, but merely a simple step-by-step guide for a basic installation, just to get us going. Still, in order to be able to better follow the article, it’s good to define some basics around Ceph architecture.

Ceph has a couple of different components and daemons, which serves different purposes, so let’s mention some of these (relevant for our setup):

  • OSD (Object Storage Daemon) – usually maps to a single drive (HDD, SDD, NVME) and it’s the one containing user data. As can be concluded from it’s name, there is a Linux process for each OSD running in a node. A node hosting only OSDs can be considered as a Storage or OSD node in Ceph’s terminology.
  • MON (Monitor daemon) – holds the cluster map(s), which provides to Ceph Clients and Ceph OSD Daemons with the knowledge of the cluster topology. To clarify this further, in the heart of Ceph is the CRUSH algorithm, which makes sure that OSDs and clients can calculate the location of specific chunk of data in the cluster (and connect to specific OSDs for read/write of data), without a need to read it’s position from somewhere (as opposite to a regular file systems which have pointers to the actual data location on a partition).

A couple of other things are worth mentioning:

  • For cluster redundancy, it’s required to have multiple Ceph MONs installed, always aiming for an odd number to avoid a chance of split-brain scenario. For smaller clusters, these could be placed on VMs or even collocated with other Ceph roles (i.e. OSD nodes), though busier clusters will need a dedicated, powerful servers/VMs. In contrast to OSDs, there can be only one MON instance per server/VM.
  • For improved performance, you might want to place MON’s database (LevelDB) on dedicated SSDs (versus the defaults of being placed on OS partition).
  • There are two ways that OSDs can manage the data they store. Starting with the Luminous 12.2.z release, the new default (and recommended) backend is BlueStore. Prior to Luminous, the default (and only option) was FileStore. With FileStore, data is first written to a Journal (which can be collocated with the OSD on same device or it can be a completely separate partition on a faster, dedicated device) and then later committed to OSD. With BlueStore, there is no true Journal per se, but a RocksDB key/value database (for managing OSD’s internal metadata). FileStore OSD will use XFS on top of it’s partition, while BlueStore write data directly to raw device, without a need for a file system. With it’s new architecture, BlueStore brings big speed improvement over FileStore.
  • When building and operating a cluster, you will probably want to have a dedicated server/VM used as the deployment or admin node. This node will host your deployment tools (be it a basic ceph-deploy tool or a full blown ansible playbook), as well as cluster definition and configuration files, which can be changed on central place (this node) and then pushed to cluster nodes as required.

Armed with above knowledge (and against all recommendations given previously) we are going to deploy a very minimalistic installation of Ceph cluster on top of 3 servers (VMs), with 1 volume per node being dedicated for an OSD daemon, and Ceph MONs collocated with the Operating System on the system volume. The reason for choosing such a minimalistic setup is the ability to quickly build a test cluster on top of 3 VMs (which most people will do when building their very first Ceph cluster) and to keep configuration as short as possible. Remember, we just want to be able to consume Ceph from CloudStack, and currently don’t care about performance or uptime / redundancy (beside some basic things, which we will cover explicitly).

Our setup will be as following:

  • We will already have a working CloudStack 4.11.2 installation (i.e. we expect you to have a working CloudStack installation)
  • We will add Ceph storage as an additional Primary Storage to CloudStack and create offerings for it
  • CloudStack Management Server will be used as Ceph admin (deployment) node
  • Management Server and KVM nodes details:
    • CloudStack Management Server: IP 10.2.2.118
    • KVM host1: IP 10.2.3.135, hostname “kvm1”
    • KVM host2: IP 10.2.2.208, hostname “kvm2”
  • Ceph nodes details (dedicated nodes):
    • 2 CPU, 4GB RAM, OS volume 20GB, DATA volume 100GB
    • Single NIC per node, attached to the CloudStack Management Network – i.e. there is no dedicated network for Primary Storage traffic between our KVM hosts and the Ceph nodes
    • Node1: IP 10.2.2.119, hostname “ceph1”
    • Node2: IP 10.2.2.116, hostname “ceph2”
    • Node3: IP 10.2.3.159, hostname “ceph3”
    • Single OSD (100GB) running on each node
    • MON instance running on each node
    • Ceph Mimic (13.latest) release
    • All nodes will be running latest CentOS 7 release, with default QEMU and Libvirt versions on KVM nodes

As stated above Ceph admin (deployment) node will be on CloudStack Management Server, but as you can guess, you can use a dedicated VM/Server for this purpose as well.

Before proceeding with the actual work, let’s define the high-level steps required to deploy a working Ceph cluster

  • Building the Ceph cluster:
    • Setting time synchronization, host name resolution and password-less login
    • Setting up firewall and SELinux
    • Creating a cluster definition file and auth keys on the deployment node
    • Installation of binaries on cluster nodes
    • Provisioning of MON daemons
    • Copying over the ceph.conf and admin keys to be able to manage the cluster
    • Provisioning of Ceph manager daemons (Ceph Dashboard)
    • Provisioning of OSD daemons
    • Basic configuration

We will cover configuration of KVM nodes in second article.

Let’s start!

On all nodes…

It is critical that the time is properly synchronized across all nodes. If you are running on hypervisor, your VMs might already be synced with the host, otherwise do it the old-fashioned way:

ntpdate -s time.nist.gov
yum install ntp
systemctl enable ntpd
systemctl start ntpd

Make sure each node can resolve the name of each other node –  if not using DNS, make sure to populate /etc/hosts file properly across all 4 nodes (including admin node):

cat << EOM >> /etc/hosts
10.2.2.219 ceph1
10.2.2.116 ceph2
10.2.3.159 ceph3
EOM

On CEPH admin node…

We start by installing ceph-deploy, a tool which we will use to deploy our whole cluster later:

release=mimic
cat << EOM > /etc/yum.repos.d/ceph.repo
[ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-$release/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
EOM
 
yum install ceph-deploy -y

Let’s enable password-less login for root account – generate SSH keys and seed public key into /root/.ssh/authorized_keys file on all Ceph nodes (in production environment, you might want to use a user with limited privileges with sudo escalation):

ssh-keygen -f $HOME/.ssh/id_rsa -t rsa -N ''
ssh-copy-id root@ceph1
ssh-copy-id root@ceph2
ssh-copy-id root@ceph3

On all CEPH nodes…

Before beginning, ensure that SELINUX is set to permissive mode and verify that firewall is not blocking required connections between Ceph components:

firewall-cmd --zone=public --add-service=ceph-mon --permanent
firewall-cmd --zone=public --add-service=ceph --permanent
firewall-cmd --reload
setenforce 0

Make sure that you make SELINUX changes permanent, by editing /etc/selinux.config and setting ‘SELINUX=permissive’

As for the firewall, in case you are using different distribution or don’t consume firewalld, please refer to the networking configuration reference at http://docs.ceph.com/docs/mimic/rados/configuration/network-config-ref/

On CEPH admin node…

Let’s create cluster definition locally on admin node:

mkdir CEPH-CLUSTER; cd CEPH-CLUSTER/
ceph-deploy new ceph1 ceph2 ceph3

This will trigger a ssh connection to each of above referenced Ceph nodes (to check for machine platform and IP addresses) and will then write a local cluster definition and the MON auth key in the current folder.  Let’s check the files generated:

# ls -la
-rw-r--r-- ceph.conf
-rw-r--r-- ceph-deploy-ceph.log
-rw------- ceph.mon.keyring

On Centos7, if you get the “ImportError: No module named pkg_resources” error message while running ceph-deploy tool, you might need to install missing packages:

yum install python-setuptools

In case that you have multiple network interfaces on Ceph nodes, you will be required to explicitly define public network (which accepts client’s connections) – in this case edit previously created ceph.conf on the local admin node to include public network setting:

echo "public network = 10.2.0.0/16" >> ceph.conf

If you only have one NIC in each Ceph node, the above line is not required.

Still on admin node, let’s start the installation of Ceph binaries across cluster nodes (no services started yet):

 ceph-deploy install ceph1 ceph2 ceph3 

Command above will also output the version of Ceph binaries installed on each node – make sure that you did not get a wrong Ceph version installed due to some other repos present (we are installing Mimic 13.2.5, which is latest as of the time of writing).

Let’s create (initial) MONs on all 3 Ceph nodes:

ceph-deploy mon create-initial

In order to be able to actually manage our Ceph cluster, let’s copy over the admin key and the ceph.conf files to all Ceph nodes:

ceph-deploy admin ceph1 ceph2 ceph3

On any CEPH node…

After previous step, you should be able to issue “ceph -s” from any Ceph node, and this will return the cluster health. If you are lucky enough, your cluster will be in HEALTH_OK state, but it might happen that your MON daemons will complain on time mismatch between the nodes, as following:

[root@ceph1 ~]# ceph -w
  cluster:
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_WARN
            clock skew detected on mon.ceph1, mon.ceph3 

In this case, we should stop NTP daemon, force time update (a few times), and start NTP daemon again – and after doing this across all nodes, it would be required to restart Ceph monitors on each node, one by one (give it a few seconds between restart on different nodes) – below we are restarting all Ceph daemons – which effectively means just MONs since we deployed only MONs so far:

systemctl stop ntpd
ntpdate -s time.nist.gov; ntpdate -s time.nist.gov; ntpdate -s time.nist.gov
systemctl start ntpd
systemctl restart ceph.target

After time has been properly synchronized (with less then 0.05 seconds of time difference between the nodes), you should be able to see a cluster in HEALTH_OK state, as below:

[root@ceph1 ~]# ceph -s
  cluster:
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_OK

On CEPH admin node…

Now that we are up and running with all Ceph monitors, let’s deploy Ceph manager daemon (Ceph dashboard, that comes with newer releases) on all nodes since they operate in active/standby configuration (we will configure it later):

ceph-deploy mgr create ceph1 ceph2 ceph3

Finally, let’s deploy some OSDs so our cluster can actually hold some data eventually:

ceph-deploy osd create --data /dev/sdb ceph1
ceph-deploy osd create --data /dev/sdb ceph2
ceph-deploy osd create --data /dev/sdb ceph3

Note in commands above, we reference /dev/sdb as the 100GB volume that is used for OSD.

As mentioned previously, newer versions of Ceph (as in our case) will use by default BlueStore as the storage backend, with (by default) collocating block data and RocksDB key/value database (for managing its internal metadata) on the same device (/dev/sdb in our case). In more complex setups, one can choose to separate RockDB DB on faster devices, while block data will remain on slower devices – somewhat similar with the older FileStore setups, where block data would be located on HDDs/SSDs devices, while Journals would be usually placed on SSD/NVME partitions.

On any CEPH node…

After previous step is done, we should get the output similar to below – confirming that we have a 300GB of space available:

[root@ceph1 ~]# ceph -s
  cluster:
    id:     7f2d23c2-1f2e-4c03-821c-cab3d76f84fc
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph2,ceph1,ceph3
    mgr: ceph1(active)
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   3.0 GiB used, 297 GiB / 300 GiB avail
    pgs:  

Finally, let’s enable the Dashboard manager and set the username/password for authentication (which will be encrypted and stored in monitor’s DB) to be able to access it.
In our lab, we will disable SSL connections and keep it simple – but obviously in production environment, you would want to force SSL connections and also install proper SSL certificate:

ceph config set mgr mgr/dashboard/ssl false
ceph mgr module enable dashboard
ceph dashboard set-login-credentials admin password

Let’s login to the Dashboard manager on the active node (ceph1 in our case, as can be seen in the output from “ceph -s” command above):

And there you go – you now have a working Ceph cluster, which concludes part 1 of this Ceph article series. In part 2 (published soon), we will continue our work by creating a dedicated RBD pool and authentication keys for our CloudStack installation, add Ceph to CloudStack, finally consuming it with dedicated Compute / Disk offerings.

It’s worth mentioning that Ceph itself does provided additional services – i.e. it supports S3 object storage (requires installation / configuration of Ceph Object Gateway) as well as POSIX-compliant file system CephFS (requires installation/configuration of Metadata Server), but for CloudStack, we only need Rados Block Device (RBD) services from Ceph.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.

Our first meetup of 2019 saw us at a new venue – Ticketmaster’s London HQ, and if you’re a music lover it certainly takes the prize for coolest meeting venue yet! Walls covered with pictures of rock stars and a stage complete with guitars and Marshall amps (not to mention pinball machines and a bar) created a real buzz of excitement before the meeting had even started. Once everyone had met up with friends, taken photos and finished lunch, Giles Sirett (CSEUG chairman) called the meeting to order, and kicked the day off with CloudStack news.

Giles talked us through the current and upcoming releases of CloudStack, and the new release of Cloudmonkey (6.0), before ‘unofficially’ announcing the new VP of Apache CloudStack – our very own Paul Angus! Moving onto market news, Giles introduced a thought-provoking topic, starting by referencing an article titled ‘What happened to OpenStack?’, before moving onto the different marketing approaches taken by the technologies.

We then heard about upcoming events – the next CSEUG will be in Sofia in June (register here), and we are currently looking for speakers. The CSEUG returns to London in October (and we are working with Ceph on making this another collaboration event), and we have CloudStack Collaboration Conferences in April (Brazil) and September (Las Vegas). Again – the Call For Participation is open for Las Vegas. All the information provided by Giles can be found by watching his talk:

Giles then introduced our first guest speaker onto the stage – Mike Rowell (Director, Platform Infrastructure) of our hosts Ticketmaster, with a talk titled ‘Our journey to a next generation cloud’. Mike did indeed take us on a journey, first explaining what challenges they needed to overcome, and what solutions they initially implemented, before discussing their investigations into a scalable cloud solution. These investigations led them to Apache CloudStack, and Mike went on to share what issues he experienced, as well as what other tools they use, such as Ansible, Terraform and Prometheus in the stack. Mike finished his talk by expanding on some features he would like to see in CloudStack.

Next to the stage was Bobby Stoyanov (ShapeBlue), talking us through some of the new features in CloudStack. These new features include: more sophisticated options for specifying pod and cluster while deploying a VM; running and retrieving diagnostics on the VR; sending additional configuration to VMs; and adding options to cleanup additional data disks when destroying a VM. It’s always great to hear about new features, and to see evidence of the continuing innovation and commitment to the project from the community. Bobby ‘dived deep’ on each feature, so I recommend you watch his talk:

After a short break, we welcomed Wido den Hollander from PCextreme, who talked about flexible networking for scaling a cloud environment,. As Wido explained – regular layer 2 VLANs have their limitations when it comes to scalability, and VXLAN overcomes these limitations, making it easier to scale out your CloudStack deployment. As of CloudStack 4.12, VXLAN can use IPv6, and Wido talked about Advanced networking + IPv6 + VXLAN which he is putting into production right now with the 4.12 release. As usual, Wido covered his topic comprehensively, and if you want to hear more, watch his talk:

We then welcomed Boyan Ivanov (Storpool) with his talk ‘Latency: the #1 metric of your cloud’. As Boyan pointed out – no two clouds are the same. However, the leading clouds all have one thing in common: they deliver on metrics, which matter to the customer. In this session Boyan examined and presented his findings on leading clouds, demonstrating why low latency is the thing that makes a cloud stand out. Watch Boyan’s talk:

Towards the end of Boyan’s talk we weren’t sure whether there would be a fifth talk, or we would be enjoying the hospitality of the Ticketmaster bar a little sooner than anticipated. All day, Grégoire Lamodière (DIMSI) had been struggling to get to London from Paris, due to disruption to Eurostar services. We had already moved his talk to the final slot of the day, and with just a few minutes to spare, he arrived! Grégoire’s talk was ‘Using message broker to extend cloud features’. As he explained, many use cases involve communication between CloudStack admin (provider) and instances (end user) regarding configuration, build and management. Grégoire presented the DIMSI team’s communication framework that enables managing user infrastructure on Windows and Linux systems from a centralized panel. Grégoire’s full talk is on our channel:

After the final ‘official’ talk of the day, Mike (playing the part of bar tender) opened the bar and we enjoyed a couple of drinks and the unofficial discussions started. We were then truly spoiled as Computacenter led us to a nearby pub and carried on buying the drinks! As usual, a fantastic event made so by the CloudStack community. Great attendance from all over Europe (including a heroic effort from Grégoire), and varied, interesting talks (thanks to Mike, Bobby, Wido, Boyan, and Grégoire). Huge thanks to Ticketmaster for hosting and providing a very cool venue (have I mentioned the slide?), and thanks to Computacenter for their generosity. We are already planning the next CSEUG which will be in Sofia, Bulgaria, on June 13 (registration open) – we are looking for talks, so if you want to come along and give a talk, please let me know at steve.roles@shapeblue.com. See you soon!

All the day’s talks were recorded, and are available on the ShapeBlue YouTube channel.

 

Our presenter’s slides can be found on SlideShare:

Giles: https://www.slideshare.net/ShapeBlue/giles-sirett-cloudstack-news

Mike: https://www.slideshare.net/ShapeBlue/mike-rowell-our-journey-to-a-next-generation-cloud

Bobby: https://www.slideshare.net/ShapeBlue/boris-stoyanov-some-new-features-in-apache-cloudstack

Boyan: https://www.slideshare.net/ShapeBlue/boyan-ivanov-latency-the-1-metric-of-your-cloud

Andrija Panic shares some thoughts on joining the ShapeBlue team

Hi there, this is Andrija from… well, ShapeBlue! I’ve been working here for a month now and I thought that I’d share my views of working for the company.

Before I move to the actual topic, let me share just a little bit of background about myself.

Before joining ShapeBlue, I was working as a Cloud System Engineer for two different Swiss-based Public Cloud providers, both utilizing CloudStack to provide IaaS services for local (Swiss) and international customers – many of which (as you can probably guess) were serious financial institutions (Switzerland being considered a big privacy and security center). We even had customers connecting all the way from South America to their infrastructure for daily business, all managed by CloudStack – and it just worked flawlessly!

During my time with the Swiss guys, I had the pleasure (with my colleagues) to lead and build their CloudStack infrastructure from scratch. Here I gained some serious knowledge and experience on this topic. I also had the opportunity to work with some nice storage solutions, from NetApp SolidFire distributed All-Flash Storage (providing block-level storage to CloudStack VMs), to Cloudian Hyperstore S3 Object Storage solution providing (you can guess by its name…) S3 object storage with 100% Native S3 API compatibility. Both solutions had their challenges of integration into existing environment and I was lucky enough to pull the strings here and lead the thing myself. Really fun time! Did I mention CloudStack? Yes, we did quite a decent job here, we made a lot of tweaks and improvements, migrations and decent customer support.

But after 5 years with CloudStack in a service provider environment , it was time for me to move on and improve my cloud building skills even more, so my next logical step was to pull Giles Sirett, ShapeBlue CEO, for a quick coffee on the last CloudStack Conference (I even didn’t have to pay for the coffee – it was a free one!). The rest is pretty much history – I’m now paving my way into consultancy as a  Cloud Architect at ShapeBlue.

After spending a month here at ShapeBlue, I can honestly say that I’m nothing short of being impressed with both the people (colleagues) and the processes inside ShapeBlue. I was already used to Swiss guys being strict and very well organized, but my feeling is that ShapeBlue has moved this to a whole new level. When I joined the company, besides having a dedicated colleague as a mentor (hi there Dag – thanks for all your help!) helping me to find my way around the company, I also got proper training on many different tools and processes used in company, from some internal infrastructure stuff, to customer support tools, processes and SLAs, to many different things in general. In fact , this was a revelation when compared to the  old RTFM-it-yourself way (stands for Read The [Insert asterisks ***] Manual), in case you were wondering) that I’d experienced at previous companies. The people at ShapeBlue are supportive, the working atmosphere is just great, with tons of seriousness across the board but with a healthy dose of (mainly) British humor in the middle of hard work – to make you wake up and warm up during these cold winter days. From time to time we even get cats jumping from our Slack channel.

After being mostly in a technical leadership position in my previous jobs, I’m now, for the first time in my professional carrier, part of the team with a more experienced guys than me – and I’m really happy about that – it’s always nice to be able to get some help in case you need advice – but individual initiative and engagement is something that is strongly respected in ShapeBlue. One of the interesting things is, that the guys in the ShapeBlue Leadership Team do actually listen to engineers and take their advice / opinion – something you don’t necessarily find in every company. It’s a very collaborative and not authoritative environment – a thing that everybody respects here.

So far, I have been tasked with quite a few interesting things to work on: from  delivering the famous ShapeBlue Bootcamp to one of our new colleagues, playing around with some more interesting CloudStack setups (with different hypervisors) and been included in some customer projects and support stuff – all in all a good start!

In case you are still following me, here come a few personal things about me:

I’m based in Belgrade, Serbia (for all you techies, that is 44.0165° N, 21.0059° E ) – a country known for good cuisine, but mostly for ćevapi and šljivovica (national drink). Serbia is also home to Novak Djokovic, the world No. 1 in men’s singles tennis (this is the guy who regularly beats Roger Federer, for the record!).

In my free time I’m hanging around with my 3 princesses and sometimes I manage to squeeze some time for gym, music or very light electronic projects.

Talk to you later, Andrija.

There was a definite feel of Christmas in the air in London as we made our way to last Thursday’s (December 13) winter meetup of the Cloudstack European User Group (CSEUG), and that only increased as we arrived at the BT Centre near St. Paul’s and saw the big Christmas tree in reception!

A great turnout for this, the last meetup of 2018, and a great representation of the CloudStack community in Europe with people travelling from Germany, Serbia, Glasgow, Switzerland and Latvia to name but a few. After a quick lunch we took our seats, and Giles Sirett (chairman of the user group) welcomed everyone and got the event started with introductions and CloudStack news.

Firstly, Giles spoke about software updates and new releases. CloudStack 4.11 is an LTS (long term support) release and included more than 250 new capabilities and a big step towards zero downtime upgrades, 4.11.2 has just been released (including 71 fixes), 4.11.3 is coming soon and 4.12 is in planning. Giles then mentioned CloudStack events starting with the recent CloudStack Collaboration Conference in September (Montreal), and events for 2019 – the next CSEUG in March (London), and the next Collaboration Conference in September (Las Vegas). During Giles’ presentation, Maurice Nettisheim (Head of Cloud Compute for BT) took to the stage to say a few words about BT’s ongoing use of CloudStack in their IaaS platform and their continued support and involvement in the CloudStack community.

Giles slides contain much more information:

After Giles, Paul Angus gave us an update on ShapeBlue’s CloudStack Container Service (CCS), giving us a walkthrough of the recently released update.This update brings CCS bang up-to-date by running the latest version of Kubernetes (v1.11.3) on the latest version of Container Linux. CCS also now makes use of CloudStack’s new CA framework to automatically secure the Kubernetes environments it creates. Paul’s talks and slides are always packed with detail:

Olivier Lambert of XCP-ng & Xen Orchestra took the floor next to tell us about the current state of the project. For those that are not familiar, XCP-ng is an opensource, community powered hypervisor based on Xen. It is easy to upgrade from XenServer (keeping all VMs, settings etc.), 100% API compatible, requires no license and has no feature restrictions.

Please take a look through Olivier’s slides for much more on this fascinating subject:

After a short break, we welcomed Ingo Jochim and Andre Walter (itelligence) with their talk entitled ‘How our cloud works’. They talked through full automation with Ansible for all infrastructure components of the cloud with CloudStack, check_mk, LDAP and more, with all functionality available through a customer portal, also covering how the setup is fully scalable for larger landscapes.
Ingo and Andre’s slides right here:

Next up was Adam Dagnall (Cloudian) with ‘Advanced S3 compatible storage integration in CloudStack’. To provide tighter integration between the S3 compatible object store and CloudStack, Cloudian has developed a connector to allow users and their applications to utilize the object store directly from within the CloudStack platform in a single sign-on manner with self-service provisioning. Additionally, CloudStack templates and snapshots are centrally stored within the object store and managed through the CloudStack service. The object store offers protection of these templates and snapshots across data centres using replication or erasure coding. Adam went into the feature-set in great detail, and his slides provide much more information:

Last talk of the day, and the honours fell to Andrija Panic (Hiag Data) with ‘CloudStack – 5 years in production’. Andrija shared real world experience of designing, deploying and managing a CloudStack public cloud, explaining how high availability for the CloudStack management components was implemented and discussing different storage technologies and networking models used, as well as the challenges faced. Andrija also presented alternate methods for deploying CloudStack as regards to regions / zones / pods, and also touched on physical networking, finally looking at the different CloudStack guest networking models available (from Basic Zone / Shared Networks to all the Advanced Zone’s networking models) and when to use each of them.
Andrija went into a lot of detail and I encourage you to look through his slides:

After Andrija had finished answering questions, Giles wrapped things up and we moved to a local pub, where I am pleased to say that conversation and collaboration continued into the night, with what rapidly became the unofficial ‘CloudStack Christmas Party’! Huge thanks to BT for providing a first-rate venue and lunch, and to all our speakers, who make these events so interesting and such a success.

The next CloudStack User Group meetup will be on Thursday, March 14, and will be hosted by our friends at Ticketmaster here in London. Please register here!

All the talks were recorded and will be made available shortly on the ShapeBlue YouTube channel.

Integration testing – What it is and why SDLC needs it.

What is Integration testing? This is a type of testing where multiple components are combined and tested working together. There are different aspects of integration testing depending on the project and component scale, but usually it comes down to validating that different modules can work together and / or independently. This type of testing drives one out of the tunnel vision one could develop while working on a complex task and gives feedback how the work integrates with rest of the system.

Integration testing in CloudStack

Integration testing in CloudStack is done using a python-based testing framework called Marvin. Marvin offers an API client and a structured test class model to execute different scenarios. Written in python, each CloudStack test class focuses on different functionality and contains multiple test cases to cover its features. Separated by the product severity within the /test/integration directory there are two separate sub-directories: smoke and component (https://github.com/apache/cloudstack). Smoke tests are focused only on the main features and most severe functionalities they offer, while component tests go deep into each feature and executes more detailed tests covering more corner cases.

What is the benefit of these tests?

Over the years, our so called “Marvin tests” have proven to be really valuable for validation of pull requests, release testing and other testing scenarios, saving hours of manual validation and testing. It’s also mostly agnostic to hypervisor, storage and networking, meaning it can be executed against different types with relatively the same success rate. The Marvin test pack comes with wide range of coverage for different hypervisor / plugin / network / storage, and other specifics.

Downside

Tests need maintenance – and lots of it. As the code base changes, the Marvin tests also need attention. Execution time is also worth mentioning here. Usually it takes on average about a day to complete a single component test run, while the best performing KVM tests can take about 8 hours. Marvin tests are usually very complex and rely on multiple components working together. They normally create a network and deploy a VM in it, within which they can work out the scenario. This is time consuming and different hypervisors perform differently.

Marvin

The Marvin test library comes out of the project and can be installed as a python package. When installed, it will require a running management server and a config file. The management server will be the API endpoint or test subject where all test scenarios will be executed, and the config file will contain all environment related details that are required (more info here: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Marvin+-+Testing+with+Python#Marvin-TestingwithPython-Installation). Marvin comes with several utilities that can be used while writing a test (eg., utilities for deploying a VM or creating a network), plus a large amount of test data to use and more. It also uses API documents to auto-generate its API references, so whenever you create a new API when building the Marvin package, it will automatically create an API reference, and the new API will be usable.

What’s new with Marvin

It’s fair to say that not much has gone on in the /marvin directory over the last couple of releases, but there’s a lot being done in terms of maintenance and new tests. Most new features in the latest releases of CloudStack come with a few Marvin tests to cover them. There were also great initiatives around 4.9 and 4.11 releases to fix the smoketests and make them healthier for the future. There are 300+ commits in the /test directory since the start of 4.9.

It’s always been time consuming to gather results for a certain code change quickly enough, and that’s why a new test attribute was introduced called ‘quick-test’. It aims to deliver quick results to the developer and help determine if their code is good enough to continue, or if further testing is required. Code changes can be found here: https://github.com/apache/cloudstack/pull/2209. Within the same PR, there’s further segmentation that goes through all the files under /test/integration/ and adds categories in each different file. For example, if a you want to test deployment of VMs, you can just execute label ‘deploy-vm’ and it will go through each file and search for test with the same attribute. This allows users to do further regression testing in combination with other components being tested at the same time.

About the author

Boris Stoyanov is Software Engineer in testing at ShapeBlue, the Cloud Specialists. Bobby spends his time testing features for the Apache CloudStack Community and for ShapeBlue clients.

Thursday, September 13 saw us back at the Early Excellence Centre, Canada Water, for the (late) Summer meetup of the CloudStack European User Group. As usual, a great turnout and representation of the community and Europe – with attendees traveling from Germany, Switzerland, Bulgaria, Latvia, Poland, and further afield from Ukraine. There were even a few of us there from the UK!

After we’d caught up with old friends and greeted new ones, we had a bite to eat and took our seats for the talks. Giles Sirett (ShapeBlue CEO and chairman of the CloudStack European User Group) was first up, starting with introductions, a run through the day’s agenda, and CloudStack news – and this past few months has seen lots of activity and development, including the release of the latest LTS branch of CloudStack (4.11), with 4.11.2 due soon. CloudStack 4.11 included more than 250 new capabilities, such as new host HA framework and Prometheus integration, whilst the 4.11.1 release brought us a step closer to ‘near zero downtime upgrades’ with a major refactor of the virtual router. Speaking of activity – approximately 800 downloads of CloudStack per month (in the last 6 months) shows continued strong adoption of the technology.

Giles then looked to the future, talking through upcoming events… and we were in Montreal for the CloudStack Collaboration Conference just last week! It was a fabulous event in a great city, and please see my blog for a roundup and some more information. Of course Giles also mentioned our next user group meetup – London, December 13, hosted by our friends at BT (London). Giles finished up with a call for users of CloudStack to talk more about it. For more information on that, and everything Giles talked about, here are his slides:

Giles then introduced our first featured speaker of the day – Paul Angus (VP Technology at ShapeBlue), with his talk: Backup & Recovery in CloudStack. As Paul explained – CloudStack users currently only have snapshots as a form of VM backup. With the Backup and Recovery Framework, end users will be presented with the features and functions that they have come to expect outside of ‘the cloud’, while cloud providers will be able to leverage the advantages of using enterprise backup and recovery products. In this talk, Paul explained some features of the forthcoming backup and recovery feature, the user experience and demonstrated the Veeam plugin working with the backup and recovery framework. This is a highly anticipated feature, and Paul’s slides are a treasure trove of information and detail:

Following Paul, Dag Sonstebo took control of the laser pointer. Dag is a Cloud Architect here at ShapeBlue and had chosen as his topic the CloudStack usage service. Dag started by explaining that the usage service is used to track consumption of resources in Apache CloudStack for reporting and billing purposes, before giving an overview of how the service is installed and configured. Dag then dived deeper into how data is processed from the database into the different usage types (VMs, network usage, storage, etc.), before being aggregated into billable units or time slices in the usage database.

The talk included several examples on how to query and report on this usage data, and looked at general maintenance and troubleshooting of the service. This really was a deep dive, as evidenced by Dag’s extensive slides:

After a brief interlude to grab coffee and some fresh air, next up was Olivier Lambert, the creator of Xen Orchestra and XCP-ng. Starting by talking about Citrix XenServer, Olivier explained why he developed an alternative that is truly open-source. He talked us through Xen Orchestra, before moving onto XCP-ng – a fork of XenServer removing all restrictions that were put in place with the free Citrix version. This is an exciting project, already proven and widely adopted. Olivier and his team continue to develop new functionality with a fast-growing community and have an exciting roadmap in place for future development. Olivier’s slides from his presentation are right here:

After Olivier we welcomed Vladimir Melnik to the podium (all the way from Ukraine, and I think the person who traveled the furthest). Vladimir is a co-founder of the first IaaS provider in Ukraine – Tucha, and his talk was ‘Building a redundant CloudStack management cluster’. Starting with a brief history of Tucha, Vlad covered building and maintaining an open-source-driven clustered environment for the Apache CloudStack management server with GNU Linux, HAProxy, HeartBeat, Bind, OpenLDAP and other tools. Vladimir’s slides are both entertaining and very interesting:

The honour of the last talk of the day fell to Boyan Ivanov of Storpool, providing advice on building software-defined clouds. Boyan posed the question ‘why software defined?’ and went on to answer the question quite comprehensively! Infrastructure is becoming more and more ‘software defined’, and Boyan illustrated how this should mean increased profitability, putting forward the business case for a software defined stack. Boyan was then good enough to provide several tactical tips and a free reference design!

Take a look through Boyan’s slides:

Once Boyan had finished taking questions, we all headed out to the nearest hostelry, and conversation continued into the night. A nice touch (indicative of the great CloudStack community) was that when it was time to say goodbye most people said ‘see you in Montreal’!

Thanks to Early Excellence for providing a first-class venue and refreshments, and huge thanks to the day’s speakers – Paul, Dag, Olivier, Vlad and Boyan, all of whom were good enough to donate their time, and in most cases travel great distances to share their expertise.

The next meetup of the CloudStack European User Group will be in London, on Thursday, December 13 and you can register here. We are always looking out for speakers with interesting and relevant subjects, and if you are interested in talking, please contact us.

All talks were recorded in full and can be found on our ShapeBlue YouTube channel:

Giles Sirett: https://youtu.be/Ls_HakbyxUU

Paul Angus: https://youtu.be/ZVThUKPeC_w

Dag Sonstebo: https://youtu.be/I5I7eduWHRQ

Olivier Lambert: https://youtu.be/KWBCKvwvnUc

Vladimir Melnik: https://youtu.be/aBNMysDoi5w

Boyan Ivanov: https://youtu.be/wt4pqTZ57OY

Thanks for reading, and I hope to see you at the next event!


We’re here in Montreal for the CloudStack Collaboration Conference, and it’s been a fantastic event with more to come! We’ve had two full days of back to back talks over two tracks, with subjects ranging from storage, billing and diagnostics through to containers, automation and monitoring… and everything in between. Mike Tutkowski (CloudStack VP) set the tone with his keynote at the beginning of the first day, asking the question ‘why are we here?’ The answer? To learn, work together​, share ideas​ and share problems. These fundamentals are what makes for a great community, and what makes Apache CloudStack such a great product. We have never really known just how widely adopted CloudStack is, so we have (for the first time) undertaken some in-depth analysis which Mike shared. In the last 12 months CloudStack management server packages were downloaded 116,796 times from 21,202 different IP addresses. We think this means that worldwide there are about 20,000 CloudStack clouds in production! Mike also mentioned several organisations that have recently adopted CloudStack, including Ticketmaster, from whom we saw a talk illustrating how they deployed their global cloud environment using Apache CloudStack.

The CloudStack community is full of smart, committed, talented people passionate about what they do, and this is clear from the quality and delivery of the talks, and the collaboration before and after. They aren’t just repeating facts or reading what has been written for them – they are talking from first hand experience, often about features and functionality they have personally developed and committed to the project. Thanks to the community, CloudStack is constantly being improved and developed by these real-world users and operators.

So we’re into day three, which means no more CloudStack talks. However – as I said, the event is far from finished. Today (Wednesday) we have an all-day hackathon – a room full of people working together on shared goals and ideas, the sole purpose to talk and share new ideas, and make CloudStack even better!

Every time I attend a CloudStack conference, I am privileged to spend time with a community who genuinely enjoy what they do, and I come away having made new friends, and having learnt something new. I am already excited about next year’s event, and seeing some of our new friends in London at our next CloudStack meetup (December 13).

Sincere thanks to the Apache Software Foundation (our conference co-locates every year with Apachecon). It’s always a well organised and well attended event, and we are delighted to be associated with it. Thanks also to the city of Montreal – a beautiful city which I hope to visit again soon.

All the CloudStack talks were recorded and will be published to Apache.org and our YouTube channel very soon.