Posts

Integration Testing within CloudStack – Marvin

, , , ,

Integration testing – What it is and why SDLC needs it.

What is Integration testing? This is a type of testing where multiple components are combined and tested working together. There are different aspects of integration testing depending on the project and component scale, but usually it comes down to validating that different modules can work together and / or independently. This type of testing drives one out of the tunnel vision one could develop while working on a complex task and gives feedback how the work integrates with rest of the system.

Integration testing in CloudStack

Integration testing in CloudStack is done using a python-based testing framework called Marvin. Marvin offers an API client and a structured test class model to execute different scenarios. Written in python, each CloudStack test class focuses on different functionality and contains multiple test cases to cover its features. Separated by the product severity within the /test/integration directory there are two separate sub-directories: smoke and component (https://github.com/apache/cloudstack). Smoke tests are focused only on the main features and most severe functionalities they offer, while component tests go deep into each feature and executes more detailed tests covering more corner cases.

What is the benefit of these tests?

Over the years, our so called “Marvin tests” have proven to be really valuable for validation of pull requests, release testing and other testing scenarios, saving hours of manual validation and testing. It’s also mostly agnostic to hypervisor, storage and networking, meaning it can be executed against different types with relatively the same success rate. The Marvin test pack comes with wide range of coverage for different hypervisor / plugin / network / storage, and other specifics.

Downside

Tests need maintenance – and lots of it. As the code base changes, the Marvin tests also need attention. Execution time is also worth mentioning here. Usually it takes on average about a day to complete a single component test run, while the best performing KVM tests can take about 8 hours. Marvin tests are usually very complex and rely on multiple components working together. They normally create a network and deploy a VM in it, within which they can work out the scenario. This is time consuming and different hypervisors perform differently.

Marvin

The Marvin test library comes out of the project and can be installed as a python package. When installed, it will require a running management server and a config file. The management server will be the API endpoint or test subject where all test scenarios will be executed, and the config file will contain all environment related details that are required (more info here: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Marvin+-+Testing+with+Python#Marvin-TestingwithPython-Installation). Marvin comes with several utilities that can be used while writing a test (eg., utilities for deploying a VM or creating a network), plus a large amount of test data to use and more. It also uses API documents to auto-generate its API references, so whenever you create a new API when building the Marvin package, it will automatically create an API reference, and the new API will be usable.

What’s new with Marvin

It’s fair to say that not much has gone on in the /marvin directory over the last couple of releases, but there’s a lot being done in terms of maintenance and new tests. Most new features in the latest releases of CloudStack come with a few Marvin tests to cover them. There were also great initiatives around 4.9 and 4.11 releases to fix the smoketests and make them healthier for the future. There are 300+ commits in the /test directory since the start of 4.9.

It’s always been time consuming to gather results for a certain code change quickly enough, and that’s why a new test attribute was introduced called ‘quick-test’. It aims to deliver quick results to the developer and help determine if their code is good enough to continue, or if further testing is required. Code changes can be found here: https://github.com/apache/cloudstack/pull/2209. Within the same PR, there’s further segmentation that goes through all the files under /test/integration/ and adds categories in each different file. For example, if a you want to test deployment of VMs, you can just execute label ‘deploy-vm’ and it will go through each file and search for test with the same attribute. This allows users to do further regression testing in combination with other components being tested at the same time.

About the author

Boris Stoyanov is Software Engineer in testing at ShapeBlue, the Cloud Specialists. Bobby spends his time testing features for the Apache CloudStack Community and for ShapeBlue clients.

CloudStack Test Automation with Trillian and Jenkins

, , , ,

In the previous post, we introduced and described Trillian that can build various environments in which we could deploy a CloudStack zone and run Marvin based integration tests. In this post, we’ll describe how we are using Jenkins and Trillian to test CloudStack builds in various environments.

Build Pipeline

Jenkins-Trillian-pipeline

Our build pipeline can be seen in the attached flow-diagram on the right, it consists of the following:

  • The Cloudstack git repository.
  • A Jenkins job for building CloudStack deb/rpm packages for Ubuntu 14.04, CentOS6 and CentOS7. Another Jenkins job for building CloudStack systemvm templates.
  • A staging packages repository server for hosting the deb/rpm packages and the systemvm templates.
  • A Jenkins job which uses Trillian to deploy a configurable integration environment on a parent CloudStack environment as a project. This currently supports deploying environments that are based on one of the three most popular hypervisors — on KVM (CentOS6, CentOS7, and Ubuntu 14.04 based hosts), XenServer (6.2 SP1 and 6.5 SP1) and VMware (5.5 U3 and 6.0 U1).
  • A semi-automated Jenkins job to trigger testing of the build.
  • Hipchat notification integration to receive updates from various Jenkins jobs.

Using Trillian with Jenkins

Trillian nests CloudStack environments in a parent CloudStack deployment.  Within the parent CloudStack environment, it creates a project and it deploys the following VMs into it:

  • A management server VM to running the CloudStack management server, the usage server and the MySQL server,
  • One (1) or more VMs to running nested-hypervisors
  • (Optional) A Marvin VM to execute integration tests.

Shared NFS storage is used to host primary and secondary storage pools at different paths for the CloudStack environment being tested.

When a build is kicked, the Jenkins job pulls the latest Trillian from the Trillian git repository, it then applies a cloudstack.ini file in Ansible directory that contains the url to the parent CloudStack management server, api key and secret key of the parent CloudStack environment. Next, specific to the parent CloudStack environment the Ansible/group_vars/all file is patched, this file contains configuration specific to the parent CloudStack environment such as the default credentials to use for various hypervisor hosts, mysql server, VM root password,  networking related information, shared NFS storage related information, base CloudStack repository url, urls for systemvmtemplates, urls various other VM templates, Hipchat notification token, and various global settings for the management server being deployed.

trillian-jenkins-job

Our Trillian Jenkins job is parameterized job that accepts some of the above options before the build job is started. These options are then used to first generate the Trillian host config and then this config is used to deploy the environment by running an ansible playbook with suitable options, for example:

$ ansible-playbook generate-cloudconfig.yml –extra-vars “env_name=$CONFIG_NAME env_version=$CLOUDSTACK_VERSION mgmt=$MS_COUNT db=0 $HYPERVISOR_OPTS hv=$HYPERVISOR_COUNT mgmt_os=$MS_OS env_accounts=all pri=$PRIMARY_STORAGE_COUNT sec=$SEC_STORAGE_COUNT baseurl_cloudstack=$CLOUDSTACK_REPO wait_till_setup=yes build_marvin=yes” -i localhost

$ ansible-playbook deployvms.yml -i ./hosts_$CONFIG_NAME –extra-vars “env_name=$CONFIG_NAME env_version=$CLOUDSTACK_VERSION”

We also have a Trillian-destroy Jenkins job that takes in the Trillian environment ID and runs an Ansible playbook to destroy the deployed VMs and related resources.

Testing CloudStack with Trillian

To make testing easier for CI systems such as Trillian and Bubble we have introduced two new CloudStack packages: cloudstack-marvin and cloudstack-integration-tests. By building Marvin and the integration-tests as a deb/rpm package we make it easier for CI systems to install Marvin and run the integration tests specific to that git tag/sha or github pull-request.

The Trillian Jenkins job allows us to deploy a test environment with various permutation and combination of compute, storage and network technologies from a specific CloudStack repository that was built from a specific git tag or SHA. The build pipeline described in previous section allows us to (1) start the building and testing process by providing it a git tag or SHA, which (2) triggers building of the CloudStack repository, which (3) on completion triggers deployment of a bunch of preset Trillian environments, which (4) on deployment starts a Jenkins job that runs the integration tests and gathers report.

 

About the author

Rohit Yadav is a Software Architect at ShapeBlue and an Apache CloudStack committer and PMC member. He has been contributing to CloudStack since 2012, some of the recent features he has developed for CloudStack includes metrics view, out-of-band management, dynamic roles. He is also the author and maintainer of CloudMonkey.

Trillian: Flexible, On-Demand Cloud Environment Creation

, , , , , , , , ,

Marvin: “I think you ought to know I’m feeling very depressed.”
Trillian: “Well, we have something that may take your mind off it.”
Marvin: “It won’t work, I have an exceptionally large mind.“

Trillian was born from the need for us to create environments which we could run CloudStack’s Marvin test framework against, but the variety of uses for a versatile tool to create cloud environments on demand quickly became clear. We have used nested virtualisation to hand-craft cloud environments for quite a while now, however we needed automation around it so that environments could be created quickly, easily and consistently. Taking inspiration from the Hitchhiker’s Guide To The Galaxy, we grabbed our towels and started work.

We started with a list of high level use cases:

  • Test new feature software builds (manually and via Marvin)
  • Test community releases (manually and via Marvin)
  • Replicate failure scenarios for support clients
  • Evaluate new features
  • Evaluate complementary technologies

 

On top of this we had a number of other requirements:

  • Environments should be as close to production as possbile
  • Support as many hypervisors as possible
  • Support as many CloudStack environment permutations as possible
  • Support ‘multi-tenancy’ such that a number of clouds can be created and or running at the same time
  • Easy to connect to external integration points such as SolidFire storage, NetScalers, Cloudian S3 installations, etc.
  • ‘Hard code’ as little as possible
  • Make it portable/replicable so that we could share it
  • Minimise the number of tools/technologies used
  • Enable CI/CD integration (particularly with Jenkins)
  • Reasonably quick deployment times

 

So Trillian became a tool to build realistic, fully functioning CloudStack cloud environments from a simple command line statement, while still fulfilling the further requirements which we’d set out.

Now that Trillian is at a V1.0 stage, we’d like to share our work for anyone to use, and/or contribute to.

What we used

For an underlying hypervisor that could reliably support nested virtualisation, KVM or ESXi were the standout choices. Our development team’s go-to hypervisor is KVM, but after a certain amount of kernel hacking, the number of workarounds that we were needing to employ made us look to my personal favourite again: ESXi. Using a few tricks that I have come across in the past, I knew that we could relatively simply create fully functioning nested clouds.

So our next question was orchestration. Well, we have the world’s best cloud orchestration platform at our fingertips (ahem) – CloudStack. But we still needed to drive it with something and there was still a lot of configuration to do. The flexibility that we required ruled out creating templates for every possible hypervisor and mgmt VM that we might need, so we went to another personal favourite – Ansible.

Ansible: The King of Config Management and Automation

Ansible allowed us to keep the number of tools down to pretty much 1. We considered additional tools such as Packer, Teraform etc, but we felt we were just adding more and more layers and dependencies, when we could just use Ansible.

Our great friend Rene Moser added CloudStack modules into Ansible 2.0 and 2.1, giving us the ability to create individual projects to put each nested cloud in, and to create the nested cloud VMs directly from Ansible. With the updates and fixes in Ansible 2.1 we are able to configure vSphere environments on-the-fly as well. Simple SSH connections to XenServer and KVM hosts put them at our mercy, ditto the CloudStack management hosts, MySQL hosts, and Marvin hosts which all run on CentOS or Ubuntu.

Once we could create and configure every individual component it became a matter of stitching it all together in a way that remained flexible.

How it works

Our setup is split into two parts.  First is our generate-cloudconfig play.  This takes a set of Ansible extra-vars, does some checks that they seem valid and then generates a host file and a groupvars file that describe the environment that you require. For very complicated environment architectures these files can be manually changed to reflect components or architectures which cannot be described on a simple command line. An example instantiation of the play might be:
ansible-playbook generate-cloudconfig.yml -i localhost --extra-vars "env_name=cs49-vmw55-pga env_version=cs49 mgmt_os=6 hvtype=v vmware_ver=55u3 hv=2 pri=2 env_accounts=all"
This example creates a project called “cs49-vmw55-pga” based on CloudStack 4.9, with a Centos6(.8) mgmt server and 2 ESXi 5.5u3 hypervisor hosts. The cluster will have 2 primary storage pools and all accounts will have permission to access the project. Trillian understands that a vSphere environment requires a vCenter server as well, and adds that to the inventory.

A global variable file holds the mappings of CloudStack versions, host hypervisor types/versions and system VM URLs, abstracting many variables away from the user.  However EVERY variable can be overridden in the extra-vars.  For instance, we can specify a specific repo to build the management server from using baseurl_cloudstack=http://10.2.0.4/shapeblue/cloudstack/testing/ or sec=2 would create 2 secondary storage pools.

Trillian’s Intelligence

There is one aspect that we haven’t covered here, and that’s the creation of a zone using these components that actually works, particularly when multiple clouds are in existence at the same time.

One way around this is to encapsulate (or nest) the nested clouds so that they can each use the same IP space without tripping over each other. However, this does cause a massive performance hit and complicates deployment somewhat. But the main problem for us with that approach is that we need the environments to appear as close to a production cloud as possible, which includes ‘direct access’ to the public and management networks.

So we used two techniques. The first was to create a shared network on VLAN 4095 and have the parent CloudStack present that to nested hypervisors for guest and public traffic. VLAN 4095 causes ESXi to trunk all VLANs, allowing us to pass guest traffic between nested guest VMs, even if they’re on different physical hosts. The second was to create a (MySQL) database of VLAN and management/public IP address ranges. The IP address ranges for system VMs share a common gateway, but crucially do not overlap. We then request a range of guest VLANs and IP address ranges for public and management networks for a new environment and we are returned unused ranges, ensuring that all cloud environments can co-exist. Once an environment has served it’s purpose, the play to remove the VMs also marks the used ranges as available again in the database.

The Heavy Lifting

Once we’ve created the environment configuration files and been assigned a guest VLAN range and public and management IP ranges we now can build and configure everything.

First we build all of the VMs. These can include mgmt server(s), dedicated MySQL server(s) (default is to run MySQL on the primary mgmt server), KVM hosts, XenServer hosts, ESXi hosts, vCenter host and/or a Marvin host.

We next install CloudStack and MySQL, and then KVM + CloudStack agent on the KVM hosts or configure the XenServer hosts and create a pool or configure the ESXi hosts and add them to the vCenter as a cluster.

Next we create the primary and secondary storage pools and seed the relevant system VM template files.

The final step is to create the zone on the mgmt server. This is done using an Ansible template of a Cloudmonkey script which takes all of the environment variables and produces a ready-to-run script, which Ansible kindly then does. The Ansible template allows loops and conditionals, so the template can deal with any number of hosts or storage pools.

The result is a CloudStack environment which has running system VMs and is happily downloading the default templates.

For test and development purposes we have the additional arguments build_marvin=yes and wait_till_setup=yes  These build a Marvin host and generate the Marvin cfg file based on the deployed environment, while wait_till_setup will poll for the system VMs to be in an ‘Up’ state and only return when the environment is fully ready.

Tips and Tricks

There were a number of tweaks we did here and there, and we’ll be updating out documentation to reflect them, in order to make the journey as easy as possible for anyone else who would like to use this. As a taster, we:

  • Enabled promiscuous mode and forged transits on the parent hosts to allow traffic to from nested guest VMs and then added VMware labs’ dvfilter on nested hosts to protect network performance
  • Created a GeneraliseXenServer script to allow us to clone XenServers
  • Created a GeneraliseESXi script to allow us to clone ESXi hosts
  • Enabled vmware.nested.virtualization in the parent CloudStack
  • Found that VMXNet3 vNICs are great, except for nested hosts, where they do weird and wonderful things. E1000 is steady but sure.

Summary

So, Trillian gives us an extremely flexible way to quickly build cloud environments for a multitude of purposes. We’re at version 1.0, and it’s now at a quality that we’re ready to opensource it, and share it with anyone who can make use of it.

We still have more plans, including Hyper-V support and we’re working on our documentation and making it as simple as we can to create the parent CloudStack configuration and templates.

If you’d like to have a look, download or submit a pull request please go here: https://github.com/shapeblue/Trillian

 Acknowledgements

Trillian was initially developed by Paul Angus, Dag Sonstebo and Glenn Wagner

About The Author

Paul Angus is VP Technology & Cloud Architect at ShapeBlue, The Cloud Specialists. He has designed and implemented numerous CloudStack environments for customers across 4 continents, based on Apache Cloudstack.

Some say that when not building Clouds, Paul likes to create Ansible playbooks that build clouds. And that he’s actually read A Brief History of Time.