Posts

Aside from traditional storage solutions, CloudStack has supported managed storage for some time. In this article, we will touch on SolidFire support in CloudStack 4.13 and lay out the exact steps needed to add SolidFire to CloudStack as Primary Storage (for VMware, KVM and XenServer). We will also explain the difference between the “SolidFire” and “SolidFireShared” plugins and discuss their use cases.

There will be a follow up article covering different feature sets that different hypervisors have when it comes to using SolidFire as Primary Storage, and we’ll also examine the way things work under the hood.

SolidFire 101

SolidFire has been around for many years, and the fact that it was acquired by NetApp (in early 2016) speaks for itself. SolidFire is an iSCSI-based, all-flash, distributed SAN solution, providing granular QoS on a per-LUN basis. A minimal cluster consists of 4 nodes, and newer generations of SolidFire models are able to provide 100,000 IOPS per single node. That means up to 400,000 IOPS per 4-node SolidFire cluster in just 4U of rack space (all IOPS figures assume a 4K IO size).

Different models of SolidFire nodes are available – currently 3 models (all with 100,000 IOPS per node). The differences between models are the size of SSDs and the amount of system memory / read cache. For more info on the different node models available, please visit https://www.netapp.com/us/products/storage-systems/all-flash-array/solidfire-scale-out.aspx.

Importantly, SolidFire supports mixing and matching nodes. So – if you are short on space, you can add bigger nodes to your cluster, whilst if you are short of IOPS, you can expand your cluster with smaller nodes. As a distributed SAN it has the advantage of being able to scale very well.

A great aspect of SolidFire is its granular, per-volume QoS. For each volume (LUN) created on the cluster, you can set its minimum, maximum and burst IOPS values / limits. Let’s briefly explain this:

  • Min IOPS: defines a guaranteed IOPS performance in normal conditions and in most failure / expansion scenarios. This means that having a dead SSD / node, or expanding the cluster with additional nodes (with data being redistributed) will not influence a client’s IOPS as the iSCSI client will always be able to reach its Min IOPS for a given volume.
  • Max IOPS: defines the maximum sustained IOPS performance for a volume. This means that if the client is (eg) benchmarking, the sustained IOPS numbers will be equal to the volume’s Max IOPS.
  • Burst IOPS: defines the allowed burst IOPS performance for a volume / LUN. This is very useful for VM reboots, DB backups and similar scenarios which require short IO bursts. A volume accrues 1 second of burst credit (up to a maximum of 60 seconds) for every second that the volume runs below its Max IOPS limit.

Regarding Max and Burst IOPS limits – they are just limits. It’s not guaranteed that the volume / LUN can achieve those numbers if your cluster is very busy. Those limits will be reached (when required by client / application) if the cluster has enough “unused” IOPS, as in the following example:

  • If your cluster has 400,000 IOPS of capacity but is only using 250,000, that leaves 150,000 to be consumed across the cluster, meaning that a single volume (if so configured) may theoretically achieve up to 150,000 IOPS

For volume QoS limits, it’s advisable to follow the user guide for the version of Element Software (formerly Element OS) that you are running on your nodes. Currently, those limits (Element Software v11.3) are as follows:

  • Min IOPS per volume: cannot exceed 15,000
  • Max IOPS per volume: cannot exceed 200,000

For other limits, please consult the Element Software User Guide.

SolidFire Plugins for CloudStack

There are 2 plugins: “SolidFire” and “SolidFireShared”.

SolidFire 1:1 plugin

The “SolidFire” plugin (referred to here as “SolidFire 1:1”) provides a 1:1 mapping between a CloudStack volume and a SolidFire volume (LUN), and the QoS you want for that specific CloudStack volume is configured on the SolidFire volume (LUN).

For each CloudStack volume created, it will do the following:

  • For VMware, create a dedicated VMware Datastore for each CloudStack volume.
  • For XenServer, create a dedicated XenServer SR for each CloudStack volume.
  • For KVM, create a “dedicated” iSCSI session for each CloudStack volume on a KVM host, effectively passing-through the iSCSI LUN (SolidFire volume) to a VM.

The main benefit of this plugin is that for each CloudStack volume you can set QoS as defined via Compute / Disk Offerings in the Storage QoS section. The plugin will take the “Min IOPS” and the “Max IOPS” setting (Burst IOPS is preconfigured as a multiplier of the maximum IOPS) – and will send those values to the SolidFire cluster’s API, so the values are set on the SolidFire volume / LUN. This way, CloudStack (via the plugin) manages the volumes on the SolidFire cluster, thus the name “Managed Storage”.

The downside of this plugin is that the number of Datastores (VMware) and SRs (XenServer) is limited to a relatively low value (native hypervisor limitations):

  • VMware 6.5 – maximum of 512 datastores per cluster (hard limit)
  • XenServer 6.x-8.0 – soft limit of 256 SRs (users have tested up to 500-600 SRs, but the time to mount new SRs becomes considerably higher with that many SRs as well as the time to reboot a host)
  • No particular limits for KVM

This means that for VMware and XenServer you cannot have more than ~500 volumes per cluster, but since volumes are stored on the datastore / SR, you can create VM snapshots. For KVM it’s not possible to create VM snapshots, since the iSCSI LUN is passed-through to the VM, so there is no QCOW2 file(s) in play – and KVM VM snapshots are only possible with QCOW2 files (i.e. not possible with any RAW block storage).

SolidFireShared plugin

The “SolidFireShared” plugin provides a many:1 mapping; ie. many CloudStack volumes on a single SolidFire volume, providing an alternative way to organize CloudStack volumes on SolidFire-based Primary Storage, and partially solving the scalability issues that exist when using the SolidFire 1:1 plugin (explained in the previous section). This plugin only supports VMware and XenServer.

Adding Primary Storage to CloudStack using the SolidFireShared plugin will result in the following:

  • For VMware, a new datastore being created immediately, formatted with VMFS5 and mounted on all ESXi hosts in the cluster.
  • For XenServer, a new SR being created immediately, using LVM (lvmoiscsi) and attached to all XenServers in a pool / cluster
  • All volumes will be placed on this shared LUN (datastore/SR).

With this plugin you can have a single datastore / SR for many CloudStack volumes and thus the number of volumes can be greater than the ~500 volumes with the SolidFire 1:1 plugin. However, with this setup, QoS is defined per whole datastore / SR, not per single CloudStack volume.

The SolidFireShared plugin requires that the Primary Storage be added as cluster-wide, i.e. zone-wide Primary Storage is not supported with this plugin (nor would it make much sense due to the native hypervisor limits).

As you can guess, you could do this setup manually (without using the SolidFireShared plugin), However, the plugin automates these steps making it less error-prone than the manual process.

If doing everything manually, the steps are as follows (the first 4 steps are done via the SolidFire UI or API):

  • create an Account (linked to your CloudStack installation).
  • create a list of allowed iSCSI initiators, which means all of your hosts in the specific cluster (get initiators IQN from your hypervisor hosts).
  • create a large enough SolidFire Volume with desired QoS.
  • create an Access Group, adding all previously created Initiators and the Volume to it.
  • Add an iSCSI-based Datastore / SR to Vmware / XenServer via vCenter / XenCenter.
  • Add new Primary Storage in CloudStack; for VMware use “VMFS” as a protocol and specify the previously created datastore name; for XenServer use PreSetup as a protocol and specify the previously created SR.

VMware setup

Before heading out to the CloudStack GUI and adding SolidFire / SolidFireShared-based Primary Storage, make sure that you:

  • Have an iSCSI Software adapter enabled on all ESXi hosts in the cluster.
  • Have done proper network binding of the iSCSI adapter to the correct vSwitch, so that your ESXi hosts will have an IP in the same VLAN as the SolidFire SVIP (Storage VIP).

Adding SolidFire 1:1-based Primary Storage

If adding zone-wide storage, set hypervisor=Any parameter (this is required for all hypervisor types).

CloudMonkey command to add zone-wide Primary Storage:


create StoragePool scope=zone zoneid=af61811f-3ca6-4927-ab0d-5bb6d693e3e7 hypervisor=Any name=SF121zonewide provider=SolidFire managed=true capacityBytes=107374182400 capacityIops=10000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;clusterDefaultMinIops=1000;
clusterDefaultMaxIops=2000;clusterDefaultBurstIopsPercentOfMaxIops=2" tags=SF121ZONE

(NOTE: due to a very long URL parameter value, we have broken the URL value into multiple lines for readability – otherwise it should be a single line with no spaces)

For cluster-wide Primary Storage, syntax is slightly different:


create StoragePool scope=cluster zoneid=af61811f-3ca6-4927-ab0d-5bb6d693e3e7 podid=954065ed-a173-4c52-9f6f-062cd9b17ddb clusterid=72750371-a6ce-4d97-b567-1a9aefc416f8 name=SF121clusterwide provider=SolidFire managed=true capacityBytes=107374182400 capacityIops=10000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;clusterDefaultMinIops=1000;
clusterDefaultMaxIops=2000;clusterDefaultBurstIopsPercentOfMaxIops=2;datacenter=Trillian" tags=SF121cluster

Most of the parameters are self-explanatory, but some do need an additional explanation:

  • The “capacityBytes” parameter is the logical / virtual size you want to deliver to CloudStack from the SolidFire cluster. The sum of the volumes, snapshots, and templates that reside on this Primary Storage cannot exceed “capacityBytes”. SolidFire performs compression and deduplication as well as leveraging thin provisioning, so the actual space used is usually much better than the sum of these virtual sizes.
  • “capacityIops”, in similar fashion, defines the total IOPS capacity that can be consumed on the SolidFire side (the sum of the Min IOPS (min_iops as visible in the cloud.volumes table in CloudStack DB)). The sum of all volumes created in CloudStack cannot exceed this value.
  • “MVIP” and “SVIP” (Management VIP and Storage VIP)
  • “clusterDefaultMinIops” and “clusterDefaultMaxIops” are default values that a CloudStack volume will get if there was no Min IOPS and Max IOPS values specified in the Compute / Disk offering.
  • “clusterDefaultBurstIopsPercentOfMaxIops” defines a Burst IOPS and is a decimal multiplier of the Max IOPS.
  • “datacenter” needs to point to the specific VMware datacenter; storage tags are optional.

We have shared the API calls in the above examples, but you can also use the GUI.

When creating Compute / Disk offerings, make sure to define “storage” as the type of QoS as shown on the image below. You can also set Min and Max IOPS for the volume here – these values will be taken from this offering and passed to the SolidFire API (via the plugin), so that the desired QoS is set on the SolidFire volume / LUN.

NOTE:  For both VMware and XenServer, you should set the “Hypervisor Snapshot Reserve” value (expressed as a percentage of the volume size). In the example above, for a 100GB volume, 200% is set (200GB) so the datastore (SolidFire volume / LUN) will be 300GB. If we didn’t set the value for this setting, the datastore would be created with the same size as the volume (100GB in this example) and taking VM snapshots would be impossible, since there would be no free space on the datastore. Since all SolidFire volumes are thinly provisioned, there is zero difference on the actual space consumption on the SolidFire cluster if the datastore is 100GB or 1TB, so make sure to take that into account.

Adding SolidFireShared-based Primary Storage

As previously mentioned, Primary Storage based on the SolidFireShared plugin can only be cluster-wide, so there are no variations regarding the scope parameter when it comes to the API call:


create StoragePool scope=cluster zoneid=af61811f-3ca6-4927-ab0d-5bb6d693e3e7 podid=954065ed-a173-4c52-9f6f-062cd9b17ddb clusterid=72750371-a6ce-4d97-b567-1a9aefc416f8 name=SFSHARED provider=SolidFireShared managed=false capacityBytes=107374182400 capacityIops=15000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;minIops=15000;
maxIops=100000;burstIops=100000;datacenter=Trillian" tags=SFSHARED

Note the slightly different URL syntax than the one used with the SolidFire 1:1 plugin.

Some of the chosen parameters need an explanation:

  • “capacityBytes” is the size of the SolidFire volume / LUN, so make it a big number.
  • “capacityIops” needs to be the same value as the “minIops” (part of the “url” section), and as already mentioned, a single SolidFire volume cannot have more than 15,000 for its Min IOPS.
  • “maxIops” and “burstIops” may not exceed 100,000 IOPS, but you can later set these values up to the volume’s limits (200,000 IOPS currently) in the SolidFire UI.

When choosing the value for “capacityBytes” (which translates to the size of the datastore), make sure to consider any additional size needed for VM / volume snapshots.

Once you have added SolidFireShared-based Primary Storage, you’ll need to create Compute / Disk offerings as usual, but this time without defining QoS on the Storage level in the Compute / Disk offerings (as we are not managing QoS on SolidFire any further, beside setting it initially during the creation of the Primary Storage). Also, it’s not necessary to define the “Hypervisor Snapshot Reserve” value, since this parameter is only consumed by the SolidFire 1:1 plugin when creating a datastore for each volume. These settings apply for both VMware and XenServer.

XenServer setup

Before trying to add SolidFire to CloudStack, make sure that you have configured your XenServers’ networks in such a way that they can access the SVIP of the SolidFire cluster. That usually means creating an additional network on the storage VLAN and creating an IP address on that network.

Once your XenServer hosts can communicate with the SVIP of the SolidFire cluster, you are ready to add a new SolidFire Primary Storage.

Adding SolidFire 1:1-based Primary Storage

As already stated in the VMware setup guide, make sure to set hypervisor=Any parameter in your API call for creating a zone-wide Primary Storage. The syntax is pretty much the same as for VMware.

CloudMonkey command to add zone-wide Primary Storage:


create StoragePool scope=zone zoneid=d2e2da70-204c-42b3-84d1-07917a2383a7 hypervisor=Any name=SF121zonewide provider=SolidFire managed=true capacityBytes=107374182400 capacityIops=10000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;clusterDefaultMinIops=1000;
clusterDefaultMaxIops=2000;clusterDefaultBurstIopsPercentOfMaxIops=2" tags=SF121ZONE

For cluster-wide Primary Storage, the syntax is slightly different – the difference to the VMware setup is the absence of the “datacenter” parameter in the URL:


create StoragePool scope=cluster zoneid=d2e2da70-204c-42b3-84d1-07917a2383a7 podid=711b8d51-8f67-4b89-8e68-7d7a28a013b0 clusterid=b98afe80-9614-48b3-aba1-b79624086bb9 name=SF121clusterwide provider=SolidFire managed=true capacityBytes=107374182400 capacityIops=10000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;clusterDefaultMinIops=1000;
clusterDefaultMaxIops=2000;clusterDefaultBurstIopsPercentOfMaxIops=2" tags=SF121cluster

If some of the parameters used in the API call are unclear, please check the VMware setup guide above, where you’ll find detailed explanations for each parameter.

For the Compute / Disk offering parameters that are needed specifically when using the SolidFire 1:1 plugin, please also see the corresponding section in the VMware setup guide – same “rules” apply here.

Adding SolidFireShared-based Primary Storage

Again, Primary Storage based on the SolidFireShared plugin can only be cluster-wide, so there are no variations when it comes to the scope parameter of the API call – same syntax as with VMware, we just skip the “datacenter” parameter in the URL:


create StoragePool scope=cluster zoneid=d2e2da70-204c-42b3-84d1-07917a2383a7 podid=711b8d51-8f67-4b89-8e68-7d7a28a013b0 clusterid=b98afe80-9614-48b3-aba1-b79624086bb9 name=SFSHARED provider=SolidFireShared managed=false capacityBytes=107374182400 capacityIops=15000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;minIops=15000;
maxIops=100000;burstIops=100000" tags=SFSHARED

In regard to the explanation of important parameters as well as different notes on the Compute/Disk offerings, please see the VMware setup for the SolidFireShared plugin above, which explains those in detail.

KVM setup

KVM, being a pretty much “unmanaged” hypervisor, is a bit different in terms of what you can do with it, and it’s much easier to make low-level changes as required. In that sense, the SolidFire 1:1 plugin works perfectly well and thus there is no need for SolidFireShared plugin support – though you can always do the big-shared-iSCSI-LUN-with-clustered-(God-forbid)-file-system yourself if you really want to.

Adding SolidFire 1:1-based Primary Storage

Before trying to add SolidFire-based Primary Storage, make sure to do the following:

  • Attach the proper storage VLAN with an IP address to all KVM hosts, so that the SolidFire SVIP is reachable.
  • Install an iSCSI initiator on all KVM hosts with yum install iscsi-initiator-utils or apt-get install open-iscsi. This will create the following file: /etc/iscsi/initiatorname.iscsi, which contains the IQN of the host (that will later be added to the cloud.host table, “url” field – this happens with all hypervisors).

You can set up both zone-wide and cluster-wide Primary Storage, as in the case of VMware and XenServer. The parameter “hypervisor” should still be set to “Any” (though the plugin will not complain if you set “hypervisor=KVM”, but will still set it to “Any” internally in the database):

CloudMonkey command to create zone-wide Primary Storage:


create StoragePool scope=zone zoneid=06938de4-0a5b-46f9-bbe7-5a264f43d4eb hypervisor=Any name=SF121zonewide provider=SolidFire managed=true capacityBytes=107374182400 capacityIops=10000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;clusterDefaultMinIops=1000;
clusterDefaultMaxIops=2000;clusterDefaultBurstIopsPercentOfMaxIops=2" tags=SF121ZONE

Again, the syntax for cluster-wide Primary Storage is slightly different – but otherwise identical to XenServer syntax:


create StoragePool scope=cluster zoneid=06938de4-0a5b-46f9-bbe7-5a264f43d4eb podid=62717320-3fc4-4c53-9345-c53eba516710 clusterid=79064c12-659a-4886-8c4d-5ee38c842a0f name=SF121clusterwide provider=SolidFire managed=true capacityBytes=107374182400 capacityIops=10000 url="MVIP=10.10.10.10;SVIP=10.254.10.10;clusterAdminUsername=admin;
clusterAdminPassword=password;clusterDefaultMinIops=1000;
clusterDefaultMaxIops=2000;clusterDefaultBurstIopsPercentOfMaxIops=2" tags=SF121cluster

If some of the parameters used in the API call are unclear, please check the VMware setup guide, where you’ll find detailed explanations for each important parameter.

For the Compute / Disk offering parameters, it’s still required to set Min and Max IOPS as the Storage Quality Of Service parameters – but there is no need to define “Hypervisor Snapshot Reserve”, since there is no datastore / SR with KVM, and VM snapshots are not supported, so nothing to reserve space for.

This concludes this part of the SolidFire article series. In the next part, we cover different feature sets that different hypervisors have when it comes to using SolidFire as Primary Storage, and we’ll also examine the way things work under the hood.

About the author

Andrija Panic is a Cloud Architect at ShapeBlue, the Cloud Specialists, and is a committer and PMC member of Apache CloudStack. Andrija spends most of his time designing and implementing IaaS solutions based on Apache CloudStack.
We would like to thank Mike Tutkowski , Senior Software Developer at SolidFire who implemented the SolidFire plugin in CloudStack, for his review and help with this article.

 

ShapeBlue SA are pleased to announce the extension of their distribution partner agreement for NetApp in South Africa, building out a successful relationship that started in 2014.

‘ShapeBlue has built a strong partnership with NetApp in this region. Expanding our capabilities to represent the full NetApp portfolio presents a strategic opportunity for us and our partners.’ Says Dan Crowe, Managing Director, ShapeBlue SA.

‘NetApp’s vision, depth of solutions and cloud-centric approach continues to differentiate them. We are seeing a fantastic response, in particular to the Cloud Infrastructure portfolio with HCI and the Cloud Data Services portfolio.’

ShapeBlue, as expert builders of clouds bring a unique insight to both service provider and integrator partners as they develop services, and work with customers on transformation projects.

ShapeBlue believe a new generation of NetApp partners can accelerate strategic initiatives across sectors and harness the true value of data insights.

ShapeBlue will offer SA based partners access to the full NetApp range of solutions, professional services and sales and marketing collaborations.

ShapeBlue have recently expanded office premises in both Cape Town and Johannesburg, with worldwide software engineering now based here in SA. “We’re excited about our newly expanded partnership with NetApp and looking forward to the next step in our evolution.” Concludes Crowe.

About ShapeBlue

ShapeBlue are the leading worldwide independent CloudStack integrator, with offices in London, Bangalore, Rio De Janerio, Mountain View CA, Cape Town and Johannesburg.
Services include consulting, integration, training and infrastructure support

Introduction

ShapeBlue have been working on a new feature for Apache Cloudstack 4.11.1 that will allow users to bypass secondary storage with KVM. The feature introduces a new way to use templates and ISOs, allowing administrators to use them without being cached on secondary storage. Using this approach Cloudstack administrators will not have to worry about massive secondary storage, since it will be simple bypassed, there won’t be any template sitting there waiting. As well it’s bypassing the SSVM since the download task will not be carried on by the SSVM, but the KVM agent itself. This will enable administrators not to spare resources for SSVM, but to use them for commercial purposes. The usual process of virtual machine deployment will stay as before.

Overview

This feature adds a new field in the vm_template table which is called ‘direct_download’. The field will determine if template needs to be downloaded by SSVM (in case of ‘0’), or directly on the host when deploying the VM (in case of ‘1’). CloudStack administrators will have the option to set this field through the UI or API call as described in the following examples:

From the UI:

From Cloudmonkey:

register template zoneid=3e80c1e6-0710-4018-9062-194d6b3bab97 ostypeid=6f232c75-5370-11e8-afb9-06354a01076f hypervisor=KVM url=http://dl.openvm.eu/cloudstack/macchinina/x86_64/macchinina-kvm.qcow2.bz2 format=QCOW2 displaytext=TestMachina name=TestMachina directdownload=true

The same feature applies to ISOs as well – they don’t need to be cached on secondary storage but can be directly downloaded by the host. CloudStack admins have this option available on the API call when registering ISOs and through the UI form as well.

Whenever a VM deployment is started the template will be downloaded on primary storage. The feature actually checks if the template/ISO has been already downloaded on the pool, checking template_spool_ref table. If there’s an entry on the table matching its pool ID and the template ID, then it won’t be downloaded again. The same action applies if the running VM requires the template again (eg. when reinstalling ). Please note that due to the direct download nature of this feature, the uniqueness of the templates across primary storage pools is the responsibility of the CloudStack operator. CloudStack itself can’t detect if the files in a template download URL have changed or not.

Metalinks are also supported for this feature, and administrators can be more flexible in terms of managing their templates as they can set priorities and location preferences in the metalink file. Metalinks are effectively xml that provides URLs for downloading files. The duplicate download locations provide reliability in case one method fails. Some clients also achieve faster download speeds by allowing different chunks/segments of each file to be downloaded from multiple resources at the same time. Please see the following example:

As the example shows, CloudStack administrators can set location preference and priority, which will be considered upon VM deployment. The deployment logic itself introduces a retry mechanism in 2 cases of failures: VM deployment failure and template download failure.

VM deployment retry logic: this will initiate the deployment on a suitable host and will try to deploy it (which includes the template download itself). If the deployment fails for some reason it will retry the deployment on another suitable host.

Template download retry logic: this is part of the VM deployment and will try to download the template/iso directly by the host. If it fails for some reason (e.g. URL not available) it will iterate through the provided list of priority and location. Once download is completed it will execute the checksum validation (if provided), if that one fails it will download it again, until it has made three attempts. If all three attempts unsuccessful it will return a deployment failure and go back to VM Deployment logic.

Please see the following simplified picture of the deployment logic:

Since the download task has been delegated to the KVM agent instead of SSVM, this feature will be available only for KVM templates.

About the author

Boris Stoyanov is Software Engineer in testing at ShapeBlue, The Cloud Specialists. Bobby spends his time testing features for the Apache CloudStack Community and for our ShapeBlue clients.

Last year we had a project which required us to build out a KVM environment which used shared storage. Most often that would be NFS all the way and very occasionally Ceph.   This time however the client already had a Fibre Channel over Ethernet (FCoE) SAN which had to be used, and the hosts were HP blades using shared converged adaptors in the chassis- just add a bit more fun.

A small crowbar and a large hammer later, the LUNs from the SAN were being presented to the hosts. So far so good.  But…

Clustered File Systems

If you need to have a volume shared between two or more  hosts, you can provision the disk to all the machines, and everything might appear to work, but each host will be maintaining its own inode table and so will be unaware of changes other hosts are making to the file system, and in the event that writes ever happened to the same areas of the disk at the same time you will end up with data corruption. The key is that you need a way to track locks from multiple nodes.  This is called a Distributed Locking Manager or DLM and for this you need a Clustered File System.

Options

There are dozens of clustered file systems out there, proprietary and open source.
For this project we needed a file system which;

  • Supported on CentOS6.7
  • Open source
  • Supports multi-path
  • Easy to configure not a complex group of Distributed Parallel Filesystems
  • Need to support concurrent file access and deliver the utmost performance
  • No management node over head, so more cluster drive space.

So we opted for OCFS2 (Oracle Clustered File System 2)

Once you have the ‘knack’, installation isn’t that arduous, and it goes like this…

These steps should be repeated on each node.

1. Installing the OCFS file system binaries

In order to use OCFS2, we need to install the kernel modules and OCFS2-tools.

First we need to download and install the OCFS2 kernel modules for CentOS 6.  Oracle now bundles the OCFS2 kernel modules in its Unbreakable Kernel, but they also used to be shipped with CloudStack 3.x so we used those.

wget http://shapeblue.s3.amazonaws.com/ocfs2-kmod-1.5.0-1.el6.x86_64.rpm"
rpm -i ocfs2-kmod-1.5.0-1.el6.x86_64.rpm 

Next we copy the OCFS2 kernel modules into the current running kernel directory for CentOS 6.7

cp -Rpv /lib/modules/2.6.32-71.el6.x86_64/extra/ocfs2/ /lib/modules/2.6.32-573.3.1.el6.x86_64/extra/ocfs2

Next we update the running kernel with the newly installed modules.

depmod –a

Add the Oracle yum repo for el6 (CentOS 6.7) for the OCFS2-tools

cd /etc/yum.repos.d
wget http://public-yum.oracle.com/public-yum-ol6.repo" 

And add the PKI keys for the Oracle el6 YUM repo

cd /etc/pki/rpm-gpg/
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle-ol6 

Now we can install the OCFS2 tools to be used to administrate the OCFS2 Cluster.

yum install -y ocfs2-tools

Finally we add the OCFS2 modules into the init script to load OCFS2 at boot.

sed -i "/online \"\$1\"/a\/sbin\/modprobe \-f ocfs2\nmount\-a" /etc/init.d/o2cb

2. Configure the OCFS2 Cluster.

OCFS2 cluster nodes are configured through a file (/etc/ocfs2/cluster.conf). This file has all the settings for the OCFS2 cluster. An example configuration file might look like this:

cd /etc/ocfs2/
vim cluster.conf

node:
ip_port = 7777
ip_address = 192.168.100.1
number = 0
name = host1.domain.com
cluster = ocfs2

node:
ip_port = 7777
ip_address = 192.168.100.2
number = 1
name = host2.domain.com
cluster = ocfs2

node:
ip_port = 7777
ip_address = 192.168.100.3
number = 2
name = host3.domain.com
cluster = ocfs2

cluster:
node_count = 3
name = ocfs2

We will need to run the o2cb service from the /etc/init.d/ directory to configure the OCFS2 cluster.

/etc/init.d/o2cb configure
Load O2CB driver on boot (y/n) [y]: y

Cluster stack backing O2CB [o2cb]: ENTER
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ENTER
Specify heartbeat dead threshold (=7) [31]: ENTER
Specify network idle timeout in ms (=5000) [30000]: ENTER
Specify network keepalive delay in ms (=1000) [2000]: ENTER
Specify network reconnect delay in ms (=2000) [2000]: ENTER

Update the iptables rules to allow the OCFS2 Cluster port 7777 on all the nodes that we have installed:

iptables -I INPUT -p udp -m udp --dport 7777 -j ACCEPT
iptables -I INPUT -p udp -m udp --dport 7777 -j ACCEPT
iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 7777 -j ACCEPT
iptables-save >> /etc/sysconfig/iptables
Restart the iptables service
service iptables restart 

3. Setting up Linux file system

First we create a directory where the OCFS2 system will be mounted.

mkdir –p /san/primary/ 

We need to format the mounted volume as OCFS2. This only needs to be run on ONE of the nodes in the cluster.

mkfs.ocfs2 -L OCFS2_label -T vmstore --fs-feature-level=max-compat /dev/sdd -N (number of nodes +1) 

The options work like this:
-L Is the Label of the OCFS2 cluster
-T What will the cluster be used for, type of Data
-fs-feature-level making OCFS2 compatible with older versions

4. Update the Linux FSTAB with the OCFS2 drive settings.

Next we had the following line to /etc/fstab to mount the volume at every boot.

 /dev/sdd /san/primary _netdev,nointr 0 0

5. Mount the OCFS2 cluster.

Once the fstab has been updated we’ll need to mount the volume

mount -a

This will give us a mount point on each node in this cluster of /san/primary. This mount point is backed by the same LUN in the SAN, but most importantly the filesystem is aware that there are multiple hosts connected to it and will lock files accordingly.

Each cluster of hosts would have a specific LUN (or LUNs) which is would connect to.  It makes life a lot simpler if you are able to mask the LUNs from SAN such that only the hosts which will connect to a specific LUN can see that LUN, as this helps to avoid any mix ups.

Adding this storage into CloudStack

In order for the KVM hosts to utilise this storage in a CloudStack context, we must add the shared LUNs as primary storage in CloudStack. This is done by setting the storage type to ‘presetup – SharedMountPoint’ when adding the primary storage pools for these clusters.  The mountpoint path should be specified in the way that they will be seen locally by the KVM hosts; in this case – /san/primary.

Summary

In this article we looked at the requirement for a Clustered File System when connecting KVM hosts to a SAN and how to configure OCFS2 on CentOS6.7

 

About The Authors

Glenn Wagner is  a Senior Consultant / Cloud Architect at ShapeBlue, The Cloud Specialists. Glenn spends most of his time designing and implementing IaaS solutions based on on Apache CloudStack.

Paul Angus is VP Technology & Cloud Architect at ShapeBlue. He has designed and implemented numerous CloudStack environments for customers across 4 continents, based on Apache CloudStack.
Some say; that when not building Clouds, Paul likes to create Ansible playbooks that build clouds. And that he’s actually read A Brief History of Time.

Paul Angus, Cloud Architect at ShapeBlue takes an interesting look at how to separate Cloudstack’s management traffic from its primary storage traffic.

I recently  looked at physical networking in a CloudStack environment and alluded to the fact that you cannot separate primary storage traffic from management traffic from CloudStack, but that it is still possible. In this article I will discuss why this is and how to do it.

 In the beginning, there was primary storage

The first thing to understand is the process of provisioning primary storage. When you create a primary storage pool for any given cluster, the CloudStack management server tells each hosts’ hypervisor to mount the NFS share or (iSCSI LUN). The storage pool will be presented within the hypervisor as a datastore (VMware), storage repository (XenServer/XCP) or a mount point (KVM), the important point is that it is the hypervisor itself that communicates with the primary storage, the CloudStack management server only communicates with the host hypervisor.

Now, all hypervisors communicate with the outside world via some kind of management interface – think VMKernel port on ESXi or ‘Management Interface’ on XenServer. As the CloudStack management server needs to communicate with the hypervisor in the host, this management interface must be on the CloudStack ‘management’ or ‘private’ network. There may be other interfaces configured on your host carrying guest and public traffic to/from VMs within the hosts but the hypervisor itself doesn’t/can’t communicate over these interfaces.

hypervisorcomms
Figure 1: Hypervisor communications

Separating Primary Storage traffic

For those from a pure virtualisation background, the concept of creating a specific interface for storage traffic will not be new; it has long been best practice for iSCSI traffic to have a dedicated switch fabric to avoid any latency or contention issues.

Sometimes in the cloud(Stack) world we forget that we are simply orchestrating processes that the hypervisors already carry out and that many ‘normal’ hypervisor configurations still apply.

The logical reasoning which explains how this splitting of traffic works is as follows:

1. If you want an additional interface over which the hypervisor can communicate (excluding teamed or bonded interfaces) you need to give it an IP address

2. The mechanism to create an additional interface that the hypervisor can use is to create an additional management interface

3. So that the hypervisor can differentiate between the management interfaces they have to be in different (non-overlapping) subnets

4. In order for the ‘primary storage’ management interface to communicate with the primary storage, the interfaces on the primary storage must be in the same CIDR as the ‘primary storage’ management interface.

5. Therefore the primary storage must be in a different subnet to the management network

subnetting storage

Figure 2: Subnetting of Storage Traffic

hypervisorcomms-secstorage

Figure 3: Hypervisor Communications with Separated Storage Traffic

Other Primary Storage Types

If you are using PreSetup or SharedMountPoints to connect to IP based storage then the same principles apply; if the primary storage and ‘primary storage interface’ are in a different subnet to the ‘management subnet’ then the hypervisor will use the ‘primary storage interface’ to communicate with the primary storage.

Summary

This article has explained the how primary storage traffic can be routed over separate network interfaces from all other traffic on the hosts by adding a management interface on the host for storage and allocating it and the primary storage IP addresses in a different subnet to the CloudStack management subnet.

About the Author

Paul Angus is a Cloud Architect at ShapeBlue, The Cloud Specialists. He has designed numerous CloudStack environments for customers across 4 continents, based on Apache Cloudstack ,Citrix Cloudplatform and Citrix Cloudportal.

 

When not building Clouds, Paul likes to create scripts that build clouds

..and he very occasionally can be seen trying to hit a golf ball.