Recently we’ve seen a few clients being tripped up by the System VM upgrades during the CloudStack upgrades in multi-zone deployments.

The issue occurs when the System VM is pre-deployed individually to multiple zones, rather than being a single template deployed to multiple zones. This may be done in error or because the specific version of CloudStack/CloudPlatform which you are upgrading from does not allow you to provision the template as zone-wide.

The result is that only the system VMs in one zone recreate properly after the upgrade and the other zones do not recreate the SSVM or CPVM when they are destroyed.

The vm_template table will have entries like this for a given System VM template:

 SELECT id,name,type,cross_zones,state FROM cloud.vm_template WHERE name like '%systemvm-xenserver%' AND removed IS NULL; 

idnametypecross-zonesstate
634 systemvm-xenserver-4.5SYSTEM 0Active
635 systemvm-xenserver-4.5SYSTEM 0Active
636 systemvm-xenserver-4.5SYSTEM 0Active

And template_store_ref:

 SELECT template_store_ref.id,template_store_ref.store_id,template_store_ref.template_id,image_store.data_center_id,image_store.name FROM template_store_ref JOIN image_store ON template_store_ref.store_id=image_store.id WHERE template_store_ref.destroyed=0; 

idstore_idtemplate_iddata_center_idname
216341
326352
436363

The tables should actually show the same template ID in each store:

idnametypecross_zonesstate
634systemvm-xenserver-4.5SYSTEM1Active

and template_store_ref:

idstore_idtemplate_iddata_center_idname
21634 1
32634 2
43634 3

In the case of the first table, CloudStack will take the template with id of 636 as being THE region-wide System VM template, and will not be able to find it in zones 1 and 2. and therefore not be able to recreate System VMs in those zones.  The management server log will report that the zone is not ready as it can’t find the template.

The above example assumes that there are 3 zones, with 1 secondary storage pool in each zone, if there are multiple secondary storage pools, then only one in each zone will have initially received the template.

You could edit the path in the template_store_ref to point to the location where the template has been downloaded to on the secondary storage, however, this will leave the system in a non-standard state.

The safest way forward is to set one of the templates to cross_zones=1 and then coax CloudStack into creating an SSVM in each zone (one at a time), which will then download that cross-zone template.

Remediation

In this example I’ll use template 634 to be the ultimate region-wide template.

In order to download the cross_zone template we need to make one template the region-wide template at a time. So assuming that zone relating to secondary storage pool 3 upgraded properly, we first make template 635 the region-wide template. By changing the database to this:

UPDATE cloud.vm_template SET state='Inactive', cross_zones=1 WHERE id=634; UPDATE cloud.vm_template SET state='Active' WHERE id=635; UPDATE cloud.vm_template SET state='Inactive' WHERE id=636;

idnametypecross_zonesstate
634systemvm-xenserver-4.5SYSTEM 1Inactive
635systemvm-xenserver-4.5SYSTEM 0Active
636systemvm-xenserver-4.5SYSTEM 0Inactive

Zone 2 will now be able to recreate the CPVM and SSVM, while zone 3 CPVM and SSVM are already running so are unaffected.
Once the CPVM and SSVM in zone 2 are up, you can now make template 634 the region-wide system VM template, by editing the database like this.

UPDATE cloud.vm_template SET state='Active' WHERE id=634; UPDATE cloud.vm_template SET state='Inactive' WHERE id=635;

idnametypecross_zonestate
634systemvm-xenserver-4.5SYSTEM1Active
635systemvm-xenserver-4.5SYSTEM0Inactive
636systemvm-xenserver-4.5SYSTEM0Inactive

The CPVM and SSVM in zone 1 will now be able to start.

The SSVM in zones 2 and 3 will need their cloud service restarting in order to prompt the re-scanning of the templates table. This can be done by a restart or stopping then starting the cloud service on the SSVM – DO NOT DESTROY the SSVM in zones 2 or 3 as they don’t have the correct template yet. The SSVMs in zones 2 and 3 will now start downloading template id 634 to their secondary storage pools.

Once the template 634 has downloaded to zones 2 and 3 (it was already in zone 1), CloudStack will be able to recreate system VMs in any zones.

Summary

This article explains how to recover from the upgrade failure scenario when system VMs will not recreate in zones in a multi-zone deployment.

About The Author

Paul Angus is VP Technology & Cloud Architect at ShapeBlue, The Cloud Specialists. He has designed and implemented numerous CloudStack environments for customers across 4 continents, based on Apache Cloudstack ,Citrix Cloudplatform and Citrix Cloudportal.

When not building Clouds, Paul likes to create Ansible playbooks that build clouds