System VM and Virtual Router Zero Downtime Upgrade l CloudStack Feature First Look
Apache CloudStack has always been easier to upgrade than many of its competitors, but a common pain point is that when a new release of Apache CloudStack is deployed, the operations team must organize maintenance windows to allow the redeployment of every customer’s VR. Depending on the number of existing networks, planning and execution can be time-consuming, especially in cases of mission-critical customer services, often requiring scheduling of the VR upgrade on a case-by-case basis. Also, to a much lesser extent, when upgrading system VMs, secondary storage-related and proxy console services have some downtime.
With this new feature, the VR and CloudStack System VMs do not have to go through the process of complete removal and redeployment involving shutdown, resource release, system VM template copy from Secondary to the Primary Storage, VM starting and final configuration.
System VM live patching
Underpinning the Zero Downtime Upgrade is another new feature: ‘System VM live patching’. This feature can be used independently of CloudStack upgrades and allows administrators and users to apply software updates to Virtual Routers and System VMs on the fly. Before this new feature, all Apache CloudStack scripts managing the System VM were stored in an ISO image and mounted during the first boot and then copied to the System VM OS. Now the update is performed dynamically, and if the base OS remains constant, users don’t need to recreate it.
A System VM upgrade involves two entities: the System VM Template based on current Debian stable distribution and the CloudStack package scripts.
A System VM Template should be updated when:
– The previous Template reaches end-of-life or,
– The latest available Template addresses essential security issues or,
– The Template has some fixes to be made in the Template’s built-in scripts.
Before CloudStack 4.17, the deployment process included updating the System VM Template when a new release is installed, thus forcing the admin to rebuild all System VMs and VRs. However, there were cases where the Debian version remains the same as the previous version and the CloudStack package scripts were modified. In this case, the admin needed to rebuild all of the System VMs and VRs and this particular scenario is a good example of when all system VMs need not be recreated causing longer downtime.
System VM Zero Downtime Upgrade
Zero Downtime is very similar to the previous system VM process, however, it doesn’t require a restart of the System VMs. The following steps occur when live patching a system VM in this new improvement:
– The latest software packages (agent.zip and cloud-scripts.tgz) are copied using SCP to the system VM along with the patch-sysvms.sh script which initiates the patching.
– When running patch-sysvms.sh script, the following steps are executed:
o A backup of the existing packages and certificates are performed;
o The System VM is patched with the latest packages;
o The services specified in /var/cache/cloud/enabled_svcs are restarted;
o It is checked that everything went well and the checksum of the last cloud-scripts.tgz and fills in the value in /var/cache/cloud/cloud-scripts-signature;
o If the restart services fails, then the previous package version is reverted, restoring the previous backup.
A new API call patchSystemVM was introduced to enable live-patching of the Secondary Storage VM (SSVM) and Console Proxy VM (CPVM). When live-patching the VR and Internal LBs, the restartNetwork API call has been extended providing the new parameter livePatch, which should be set to True considering other parameters like cleanup=False and makereduntant=False. The network service will be restarted
to ensure that all the port-forwarding, load-balancing rules and public IPs are re-applied.
The following services will be restarted once a system VM is live patched:
|SSVM||cloud, apache2, portmap|
|VRs||haproxy, apache2, dnsmasq|
Many related systemvm.iso issues were reported by CloudStack users in previous releases including:
– The systemvm.iso path is not consistent across hypervisors, for VMware, it gets uploaded to the Secondary Storage, while for others hypervisors it is placed on the hosts or management server.
– The systemvm.iso is ejected / disconnected from the system VM after boot-up, and this causes issues with picking up the latest systemvm.iso when the System VM is stopped and then started.
To address these issues and to make live patching smooth, the systemvm.iso file dependency was removed and the agent.zip and cloud-scripts.tgz files are copied to the System VM filesystem during boot up. Now, the cloudstack-common package doesn’t include the systemvm.iso file. The software packages (agent.zip and cloud-scripts.tgz) required for patching the system VMs can be found at /usr/share/cloudstack-common/vms on the CloudStack management servers and on KVM hypervisor hosts. On XenServer hosts, these files can be found at /opt/xensource/packages/resources/.
Regarding the UI, when a new live patch is available, both, admin and regular users can update the package scripts by clicking on the related option.
– Root Admin
– Root admin and regular users
When is live patch considered?
The live patch can be performed from 4.14 and above:
|ACS Version||Upgrade Version||Live Patching Support||Reason / Comment|
|<=4.13||4.17+||No||Update in the openJDK version|
|4.14||4.17+||Yes||May notice some issue with remove access VPN due to older version of Strongswan used in 4.14 template|
This makes a CloudStack upgrade an easier decision for operators as downtime related to VRs from users of critical systems was the biggest impediment. In addition, an in-place upgrade of CloudStack virtual routers can be performed with zero downtime.
This feature will be available in Apache CloudStack 4.17.0 LTS and above.