Share:

async agent cloudstack

Async Agent Command Reconciliation | CloudStack Feature First Look

Apache CloudStack 4.21 introduces Async Agent Command Reconciliation, a mechanism designed to improve the reliability and accuracy of long-running operations (such as Instance and Volume migrations) when interruptions occur involving the Management Server, Agent, or the network. The Feature tracks and reconciles key Commands—CopyCommand, MigrateCommand, and MigrateVolumeCommand—by utilizing Agent heartbeats and a new reconciliation workflow. This ensures that resource states remain consistent across CloudStack and KVM Hosts, even following crashes or restarts.

 

Feature Overview

Alongside the Cloudstack Management Server, CloudStack provides a service called CloudStack Agent, which is installed on each KVM Hosts (for example RHEL, RockyLinux, AlmaLinux, Ubuntu and Debian Hosts). When CloudStack creates, updates or deletes resources on the KVM Hosts, the CloudStack Management Server sends an internal command to a CloudStack Agent. When CloudStack Agent receives the command, it processes the command and return the answer to the Management server. The CloudStack Management Server then processes the answer upon receipt. This standard workflow is described below:

old Command Answer Workflow cloudstack

Problem description

The standard process highly relies on the stability of communication and services. When the communication link is unstable or service is not running, several issues can arise, especially for asynchronous jobs that take a long time. Apache CloudStack 4.21 introduces the new feature “Async Agent Command Reconciliation” to increase the stability of this process and, subsequently, the accuracy of resource states.

The following scenarios lead to inconsistencies,

  • Intermittent connection failure, which prevents the CloudStack Management Server from receiving the CloudStack Agent’s answer.
  • Agent crashes or is restarted while some jobs are still being processed in the backend. (e.g., by third parties).
  • CloudStack Management Server is restarted while some jobs are being processed on the Agent side.

 

As a consequence of these failures, the CloudStack Management Server cannot proceed with the answer and update the state of the resources. This leads to inconsistent resource states across CloudStack Management, the Agent, or network and storage components.
 

Introducing Terminology: Reconcile Command

Virtual machine Instances and Volumes are critical resources in cloud environments. Since the migration of Instances and Volumes may take a long time, maintaining accurate information is essential when the previously described events occur.

In Apache CloudStack 4.21, a new term, “Reconcile Command”, is introduced to improve the accuracy of Instance and volume information. A Reconcile Command is an internal command which can be reconciled when an error occurs during the process.

Currently, there are three Reconcile Commands:

Command Description
CopyCommand This command is used when copying a Template or Volume between Primary Storage and Secondary Storage. It is also used in some scenarios to migrate Volumes from Primary Storage to another Primary Storage.
MigrateCommand This command is used when migrating a running Instance (with or without Volumes) to another storage pool.
MigrateVolumeCommand The command is used when migrating a Volume between storage pools in some scenarios.

 

Supported Operations and Storage Pools

The Feature has been tested for online Instance Migration and online/offline Volume Migration on the following storage pools:

  • Local to NFS storage
  • NFS to Local storage
  • Local to Local storage
  • NFS to NFS storage
  • PowerFlex to PowerFlex storage

 

Global settings

The feature is disabled by default. Several Global Settings are available to control the Feature:

Name of Global Configuration Default Value Description
reconcile.commands.enabled false Indicates whether the background task to reconcile the commands is enabled or not.
reconcile.commands.interval 60 Interval (in seconds) for the background task to reconcile the commands.
reconcile.commands.max.attempts 30 The maximum number of attempts to reconcile the commands.
reconcile.commands.workers 100 The Number of worker threads to reconcile the commands

 

Solution/New Workflow

For Non-Reconcile Commands, the process remains unchanged. For Reconcile Commands, the process is integrated with the Agent heartbeat, which runs periodically (depends on the Global Setting ping.interval).

new command answer Workflow cloudstack

The main differences in the workflow are:

  1. The Management Server creates a record in the reconcile_commands table before sending the Reconcile Command to the Agent.
  2. The Agent updates the Command/answer in a JSON file while processing the command.
  3. The Agent syncs with Management Server by sending heartbeat (PingCommand) every 60 seconds (based on ping.interval)
  4. When the Management Server receives the heartbeat (PingCommand) from the Agent, it updates the information of Reconcile Commands in the database and sends the Commands in PingAnswer to the Agent.
  5. When the Agent receives the PingAnswer from the Management Server, it removes the JSON file if the state is COMPLETED or FAILED.

If the Agent crashes, is restarted, or Management server is restarted:

  • The Management Servers will reconcile the Commands by sending an internal ReconcileCommand to Agents. This checks the state of resources on the KVM Hosts or on storage.
  • Once the state is determined, Management Server updates the information of Instance of Volume.

 

Conclusion

With the new Feature, “Async Agent Command Reconciliation” in Apache CloudStack 4.21, CloudStack Users will receive more accurate information regarding Instances and Volumes when unexpected events occur involving CloudStack Management Server, CloudStack Agent, or their connection.

 

References

https://github.com/apache/cloudstack/pull/10514

https://cwiki.apache.org/confluence/display/CLOUDSTACK/Async+Agent+Command+Reconciliation

Share:

Related Posts:

ShapeBlue