Software based agent LB for CloudStack

, ,

Introduction

Last year we implemented a new CA Framework on CloudStack 4.11 to make communications between CloudStack management servers it’s hypervisor agents more secure. As part of that work, we introduced the ability for CloudStack agents to connect to multiple management servers, avoiding the usage of an external load balancer.

We’ve now extended the CA Framework by implementing load balance sorting algorithms which are applied to the list of management servers before being sent to the indirect agents.  This allows the CloudStack management servers to balance the agent load between themselves, with no reliance on an external load balancer. This will  be available in CloudStack 4.11.1 The new functionality also  introduces the notion of a preferred management server for agents, and a background mechanism to check and eventually connect to the preferred management server (assumed to be the first on the list the agent receives).

Overview

The CloudStack administrator is responsible for setting the list of management servers to connect to and an algorithm (to sort the management servers list) from the CloudStack management server using global configurations.

Management server perspective

This feature uses (and introduces) these configurations:

  • ‘indirect.agent.lb.algorithm’: The algorithm to be applied to the list of management servers on ‘host’ configuration before being sent to the agents. Allowed algorithm values are:
    • ‘static’: Each agent receives the same list as provided on ‘host’ configuration. Therefore, no load balancing performed.
    • ’roundrobin’: The agents are evenly spread across management servers
    • ‘shuffle’: Randomly sorts the list before being sent to each agent.
  • ‘indirect.agent.lb.check.interval’: The interval in seconds after which agent should check and try to connect to its preferred host.

Any changes to these global configurations are dynamic and do not require restarting the management server.

There are three cases in which new lists are propagated to the agents:

  • Addition of a host
  • Connection or reconnection of an agent
  • A change on the ‘host’ or ‘indirect.agent.lb.algorithm’ configurations

Agents perspective

Agents receive the list of management servers, the algorithm and the check interval (if provided) and persist them on their agent.properties file as:

hosts=<list-of-comma-separated-management-servers>@<algorithm>

The first management server on the list is considered the preferred host. The check interval to check for the preferred host should be greater than 0, in which case it is persisted on agent.properties on the ‘host.lb.check.interval’ key. In case the interval is greater than 0 and the host which the agent is connected to is not the preferred host, the agent will attempt connection to the preferred host

When connection is established between an agent and a management server, the agent sends its list of management servers. The management server checks if the list the agent has is up to date, sending the updated list if it is outdated. This behaviour ensures that each agent should get the updated version of the list of management servers even after any failure.

Examples

Assuming a test environment consisting on:

  • 3 management servers: M1, M2 and M3
  • 4 KVM hosts: H1, H2, H3 and H4

The ‘host’ global configuration should be set to ‘M1,M2,M3’

If the CloudStack administrator wishes no load balancing between agents and management servers, it would set the ‘static’ algorithm as the ‘indirect.agent.lb.algorithm’ global configuration. Each agent receives the same list (M1,M2,M3), and will be connected to the same management server.

If the CloudStack administrator wishes to balance connections between agents and management servers, the ’roundrobin’ algorithm is recommended. In this case:

  • H1 receives the list (M1, M2, M3)
  • H2 receives the list (M2, M3, M1)
  • H3 receives the list (M3, M1, M2)
  • H4 receives the list (M1, M2, M3)

There is also a ‘shuffle’ algorithm, in which the list is randomized before being sent to any agent. With this algorithm, the CloudStack administrator has no control of the load balancing so it is not recommened production use at the moment.

Combined with the algorithm, the CloudStack administrator can also set the ‘indirect.agent.lb.check.interval’ global configuration to ‘X’. This ensures that each X seconds, every agent will check if the management server they are connected to is the same as the first element of their list (preferred host). If there is a mismatch, the agent will attempt connecting to the preferred host.

About the author

Nicolas Vazquez is a Senior Software Engineer at ShapeBlue, the Cloud Specialists, and is a committer in the Apache CloudStack project. Nicolas spends his time designing and implementing features in Apache CloudStack.