CloudStack usage service deep dive

Introduction

CloudStack usage is a complimentary service which tracks end user consumption of CloudStack resources and summarises this in a separate database for reporting or billing. The usage database can be queried directly, through the CloudStack API, or it can be integrated into external billing or reporting systems.

For background information on the usage service please refer to the CloudStack documentation set:

In this blog post we will go a step further and deep dive into how the usage service works, how you can run usage reports from the database either directly or through the API, and also how to troubleshoot this.

Please note – in this blog post we will be discussing the underlying database structure for the CloudStack management and usage services. Whilst these have separate databases they do in some cases share table names – hence please note the databases referenced throughout – e.g. cloud.usage_event versus cloudstack_usage.usage_event, etc.

Configuration

Installation

As per the official CloudStack documentation the usage service is simply installed and started. In CentOS/RHEL this is done as follows:

# yum install cloudstack-usage
# chkconfig cloudstack-usage on
# service cloudstack-usage on

whilst on a Debian/Ubuntu server:

# apt-get install cloudstack-usage
# update-rc.d cloudstack-usage defaults
# service cloudstack-usage on

Once configure the usage service will use the same MySQL connection details as the main CloudStack management service. This is automatically added when the management service is configured with the “cloudstack-setup-databases” script (refer to http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.9/management-server/index.html). The usage service installation simply adds a symbolic link to the same db.properties file as is used by cloudstack-management:

 
# ls -l /etc/cloudstack/usage/ total 4 
lrwxrwxrwx. 1 root root 40 Sep 8 08:18 db.properties > /etc/cloudstack/management/db.properties 
lrwxrwxrwx. 1 root root 30 Sep 8 08:18 key > /etc/cloudstack/management/key 
-rw-r--r--. 1 root root 2968 Jul 12 10:36 log4j-cloud.xml 

Please note whilst the cloudstack-usage and cloudstack-management service share the same db.properties configuration file this will still contain individual settings for each service:

 
# grep -i usage /etc/cloudstack/usage/db.properties
db.usage.maxActive=100
# usage database tuning parameters
db.usage.maxWait=10000
db.usage.maxIdle=30
db.usage.name=cloud_usage
db.usage.port=3306
# usage database settings
db.usage.failOverReadOnly=false
db.usage.host=(Usage DB host IP address)
db.usage.password=ENC(Encrypted password)
db.usage.initialTimeout=3600
db.usage.username=cloud
db.usage.autoReconnect=true
db.usage.url.params=
db.usage.driver=jdbc:mysql
#usage Database
db.usage.reconnectAtTxEnd=true
db.usage.queriesBeforeRetryMaster=5000
db.usage.slaves=localhost,localhost
db.usage.autoReconnectForPools=true
db.usage.secondsBeforeRetryMaster=3600

Note the above settings would need changed if:

  • the usage DB is installed on a different MySQL server than the main CloudStack database
  • if the usage database is using a different set of login credentials

Also note that the passwords in the file above are encrypted using the method specified during the “cloudstack-setup-databases” script run – hence this also uses the referenced “key” file as shown in the above folder listing.

Application settings

Once installed the usage service is configured with the following global settings in CloudStack:

  • enable.usage.server:
    • Switches usage service on/off
    • true|false
  • usage.aggregation.timezone:
    • Timezone used for usage aggregation.
    • Refer to http://docs.cloudstack.apache.org/en/latest/dev.html for formatting.
    • Defaults to “GMT”.
  • usage.execution.timezone:
    • Timezone for usage job execution.
    • Refer to http://docs.cloudstack.apache.org/en/latest/dev.html for formatting.
  • usage.sanity.check.interval:
    • Interval (in days) to check sanity of usage data.
  • usage.snapshot.virtualsize.select:
    • Set the value to true if snapshot usage need to consider virtual size, else physical size is considered.
    • true|false – defaults to false.
  • usage.stats.job.aggregation.range:
    • The range of time for aggregating the user statistics specified in minutes (e.g. 1440 for daily, 60 for hourly. Default is 60 minutes).
    • Please note this setting would be changed in a chargeback situation where VM resources are charged on an hourly/daily/monthly basis.
  • usage.stats.job.exec.time:
    • The time at which the usage statistics aggregation job will run as an HH:MM time, e.g. 00:30 to run at 12:30am.
    • Default is 00:15.
    • Please note this time follows the setting in usage.execution.timezone above.

Please note – if any of these settings are updated then only the cloudstack-usage service needs restarted (i.e. there is no need to restart cloudstack-management).

Usage types

To track the resources utilised in CloudStack every API call where a resource is created, destroyed, stopped, started, requested and released are tracked in the cloud.usage_event table. This table has entries for every event since the start of the CloudStack instance creation, hence may grow to become quite big.

During processing every event in this table are assigned a usage type. The usage types are listed in the CloudStack documentation http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/4.9/usage.html#usage-types, or it can simply be queried using the CloudStack “listUsagetypes” API call:


# cloudmonkey list usagetypes
count = 19
usagetype:
+-------------+-----------------------------------------+
| usagetypeid | description                             |
+-------------+-----------------------------------------+
|  1          |  Running Vm Usage                       |
|  2          |  Allocated Vm Usage                     |
|  3          |  IP Address Usage                       |
|  4          |  Network Usage (Bytes Sent)             |
|  5          |  Network Usage (Bytes Received)         |
|  6          |  Volume Usage                           |
|  7          |  Template Usage                         |
|  8          |  ISO Usage                              |
|  9          |  Snapshot Usage                         |
| 10          |  Security Group Usage                   |
| 11          |  Load Balancer Usage                    |
| 12          |  Port Forwarding Usage                  |
| 13          |  Network Offering Usage                 |
| 14          |  VPN users usage                        |
| 21          |  VM Disk usage(I/O Read)                |
| 22          |  VM Disk usage(I/O Write)               |
| 23          |  VM Disk usage(Bytes Read)              |
| 24          |  VM Disk usage(Bytes Write)             |
| 25          |  VM Snapshot storage usage              |
+-------------+-----------------------------------------+

Please note these usage types are calculated depending on the nature of resource used, e.g.:

  • “Running VM usage” will simply count the hours a single VM instance is used.
  • “Volume usage” will however track both the size of each volume in addition to the time utilised.

Process flow

Overview

From a high level point of view the usage service processes data already generated by the CloudStack management service, copies this to the cloud_usage database before processing and aggregating the data in the cloud_usage.cloud_usage database:

 

Details

Using a running VM instance as example the data process flow is as follows.

Usage_event table entries

CloudStack management writes all events to the cloud.usage_event table. This happens whether the cloudstack-usage service is running or not.

In this example we will track the VM with instance ID 17. The resource tracked – be it a VM, a volume, a port forwarding rule , etc. – is listed in the usage_event table as “resource_id”, which points to the main ID field in the vm_instance, volume tables etc.


SELECT 
   * 
FROM 
   cloud.usage_event 
WHERE
   type like '%VM%' and resource_id=17;

idtypeaccount_idcreatedzone_idresource_idresource_nameoffering_idtemplate_idsizeresource_typeprocessedvirtual_size
68VM.CREATE62017-09-08 11:14:31117bbannervm12175NULLXenServer0NULL
70VM.START62017-09-08 11:14:41117bbannervm12175NULLXenServer0NULL
123VM.STOP62017-09-26 13:44:48117bbannervm12175NULLXenServer0NULL
125VM.DESTROY62017-09-26 13:45:00117bbannervm12175NULLXenServer0NULL

Please note: a lot of the resources will obviously still be in use – i.e. they will not have a destroy/release entry. In this case the usage service considers the end date to be open, i.e. all calculations are up until today.

Usage_event copy

When the usage job runs (at “usage.stats.job.exec.time”) it first copies all new entries since the last processing time from the cloud.usage_event table to the cloud_usage.usage_event table.

The only difference between the two tables is the “processed” column – in the cloud database this is always set to 0 – nil, however once the table entry is processed in the cloud_usage database this field is updated to 1.

In comparison – the entries in the cloud database:


SELECT 
   * 
FROM
   cloud.usage_event 
WHERE
   id > 130;
idtypeaccount_idcreatedzone_idresource_idresource_nameoffering_idtemplate_idsizeresource_typeprocessedvirtual_size
131VOLUME.CREATE62017-09-26 13:45:44131bbannerdata36NULL2147483648NULL0NULL
132NET.IPASSIGN62017-09-26 13:46:0511710.1.34.77NULL00VirtualNetwork0NULL
133VM.STOP82017-09-28 10:31:44123secretprojectvm1175NULLXenServer0NULL
134NETWORK.OFFERING.REMOVE82017-09-28 10:31:44123418NULL0NULL0NULL

Compared to the same entries in cloud_usage:


SELECT 
   * 
FROM 
   cloud_usage.usage_event
WHERE 
   id > 130;
idtypeaccount_idcreatedzone_idresource_idresource_nameoffering_idtemplate_idsizeresource_typeprocessedvirtual_size
131VOLUME.CREATE62017-09-26 13:45:44131bbannerdata36NULL2147483648NULL1NULL
132NET.IPASSIGN62017-09-26 13:46:0511710.1.34.77NULL00VirtualNetwork1NULL
133VM.STOP82017-09-28 10:31:44123secretprojectvm1175NULLXenServer1NULL
134NETWORK.OFFERING.REMOVE82017-09-28 10:31:44123418NULL0NULL1NULL

Account copy

As part of this copy job the cloudstack-usage service will also make a copy of some of the columns in the cloud.account table such that a ownership of resources can be easily established during processing.

Usage summary and helper tables

In the first usage aggregation step all usage data per account and per usage type is summarised in helper tables. Continuing the example above the CREATE+DESTROY events as well as the VM START+STOP events are summarised in the “usage_vm_instance” table:


SELECT
   *
FROM
   cloud_usage.usage_vm_instance 
WHERE
   vm_instance_id=17;

usage_typezone_idaccount_idvm_instance_idvm_nameservice_offering_idtemplate_idhypervisor_typestart_dateend_datecpu_speedcpu_coresmemory
11617bbannervm12175XenServer2017-09-08 11:14:412017-09-26 13:44:48NULLNULLNULL
21617bbannervm12175XenServer2017-09-08 11:14:312017-09-26 13:45:00NULLNULLNULL

Note the helper table has now summarised the data with the usage type mentioned above – and the start/end dates are contained in the same database row.

Please note – if a resource is still in use then the end date simply isn’t populated, i.e. all calculations will work on rolling end date of today.

If we now also compare the volume used by VM instance ID 17 we find this in the cloud_usage.usage_volume helper table:

SELECT
 usage_volume.*
FROM
 cloud_usage.usage_volume
LEFT JOIN
 cloud.volumes ON (usage_volume.id = volumes.id)
WHERE
 cloud.volumes.instance_id = 17;
idzone_idaccount_iddomain_iddisk_offering_idtemplate_idsizecreateddeleted
18162NULL5214748364802017-09-08 11:14:312017-09-26 13:45:00

As the database selects above show – each helper table will contain only the information pertinent to that specific usage type, hence the cloud_usage.usage_vm_instance contains information about VM service offering, template and hypervisor type the cloud_usage.usage_volume contains information about disk offering ID, template ID and size.

If a usage type for a resource has been started/stopped or requested/released multiple times then each period of use will be listed in the helper tables:


SELECT 
   * 
FROM
   cloud_usage.usage_vm_instance
WHERE
   vm_instance_id=12;

usage_typezone_idaccount_idvm_instance_idvm_nameservice_offering_idtemplate_idhypervisor_typestart_dateend_datecpu_speedcpu_coresmemory
11612bbannervm2175XenServer2017-09-08 09:30:372017-09-08 09:30:49NULLNULLNULL
11612bbannervm2175XenServer2017-09-08 11:14:03NULLNULLNULLNULL
21612bbannervm2175XenServer2017-09-08 09:30:20NULLNULLNULLNULL

Usage data aggregation

Once all helper tables have been populated the usage service now creates time aggregated database entries in the cloud_usage.cloud_usage table. In all simplicity this process:

  1. Analyses all entries in the helper tables.
  2. Splits up this data based on “usage.stats.job.aggregation.range” to create individual usage timeblocks.
  3. Repeats this process for all accounts and for all resources.

So – looking at the VM with ID=17 analysed above:

  • This had a running start date of 2017-09-08 11:14:41, an end date of 2017-09-26 13:44:48.
  • The usage service is set up with usage.stats.job.aggregation.range=1440, i.e. 24 hours.
  • The usage service will now create entries in the cloud_usage.cloud_usage table for every full and partial 24 hour period this VM was running.

SELECT 
   *
FROM
   cloud_usage.cloud_usage
WHERE 
   usage_id=17 and usage_type=1;

idzone_idaccount_iddomain_iddescriptionusage_displayusage_typeraw_usagevm_instance_idvm_nameoffering_idtemplate_idusage_idtypesizenetwork_idstart_dateend_datevirtual_sizecpu_speedcpu_coresmemoryquota_calculated
64162bbannervm12 running time (ServiceOffering: 17) (Template: 5)12.755278 Hrs112.75527763366699217bbannervm1217517XenServerNULLNULL2017-09-08 00:00:002017-09-08 23:59:59NULLNULLNULLNULL0
146162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-09 00:00:002017-09-09 23:59:59NULLNULLNULLNULL0
221162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-10 00:00:002017-09-10 23:59:59NULLNULLNULLNULL0
.....................................................................
1271162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-24 00:00:002017-09-24 23:59:59NULLNULLNULLNULL0
1346162bbannervm12 running time (ServiceOffering: 17) (Template: 5)24 Hrs12417bbannervm1217517XenServerNULLNULL2017-09-25 00:00:002017-09-25 23:59:59NULLNULLNULLNULL0
1427162bbannervm12 running time (ServiceOffering: 17) (Template: 5)13.746667 Hrs113.7466669082641617bbannervm1217517XenServerNULLNULL2017-09-26 00:00:002017-09-26 23:59:59NULLNULLNULLNULL0

Since all of these entries are split into specific dates it is now relatively straight forward to run a report to capture all resource usage for an account over a specific time period, e.g. if a monthly bill is required.

Querying usage data through the API

The usage records can also be queried through the API by using the “listUsagerecords” API call. This uses similar syntax to the above – but there are some differences:

  • The API call requires start and end dates, these are in a “yyyy-MM-dd HH:mm:ss” or simply a “yyyy-MM-dd” format.
  • The usage type is same as above, e.g. type=1 for running VMs.
  • Usage ID is however the UUID attached to the resource in question, e.g. in the following example VM ID 17 actually has UUID 4358f436-bc9b-4793-b1be-95fa9b074fd5 in the vm_instance table.
  • The API call can also be filtered for account/accountid/domain.

More information on the syntax can be found in http://cloudstack.apache.org/api/apidocs-4.9/apis/listUsageRecords.html .

The following API query will list the first three day’s worth of usage data listed in the table above:


# cloudmonkey list usagerecords type=1 startdate=2017-09-09 enddate=2017-09-10 usageid=4358f436-bc9b-4793-b1be-95fa9b074fd5
count = 3
usagerecord:
+-----------------------------+---------+--------------------------------------+-----------------------------+--------------------------------------------------------------+-------------+--------------------------------------+--------------------------------------+-----------+------------+--------------------------------------+----------+--------------------------------------+---------------+--------------------------------------+-----------+--------------------------------------+
| startdate                   | account | domainid                             | enddate                     | description                                                  | name        | virtualmachineid                     | offeringid                           | usagetype | domain     | zoneid                               | rawusage | templateid                           | usage         | usageid                              | type      | accountid                            |
+-----------------------------+---------+--------------------------------------+-----------------------------+--------------------------------------------------------------+-------------+--------------------------------------+--------------------------------------+-----------+------------+--------------------------------------+----------+--------------------------------------+---------------+--------------------------------------+-----------+--------------------------------------+
| 2017-09-08'T'00:00:00+00:00 | bbanner | f3501b29-01f7-44ce-a266-9e3f12c17394 | 2017-09-08'T'23:59:59+00:00 | bbannervm12 running time (ServiceOffering: 17) (Template: 5) | bbannervm12 | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | 60d9aaf1-7ff7-472e-b29f-6768d0cb5702 | 1         | Subdomain1 | d4b9d32e-d779-48b8-814d-d7847d55a684 | 12.755278| 47dd8c98-946e-11e7-b419-0666ae010714 | 12.755278 Hrs | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | XenServer | 8c2d592f-78e1-4e92-a910-1e4b865240cf |
| 2017-09-09'T'00:00:00+00:00 | bbanner | f3501b29-01f7-44ce-a266-9e3f12c17394 | 2017-09-09'T'23:59:59+00:00 | bbannervm12 running time (ServiceOffering: 17) (Template: 5) | bbannervm12 | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | 60d9aaf1-7ff7-472e-b29f-6768d0cb5702 | 1         | Subdomain1 | d4b9d32e-d779-48b8-814d-d7847d55a684 | 24       | 47dd8c98-946e-11e7-b419-0666ae010714 | 24 Hrs        | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | XenServer | 8c2d592f-78e1-4e92-a910-1e4b865240cf |
| 2017-09-10'T'00:00:00+00:00 | bbanner | f3501b29-01f7-44ce-a266-9e3f12c17394 | 2017-09-10'T'23:59:59+00:00 | bbannervm12 running time (ServiceOffering: 17) (Template: 5) | bbannervm12 | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | 60d9aaf1-7ff7-472e-b29f-6768d0cb5702 | 1         | Subdomain1 | d4b9d32e-d779-48b8-814d-d7847d55a684 | 24       | 47dd8c98-946e-11e7-b419-0666ae010714 | 24 Hrs        | 4358f436-bc9b-4793-b1be-95fa9b074fd5 | XenServer | 8c2d592f-78e1-4e92-a910-1e4b865240cf |
+-----------------------------+---------+--------------------------------------+-----------------------------+--------------------------------------------------------------+-------------+--------------------------------------+--------------------------------------+-----------+------------+--------------------------------------+----------+--------------------------------------+---------------+--------------------------------------+-----------+--------------------------------------+

Analysing and reporting on usage data

The usage data can be analysed in any reporting tool – from the various CloudStack billing platforms, to enterprise billing systems as well as simpler tools like Excel. Since the cloud_usage.cloud_usage data is fully aggregated into time utilised blocks, it is now just a question of summarising data based on usage type, accounts, service offerings, etc.

The following SQL queries are provided as examples only – in a real use case these will most likely require to be changed and refined to the specific reporting requirements.

Running VMs

To find usage data for all running VMs run during the month of September we search for usage type=1 and group by vm_instance. For a VM instance we summarise how many hours each VM has been running – however in a real billing scenario this would most likely also be broken down into e.g. how many hours of VM usage has been utilised per VM service offering.


SELECT
   account_id,
   account_name,
   usage_type,
   offering_id,
   vm_instance_id,
   vm_name,
   SUM(raw_usage) as VMRunHours
FROM
   cloud_usage.cloud_usage
LEFT JOIN 
   cloud_usage.account on (cloud_usage.account_id = account.id)
WHERE 
   start_date LIKE '2017-09%' 
   AND usage_type = 1
GROUP BY 
   vm_instance_id
ORDER BY 
   account_id ASC, vm_instance_id ASC;
account_idaccount_nameusage_typeoffering_idvm_instance_idvm_nameVMRunHours
2admin113rootvm13.0205559730529785
2admin11720rootvm2539.7991666793823
4pparker1175pparkervm1542.5497226715088
4pparker11714pparkervm50.26527804136276245
4pparker11715pparkervm70.2247224897146225
4pparker11716pparkervm16540.774167060852
4pparker11722ppvpcvm1000539.7311105728149
5ckent1177ckentvm15.246944904327393
5ckent1179ckentvm2435.4169445037842
5ckent11718ckentvm230.8186113834381104
5ckent11725ckentvm30106.28194522857666
6bbanner11710bbannervm11.7469446659088135
6bbanner11712bbannervm2540.7691669464111
6bbanner11717bbannervm12434.50194454193115
6bbanner11726bbannervm30106.24055576324463
8PrjAcct-SecretProject-111723secretprojectvm1477.4819440841675

Network utilisation

The following will summarise network usage for sent (usage type=4) and received (usage type=5) traffic on a per account basis, again this is listing for the month of September.

For network utilisation the usage is simply summarised as total Bytes sent or received:


SELECT
   account_id,
   account_name,
   usage_type,
   network_id,
   SUM(raw_usage) as TotalBytes
FROM
   cloud_usage.cloud_usage
LEFT JOIN 
   cloud_usage.account on (cloud_usage.account_id = account.id)
WHERE 
   start_date LIKE '2017-09%' 
   AND usage_type in (4,5)
GROUP BY 
   account_id, usage_type
ORDER BY 
   account_id ASC;
account_idaccount_nameusage_typenetwork_idTotalBytes
2admin4204391320
2admin52041744
4pparker4200164764260
4pparker5200163779643
5ckent4206391500
5ckent52060
6bbanner4207776700
6bbanner52070
8PrjAcct-SecretProject-14211343080
8PrjAcct-SecretProject-152110

Volume utilisation

For volume or general storage utilisation (applies to snapshots as well) the usage is calculated as storage hours – e.g. GbHours. In this example we again summarise for all volumes (usage type=6) on a per account and disk basis during the month of September. Please note in this case we have to do multiple joins (or nested WHERE statements) to look up volume IDs, VM name, etc.


SELECT
   cloud_usage.cloud_usage.account_id,
   cloud_usage.account.account_name,
   cloud_usage.cloud_usage.usage_type,
   cloud_usage.cloud_usage.usage_id,
   cloud.vm_instance.name as Instance_Name,
   cloud.volumes.name as Volume_Name,
   cloud_usage.cloud_usage.size/(1024*1024*1024) as DiskSizeGb,
   SUM(cloud_usage.cloud_usage.raw_usage) as TotalHours,
   sum(cloud_usage.cloud_usage.raw_usage*cloud_usage.cloud_usage.size/(1024*1024*1024)) as GbHours
FROM
   cloud_usage.cloud_usage
LEFT JOIN 
   cloud_usage.account on (cloud_usage.account_id = account.id)
LEFT JOIN
   cloud.volumes on (cloud_usage.usage_id = volumes.id)
LEFT JOIN 
   cloud.vm_instance on (cloud.volumes.instance_id = cloud.vm_instance.id)
WHERE 
   start_date LIKE '2017-09%' AND usage_type = 6
GROUP BY 
   usage_id
ORDER BY 
   account_id ASC, usage_id ASC;

account_idaccount_nameusage_typeusage_idInstance_NameVolume_NameDiskSizeGbTotalHoursGbHours
2admin63rootvm1ROOT-320.0000542.883610725402810857.672214508057
2admin623rootvm2ROOT-2020.0000539.803333282470710796.066665649414
4pparker65pparkervm1ROOT-520.0000542.649444580078110852.988891601562
4pparker615pparkervm5ROOT-1420.0000541.044167518615710820.883350372314
4pparker616pparkervm7ROOT-1520.00000.22916693985462194.583338797092438
4pparker617pparkervm16ROOT-1620.0000540.777222633361810815.544452667236
4pparker625ppvpcvm1000ROOT-2220.0000539.735555648803710794.711112976074
5ckent67ckentvm1ROOT-720.0000436.33611202239998726.722240447998
5ckent69ckentvm2ROOT-920.0000542.558610916137710851.172218322754
5ckent620ckentvm23ROOT-1820.0000434.362777709960948687.255554199219
5ckent622NULLckentdata12.0000540.6513881683351081.30277633667
5ckent629ckentvm30ROOT-2520.0000106.286389350891112125.7277870178223
6bbanner610bbannervm1ROOT-1020.00001.77138912677764935.42778253555298
6bbanner612bbannervm2ROOT-1220.0000542.494444847106910849.888896942139
6bbanner613bbannervm2bbannerdatadisk12.0000542.3058328628541084.611665725708
6bbanner618bbannervm12ROOT-1720.0000434.50805568695078690.161113739014
6bbanner619bbannervm2bbannerdata25.0000540.75361156463622703.768057823181
6bbanner630bbannervm30ROOT-2620.0000106.244722366333012124.89444732666
6bbanner631bbannervm30bbannerdata32.0000106.23777770996094212.47555541992188
8PrjAcct-SecretProject-1626secretprojectvm1ROOT-2320.0000538.97583293914810779.516658782959
8PrjAcct-SecretProject-1628secretprojectvm1secretprojectdata12.0000538.75250053405761077.5050010681152

 

IP addresses, port forwarding rules and VPN users

For other usage types where – similar to VM running hours – we simply report on the total hours utilised we again summarise the raw_usage, but since the description in cloud_usage.cloud.usage is clear enough we don’t need to go looking elsewhere for this information. In the following example we report on IP address usage (usage type=3), port forwarding rules (12) and VPN users (14):


SELECT
   cloud_usage.cloud_usage.account_id,
   cloud_usage.account.account_name,
   cloud_usage.cloud_usage.usage_type,
   cloud_usage.cloud_usage.usage_id,
   cloud_usage.cloud_usage.description,
   SUM(cloud_usage.cloud_usage.raw_usage) as TotalHours
FROM
   cloud_usage.cloud_usage
LEFT JOIN
   cloud_usage.account on (cloud_usage.account_id = account.id)
WHERE
   start_date LIKE '2017-09%' AND usage_type in (3,12,14)
GROUP BY 
   description
ORDER BY
   account_id ASC, usage_id ASC;

 

account_idaccount_nameusage_typeusage_iddescriptionTotalHours
2admin33IPAddress: 10.1.34.63542.8833332061768
4pparker34IPAddress: 10.1.34.64542.648889541626
4pparker313IPAddress: 10.1.34.73539.7686109542847
5ckent35IPAddress: 10.1.34.65542.6322221755981
5ckent36IPAddress: 10.1.34.66542.5547218322754
5ckent37IPAddress: 10.1.34.67542.5541667938232
5ckent310IPAddress: 10.1.34.70540.6561107635498
5ckent311IPAddress: 10.1.34.71540.2247219085693
5ckent312IPAddress: 10.1.34.72540.0552778244019
5ckent316IPAddress: 10.1.34.76106.27805614471436
6bbanner141VPN User: bbannervpn1, Id: 1 usage time542.4766664505005
6bbanner142VPN User: brucesdogvpn1, Id: 2 usage time1.7355557680130005
6bbanner143VPN User: bruceswifevpn1, Id: 3 usage time540.7405557632446
6bbanner144VPN User: stanleevpn1, Id: 4 usage time540.7180547714233
6bbanner38IPAddress: 10.1.34.68542.529444694519
6bbanner129Port Forwarding Rule: 9 usage time1.6469446420669556
6bbanner39IPAddress: 10.1.34.69542.4852771759033
6bbanner317IPAddress: 10.1.34.77106.2319450378418
8PrjAcct-SecretProject-1314IPAddress: 10.1.34.74538.9755554199219
8PrjAcct-SecretProject-1315IPAddress: 10.1.34.75538.7594442367554

Troubleshooting

Service management

As described earlier in this blog post the usage job will run at a time specified in the usage.stats.job.exec.time global setting.

Once the job has ran it will update its own internal database with the run time and the start/end times processed:


SELECT * FROM cloud_usage.usage_job;

 

idhostpidjob_typescheduledstart_millisend_millisexec_timestart_dateend_datesuccessheartbeat
1acshostname/192.168.10.1023589001504828800000150491519999920722017-09-08 00:00:002017-09-08 23:59:5912017-09-09 00:14:53
2acshostname/192.168.10.102358900150491520000015050015999996072017-09-09 00:00:002017-09-09 23:59:5912017-09-10 00:14:53
3acshostname/192.168.10.102358900150500160000015050879999995362017-09-10 00:00:002017-09-10 23:59:5912017-09-11 00:14:53
4acshostname/192.168.10.102358900150508800000015051743999995032017-09-11 00:00:002017-09-11 23:59:5912017-09-12 00:14:53
5acshostname/192.168.10.102358900150517440000015052607999995092017-09-12 00:00:002017-09-12 23:59:5912017-09-13 00:14:53

A couple of things to note on this lists:

  • Start_millis and end_millis simply list the epoch timestamp in start_date and end_date. The epoch time is used by the usage service to determine cloud_usage.cloud_usage entries.
  • Exec_time will list how long the usage job ran for. This is useful in cases where the usage job processing time is longer than 24 hours – i.e. where usage job schedules may start overlapping.
  • The success field is set to 1 for success, 0 for failure.
  • Heartbeat lists when the job was ran.

When the cloudstack-usage service is restarted this will run checks against the usage_jobs table to determine:

  • If the last scheduled job was ran. If this wasn’t done the job is ran again, i.e. a service startup will run a single missed job.
  • Thereafter the usage job will run at its normal scheduled time.

Usage troubleshooting – general advice

Since this blog post covers topics around adding/updating/removing entries in the cloud and cloud_usage databases we always advise CloudStack users to take MySQL dumps of both databases before doing any work – whether this directly in MySQL or via the usage API calls. 

Database inconsistencies

Under certain circumstances (e.g. if the cloudstack-management service crashes) the cloud.usage_event table may have inconsistent entries, e.g.:

  • STOP entries without a START entry, or DESTROY entries without a CREATE.
  • Double entries – i.e. a VM has two START entries.

The usage logs will show where these failures occur. The fix for these issues is to add/delete entries as required in the cloud.usage_event table, e.g. add a VM.START with date stamp if missing and so on.

Usage service logs

The usage service writes all logs to /var/log/cloudstack/usage/usage.log. These logs are relatively verbose and will outline all actions performed during the usage job:


DEBUG [usage.parser.IPAddressUsageParser] (Usage-Job-1:null) (logid:) Parsing IP Address usage for account: 2
DEBUG [usage.parser.IPAddressUsageParser] (Usage-Job-1:null) (logid:) Total usage time 86400000ms
DEBUG [usage.parser.IPAddressUsageParser] (Usage-Job-1:null) (logid:) Creating IP usage record with id: 3, usage: 24, startDate: Tue Oct 10 00:00:00 UTC 2017, endDate: Tue Oct 10 23:59:59 UTC 2017, for account: 2
DEBUG [usage.parser.VPNUserUsageParser] (Usage-Job-1:null) (logid:) Parsing all VPN user usage events for account: 2
DEBUG [usage.parser.VPNUserUsageParser] (Usage-Job-1:null) (logid:) No VPN user usage events for this period
DEBUG [usage.parser.VMSnapshotUsageParser] (Usage-Job-1:null) (logid:) Parsing all VmSnapshot volume usage events for account: 2
DEBUG [usage.parser.VMSnapshotUsageParser] (Usage-Job-1:null) (logid:) No VM snapshot usage events for this period
DEBUG [usage.parser.VMInstanceUsageParser] (Usage-Job-1:null) (logid:) Parsing all VMInstance usage events for account: 3
DEBUG [usage.parser.NetworkUsageParser] (Usage-Job-1:null) (logid:) Parsing all Network usage events for account: 3
DEBUG [usage.parser.VmDiskUsageParser] (Usage-Job-1:null) (logid:) Parsing all Vm Disk usage events for account: 3

Housekeeping of cloud_usage table

To carry out housekeeping of the cloud_usage.cloud_usage table the “RemoveRawUsageRecords” API call can be used to delete all usage entries older than a certain number of dates. Note – since the cloud_usage table only contains completed parsed entries deleting anything from this table will not lead to inconsistencies – rather just cut down on the number of usage records being reported on.

More information can be found in http://cloudstack.apache.org/api/apidocs-4.9/apis/removeRawUsageRecords.html.

The following example deletes all usage records older than 5 days:


# cloudmonkey removeRawUsageRecords interval=5
success = true

Regenerating usage data

The CloudStack API also has a call for regenerating usage records – generateUsageRecords. This can be utilised to rerun the usage job in case of job failure. More information can be found in the CloudStack documentation – http://cloudstack.apache.org/api/apidocs-4.9/apis/generateUsageRecords.html.

Please note the comment on the above documentation page:  “This will generate records only if there any records to be generated, i.e. if the scheduled usage job was not run or failed”. In other words this API call should not be made ad-hoc apart from in this specific situation.


# cloudmonkey generateUsageRecords startdate=2017-09-01 enddate=2017-09-30
success = true

Quota service

Anyone looking through the cloud_usage database will notice a number of quota_* tables. These are not directly linked to the usage service itself, they are rather consumed by the Quota service. This service was created to monitor usage of CloudStack resources based on a per account credit limit and a per resource credit cost.

For more information on the Quota service please refer to the official CloudStack documentation / CloudStack wiki:

Conclusion

The CloudStack usage service can seem complicated for someone just getting started with it. We hope this blog post has managed to explain the background processes and how to get useful data out of the service.

We always value feedback – so if you have any comments or questions around this blog post please feel free to get in touch with the ShapeBlue team.

About The Author

Dag Sonstebo is a Cloud Architect at ShapeBlue, The Cloud Specialists. Dag spends his time designing, implementing and automating IaaS solutions based around Apache CloudStack.