Containing the Costs of Kubernetes: A Closer Look at the Metrics Agent Project

Apptio’s Metrics Agent collects information about a cluster’s workloads and send it to IBM Cloudability to produce accurate, meaningful cost allocations.

At last month’s KubeCon Europe event in Amsterdam, over 10,000 attendees gathered to discuss the increasingly popular computing platform, Kubernetes (K8s). K8s is a widely used container orchestration platform that allows companies to quickly deploy their applications at scale, often to managed services provided by the major cloud vendors. However, the simplicity of deploying containers and their inherently dynamic nature can make the associated cloud costs difficult to manage. This is where the IBM Cloudability feature, Container Cost Allocation, comes in. This feature gives customers critical information about their containerized cloud infrastructure, including surfacing the exact cost of each K8s cluster and the contribution of all underlying workloads. In order for Cloudability to provide such insights, customers deploy the Metrics Agent on their K8s clusters. The Metrics Agent is an opensource project maintained by Apptio that is designed to collect information about a cluster’s workloads and send it to Cloudability to then produce accurate, business-meaningful cost allocations.

container cost allocation raw - Containing the Costs of Kubernetes: A Closer Look at the Metrics Agent Project - Apptio

The Cloudability Metrics Agent is deployed via a public Docker image that is published on Docker Hub (https://hub.docker.com/r/cloudability/metrics-agent). This enables customers to easily deploy the Agent on their clusters and start collecting data about their K8s environment. As the Agent is open source, customers can also take a look at what is “under the hood” on our GitHub repository: https://github.com/cloudability/metrics-agent. The Metrics Agent also supports a K8s package manager called “Helm”. This allows customers who use Helm to manage the workloads on their clusters to have a quick and easy way to deploy our Agent. Instructions on deploying the Metrics-Agent using Helm can be found in the README file in the GitHub repository.

The Metrics Agent architecture

Metrics Agent V1 outlines - Containing the Costs of Kubernetes: A Closer Look at the Metrics Agent Project - Apptio

Overview

The Metrics Agent is provisioned to each K8s cluster that requires cost allocation. By default, the Metrics Agent is deployed to the ‘cloudability’ namespace.

The Metrics Agent collects two types of data:

  • K8s cluster resource objects. This includes deployments, pods, nodes, namespaces, services, jobs, cronjobs, replica sets, services, replication controllers, persistent volumes, persistent volume claims, daemon sets, etc.
  • Node summary metrics. These metrics include utilization information related to each of the containers running on the node and the node itself. This includes CPU, memory, data transfer, and volume usage information.

The Metrics Agent stores the collected metrics on the local disk and periodically uploads them to Cloudability.

Data collection

The data collected by the Metrics Agent includes both container utilization metrics and K8s resource objects. In order to collect utilization metrics, the Agent communicates with each node’s kubelet and queries the stats-summary endpoint. The following code snippet shows the API call to this endpoint.

/stats/summary (json)kubectl get --raw "/api/v1/nodes/kind-worker/proxy/stats/summary"

This request is made by the Agent every three mins (configurable). The data is then stored on the Agent’s filesystem to be uploaded in 10-minute intervals to Cloudability.

As for the K8s object labels data — labels are particularly useful for cost allocation — this is collected using informers (provided by the k8s.io client-go project). The Agent’s informers are an in-memory data structure that contains a list of each of the K8s objects. The Agent periodically retrieves the state of these lists and stores them on the filesystem before each upload interval. Implementing informers allows us to reduce the packet request size to the cluster’s API server, which helps with performance. The following code snippet shows how to initiate informers to collect K8s resource objects.

factory := informers.NewSharedInformerFactory(clientset, time.Duration(resyncInterval)*time.Hour) 

replicationControllerInformer := factory.Core().V1().ReplicationControllers().Informer()
servicesInformer := factory.Core().V1().Services().Informer()
nodesInformer := factory.Core().V1().Nodes().Informer()
podsInformer := factory.Core().V1().Pods().Informer()
persistentVolumesInformer := factory.Core().V1().PersistentVolumes().Informer()
persistentVolumeClaimsInformer := factory.Core().V1().PersistentVolumeClaims().Informer()
namespacesInformer := factory.Core().V1().Namespaces().Informer()
replicasetsInformer := factory.Apps().V1().ReplicaSets().Informer()
daemonsetsInformer := factory.Apps().V1().DaemonSets().Informer()
deploymentsInformer := factory.Apps().V1().Deployments().Informer()
jobsInformer := factory.Batch().V1().Jobs().Informer()

 

Metrics Agent Design outlines - Containing the Costs of Kubernetes: A Closer Look at the Metrics Agent Project - Apptio

Data upload

In order for the Agent to properly upload data to Cloudability, customers must configure their cluster network settings to allow these requests.

To upload data to Cloudability, the Agent makes two outgoing HTTP requests:

The first request is made to https://metrics-collector.cloudability.com/ to gather a pre-signed S3 URL. The request is authenticated using the customer’s API key and returns the specific Apptio S3 location for the customer. Example request:

curl -H "token: " \
-H "x-api-key: " \
-H "x-cluster-uid: " \
-H "x-upload-file: " \
-H "Content-Type: application/json" \
-H "x-agent-version: 2.11.7" \
-H "User-Agent: cldy-client/2.11.7" \
-X POST https://metrics-collector.cloudability.com/metricsample 

### SUCCESS RESPONSE ###
{"location":
"https://cldy-cake-pipeline.s3.amazonaws.com/production/data/metrics-agent/
XXXX/XXXX/XX/XX//XXXX--XXXXXXXX-XX-XX.tgz?
AWSAccessKeyId=XXXXX&Signature=XXXXXXX&content-type=multipart%2Fform-
data&content-md5= filehash&x-amz-security-
token=XXXXXXXXXXXXXXXXXXXXXXExpires=XXXX"}

The second request is a HTTP PUT to the S3 location returned by the previous request.

curl -H "Content-Type: multipart/form-data" \
-H "Content-MD5: <MD5_hash_of_file_contents>" \
-v --upload-file <file_location> "<S3_location_URL>"

### SUCCESS RESPONSE ###
.... HTTP 200 - OK ...

Configuration options for the Metrics Agent

The Metrics Agent has several configurable options through the use of environment variables. These configurable options are listed in the README. Some of the more common environment variables that customers configure when deploying the Agent include:

  • CLOUDABILITY_API_KEY: The customer’s provisioned API key that is used to obtain a valid pre-signed S3 URL
  • CLOUDABILITY_CLUSTER_NAME: The name of the cluster which will be displayed in the Cloudability UI
  • CLOUDABILITY_OUTBOUND_PROXY: URL of an outbound proxy the cluster uses, allowing the Agent’s requests to successfully execute

Contributing to the Metrics Agent development

The Cloudability Metrics Agent is an essential component to helping customers who are expanding their K8s adoption manage and optimize their costs. Apptio will continue to invest in developing new features to deliver more capabilities for our customers through this open source project.

The Metrics Agent is available in Docker Hub and GitHub. We welcome users to make requests or directly contribute to its feature development.

Additional Resources