This is the first part of my Azure blog series. In these posts, I would like to share how I use Azure services with my home servers and hopefully I can give you some practical examples how you can integrate on-premise infrastructure to Azure.
In this post I will use Telegraf to monitor CPU and disk usage of my home server and send the collected data to Azure Metrics. I will also setup an alert to send an email notification when a metric exceeds a threshold.
For this demonstration, I use my Orange Pi home server and run Telegraf in a container, but you can use any hardware with docker and docker-compose installed.
I will assume in this tutorial that you already have:
- an Azure account
- az cli installed on your developer pc
- docker-compose installed on your server
Let’s start with creating a new resource group.
$ az login ... $ az group create -l northeurope -g rg-custom-metrics-test Location Name ----------- ---------------------- northeurope rg-custom-metrics-test
First we need an Azure resource our metrics is reported for. I will create a new Application Insight resource for this purpose. (and later we will use the Availability test service to monitor our servers)
$ az extension add -n application-insights $ az monitor app-insights component create --app appis-custom-metrics-test --location westeurope -g rg-custom-metrics-test
We need to create a new Service principal to give permission for our Telegraf instance to publish metrics. This is a security identity in the Azure Active Directory and we can create one with the following command.
$ az ad sp create-for-rbac -n sp-custom-metrics-test --role "Monitoring Metrics Publisher" -o yaml appId: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX displayName: sp-custom-metrics-test name: http://sp-custom-metrics-test password: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX tenant: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
Save the output of this command, we need this in the next step to configure the Telegraf output plugin.
Create a new docker-compose file on your server.
version: '3' services: telegraf: image: telegraf:latest volumes: - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro - /:/rootfs:ro environment: HOST_SYS: /rootfs/sys HOST_MOUNT_PREFIX: /rootfs AZURE_TENANT_ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX AZURE_CLIENT_ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX AZURE_CLIENT_SECRET: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX command: - "--test"
Here is some explanation, what happens in this file: we mount the host file system to the container and set the HOST_PROC variable, so Telegraf can read disk usage info from the host. We also set HOST_MOUNT_PREFIX to cut the /rootfs part from the path when reporting metrics.
Set the following environment variables from the output of the create-rbac-command:
- AZURE_CLIENT_ID: AppId
- AZURE_TENANT_ID: Tenant
- AZURE_CLIENT_SECRET: Password
Finally we need a simple configuration file for Telegraf to report CPU and disk usage.
[agent] interval = "1m" hostname = "orangepi-v2" # Override container hostname [[inputs.disk]] taginclude = ["device", "host"] fieldpass = ["used_percent"] mount_points = ["/rootfs/mnt/data", "/rootfs"] [[inputs.system]] fieldpass = ["load*"] [[outputs.azure_monitor]] namespace_prefix = "Telegraf/" region = "westeurope" resource_id = "/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/rg-davidjenei-com/providers/microsoft.insights/components/appis-davidjenei-com"
Few things to notice here: I tried to report only a handful of metrics, so I used the fieldpass directive to select only what I need. You need to fill in your resource details in the output config. Use this command to find out your resource id and region name:
az resource show -n appis-custom-metrics-test -g rg-custom-metrics-test --resource-type microsoft.insights/components
Now we can start Telegraf with docker-compose and check the reported metrics:
$ docker-compose up 2020-04-04T19:28:34Z I! Starting Telegraf 1.13.4 2020-04-04T19:28:34Z I! Using config file: /etc/telegraf/telegraf.conf > disk,device=mmcblk0p2,host=orangepi-v2 used_percent=94.26396009711372 1586028514000000000 > disk,device=mmcblk0p3,host=orangepi-v2 used_percent=15.371661730642543 1586028514000000000 > system,host=orangepi-v2 load1=0.22,load15=0.02,load5=0.08 1586028514000000000
Log in to Azure portal and use the search bar on the top to quickly navigate to Metrics. Select your subscription and find the Application Insight resource. After a few minutes, your telegraf metric namespaces appears, and you can display your load metrics.
Now that we have some data in Azure, we can create an alert. Open Alerts and select New Alert Rule.
The instructions are fairly straightforward from here, select your resource, than add a condition and select load1 from the dropdown list. Set the threshold to 1 for testing.
Finally add an action group and create a new email alert.
We can test the alert by generating some artifical load. Here is a simple Dockerfile for building a container with the stress tool.
FROM ubuntu RUN apt-get update && apt-get install -y stress ENTRYPOINT ["/usr/bin/stress", "-v"]
Build the container and start a CPU hog.
docker build -t stress . && docker run --rm -it stress --cpu 2
After a one minute delay you will receive your first email alert.
In the next tutorials, I will add new alert groups to trigger Azure Logic Apps and send a message to a Microsoft Teams channel.