One of the most important things to consider when building a service is how you will monitor it and gain insights as to how it’s performing. This is a critical aspect of running any service at scale. Beyond just understanding, if you’re experiencing specific issues at different points in time, it’s also an important tool to understand how your service is growing and being adopted.

If you’re building your service in AWS, there are many options you can use for monitoring your application and infrastructure. This tutorial will touch on how to use CloudWatch to help you monitor your service, configure alerts, and decide what to do in certain circumstances. In addition to using CloudWatch, we will also use Terraform to describe all of the configurations through code.

As a starting point, the process typically involves 3 phases:

  • Creating a dashboard shell

  • Adding metrics provided from the different resources that make up your service

  • Configuring alarms upon the breach of certain thresholds

Note: This tutorial assumes you have a basic understanding of Terraform and Terraform modules. If that is not the case, we recommend you get familiarized with these topics first:

Sample Scenario

As a simple example, assume that you want to monitor the usage of a basic database server. A common metric you may want to track is how much available storage the server has. In this case, we want to create a dashboard that shows available storage over time as well as configure an alert if it drops below a certain threshold: say, 3GB. Alarms in CloudWatch can be configured to notify an SNS target. In a basic target, you can simply send an email to an administrator indicating that the server is running out of storage. You may then elect to take a manual action and provision more storage for the server. Similarly, you can also target a more sophisticated workflow where in addition to sending an email, you can automatically provision additional storage capacity. For the purpose of this post, we will break the process in 2 parts. The first part will be creating the alarms using Terraform and then the second part, the actual monitoring dashboard.

Part 1: Setting up Alarms

The first step in creating the alarm is deciding what we want to target. In our case, we will create an SNS topic that we can use to send an email when the alarm occurs. Topics themselves can have subscriptions such as an email address so whenever a message is sent to the topic, an email can be sent with the message received by the topic. With the following Terraform snippet, we can create a simple SNS topic.

resource aws_sns_topic notification-topic {name = "Alert-
Notifications"display_name = "Alert Notifications"}

NOTE: Terraform does not currently support creating a topic subscription of type e-mail. To achieve this, you can use a Terraform provisioner that relies on the AWS CLI instead. The downside of this approach is that you now depend on having the AWS CLI installed as a dependency but for the purpose of this post we will just create the topic without the subscription and leave that as a manual step after the topic has been created. In my next post, I’ll provide an example of how to use a provisioner so you can automate the entire process.Now that the topic is created, we can create the actual alarm. In this alarm, we will monitor free local storage from an RDS database. We can create the alarm with the following Terraform snippet.

resource aws_cloudwatch_metric_alarm low-db-space-alarm {alarm_name = 
"Low Database Space Alert"alarm_description = "This alarm is triggered 
when the database is running out of local storage (less than 3 GB 
remaining)."comparison_operator = "LessThanThreshold"evaluation_periods 
= 3period = 300threshold = 3000statistic = "Average"datapoints_to_alarm 
= 3treat_missing_data = "missing"namespace = "AWS/RDS"metric_name = 
"FreeLocalStorage"dimensions = {DBInstanceIdentifier = 
var.db_instance_name}alarm_actions = [aws_sns_topic.notification-
topic.arn]}

Notice a few things from the snippet above:

  • The namespace, metric_name, and dimensions are used to configure specifics of what each alert will look for. For a list of all available options check out this article here.

  • This alert requires specifying the database instance identifier as a dimension which in turn is expected to be passed in as a variable to this Terraform module called db_instance_name.

  • Last is the alarm_actions. The way you specify what action to perform when the alert is triggered is by providing an array of amazon resource names (arns) corresponding to the expected targets which, in our case, was the single sns topic we had previously configured to send notification emails.

Our alarm is now complete and it will be triggered if the average of 3 consecutive data points across 300 seconds each is less than our 3000 megabyte (3GB) threshold.

Part 2: Configure Dashboard

Unfortunately, there is no easy way to describe your dashboard’s initial layout through Terraform, especially if you want a more sophisticated one. Your best bet is to build your first dashboard online using the AWS console, add the metrics that you need, and organize it in the most visually pleasing way.

After you’ve done this initial configuration, you can export the JSON definition of your dashboard and use it as the body of your Terraform definition for subsequent usages. Let’s see how to create the dashboard online and then export the definition so we can use it in Terraform.

  1. Open the AWS Console -> CloudWatch -> Dashboards

  2. Click on Create dashboard and provide a unique name when prompted.

  3. At this point, you have an empty dashboard shell and you can start adding metrics to it.

  4. For this exercise, let’s add 2 metrics, a line graph, and a number

  5. Next, you will see a list of all the available metrics being published by the different resources in your account.

  6. For our case, we will choose RDS -> DBClusterIdentifier -> FreeLocalStorage

  7. Click on Create widget

  8. The dashboard will refresh and you should see the new widget that we just created as part of the dashboard.

  9. We will repeat this process to add the new Number widget.

  10. Select add widget -> Number -> Configure ->

  11. Again choose RDS -> DBClusterIdentifier -> FreeLocalStorage and Create widget

  12. Organize the dashboard however you like it best and click Save dashboard to save your changes, in my case it looks like this:

  13. At this point your dashboard has been created, you can add as many or as few widgets as you you think would be useful.

  14. Now that we have our initial dashboard configured, let’s check out the json definition of the dashboard so we can use it in Terraform going forward: You can do so by clicking on Actions -> View/edit source

  15. Copy the source by clicking on Copy source and paste it into a new file called dashboard-body.json

We are now ready to create our Terraform module for our dashboard:

resource aws_cloudwatch_dashboard my-dashboard {dashboard_name = "My-
Dashboard"dashboard_body = file("${path.module}/dashboard-body.json",}

This snippet references a file called dashboard-body.json where it will read the dashboard layout from. We could have embedded the json layout directly into the value of the dashboard_body parameter but in my opinion, this is a bit cleaner and easier to read. The dashboard-body.json file simply contains the content of the json definition we copied from the AWS console in the step above.

Wrap Up

As you can see, creating powerful dashboards and alerts in AWS is pretty straightforward with CloudWatch. Together with Terraform, you can automate the entire process so you can easily recreate your monitoring setup if needed.

Let us know what you think. We would love to know more about a service you have built and how you may be monitoring it. Alternatively, If you are in the process of building a new digital asset and need help laying out the strategy and a roadmap of how to achieve it, let us help you and we can build it together!

Feel free to shoot me a note right here or ping me anytime through LinkedIn.