AWS Cloud Monitoring Tools

Amazon web services offer a suite of tools for cloud monitoring. Cloud monitoring is a service that allows you to monitor your AWS resources and the applications you run on AWS in real-time. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources. You can use CloudWatch to gain system-wide visibility into resource utilization, application performance, and operational health. Cloudwatch is not the only tool available; there are many options for monitoring AWS resources, but Cloudwatch is a good starting point for monitoring AWS resources.

AWS CloudWatch Metrics

You can create metric and composite alarms in Amazon CloudWatch. A metric alarm watches a single CloudWatch metric or the result of a math expression based on CloudWatch metrics. The alarm performs one or more actions based on the value of the metric or expression relative to a threshold over several timeframes. An event triggers when created according to a schedule, but an alarm needs a threshold reached. So, our first difference between an alarm and an event is how they are triggered. A CloudWatch alarm watches a metric for a specific period and starts if it goes above or below a threshold you define.

Amazon CloudWatch Metrics
• CloudWatch provides metrics for every service in AWS
• Metric is a variable to monitor (CPUUtilization, Network) • Metrics have timestamps
• Can create CloudWatch dashboards of metrics

• EC2 instances: CPU Utilization, Status Checks, Network (not RAM) • Default metrics every 5 minutes
• Option for Detailed Monitoring ($$$): metrics every 1 minute
• EBS volumes: Disk Read/Writes
• S3 buckets: BucketSizeBytes, NumberOfObjects, AllRequests • Billing: Total Estimated Charge (only in us-east-1)
• Service Limits: how much you’ve been using a service API
• Custom metrics: push your own metrics

Amazon CloudWatch Alarms

CloudWatch enables you to specify how to treat missing data points when evaluating an alarm. Defining helps you configure your alarm, so it goes to ALARM state only when appropriate for the type of data being monitored. You can avoid false positives when missing data doesn’t indicate a problem. A CloudWatch Alarm is always in one of three states: OK, ALARM, or INSUFFICIENT_DATA. A datapoint is a metric for a given metric aggregation period i.e. if you use one minute as an aggregation period for a metric, then there. Their data point every minute.

• Alarms are used to trigger notifications for any metric
• Alarms actions…
• Auto Scaling: increase or decrease EC2 instances “desired” count

• EC2 Actions: stop, terminate, reboot or recover, an EC2 instance

• SNS notifications: send a message to an SNS topic
• Various options (sampling, %, max, min, etc)
• Can choose the period on which to evaluate an alarm
• Example: create a billing alarm on the CloudWatch Billing metric • Alarm States: OK. INSUFFICIENT_DATA, ALARM

An Amazon CloudWatch alarm would help an ecommerce company that wants to use Amazon EC2 Auto Scaling to add and remove EC2 instances based on CPU utilization.

AWS CloudWatch Logs

CloudWatch Logs lets you monitor and troubleshoot your systems and applications using your existing system, application, and custom log files. With CloudWatch Logs, you can monitor your logs, in near real-time, for specific phrases, values, or patterns.

• CloudWatch Logs can collect logs from:
• Elastic Beanstalk: a collection of logs from the application
• ECS: collection from containers
• AWS Lambda: collection from function logs
• CloudTrail based on filter
• CloudWatch log agents: on EC2 machines or on-premises servers • Route53: Log DNS queries
• Enables real-time monitoring of logs
• Adjustable CloudWatch Logs retention

CloudWatch Logs for EC2
• By default, no logs from your EC2 instance will go to CloudWatch
• You need to run a CloudWatch agent on EC2 to push the log files you want
• Make sure IAM permissions are correct
• The CloudWatch log agent can be setup on-premises too

Amazon EventBridge

Amazon EventBridge is a serverless event bus that lets you receive, filter, transform, route, and deliver events. You can use Eventbridge when you want to publish messages to many subscribers, and use the event data itself to match targets. Integration with other SaaS providers such as Shopify, Datadog, Pagerduty, and EventBri. In that case,ge delivers a stream of real-time data from your applications, software as a service (SaaS) applications, and AWS services to targets such as AWS Lambda functions, HTTP invocation endpoints using API destinations, or event buses in other AWS accounts. EventBridge was formerly called Amazon CloudWatch Events.

• Schedule: Cron jobs (scheduled scripts) Schedule Every hour
Trigger script on Lambda function
• Event Pattern: Event rules to react to a service doing something
IAM Root User Sign in Event SNS Topic with Email Notification
• Trigger Lambda functions, send SQS/SNS messages…

• Schema Registry: model event schema
• You can archive events (all/filter) sent to an event bus (indefinitely or set period) • Ability to replay archived events

CloudWatch Events

Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources. Using simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams. CloudWatch Events becomes aware of operational changes as they occur. CloudWatch Events responds to these operational changes and takes corrective action as necessary by sending messages to respond to the environment, activating functions, making changes, and capturing state information.

Amazon CloudTrail

AWS CloudTrail is an AWS service that helps you track your account’s activities. Risk auditing, governance, and compliance of yoions taken by a user, role, or an AWS service are recorded as events in CloudTrail. The Difference between CloudWatch and CloudTrail is that CloudWatch focuses on the activity of AWS services and resources, reporting on their health and performance. On the other hand, CloudTrail is a log of all actions that have taken place inside your AWS environment.

• Provides governance, compliance, and audit for your AWS Account
• CloudTrail is enabled by default!
• Get a history of events / API calls made within your AWS Account by • Console
• SDK
• CLI
• AWS Services
• Can put logs from CloudTrail into CloudWatch Logs or S3
• A trail can be applied to All Regions (default) or a single Region. • If a resource is deleted in AWS, investigate CloudTrail first!

Use AWS CloudTrail to see if the security group was changed when a user needs to determine whether an Amazon EC2 instance’s security groups were modified in the last month.

Amazon CloudWatch should be used to monitor Amazon EC2 instances for CPU and network utilization.

AWS X-Ray

AWS X-Ray is a service that helps developers analyze and debug distributed applications. Customers use X-Ray to monitor application traces, including the performance of calls to other downstream components or services, in either cloud-hosted applications or from their computers during development. AWS X-Ray lets you directly detect node and edge latency distribution from the service map. You can quickly isolate outliers, patterns, and trends, drill into traces, and use built-in keys and custom annotations to understand better performance issues impacting your application and end users.

• Debugging in Production, the good old way: • Test locally
• Add log statements everywhere • Re-deploy in production
• Log formats differ across applications, and log analysis is complex.
• Debugging: one giant monolith “easy,” distributed services “hard” • No standard views of your entire architecture

AWS X-Ray advantages
• Troubleshooting performance (bottlenecks)
• Understand dependencies in a microservice architecture • Pinpoint service issues
• Review request behavior
• Find errors and exceptions
• Are we meeting time SLA?
• Where am I throttled?
• Identify users that are impacted

AWS CloudTrail and AWS X-Ray can trace user requests as they move through a serverless application that includes an Amazon API Gateway API, an AWS Lambda function, and an Amazon DynamoDB database. AWS X-Ray provides the capability to view end-to-end performance metrics and troubleshoot distributed applications.

AWS CodeGuru

Amazon CodeGuru is a developer tool that provides intelligent recommendations to improve code quality and identify an application’s most expensive lines of code. CodeGuru Reviewer analyzes existing code bases in the repository, identifies hard-to-find bugs and critical issues with high accuracy, provides intelligent suggestions on how to remediate them, and creates a baseline for successive code reviews.

• An ML-powered service for automated code reviews and application performance recommendations
• Provides two functionalities
• CodeGuru Reviewer: automated code reviews for static code analysis (development) – Amazon CodeGuru provides intelligent recommendations to improve code quality and identify an application’s most expensive lines of code.
• CodeGuru Profiler: visibility/recommendations about application performance during runtime (production)

• Identify critical issues, security vulnerabilities, and hard-to-find bugs
• Example: standard coding best practices, resource leaks, security detection, input validation
• Uses Machine Learning and automated reasoning
• Hard-learned lessons across millions of code reviews on 1000s of open-source and Amazon repositories
• Supports Java and Python
• Integrates with GitHub, Bitbucket, and AWS CodeCommit

AWS Service Health Dashboard

The AWS Health Dashboard is the single place to learn about the availability and operations of AWS services. You can view the overall status of AWS services, and you can sign in to view personalized communications about your particular AWS account or organization.

AWS Personal Health Dashboard

AWS Personal Health Dashboard provides a personalized view into the performance and availability of the AWS services you are using, as well as alerts automatically triggered by changes in the health of those services.

• AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.
• While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view of the performance and availability of the AWS services underlying your AWS resources.
• The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.