Container Map
Container Resource
Performance Dashboards
Log Groups
Log Insights
Alarms
Automatic Performance Dashboard
Monitoring EKS using CloudWatch Container Insigths
Step-01: Introduction
What is CloudWatch?
What are CloudWatch Container Insights?
What is CloudWatch Agent and Fluentd?
CloudWatch: CloudWatch is a monitoring and observability service provided by Amazon Web Services (AWS). It collects and tracks various metrics, logs, and events from AWS resources and applications. CloudWatch allows you to gain insights into the performance, health, and availability of your AWS infrastructure and applications. It provides features like dashboards, alarms, logs, and automated actions to help you monitor, troubleshoot, and optimize your resources.
CloudWatch Container Insights: CloudWatch Container Insights is a feature of CloudWatch that provides monitoring and analysis capabilities specifically designed for containerized environments. It helps you understand the performance and resource utilization of your containerized applications running on services like Amazon Elastic Container Service (ECS), Amazon Elastic Kubernetes Service (EKS), and Kubernetes clusters.
CloudWatch Container Insights collects and displays key metrics, logs, and metadata related to containers, tasks, pods, and services. It offers pre-defined dashboards, automated alarms, and performance recommendations to help you monitor, troubleshoot, and optimize your containerized applications.
CloudWatch Agent: The CloudWatch Agent is a software component that runs on EC2 instances, on-premises servers, or virtual machines to collect and send system-level metrics, logs, and custom metrics to CloudWatch. It enables you to monitor system-level metrics, such as CPU usage, memory utilization, disk space, and network performance, as well as application-level metrics.
- Fluentd: Fluentd is an open-source data collection agent that can be used to collect, transform, and forward logs and other data from various sources to different destinations. It provides a unified logging layer that supports a wide range of input sources and output destinations.
In the context of CloudWatch, Fluentd can be used as a log collector and forwarder to gather logs from various sources within your infrastructure and send them to CloudWatch Logs. It acts as an intermediary between log sources (such as containers, applications, or system logs) and CloudWatch Logs, enabling you to centralize and analyze logs in CloudWatch.
By deploying the CloudWatch Agent and using Fluentd, you can collect both system-level metrics and application logs, and send them to CloudWatch for monitoring, analysis, and troubleshooting purposes.
Step-02: Associate CloudWatch Policy to our EKS Worker Nodes Role
Go to Services -> EC2 -> Worker Node EC2 Instance -> IAM Role -> Click on that role
# Sample Role ARN
arn:aws:iam::180789647333:role/eksctl-eksdemo1-nodegroup-eksdemo-NodeInstanceRole-1FVWZ2H3TMQ2M
# Policy to be associated
Associate Policy: CloudWatchAgentServerPolicy
Step-03: Install Container Insights
Deploy CloudWatch Agent and Fluentd as DaemonSets
This command will
Creates the Namespace amazon-cloudwatch.
Creates all the necessary security objects for both DaemonSet:
SecurityAccount
ClusterRole
ClusterRoleBinding
Deploys Cloudwatch-Agent (responsible for sending the metrics to CloudWatch) as a DaemonSet.
Deploys fluentd (responsible for sending the logs to Cloudwatch) as a DaemonSet.
Deploys ConfigMap configurations for both DaemonSets.
# Template
curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/<REPLACE_CLUSTER_NAME>/;s/{{region_name}}/<REPLACE-AWS_REGION>/" | kubectl apply -f -
# Replaced Cluster Name and Region
curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/my-cluster1/;s/{{region_name}}/us-east-1/" | kubectl delete -f -
Verify
# List Daemonsets
kubectl -n amazon-cloudwatch get daemonsets
Step-04: Deploy Sample Nginx Application
# Deploy
kubectl apply -f kube-manifests
--------------------------------------------
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-nginx-deployment
labels:
app: sample-nginx
spec:
replicas: 1
selector:
matchLabels:
app: sample-nginx
template:
metadata:
labels:
app: sample-nginx
spec:
containers:
- name: sample-nginx
image: stacksimplify/kubenginx:1.0.0
ports:
- containerPort: 80
resources:
requests:
cpu: "5m"
memory: "5Mi"
limits:
cpu: "10m"
memory: "10Mi"
---
apiVersion: v1
kind: Service
metadata:
name: sample-nginx-service
labels:
app: sample-nginx
spec:
selector:
app: sample-nginx
ports:
- port: 80
targetPort: 80
Step-05: Generate load on our Sample Nginx Application
# Generate Load
kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/
kubectl run apache-bench -i --tty --rm
--image=httpd -- ab -n 500000 -c 1000
http://sample-nginx-service.default.svc.cluster.local/
Step-06: Access CloudWatch Dashboard
Access CloudWatch Container Insigths Dashboard
Step-07: CloudWatch Log Insights
View Container logs
View Container Performance Logs
Step-08: Container Insights - Log Insights in depth
Log Groups
Log Insights
Create Dashboard
Create Graph for Avg Node CPU Utlization
DashBoard Name: EKS-Performance
Widget Type: Bar
Log Group: /aws/containerinsights/eksdemo1/performance
STATS avg(node_cpu_utilization) as avg_node_cpu_utilization by NodeName
| SORT avg_node_cpu_utilization DESC
Container Restarts
DashBoard Name: EKS-Performance
Widget Type: Table
Log Group: /aws/containerinsights/eksdemo1/performance
STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName
| SORT avg_number_of_container_restarts DESC
Cluster Node Failures
DashBoard Name: EKS-Performance
Widget Type: Table
Log Group: /aws/containerinsights/eksdemo1/performance
stats avg(cluster_failed_node_count) as CountOfNodeFailures
| filter Type="Cluster"
| sort @timestamp desc
CPU Usage By Container
DashBoard Name: EKS-Performance
Widget Type: Bar
Log Group: /aws/containerinsights/eksdemo1/performance
stats pct(container_cpu_usage_total, 50) as CPUPercMedian by kubernetes.container_name
| filter Type="Container"
Pods Requested vs Pods Running
DashBoard Name: EKS-Performance
Widget Type: Bar
Log Group: /aws/containerinsights/eksdemo1/performance
fields @timestamp, @message
| sort @timestamp desc
| filter Type="Pod"
| stats min(pod_number_of_containers) as requested, min(pod_number_of_running_containers) as running, ceil(avg(pod_number_of_containers-pod_number_of_running_containers)) as pods_missing by kubernetes.pod_name
| sort pods_missing desc
Application log errors by container name
DashBoard Name: EKS-Performance
Widget Type: Bar
Log Group: /aws/containerinsights/eksdemo1/application
stats count() as countoferrors by kubernetes.container_name
| filter stream="stderr"
| sort countoferrors desc
Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-view-metrics.html
Step-09: Container Insights - CloudWatch Alarms
Create Alarms - Node CPU Usage
Specify metric and conditions
Select Metric: Container Insights -> ClusterName -> node_cpu_utilization
Metric Name: eksdemo1_node_cpu_utilization
Threshold Value: 4
Important Note: Anything above 4% of CPU it will send a notification email, ideally it should 80% or 90% CPU but we are giving 4% CPU just for load simulation testing
Configure Actions
Create New Topic: eks-alerts
Email: dkalyanreddy@gmail.com
Click on Create Topic
Important Note:** Complete Email subscription sent to your email id.
Add name and description
Name: EKS-Nodes-CPU-Alert
Descritption: EKS Nodes CPU alert notification
Click Next
Preview
Preview and Create Alarm
Add Alarm to our custom Dashboard
Generate Load & Verify Alarm
# Generate Load
kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/
Step-10: Clean-Up Container Insights
# Template
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/cluster-name/;s/{{region_name}}/cluster-region/" | kubectl delete -f -
# Replace Cluster Name & Region Name
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/my-cluster1/;s/{{region_name}}/us-east-2/" | kubectl delete -f -
Step-11: Clean-Up Application
# Delete Apps
kubectl delete -f kube-manifests
References
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-EKS.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus-Setup.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-reference-performance-entries-EKS.html




No comments:
Post a Comment