Friday, June 16, 2023

AWS CloudWatch Container Insight

 

Container Map

Container Resource

Performance Dashboards

Log Groups

Log Insights

Alarms 


Automatic Performance Dashboard


Monitoring EKS using CloudWatch Container Insigths

Step-01: Introduction

    What is CloudWatch?
    What are CloudWatch Container Insights?
    What is CloudWatch Agent and Fluentd?

  1. CloudWatch: CloudWatch is a monitoring and observability service provided by Amazon Web Services (AWS). It collects and tracks various metrics, logs, and events from AWS resources and applications. CloudWatch allows you to gain insights into the performance, health, and availability of your AWS infrastructure and applications. It provides features like dashboards, alarms, logs, and automated actions to help you monitor, troubleshoot, and optimize your resources.

  2. CloudWatch Container Insights: CloudWatch Container Insights is a feature of CloudWatch that provides monitoring and analysis capabilities specifically designed for containerized environments. It helps you understand the performance and resource utilization of your containerized applications running on services like Amazon Elastic Container Service (ECS), Amazon Elastic Kubernetes Service (EKS), and Kubernetes clusters.

    CloudWatch Container Insights collects and displays key metrics, logs, and metadata related to containers, tasks, pods, and services. It offers pre-defined dashboards, automated alarms, and performance recommendations to help you monitor, troubleshoot, and optimize your containerized applications.

  3. CloudWatch Agent: The CloudWatch Agent is a software component that runs on EC2 instances, on-premises servers, or virtual machines to collect and send system-level metrics, logs, and custom metrics to CloudWatch. It enables you to monitor system-level metrics, such as CPU usage, memory utilization, disk space, and network performance, as well as application-level metrics.

    1. Fluentd: Fluentd is an open-source data collection agent that can be used to collect, transform, and forward logs and other data from various sources to different destinations. It provides a unified logging layer that supports a wide range of input sources and output destinations.

    In the context of CloudWatch, Fluentd can be used as a log collector and forwarder to gather logs from various sources within your infrastructure and send them to CloudWatch Logs. It acts as an intermediary between log sources (such as containers, applications, or system logs) and CloudWatch Logs, enabling you to centralize and analyze logs in CloudWatch.

    By deploying the CloudWatch Agent and using Fluentd, you can collect both system-level metrics and application logs, and send them to CloudWatch for monitoring, analysis, and troubleshooting purposes.



Step-02: Associate CloudWatch Policy to our EKS Worker Nodes Role

    Go to Services -> EC2 -> Worker Node EC2 Instance -> IAM Role -> Click on that role

# Sample Role ARN
arn:aws:iam::180789647333:role/eksctl-eksdemo1-nodegroup-eksdemo-NodeInstanceRole-1FVWZ2H3TMQ2M

# Policy to be associated
Associate Policy: CloudWatchAgentServerPolicy

Step-03: Install Container Insights
Deploy CloudWatch Agent and Fluentd as DaemonSets

    This command will
        Creates the Namespace amazon-cloudwatch.
        Creates all the necessary security objects for both DaemonSet:
            SecurityAccount
            ClusterRole
            ClusterRoleBinding
        Deploys Cloudwatch-Agent (responsible for sending the metrics to CloudWatch) as a DaemonSet.
        Deploys fluentd (responsible for sending the logs to Cloudwatch) as a DaemonSet.
        Deploys ConfigMap configurations for both DaemonSets.

# Template
curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/<REPLACE_CLUSTER_NAME>/;s/{{region_name}}/<REPLACE-AWS_REGION>/" | kubectl apply -f -

# Replaced Cluster Name and Region
curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/my-cluster1/;s/{{region_name}}/us-east-1/" | kubectl delete -f -

Verify
# List Daemonsets
kubectl -n amazon-cloudwatch get daemonsets

Step-04: Deploy Sample Nginx Application
# Deploy
kubectl apply -f kube-manifests

 --------------------------------------------

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-nginx-deployment
  labels:
    app: sample-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sample-nginx
  template:
    metadata:
      labels:
        app: sample-nginx
    spec:
      containers:
        - name: sample-nginx
          image: stacksimplify/kubenginx:1.0.0
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "5m"
              memory: "5Mi"
            limits:
              cpu: "10m"
              memory: "10Mi"       
---
apiVersion: v1
kind: Service
metadata:
  name: sample-nginx-service
  labels:
    app: sample-nginx
spec:
  selector:
    app: sample-nginx
  ports:
  - port: 80
    targetPort: 80         

--------------------------------------------------------------------



Step-05: Generate load on our Sample Nginx Application
# Generate Load
kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/ 

kubectl run  apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/

Step-06: Access CloudWatch Dashboard

    Access CloudWatch Container Insigths Dashboard

Step-07: CloudWatch Log Insights

    View Container logs
    View Container Performance Logs

Step-08: Container Insights - Log Insights in depth

    Log Groups
    Log Insights
    Create Dashboard

Create Graph for Avg Node CPU Utlization

    DashBoard Name: EKS-Performance
    Widget Type: Bar
    Log Group: /aws/containerinsights/eksdemo1/performance

STATS avg(node_cpu_utilization) as avg_node_cpu_utilization by NodeName
| SORT avg_node_cpu_utilization DESC

Container Restarts

    DashBoard Name: EKS-Performance
    Widget Type: Table
    Log Group: /aws/containerinsights/eksdemo1/performance

STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName
| SORT avg_number_of_container_restarts DESC

Cluster Node Failures

    DashBoard Name: EKS-Performance
    Widget Type: Table
    Log Group: /aws/containerinsights/eksdemo1/performance

stats avg(cluster_failed_node_count) as CountOfNodeFailures
| filter Type="Cluster"
| sort @timestamp desc


CPU Usage By Container

    DashBoard Name: EKS-Performance
    Widget Type: Bar
    Log Group: /aws/containerinsights/eksdemo1/performance

stats pct(container_cpu_usage_total, 50) as CPUPercMedian by kubernetes.container_name
| filter Type="Container"

Pods Requested vs Pods Running

    DashBoard Name: EKS-Performance
    Widget Type: Bar
    Log Group: /aws/containerinsights/eksdemo1/performance

fields @timestamp, @message
| sort @timestamp desc
| filter Type="Pod"
| stats min(pod_number_of_containers) as requested, min(pod_number_of_running_containers) as running, ceil(avg(pod_number_of_containers-pod_number_of_running_containers)) as pods_missing by kubernetes.pod_name
| sort pods_missing desc


Application log errors by container name

    DashBoard Name: EKS-Performance
    Widget Type: Bar
    Log Group: /aws/containerinsights/eksdemo1/application

stats count() as countoferrors by kubernetes.container_name
| filter stream="stderr"
| sort countoferrors desc

    Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-view-metrics.html

Step-09: Container Insights - CloudWatch Alarms
Create Alarms - Node CPU Usage

    Specify metric and conditions
        Select Metric: Container Insights -> ClusterName -> node_cpu_utilization
        Metric Name: eksdemo1_node_cpu_utilization
        Threshold Value: 4
        Important Note: Anything above 4% of CPU it will send a notification email, ideally it should 80% or 90% CPU but we are giving 4% CPU just for load simulation testing
    Configure Actions
        Create New Topic: eks-alerts
        Email: dkalyanreddy@gmail.com
        Click on Create Topic
        Important Note:** Complete Email subscription sent to your email id.
    Add name and description
        Name: EKS-Nodes-CPU-Alert
        Descritption: EKS Nodes CPU alert notification
        Click Next
    Preview
        Preview and Create Alarm
    Add Alarm to our custom Dashboard
    Generate Load & Verify Alarm

# Generate Load
kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/


Step-10: Clean-Up Container Insights
# Template
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/cluster-name/;s/{{region_name}}/cluster-region/" | kubectl delete -f -

# Replace Cluster Name & Region Name
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/my-cluster1/;s/{{region_name}}/us-east-2/" | kubectl delete -f -

Step-11: Clean-Up Application

# Delete Apps
kubectl delete -f  kube-manifests


References
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-EKS.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus-Setup.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-reference-performance-entries-EKS.html

No comments:

Post a Comment