This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Observability in EKS Anywhere

Monitoring, Logging, and Tracing for EKS Anywhere Clusters.

1 - Overview

Overview of observability in EKS Anywhere

Most Kubernetes-conformant observability tools can be used with EKS Anywhere. You can optionally use the EKS Connector to view your EKS Anywhere cluster resources in the Amazon EKS console, reference the Connect to console page for details. EKS Anywhere includes the AWS Distro for Open Telemetry (ADOT) and Prometheus for metrics and tracing as EKS Anywhere Curated Packages. You can use popular tooling such as Fluent Bit for logging, and can track the progress of logging for ADOT on the AWS Observability roadmap . For more information on EKS Anywhere Curated Packages, reference the Package Management Overview .

AWS Integrations

AWS offers comprehensive monitoring, logging, alarming, and dashboard capabilities through services such as Amazon CloudWatch , Amazon Managed Prometheus (AMP) , and Amazon Managed Grafana (AMG) . With CloudWatch, you can take advantage of a highly scalable, AWS-native centralized logging and monitoring solution for EKS Anywhere clusters. With AMP and AMG, you can monitor your containerized applications EKS Anywhere clusters at scale with popular Prometheus and Grafana interfaces.

Resources

  1. Verify EKS Anywhere cluster status
  2. Use the EKS Connector to view EKS Anywhere clusters and resources in the EKS console
  3. Use Fluent Bit and Container Insights to send metrics and logs to CloudWatch
  4. Use ADOT to send metrics to AMP and AMG
  5. Expose metrics for EKS Anywhere components

2 - Verify EKS Anywhere cluster status

Verify the status of EKS Anywhere clusters

Check cluster nodes

To verify the expected number of cluster nodes are present and running, use the kubectl command to show that nodes are Ready.

Worker nodes are named using the cluster name followed by the worker node group name. In the example below, the cluster name is mgmt and the worker node group name is md-0. The other nodes shown in the response are control plane or etcd nodes.

kubectl get nodes
NAME                              STATUS   ROLES           AGE   VERSION
mgmt-clrt4                        Ready    control-plane   3d22h   v1.27.1-eks-61789d8
mgmt-md-0-5557f7c7bxsjkdg-l2kpt   Ready    <none>          3d22h   v1.27.1-eks-61789d8

Check cluster machines

To verify that the expected number of cluster machines are present and running, use the kubectl command to show that the machines are Running.

The machine objects are named using the cluster name as a prefix and there should be one created for each node in your cluster. In the example below, the command was run against a management cluster with a single attached workload cluster. When the command is run against a management cluster, all machines for the management cluster and attached workload clusters are shown.

kubectl get machines -A
NAMESPACE     NAME                              CLUSTER   NODENAME                          PROVIDERID                                       PHASE     AGE     VERSION
eksa-system   mgmt-clrt4                        mgmt      mgmt-clrt4                        vsphere://421a801c-ac46-f47e-de1f-f070ef990c4d   Running   3d22h   v1.27.1-eks-1-27-4
eksa-system   mgmt-md-0-5557f7c7bxsjkdg-l2kpt   mgmt      mgmt-md-0-5557f7c7bxsjkdg-l2kpt   vsphere://421a4b9b-c457-fc4d-458a-d5092f981c5d   Running   3d22h   v1.27.1-eks-1-27-4
eksa-system   w01-7hzfh                         w01       w01-7hzfh                         vsphere://421a642b-f4ef-5764-47f9-5b56efcf8a4b   Running   15h     v1.27.1-eks-1-27-4
eksa-system   w01-etcd-z2ggk                    w01                                         vsphere://421ac003-3a1a-7dd9-ac83-bd0c75370cc4   Running   15h     
eksa-system   w01-md-0-799ffd7946x5gz8w-p94mt   w01       w01-md-0-799ffd7946x5gz8w-p94mt   vsphere://421a7b77-ca57-dc78-18bf-f361081a2c5e   Running   15h     v1.27.1-eks-1-27-4

Check cluster components

To verify cluster components are present and running, use the kubectl command to show that the system Pods are Running. The number of Pods may vary based on the infrastructure provider (vSphere, bare metal, Snow, Nutanix, CloudStack), and whether the cluster is a workload cluster or a management cluster.

kubectl get pods -A
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS      AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-8665b88c65-v982t       1/1     Running   0             3d22h
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-67595c55d8-z7627   1/1     Running   0             3d22h
capi-system                         capi-controller-manager-88bdd56b4-wnk66                          1/1     Running   0             3d22h
capv-system                         capv-controller-manager-644d9864dc-hbrcz                         1/1     Running   1 (16h ago)   3d22h
cert-manager                        cert-manager-548579646f-4tgb2                                    1/1     Running   0             3d22h
cert-manager                        cert-manager-cainjector-cbb6df554-w5fjx                          1/1     Running   0             3d22h
cert-manager                        cert-manager-webhook-54f748c89b-qnfr2                            1/1     Running   0             3d22h
eksa-packages                       ecr-credential-provider-package-4c7mk                            1/1     Running   0             3d22h
eksa-packages                       ecr-credential-provider-package-nvlkb                            1/1     Running   0             3d22h
eksa-packages                       eks-anywhere-packages-784c6fc8b9-2t5nr                           1/1     Running   0             3d22h
eksa-system                         eksa-controller-manager-76f484bd5b-x6qld                         1/1     Running   0             3d22h
etcdadm-bootstrap-provider-system   etcdadm-bootstrap-provider-controller-manager-6bcdd4f5d7-wvqw8   1/1     Running   0             3d22h
etcdadm-controller-system           etcdadm-controller-controller-manager-6f96f5d594-kqnfw           1/1     Running   0             3d22h
kube-system                         cilium-lbqdt                                                     1/1     Running   0             3d22h
kube-system                         cilium-operator-55c4778776-jvrnh                                 1/1     Running   0             3d22h
kube-system                         cilium-operator-55c4778776-wjjrk                                 1/1     Running   0             3d22h
kube-system                         cilium-psqm2                                                     1/1     Running   0             3d22h
kube-system                         coredns-69797695c4-kdtjc                                         1/1     Running   0             3d22h
kube-system                         coredns-69797695c4-r25vv                                         1/1     Running   0             3d22h
kube-system                         etcd-mgmt-clrt4                                                  1/1     Running   0             3d22h
kube-system                         kube-apiserver-mgmt-clrt4                                        1/1     Running   0             3d22h
kube-system                         kube-controller-manager-mgmt-clrt4                               1/1     Running   0             3d22h
kube-system                         kube-proxy-588gj                                                 1/1     Running   0             3d22h
kube-system                         kube-proxy-hrksw                                                 1/1     Running   0             3d22h
kube-system                         kube-scheduler-mgmt-clrt4                                        1/1     Running   0             3d22h
kube-system                         kube-vip-mgmt-clrt4                                              1/1     Running   0             3d22h
kube-system                         vsphere-cloud-controller-manager-7vzjx                           1/1     Running   0             3d22h
kube-system                         vsphere-cloud-controller-manager-cqfs5                           1/1     Running   0             3d22h

Check control plane components

You can verify the control plane is present and running by filtering Pods by the control-plane=controller-manager label.

kubectl get pod -A -l control-plane=controller-manager
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS      AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-8665b88c65-v982t       1/1     Running   0             3d21h
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-67595c55d8-z7627   1/1     Running   0             3d21h
capi-system                         capi-controller-manager-88bdd56b4-wnk66                          1/1     Running   0             3d21h
capv-system                         capv-controller-manager-644d9864dc-hbrcz                         1/1     Running   1 (15h ago)   3d21h
eksa-packages                       eks-anywhere-packages-784c6fc8b9-2t5nr                           1/1     Running   0             3d21h
etcdadm-bootstrap-provider-system   etcdadm-bootstrap-provider-controller-manager-6bcdd4f5d7-wvqw8   1/1     Running   0             3d21h
etcdadm-controller-system           etcdadm-controller-controller-manager-6f96f5d594-kqnfw           1/1     Running   0             3d21h

Check workload clusters from management clusters

Set up CLUSTER_NAME and KUBECONFIG environment variable for the management cluster:

export CLUSTER_NAME=mgmt
export KUBECONFIG=${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig
Check control plane resources for all clusters

Use the command below to check the status of cluster control plane resources. This is useful to verify clusters with multiple control plane nodes after an upgrade. The status for the management cluster and all attached workload clusters is shown.

kubectl get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io -n eksa-system
NAME   CLUSTER   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE     VERSION
mgmt   mgmt      true          true                   1          1       1                       3d22h   v1.27.1-eks-1-27-4
w01    w01       true          true                   1          1       1         0             16h     v1.27.1-eks-1-27-4

Use the command below to check the status of a cluster resource. This is useful to verify cluster health after any mutating cluster lifecycle operation. The status for the management cluster and all attached workload clusters is shown.

kubectl get clusters.cluster.x-k8s.io -A -o=custom-columns=NAME:.metadata.name,CONTROLPLANE-READY:.status.controlPlaneReady,INFRASTRUCTURE-READY:.status.infrastructureReady,MANAGED-EXTERNAL-ETCD-INITIALIZED:.status.managedExternalEtcdInitialized,MANAGED-EXTERNAL-ETCD-READY:.status.managedExternalEtcdReady
NAME   CONTROLPLANE-READY   INFRASTRUCTURE-READY   MANAGED-EXTERNAL-ETCD-INITIALIZED   MANAGED-EXTERNAL-ETCD-READY
mgmt   true                 true                   <none>                              <none>
w01    true                 true                   true                                true

3 - Connect EKS Anywhere clusters to the EKS console

Connect an EKS Anywhere cluster to the EKS console

The EKS Connector lets you connect your EKS Anywhere cluster to the EKS console. The connected console displays the EKS Anywhere cluster, its configuration, workloads, and their status. EKS Connector is a software agent that runs on your EKS Anywhere cluster and registers the cluster with the EKS console

Visit the EKS Connector documentation for details on how to configure and run the EKS Connector.

4 - Configure Fluent Bit for CloudWatch

Using Fluent Bit for logging with EKS Anywhere clusters and CloudWatch

Fluent Bit is an open source, multi-platform log processor and forwarder which allows you to collect data/logs from different sources, then unify and send them to multiple destinations. It’s fully compatible with Docker and Kubernetes environments. Due to its lightweight nature, using Fluent Bit as the log forwarder for EKS Anywhere clusters enables you to stream application logs into Amazon CloudWatch Logs efficiently and reliably.

You can additionally use CloudWatch Container Insights to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices running on EKS Anywhere clusters. CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. You can also set CloudWatch alarms on metrics that Container Insights collects.

On this page, we show how to set up Fluent Bit and Container Insights to send logs and metrics from your EKS Anywhere clusters to CloudWatch.

Prerequisites

  • An AWS Account (see AWS documentation to get started)
  • An EKS Anywhere cluster with IAM Roles for Service Account (IRSA) enabled: With IRSA, an IAM role can be associated with a Kubernetes service account. This service account can provide AWS permissions to the containers in any Pod that use the service account, which enables the containers to securely communicate with AWS services. This removes the need to hardcode AWS security credentials as environment variables on your nodes. See the IRSA configuration page for details.

Before setting up Fluent Bit, first create an IAM Policy and Role to send logs to CloudWatch.

Step 1: Create IAM Policy

  1. Go to IAM Policy in the AWS console.

  2. Click on JSON as shown below:

    Observability Create Policy

  3. Create below policy on the IAM Console. Click on Create Policy as shown:

        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "EKSAnywhereLogging",
                    "Effect": "Allow",
                    "Action": "cloudwatch:*",
                    "Resource": "*"
                }
            ]
        }

Step 2: Create IAM Role

  1. Go to IAM Role in the AWS console.

  2. Follow the steps as shown below:

    Observability Role Creation

    In Identity Provider, enter the OIDC provider you created as a part of IRSA configuration.

    In Audience, select sts.amazonaws.com. Click on Next.

  3. Select permission name which we have created in Create IAM Policy

    Observability Select Permission

  4. Provide a Role name EKSAnywhereLogging and click Next.

  5. Copy the ARN as shown below and save it locally for the next step.

    Observability Copy ARN

Step 3: Install Fluent Bit

  1. Create the amazon-cloudwatch namespace using this command:

    kubectl create namespace amazon-cloudwatch
    
  2. Create the Service Account for cloudwatch-agent and fluent-bit under the amazon-cloudwatch namespace. In this section, we will use Role ARN which we saved earlier . Replace $RoleARN with your actual value.

    cat << EOF | kubectl apply -f -
    # create cwagent service account and role binding
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: cloudwatch-agent
    namespace: amazon-cloudwatch
    annotations:
    # set this with value of OIDC_IAM_ROLE
    eks.amazonaws.com/role-arn: "$RoleARN"
    # optional: Defaults to "sts.amazonaws.com" if not set
    eks.amazonaws.com/audience: "sts.amazonaws.com"
    # optional: When set to "true", adds AWS_STS_REGIONAL_ENDPOINTS env var
    #   to containers
    eks.amazonaws.com/sts-regional-endpoints: "true"
    # optional: Defaults to 86400 for expirationSeconds if not set
    #   Note: This value can be overwritten if specified in the pod
    #         annotation as shown in the next step.
    eks.amazonaws.com/token-expiration: "86400"
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: fluent-bit
    namespace: amazon-cloudwatch
    annotations:
    # set this with value of OIDC_IAM_ROLE
    eks.amazonaws.com/role-arn: "$RoleARN"
    # optional: Defaults to "sts.amazonaws.com" if not set
    eks.amazonaws.com/audience: "sts.amazonaws.com"
    # optional: When set to "true", adds AWS_STS_REGIONAL_ENDPOINTS env var
    #   to containers
    eks.amazonaws.com/sts-regional-endpoints: "true"
    # optional: Defaults to 86400 for expirationSeconds if not set
    #   Note: This value can be overwritten if specified in the pod
    #         annotation as shown in the next step.
    eks.amazonaws.com/token-expiration: "86400"
    EOF
    

    The above command creates two Service Accounts:

    serviceaccount/cloudwatch-agent created
    serviceaccount/fluent-bit created
    
  3. Now deploy Fluent Bit in your EKS Anywhere cluster to scrape and send logs to CloudWatch:

    kubectl apply -f "https://anywhere.eks.amazonaws.com/manifests/fluentbit.yaml"
    

    You should see the following output:

    clusterrole.rbac.authorization.k8s.io/cloudwatch-agent-role changed
    clusterrolebinding.rbac.authorization.k8s.io/cloudwatch-agent-role-binding changed
    configmap/cwagentconfig changed
    daemonset.apps/cloudwatch-agent changed
    configmap/fluent-bit-cluster-info changed
    clusterrole.rbac.authorization.k8s.io/fluent-bit-role changed
    clusterrolebinding.rbac.authorization.k8s.io/fluent-bit-role-binding changed
    configmap/fluent-bit-config changed
    daemonset.apps/fluent-bit changed
    
  4. You can verify the DaemonSets have been deployed with the following command:

    kubectl -n amazon-cloudwatch get daemonsets
    
  • If you are running the EKS connector , you can verify the status of DaemonSets by logging into AWS console and navigate to Amazon EKS -> Cluster -> Resources -> DaemonSets

    Observability Verify DaemonSet

Step 4: Deploy a test application

Deploy a simple test application to verify your setup is working properly.

Step 5: View cluster logs and metrics

Cloudwatch Logs

  1. Open the CloudWatch console . The link opens the console and displays your current available log groups.

  2. Choose the EKS Anywhere clustername that you want to view logs for. The log group name format is /aws/containerinsights/my-EKS-Anywhere-cluster/cluster.

    Observability Container Insights

    Log group name /aws/containerinsights/my-EKS-Anywhere-cluster/application has log source from /var/log/containers.

    Log group name /aws/containerinsights/my-EKS-Anywhere-cluster/dataplane has log source for kubelet.service, kubeproxy.service, and docker.service

  3. To view the deployed test application logs, click on the application LogGroup, and click on Search All

    Observability Container Insights

  4. Type HTTP 1.1 200 in the search box and press enter. You should see logs as shown below:

    Observability Container Insights

Cloudwatch Container Insights

  1. Open the CloudWatch console . The link opens the Container Insights performance Monitoring console and displays a dropdown to select your EKS Clusters.

    Observability Container Insights

For more details on CloudWatch logs, please refer What is Amazon CloudWatch Logs?

5 - Expose metrics for EKS Anywhere components

Expose metrics for EKS Anywhere components

Some Kubernetes system components like kube-controller-manager, kube-scheduler, kube-proxy and etcd (Stacked) expose metrics only on the localhost by default. In order to expose metrics for these components so that other monitoring systems like Prometheus can scrape them, you can deploy a proxy as a Daemonset on the host network of the nodes. The proxy pods also need to be configured with control plane tolerations so that they can be scheduled on the control plane nodes.

For etcd metrics, the steps outlined below are applicable only for stacked etcd setup. For Unstacked/External etcd, metrics are already exposed on https://<etcd-machine-ip>:2379/metrics endpoint and can be scraped by Prometheus directly without deploying a proxy.

Configure Proxy

To configure a proxy for exposing metrics on an EKS Anywhere cluster, you can perform the following steps:

  1. Create a config map to store the proxy configuration.

    Below is an example ConfigMap if you use HAProxy as the proxy server.

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: metrics-proxy
    data:
      haproxy.cfg: |
        defaults
          mode http
          timeout connect 5000ms
          timeout client 5000ms
          timeout server 5000ms
          default-server maxconn 10
    
        frontend kube-proxy
          bind \${NODE_IP}:10249
          http-request deny if !{ path /metrics }
          default_backend kube-proxy
        backend kube-proxy
          server kube-proxy 127.0.0.1:10249 check
    
        frontend kube-controller-manager
          bind \${NODE_IP}:10257
          http-request deny if !{ path /metrics }
          default_backend kube-controller-manager
        backend kube-controller-manager
          server kube-controller-manager 127.0.0.1:10257 ssl verify none check
    
        frontend kube-scheduler
          bind \${NODE_IP}:10259
          http-request deny if !{ path /metrics }
          default_backend kube-scheduler
        backend kube-scheduler
          server kube-scheduler 127.0.0.1:10259 ssl verify none check
    
        frontend etcd
          bind \${NODE_IP}:2381
          http-request deny if !{ path /metrics }
          default_backend etcd
        backend etcd
          server etcd 127.0.0.1:2381 check
    EOF
    
  2. Create a daemonset for the proxy and mount the config map volume onto the proxy pods.

    Below is an example configuration for the HAProxy daemonset.

    cat << EOF | kubectl apply -f -
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: metrics-proxy
    spec:
      selector:
        matchLabels:
          app: metrics-proxy
      template:
        metadata:
          labels:
            app: metrics-proxy
        spec:
          tolerations:
          - key: node-role.kubernetes.io/control-plane
            operator: Exists
            effect: NoSchedule
          hostNetwork: true
          containers:
            - name: haproxy
              image: public.ecr.aws/eks-anywhere/kubernetes-sigs/kind/haproxy:v0.20.0-eks-a-54
              env:
                - name: NODE_IP
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: status.hostIP
              ports:
                - name: kube-proxy
                  containerPort: 10249
                - name: kube-ctrl-mgr
                  containerPort: 10257
                - name: kube-scheduler
                  containerPort: 10259
                - name: etcd
                  containerPort: 2381
              volumeMounts:
                - mountPath: "/usr/local/etc/haproxy"
                  name: haproxy-config
          volumes:
            - configMap:
                name: metrics-proxy
              name: haproxy-config
    EOF
    

Configure Client Permissions

  1. Create a new cluster role for the client to access the metrics endpoint of the components.

    cat << EOF | kubectl apply -f -
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: metrics-reader
    rules:
      - nonResourceURLs:
          - "/metrics"
        verbs:
          - get
    EOF
    
  2. Create a new cluster role binding to bind the above cluster role to the client pod’s service account.

    cat << EOF | kubectl apply -f -
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: metrics-reader-binding
    subjects:
    - kind: ServiceAccount
      name: default
      namespace: default
    roleRef:
      kind: ClusterRole
      name: metrics-reader
      apiGroup: rbac.authorization.k8s.io
    EOF
    
  3. Verify that the metrics are exposed to the client pods by running the following commands:

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      containers:
      - command:
        - /bin/sleep
        - infinity
        image: curlimages/curl:latest
        name: test-container
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
    EOF
    
    kubectl exec -it test-pod -- sh
    export TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
    curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10257/metrics"
    curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10259/metrics"
    curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10249/metrics"
    curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:2381/metrics"