Creating EKS Cluster with Fargate nodes and Karpenter using Terraform

January 23, 2024

amazon_eks_logo

Link to Github code repo here

Prerequisites:

  1. AWS Account: You must have an active AWS account. If you don't have one, you can sign up for an AWS account on the AWS website. You can create it here
  2. IAM User or Role: Create an IAM (Identity and Access Management) user or role in your AWS account with the necessary permissions to create and manage EKS clusters. At a minimum, the user or role should have permissions to create EKS clusters, EC2 instances, VPCs, and related resources.
  3. AWS CLI: Install and configure the AWS Command Line Interface (CLI) on your local machine. You'll use the AWS CLI to interact with your AWS account and configure your AWS credentials. You can download it here
  4. Terraform Installed: Install Terraform on your local machine. You can download Terraform from the official Terraform website and follow the installation instructions for your operating system here

What is Karpenter?

Karpenter is a project that provides automated cluster node scaling for Kubernetes. When using Fargate with Amazon EKS, the concept of traditional autoscaling groups for worker nodes does not apply since Fargate is a serverless compute engine where you don't manage the underlying nodes. However, there are still scenarios where you might want to scale the capacity of your EKS cluster, such as when you need more resources for running Fargate tasks.

Karpenter, when used with EKS and Fargate, helps automate the scaling of Fargate profiles, adjusting the capacity of Fargate tasks based on the demand in your cluster. It optimizes the number of Fargate tasks running based on the resources needed by your workloads.

Terraform code

As alway we will use our favourite terraform module from here

You can find our full terraform code in our repo We would use the code from our previous article. As example we would use Fargate nodes and configure Karpenter to add EC2 nodes to our cluster when scalling.

Providers.tf

Since we would use helm to install Karpenter and Kubectl to create Karpenter node pool we would need to declare following providers:

provider "aws" {
  region  = var.aws_region
  profile = var.aws_profile
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

provider "kubectl" {
  apply_retry_count      = 5
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  load_config_file       = false

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

data "aws_availability_zones" "available" {}
data "aws_ecrpublic_authorization_token" "token" {}

Please note that AWSCli need to be installed locally

versions.tf

Specifying versions for providers is a good practice as well as helps to prevent unintended changes caused by automatically upgrading to the latest provider version. This helps in version locking and maintaining consistent behavior across deployments.

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.57"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.10"
    }
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.7"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.14"
    }
    null = {
      source  = "hashicorp/null"
      version = ">= 3.0"
    }
  }
}

variables.tf

We would update our variables.tf and start using locals to store the result of a computation that would otherwise be repeated multiple times in the module, optimizing performance and reducing redundancy:

variable "aws_profile" {
  description = "Set this variable if you use another profile besides the default awscli profile called 'default'."
  type        = string
  default     = "default"
}

variable "aws_region" {
  description = "Set this variable if you use another aws region."
  type        = string
  default     = "us-east-1"
}

locals {
  name            = "test"
  cluster_version = "1.27"
  region          = "us-east-1"

  vpc_cidr = "10.0.0.0/16"
  azs      = slice(data.aws_availability_zones.available.names, 0, 3)

  tags = {
    Example    = local.name
  }
}

Main.tf

In our Main.tf file we would create a cluster and vpc to prepare everything to deploy Karpenter:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = local.name
  cidr = local.vpc_cidr

  azs             = local.azs
  private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
  public_subnets  = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]
  intra_subnets   = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 52)]

  enable_nat_gateway = true
  single_nat_gateway = true

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
    "karpenter.sh/discovery" = local.name
  }

  tags = local.tags
}

module "eks" {
  source = "terraform-aws-modules/eks/aws"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version
  cluster_endpoint_public_access = true

  cluster_addons = {
    kube-proxy = {}
    vpc-cni    = {}
    coredns = {
      configuration_values = jsonencode({
        computeType = "Fargate"
        resources = {
          limits = {
            cpu = "0.25"
            memory = "256M"
          }
          requests = {
            cpu = "0.25"
            memory = "256M"
          }
        }
      })
    }
  }

  vpc_id                   = module.vpc.vpc_id
  subnet_ids               = module.vpc.private_subnets
  control_plane_subnet_ids = module.vpc.intra_subnets

  create_cluster_security_group = false
  create_node_security_group    = false

  manage_aws_auth_configmap = true
  aws_auth_roles = [
    {
      rolearn  = module.karpenter.role_arn
      username = "system:node:{{EC2PrivateDNSName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes",
      ]
    },
  ]

  fargate_profiles = {
    karpenter = {
      selectors = [
        { namespace = "karpenter" 
          labels = {
            "k8s-app" = "karpenter"
          }
        }
      ]
    }
    kube-system = {
      selectors = [
        { namespace = "kube-system" }
      ]
    }
  }

  tags = merge(local.tags, {
    "karpenter.sh/discovery" = local.name
  })
}

Let's discuss eks module main changes and we need them:

  • cluster_addons. coredns: Fargate adds 256 MB to each pod's memory reservation for the required Kubernetes components (kubelet, kube-proxy, and containerd). Fargate rounds up to the following compute configuration that most closely matches the sum of vCPU and memory requests in order to ensure pods always have the resources that they need to run. We are targeting the smallest Task size of 512Mb, so we subtract 256Mb from the request/limit to ensure we can fit within that task
  • manage_aws_auth_configmap: is set to true since we need to add in the Karpenter node IAM role for nodes launched by Karpenter
  • fargate_profiles: feel free to add more if needed in order to use scheduler. We would use label for Karpenter profile to ensure it's pods would be scheduled

Karpenter.tf

Here we woud define resources to deploy and configure karpenter in our cluster:

module "karpenter" {
  source = "terraform-aws-modules/eks/aws//modules/karpenter"

  cluster_name           = module.eks.cluster_name
  irsa_oidc_provider_arn = module.eks.oidc_provider_arn

  enable_karpenter_instance_profile_creation = true

  iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }

  tags = local.tags
}

resource "helm_release" "karpenter" {
  namespace        = "karpenter"
  create_namespace = true

  name                = "karpenter"
  repository          = "oci://public.ecr.aws/karpenter"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
  chart               = "karpenter"
  version             = "v0.32.1"

  values = [
    <<-EOT
    settings:
      clusterName: ${module.eks.cluster_name}
      clusterEndpoint: ${module.eks.cluster_endpoint}
      interruptionQueueName: ${module.karpenter.queue_name}
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: ${module.karpenter.irsa_arn}
    controller:
        resources:
          requests:
            cpu: 1
            memory: 1Gi
          limits:
            cpu: 1
            memory: 1Gi
    podLabels:
        k8s-app: karpenter
    EOT
  ]
  depends_on = [
    module.eks
  ]
}

resource "kubectl_manifest" "karpenter_node_class" {
  yaml_body = <<-YAML
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    metadata:
      name: default
    spec:
      amiFamily: AL2
      role: ${module.karpenter.role_name}
      subnetSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${module.eks.cluster_name}
      securityGroupSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${module.eks.cluster_name}
      tags:
        karpenter.sh/discovery: ${module.eks.cluster_name}
  YAML

  depends_on = [
    helm_release.karpenter
  ]
}

resource "kubectl_manifest" "karpenter_node_pool" {
  yaml_body = <<-YAML
    apiVersion: karpenter.sh/v1beta1
    kind: NodePool
    metadata:
      name: default
    spec:
      template:
        spec:
          nodeClassRef:
            name: default
          requirements:
            - key: "karpenter.k8s.aws/instance-category"
              operator: In
              values: ["c", "m", "r"]
            - key: "karpenter.k8s.aws/instance-cpu"
              operator: In
              values: ["4", "8", "16", "32"]
            - key: "karpenter.k8s.aws/instance-hypervisor"
              operator: In
              values: ["nitro"]
            - key: "karpenter.k8s.aws/instance-generation"
              operator: Gt
              values: ["2"]
      limits:
        cpu: 1000
      disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 30s
  YAML

  depends_on = [
    kubectl_manifest.karpenter_node_class
  ]
}

resource "kubectl_manifest" "nginx_deployment" {
  yaml_body = <<-YAML
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
    spec:
      replicas: 0
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          terminationGracePeriodSeconds: 0
          containers:
            - name: nginx
              image: nginx:latest
              resources:
                requests:
                  cpu: 1
  YAML

  depends_on = [
    helm_release.karpenter
  ]
}
  • module "karpenter": is used to create IAM instance profile whith additional policy for the Karpenter node IAM role
  • resource "helm_release" "karpenter": deploying Karpenter itself, depends on block delays this resource creation until module eks succeeded, otherwise you will get "deadline exceeded" error
  • resource "kubectl_manifest" "karpenter_node_class": here we define a node class for karpenter and specify subnets in which they should be proviosioned
  • resource "kubectl_manifest" "karpenter_node_pool": here is where we define requirements for instances karpenter can use in a default nodepool such as: karpenter.k8s.aws/instance-category: Specifies the allowed instance categories as "c" Instances (Compute-Optimized), "m" Instances (General Purpose), and "r" Instances (Memory-Optimized) karpenter.k8s.aws/instance-cpu: Specifies allowed CPU values as "4," "8," "16," and "32." karpenter.k8s.aws/instance-hypervisor: Specifies that the hypervisor type must be "nitro." karpenter.k8s.aws/instance-generation: Specifies that the instance generation must be greater than "2." limits: Sets a CPU limit of 1000 for the nodes. disruption: Configures how Karpenter should handle node disruptions.
  • resource "kubectl_manifest" "nginx_deployment": here we would create sample nginx deployment with 0 replicas.

Deployment

In order to initialize terraform and download modules run:

`terraform init` 

You can also check which resources terraform is planning to create by running:

terraform plan

To provision resources run:

terraform apply

Testing

After Terraform applied you should see the following output:

Apply complete! Resources: 78 added, 0 changed, 0 destroyed.

Outputs:

connect_to_eks = "aws eks --region <YOUR_REGION> update-kubeconfig --name <CLUSTER_NAME> --profile default"
endpoint = "<CLUSTER_ENDPOINT>"

Execute command from connect_to_eks output in order to generate kubeconfig file:

aws eks --region <YOUR_REGION> update-kubeconfig --name <CLUSTER_NAME> --profile default

Verify conectivity to the cluster with kubectl:

kubectl get no

You should see list of nodes:

NAME                                  STATUS   ROLES    AGE     VERSION
fargate-ip-10-0-12-3.ec2.internal     Ready    <none>   4m10s   v1.27.7-eks-4f4795d
fargate-ip-10-0-14-219.ec2.internal   Ready    <none>   4m15s   v1.27.7-eks-4f4795d
fargate-ip-10-0-25-226.ec2.internal   Ready    <none>   5m46s   v1.27.7-eks-4f4795d
fargate-ip-10-0-28-159.ec2.internal   Ready    <none>   5m45s   v1.27.7-eks-4f4795d

As you can see we are using Fargate nodes. Since we have also deployment installed let's scale it to see if Karpenter will provision new node:

kubectl scale deployment nginx --replicas 2

You should see the following output:

deployment.apps/nginx scaled

After a while we can check nodes again:

kubectl get no

Output should be similar:

NAME                                  STATUS   ROLES    AGE     VERSION
fargate-ip-10-0-12-3.ec2.internal     Ready    <none>   4m10s   v1.27.7-eks-4f4795d
fargate-ip-10-0-14-219.ec2.internal   Ready    <none>   4m15s   v1.27.7-eks-4f4795d
fargate-ip-10-0-25-226.ec2.internal   Ready    <none>   5m46s   v1.27.7-eks-4f4795d
fargate-ip-10-0-28-159.ec2.internal   Ready    <none>   5m45s   v1.27.7-eks-4f4795d
ip-10-0-2-65.ec2.internal             Ready    <none>   29s     v1.27.9-eks-5e0fdde

As you can see one more node is added but this time it's EC2 type node. Let check karpenter logs:

kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

You would see that Karpenter provisioned new node.

Clean up

Since our terraform code is not resposable for EC2 node it will not be able to delete it hence will not be able to delete VPC and related to this instance resources provisioned by Karpenter. So we would need to delete deployment so Karpenter will terminate instance for us. Run:

kubectl delete deployment nginx

You will see that deployment is deleted. Now we can run:

terraform destroy

You can find source code in our Github repo