With our production load already running on an autoscaling Kubernetes cluster using Amazon EKS, it feels like we are at the peak of technology. But the truth is that in many cases the production environment is far from being as big or as complex as the internal environment.

Many organizations are running sophisticated testing and analytics environments, none of which is customer-facing, but both are much harder to implement and more expensive to maintain than the production environment.

When I approached the task of containerizing our internal environment, I realized that I need to handle much more flows than before, while always keeping in mind the costs, and avoid paying for resources we don’t use.

AWS Spot instances are a great place to start for that. With Spot instances you can get a set of EC2 instances on demand booted with an image of your choice, and at a fraction of the price when compared to using the same EC2 instances in the standard way.

In my previous post I demonstrated how to set up an auto-scaling EKS environment using a custom Docker image. Now we’re going to add Spot instances, and make everything run with Jenkins, to create a complete containerized auto-scaling CI/CD environment.


This article assumes that you have AWS as your cloud provider, and that you have access to the EKS and ECR management consoles, and also to AWS CLI.

I also assume that you have eksctl and kubectl installed and configured. For further information regarding those setups, check out my previous post at

Once again, I tried to make minimal assumptions regarding previous knowledge and explain what I could, but I do assume some familiarity with the concepts of Kubernetes and Docker.

Finally, please note that all the code snippets and referenced files in this article are available here:

Setting up the EKS cluster

The first thing we need to do is to set up an EKS cluster to host both our Jenkins installation and our worker nodes.

Our cluster will consist of two nodegroups:

  • A nodegroup that will include a single relatively small static node to hold the Jenkins installation as well as all the necessary Kubernetes services and persistent deployments.
  • A nodegroup of Spot instances that we will use for our worker nodes. This nodegroup can be scaled down to size 0 and grow on demand once Jenkins requests to execute a job.

In the gist, you will find an eksctl example configuration file named eks-test-jenkins.yaml. This example configuration file can be used to setup an EKS cluster in your VPC with two nodegroups as described above.

Don’t forget to replace the placeholders with your cluster AWS region and subnet IDs.

Notice the labels that we added to the Spot instances nodegroup (lifecycle: Ec2Spot, intent: apps). We will use those labels in order to identify this nodegroup as the nodegroup we want to use to execute our builds in Jenkins.

Save the configuration file run the following command:

eksctl create cluster -f eks-test-jenkins.yaml

Once eksctl is done, run the following commands to check your cluster, and note that your cluster has two nodegroups configured, but only one node is actually up and running, which is exactly what we wanted at the moment.

eksctl get nodegroups --cluster=eks-test-jenkins

kubectl get nodes

Setting up the Jenkins deployment

Once we have our cluster up and running, we first need to install Jenkins. Since we need Jenkins to access cluster resources in order to control our worker nodes, the easiest way will be to set up Jenkins on Kubernetes, meaning to install Jenkins on the static node of our EKS cluster.

Creating a persistent volume

Before creating a Jenkins deployment we need a persistent volume, an independent storage volume in the Kubernetes cluster, in order to make sure the configuration and data we store on Jenkins will not be lost in case we redeploy Jenkins or reboot a node.

We also need a persistent volume claim, which is a request for a persistent volume storage.
Additional information regarding persistent volumes can be found here:

Get the pv-volume.yaml example file from the gist. This example will create a 10GB persistent volume in /mnt/data folder on the host node, with read/write once permissions, meaning it can only be accessed from a single node in the cluster.

Run the following command to deploy the persistent volume:

kubectl apply -f pv-volume.yaml

Next, get the pv-claim.yaml example file from the gist. This example will create a 5GB storage claim, again with the same access mode.

Run the following command to deploy the persistent volume claim:

kubectl apply -f pv-claim.yaml

Test your deployment using the following command:

kubectl get pvc

Creating a Jenkins container

In order to create a Jenkins deployment we first need a Jenkins container (we can either use the default container and install it using helm, for example, or we can create our own container with some pre-installed customizations such as plugins we know we need).

Get the Dockerfile-jenkins example Dockerfile from the gist. This Dockerfile will create a container based on the latest Jenkins version (for the time I’m writing this) with a couple of necessary plugins preinstalled — the kubernetes plugin and the ssh-slaves plugin that allows connecting to remote agents to execute builds.

Run the following command to build your Docker image:

docker build -f Dockerfile-jenkins -t "YOUR_DOCKER_IMAGE_FILE_NAME"

For additional information on publishing your image on Amazon ECR, check my previous post here:

Creating a Jenkins deployment

When creating the deployment, we need to make sure to attach it to the persistent volume claim we just created, in order to make sure Jenkins data and configuration is persistent, and we will not get a clean copy each time our node is restarted.

Get the jenkins-deployment.yaml file from the gist. This example will create a Jenkins deployment with our persistent volume claim mounted as the jenkins_home folder.

The deployment is based on a specified Jenkins Docker image (Don’t forget to replace the Docker image URL placeholder. You may also use jenkins/jenkins:lts for the default image).

Note that we need to make sure that the volume can be accessed using the jenkins user. There are several options for that, but I found that the most common solution (using Kubernetes SecurityContext and setting up fsGroup, as elaborated here: fails to work on an EKS cluster. The best solution I could find was the one that I used in the above example — create an init container at the creation of the deployment, that will manually fix the volume permissions to match the jenkins user and group ids, which are default to 1000.

Run the following command to deploy Jenkins:

kubectl apply -f jenkins-deployment.yaml

Test your deployment using the following command:

kubectl get pods

Creating a Jenkins external service

Next, we need to expose our Jenkins deployment so it will have an external endpoint for easier access. Note that If your nodes are running inside a VPC and don’t accept incoming traffic from the internet, the external endpoint will only be accessible inside your VPC.

Run the following command:

kubectl expose deployment jenkins --type=LoadBalancer --name=jenkins-external

Test your service using the following command:

kubectl get services

If your service external IP is stuck on pending or your are facing another issue here, check out my previous post for a more thorough explanation:

Connecting to Jenkins

Now that we have the external service ready, we can connect to Jenkins using the service address.

Use the following command to get the address:

kubectl get services | grep jenkins-external | awk '{print $4}'

Copy the URL and paste it in the browser. Don’t forget to add the port of your external service (the default port is 8080).

You should get the Jenkins welcome screen:

Setting up the Cluster Autoscaler

Now, we want to configure our EKS cluster so that the Spot instances nodegroup can autoscale up from 0 on demand.

Setting up an auto-scaling policy

We need to assign an auto-scaling policy to our static nodegroup, to grant it permissions to access the Spot instances auto-scaling group and modify its size.

First, we need to find the exact name of our static nodegroup stack. Run the following command:

aws cloudformation list-stacks

Check the StackName field and look for a stack with your cluster name and your nodegroup name. For example, if your cluster name is eks-test-jenkins and your nodegroup name is ng-test-jenkins, your stack name should be eksctl-eks-test-jenkins-nodegroup-ng-test-jenkins.

Next, run the following command to find the IAM role name for that stack:

aws cloudformation describe-stack-resources --stack-name YOUR_STACK_NAME | jq -r '.StackResources[] | select(.ResourceType=="AWS::IAM::Role") | .PhysicalResourceId'

Finally, get the k8s-asg-policy.json file from the gist, and run the following command to apply the IAM policy on that role:

aws iam put-role-policy --role-name YOUR_ROLE_NAME --policy-name ASG-Policy-For-Worker --policy-document file://./k8s-asg-policy.json

Deploying a Cluster Autoscaler

The Cluster Autoscaler is the component that will be in charge of automatically adjusting the size of our Spot instances nodegroup.

First, we need to get the name of the autoscaling group that was created for that nodegroup. Run the following command (replace the placeholder with your Spot instances nodegroup name, defined in the eksctl config file (ng-test-jenkins-spot in my case)):

aws autoscaling describe-auto-scaling-groups | grep AutoScalingGroupName | grep YOUR_SPOT_NODEGROUP | awk 'NR==1{print substr($2, 2, length($2) — 3)}'

Get the cluster-autoscaler.yaml file from the gist, and replace the YOUR_SPOT_ASG_NODEGROUP placeholder with the autoscaling group name you just collected, and the YOUR_SPOT_ASG_REGION with the AWS region you are working on.

The cluster-autoscaler.yaml provided here is adapted to deploy Cluster Autoscaler on an EKS cluster using a single autoscaling group. You can find additional examples for different EKS configurations here:

Run the following command to deploy the Cluster Autoscaler on your cluster:

kubectl apply -f cluster-autoscaler.yaml

Creating a Jenkins agent Docker image

The last thing we need is a Docker image that we will use for the containers that execute our Jenkins builds.

You may use the standard jenkins/jnlp-slave:latest Docker image here, but it is simple to create a customized image that includes any other prerequisites that your Jenkins agent needs, such as configurations and packages.

Get the Dockerfile-jenkins-agent and the jenkins-slave files from the gist. This example creates a simple Ubuntu container with the Jenkins JNLP agent, that our Jenkins master will use for communication.

Run the following command to build a Docker image for your Jenkins agent:

docker build -f Dockerfile-jenkins-agent -t "YOUR_DOCKER_IMAGE_FILE_NAME"

Configuring Kubernetes on Jenkins

We have finished setting up our environment and preparing our Docker image, and now we have everything ready in order to start using Jenkins on our EKS cluster.

The last thing we need is to configure Jenkins to communicate with the Kubernetes cluster, in order for it to be able to start containers and execute builds.

Adding Kubernetes cluster permissions

Any application running in a container is able to communicate with a Kubernetes cluster using a cluster service account. The service account can be granted certain cluster privileges by being bound to cluster roles.

It is possible to set Jenkins to run in a separate cluster namespace, using a dedicated service account with unique privileges, but let’s leave the deep security discussion for a future post. For now the Jenkins deployment will use the default service account automatically, and we will create a permissive binding that gives all service accounts cluster administration privileges, which is more than enough for our purpose.

Run the following command:

kubectl create clusterrolebinding permissive-binding --clusterrole=cluster-admin --user=admin --user=kubelet --group=system:serviceaccounts

For additional information regarding service account permissions and cluster role-binding check the following page in Kubernetes documentation:

Getting the Kubernetes master URL

Another thing Jenkins needs is the URL of the Kubernetes master service, that will be used for Jenkins communication with the cluster.

Execute the following command:

kubectl cluster-info | grep master

Configuring Jenkins

General Jenkins Configuration

From the Jenkins welcome screen, click on Manage Jenkins in the left menu, and then click on Configure System.

Set the # of executors field to 0, and in the Usage field select Only build jobs with label expressions matching this node.

Those two settings will make sure that we will not execute builds on our master Jenkins node.

Kubernetes Cloud Configuration

In the Cloud section, click on Add a new cloud and choose Kubernetes.

In the Kubernetes URL box type the Kubernetes master URL we extracted earlier, and then click on the Test Connection button. You should get a Connection test successful message to indicate that Jenkins was successfully able to communicate with the Kubernetes cluster.

If you get an error message here, review the previous sections and make sure everything was configured smoothly.

In the Jenkins URL box type the URL of the Jenkins external service, or an internal VPC address you mapped to access Jenkins. Note that you can get the Jenkins VPC internal IP address by executing the following command:

kubectl describe service jenkins-external | grep Endpoints | awk 'NR==1{print $2}'

Kubernetes Pod Template Configuration

In the Pod Templates section, click on Add Pod Template.

Type a name, and in the Usage field select Use this node as much as possible.

Scroll down to the Node Selector field, and type the labels that we added when we configured the Spot instances nodegroup. This will make sure Kubernetes uses nodes from this nodegroup (Spot instances) to execute our builds.

Scroll back up, and click on the Add Container button, then choose Container Template.

In the Name field type jnlp (this is the type of Jenkins agent connection we are using), and in the Docker image field type the name of the Jenkins agent Docker image that you created earlier.

Clear the Command to run and the Arguments to pass to the command fields.

You may also click on the Advanced… button and update the Kubernetes pod execution settings such as CPU and memory requests:

Note that you can use these advanced execution settings to create several container types with different resource allocation (and even a completely different pod template) to maintain different flows with completely different requirements.

Click on the Save button when you’re done.

Creating a Jenkins test project

We are finally ready to test our Jenkins deployment to make sure that we can execute builds on-demand on Spot machines.

From the Jenkins welcome screen, click on New Item in the left menu, and in the next window type a name and select Freestyle project. Then click OK.

Next, go to the Build section, click on the Add build step button and select Execute shell.

In the command box type sleep 300. Then click Save.

This project we just created has only one build step, that executes the sleep 300 command. This means that the build will wait for 5 minutes, giving us enough time to see what happens.

Select Build Now to trigger the build. You’ll immediately notice that the build is pending, and waiting for an executor.

Go to the Jenkins main window and you’ll see that a pod was allocated for this build, but it is still offline.

Go to the terminal, and run the following commands:

kubectl get pods

kubectl get nodes

Note that a new pod has been created, but it is still in Pending state. Also, note that a new node has been created (it might take a few seconds, we need to provision a Spot instance), and it is still in NotReady state.

Wait a minute, and check the Jenkins main window again. You’ll notice that at some point the build starts on the allocated pod.

Check the console, and notice that the build is currently busy with our sleep 300 command.

Go back to the terminal, and run the same commands.

kubectl get pods

kubectl get nodes

Note that now the pod is in Running state and the node is in Ready state.

Wait for the build to finish, and execute the commands again.

kubectl get pods

kubectl get nodes

Note that the pod has now disappeared, but the node is still there. If you execute another Jenkins build now, a new pod will instantly be created on this node.

The cluster autoscaler service that we deployed is waiting for the node to be idle for a few minutes before it scales down.

Wait a few minutes and run the command again.

kubectl get nodes

Note that the node has now disappeared.


Maintaining a CI/CD environment with many different flows and requirements is not an easy task, and moving to a containerized implementation is a welcome approach but also a huge change that many organizations delay again and again.

I have demonstrated a basic use-case here to explain the concepts and the capabilities of such an implementation. I believe it will be relatively easy to adjust it to your specific needs, and add different projects, each running on a unique image with predefined resources.

Using Jenkins with AWS Spot instances directly is a fine solution, but even if we put aside the integration which is far from being perfect, it is hard and expensive to maintain when it comes to different flows with different requirements. I think it makes much more sense to simply add new pod templates instead.

To learn more about EKS and its capabilities, visit the AWS EKS documentation

You should also refer to this EKS online workshop provided by Amazon that can also help get you started and introduce you to all the EKS features

Also, check out my previous post on EKS that might also be of help:


Omer Hanetz