CKS Simulator Kubernetes 1.31

https://killer.sh

 

Each question needs to be solved on a specific instance other than your main candidate@terminal. You'll need to connect to the correct instance via ssh, the command is provided before each question. To connect to a different instance you always need to return first to your main terminal by running the exit command, from there you can connect to a different one.

In the real exam each question will be solved on a different instance whereas in the simulator multiple questions will be solved on same instances.

Use sudo -i to become root on any node in case necessary.

 

 

Question 1 | Contexts

 

Solve this question on: ssh cks3477

 

You have access to multiple clusters from your main terminal through kubectl contexts. Write all context names into /opt/course/1/contexts on cks3477, one per line.

From the kubeconfig extract the certificate of user restricted@infra-prod and write it decoded to /opt/course/1/cert.

 

Answer:

Maybe the fastest way is just to run:

Or using jsonpath:

The content could then look like:

For the certificate we could just run

And copy it manually. Or we do:

Or even:

Completed.

 

 

Question 2 | Runtime Security with Falco

 

Solve this question on: ssh cks7262

 

Falco is installed on worker node cks7262-node1. Connect using ssh cks7262-node1 from cks7262. There is file /etc/falco/rules.d/falco_custom.yaml with rules that help you to:

  1. Find a Pod running image httpd which modifies /etc/passwd.

    Scale the Deployment that controls that Pod down to 0.

  2. Find a Pod running image nginx which triggers rule Package management process launched.

    Change the rule log text after Package management process launched to only include:

    Collect the logs for at least 20 seconds and save them under /opt/course/2/falco.log on cks7262.

    Scale the Deployment that controls that Pod down to 0.

 

ℹ️ Use sudo -i to become root which may be required for this question

 

Answer:

ℹ️ Other tools you might have to be familar with are sysdig or tracee

 

Check out Falco files

First we can investigate Falco config a little:

Here we see the Falco rule file falco_custom.yaml mentioned in the question text. We can also see the Falco configuration in falco.yaml:

This means that Falco is checking these directories for rules. There is also falco_rules.local.yaml in which we can override existing default rules. This is a much cleaner solution for production. Choose the faster way for you in the exam if nothing is specified in the task.

 

Step 1

We can run Falco and filter for certain output:

ℹ️ It can take a bit till Falco displays output, use falco -U/--unbuffered to speed up

We can see a matching log. Next we can find the belonging Pod and scale down the Deployment:

Using the Pod ID we can find out more information like the Namespace:

Now we can scale down:

 

Step 1: Rule Investigation

If we have a look in file /etc/falco/rules.d/falco_custom.yaml then we see:

This is a list that overwrites the default list in falco_rules.yaml. It's used for example by macro: sensitive_files. To find the rule we could simply search for Sensitive file opened for reading by non-trusted program in falco_rules.yaml.

If we would like to trigger the rule with additional files/paths we could simply add these to list: sensitive_file_names.

 

 

Step 2

We run Falco and filter for certain output:

ℹ️ It can take a bit till Falco displays output, use falco -U/--unbuffered to speed up

We can see a matching log. Next we can find the belonging Pod:

Using the Pod ID we can find out more information like the Namespace:

We wait before scaling down because this task requires some more steps before.

 

Step 2: Update Rule

The task requires us to store logs for rule Package management process launched with data time,container-id,container-name,user-name. So we edit the rule in /etc/falco/rules.d/falco_custom.yaml:

We change the above rule to:

For all available fields we can check https://falco.org/docs/rules/supported-fields, which should be allowed to open during the exam. We can also run for example falco --list | grep user to find available fields.

 

Step 2: Collect logs

Next we check the logs in our adjusted format:

If there are syntax or other errors in the falco_custom.yaml then Falco will display these and we would need to adjust.

Now we can collect for at least 20 seconds. Copy&paste the output into file /opt/course/2/falco.log on cks7262:

 

Step 2: Scale down Deployment

Now we can scale down using the information we got at the beginning of step (2):

You should be comfortable finding, creating and editing Falco rules.

 

 

Question 3 | Apiserver Security

 

Solve this question on: ssh cks7262

 

You received a list from the DevSecOps team which performed a security investigation of the cluster. The list states the following about the apiserver setup:

Change the apiserver setup so that:

 

ℹ️ Use sudo -i to become root which may be required for this question

 

Answer:

In order to modify the parameters for the apiserver, we first ssh into the controlplane node and check which parameters the apiserver process is running with:

We may notice the following argument:

We can also check the Service and see it's of type NodePort:

The apiserver runs as a static Pod, so we can edit the manifest. But before we do this we also create a copy in case we mess things up:

We should remove the unsecure settings:

Wait for the apiserver container to restart:

Give the apiserver some time to start up again. Check the apiserver's Pod status and the process parameters:

The apiserver got restarted without the unsecure settings. However, the Service kubernetes will still be of type NodePort:

We need to delete the Service for the changes to take effect:

After a few seconds:

This should satisfy the DevSecOps team.

 

 

Question 4 | Pod Security Standard

 

Solve this question on: ssh cks7262

 

There is Deployment container-host-hacker in Namespace team-red which mounts /run/containerd as a hostPath volume on the Node where it's running. This means that the Pod can access various data about other containers running on the same Node.

To prevent this configure Namespace team-red to enforce the baseline Pod Security Standard. Once completed, delete the Pod of the Deployment mentioned above.

Check the ReplicaSet events and write the event/log lines containing the reason why the Pod isn't recreated into /opt/course/4/logs on cks7262.

 

Answer:

Making Namespaces use Pod Security Standards works via labels. We can simply edit it:

Now we configure the requested label:

This should already be enough for the default Pod Security Admission Controller to pick up on that change. Let's test it and delete the Pod to see if it'll be recreated or fails, it should fail!

Usually the ReplicaSet of a Deployment would recreate the Pod if deleted, here we see this doesn't happen. Let's check why:

There we go! Finally we write the reason into the requested file so that scoring will be happy too!

Pod Security Standards can give a great base level of security! But when one finds themselves wanting to deeper adjust the levels like baseline or restricted... this isn't possible and 3rd party solutions like OPA or Kyverno could be looked at.

 

 

Question 5 | CIS Benchmark

 

Solve this question on: ssh cks3477

 

You're ask to evaluate specific settings of the cluster against the CIS Benchmark recommendations. Use the tool kube-bench which is already installed on the nodes.

Connect to the worker node using ssh cks3477-node1 from cks3477.

On the controlplane node ensure (correct if necessary) that the CIS recommendations are set for:

  1. The --profiling argument of the kube-controller-manager

  2. The ownership of directory /var/lib/etcd

On the worker node ensure (correct if necessary) that the CIS recommendations are set for:

  1. The permissions of the kubelet configuration /var/lib/kubelet/config.yaml

  2. The --client-ca-file argument of the kubelet

 

ℹ️ Use sudo -i to become root which may be required for this question

 

Answer:
Step 1

First we ssh into the controlplane node run kube-bench against the controlplane components:

We see some passes, fails and warnings. Let's check the required step (1) of the controller manager:

There we see 1.3.2 which suggests to set --profiling=false, we can check if it currently passes or fails:

So to obey we do:

Edit the corresponding line:

We wait for the Pod to restart, then run kube-bench again to check if the problem was solved:

Problem solved and 1.3.2 is passing:

 

Step 2

Next step is to check the ownership of directory /var/lib/etcd, so we first have a look:

Looks like user root and group root. Also possible to check using:

But what has kube-bench to say about this?

To comply we run the following:

This looks better. We run kube-bench again, and make sure test 1.1.12. is passing.

Done.

 

Step 3

To continue with step (3), we'll head to the worker node and ensure that the kubelet configuration file has the minimum necessary permissions as recommended:

Also here some passes, fails and warnings. We check the permission level of the kubelet config file:

777 is highly permissive access level and not recommended by the kube-bench guidelines:

We obey and set the recommended permissions:

And check if test 4.1.9 is passing:

 

Step 4

Finally for step (4), let's check whether --client-ca-file argument for the kubelet is set properly according to kube-bench recommendations:

This looks like 4.2.3 is passing.

To further investigate we run the following command to locate the kubelet config file, and open it:

The clientCAFile points to the location of the certificate, which is correct.

 

 

Question 6 | Verify Platform Binaries

 

Solve this question on: ssh cks3477

 

There are four Kubernetes server binaries located at /opt/course/6/binaries on cks3477. You're provided with the following verified sha512 values for these:

kube-apiserver f417c0555bc0167355589dd1afe23be9bf909bf98312b1025f12015d1b58a1c62c9908c0067a7764fa35efdac7016a9efa8711a44425dd6692906a7c283f032c

kube-controller-manager 60100cc725e91fe1a949e1b2d0474237844b5862556e25c2c655a33boa8225855ec5ee22fa4927e6c46a60d43a7c4403a27268f96fbb726307d1608b44f38a60

kube-proxy 52f9d8ad045f8eee1d689619ef8ceef2d86d50c75a6a332653240d7ba5b2a114aca056d9e513984ade24358c9662714973c1960c62a5cb37dd375631c8a614c6

kubelet 4be40f2440619e990897cf956c32800dc96c2c983bf64519854a3309fa5aa21827991559f9c44595098e27e6f2ee4d64a3fdec6baba8a177881f20e3ec61e26c

Delete those binaries that don't match with the sha512 values above.

 

Answer:

We check the directory:

To generate the sha512 sum of a binary we do:

Looking good, next:

Okay, next:

Also good, and finally:

Catch! Binary kubelet has a different hash!

 

But did we actually compare everything properly before? Let's have a closer look at kube-controller-manager again:

Edit to only have the provided hash and the generated one in one line each:

Looks right at a first glance, but if we do:

This shows they are different, by just one character actually.

We could also do a diff:

To complete the task we do:

 

 

Question 7 | KubeletConfiguration

 

Solve this question on: ssh cks8930

 

You're asked to update the cluster's KubeletConfiguration. Implement the following changes in the Kubeadm way that ensures new Nodes added to the cluster will receive the changes too:

  1. Set containerLogMaxSize to 5Mi

    Set containerLogMaxFiles to 3

  2. Apply the changes for the Kubelet on cks8930

  3. Apply the changes for the Kubelet on cks8930-node1. Connect with ssh cks8930-node1 from cks8930

 

ℹ️ Use sudo -i to become root which may be required for this question

 

Answer:

 

Step 1: Update Kubelet-Config ConfigMap

A cluster created with Kubeadm will have a ConfigMap named kubelet-config in Namespace kube-system. This ConfigMap will be used if new Nodes are added to the cluster. There is information about that process in the docs.

Let's find that ConfigMap and perform the requested changes:

Above we can see that we simply added the two new arguments to data.kubelet.

A new Node added to the cluster, both control plane and worker, would use this KubeletConfiguration containing the changes. That KubeletConfiguration from the ConfigMap will also be used during a kubeadm upgrade.

In the next steps we'll see that the Kubelet-Config of the control plane and worker node remain unchanged so far.

 

Step 2: Update Control Plane Kubelet-Config

To find the Kubelet-Config path we can check the Kubelet process:

Above we see it's specified via the argument --config=/var/lib/kubelet/config.yaml. We could also check the Kubeadm config for the Kubelet:

Above we see the argument --config being set. And we should see that our changes are still missing in that file:

We go ahead and download the latest Kubelet-Config, possible with --dry-run at first:

Sweet! Now we just need to restart the Kubelet:

 

(Optional) See the current Kubelet-Config of a Node

It is necessary to restart the Kubelet in order for updates in /var/lib/kubelet/config.yaml to take effect. We could verify this with (docs):

For Node cks8930-node1 the default values are still configured.

 

Step 3: Update Worker Node Kubelet-Config

We should see that the existing Kubelet-Config on the worker node is still unchanged:

So we go ahead and apply the updates:

And optionally for admins with trust issues (or the ones that might forget to restart the Kubelets):

Task completed.

 

 

Question 8 | CiliumNetworkPolicy

 

Solve this question on: ssh cks7262

 

In Namespace team-orange a Default-Allow strategy for all Namespace-internal traffic was chosen. There is an existing CiliumNetworkPolicy default-allow which assures this and which should not be altered. That policy also allows cluster internal DNS resolution.

Now it's time to deny and authenticate certain traffic. Create 3 CiliumNetworkPolicies in Namespace team-orange to implement the following requirements:

  1. Create a Layer 3 policy named p1 to:

    Deny outgoing traffic from Pods with label type=messenger to Pods behind Service database

  2. Create a Layer 4 policy named p2 to:

    Deny outgoing ICMP traffic from Deployment transmitter to Pods behind Service database

  3. Create a Layer 3 policy named p3 to:

    Enable Mutual Authentication for outgoing traffic from Pods with label type=database to Pods with label type=messenger

 

ℹ️ All Pods in the Namespace run plain Nginx images with open port 80. This allows simple connectivity tests like: k -n team-orange exec POD_NAME -- curl database

 

 

Answer:

A great way to inspect and learn writing NetworkPolices and CiliumNetworkPolicies is the Network Policy Editor, but it's not an allowed resource during the exam.

 

Overview

First we have a look at existing resources in Namespace team-orange:

These are the existing Pods and the Service we should work with. We can see that the database Service points to the database-0 Pod. And this is the existing default-allow policy:

CiliumNetworkPolicies behave like vanilla NetworkPolicies: once one egress rule exists, all other egress is forbidden. This is also the case for egressDeny rules: once one egressDeny rule exists, all other egress is also forbidden, unless allowed by an egress rule. This is why a Default-Allow policy like this one is necessary in this scenario. The behaviour explained above for egress is also the case for ingress.

 

Policy 1

Without any changes we check the connection from a type=messenger Pod to the Service database:

This works because of the K8s DNS resolution of the database Service, we should see the same result when using the Service IP:

This works, we just used the --head for curl to only show the HTTP response code which should be sufficient. And same should work if we contact the database-0 Pod IP directly:

Connectivity works without restriction. Now we create a deny policy as requested:

Let's test connection to the Service by name and IP:

Connection timing out. And we test connection to the database-0 Pod IP directly:

Also timing out. But do other connections still work? We try to contact a type=transmitter Pod:

Looks great.

 

Policy 2

Now we should prevent ICMP (Pings) from Deployment transmitter to Pods behind Service database. Before we do this we check that ICMP currently works:

Works. Now to restrict it:

Above we see that the ping command failed because we used the -w 2 to set a timeout. Policy works! But do other connections still work as they should?

We try to connect to the database Service and database-0 Pod which should still work because it's not ICMP:

Just as expected. And we try to connect to and ping a type=messenger Pod:

Awesome!

 

Policy 3

Now to the final policy:

Cilium ftw!

 

 

Question 9 | AppArmor Profile

 

Solve this question on: ssh cks7262

 

Some containers need to run more secure and restricted. There is an existing AppArmor profile located at /opt/course/9/profile on cks7262 for this.

  1. Install the AppArmor profile on Node cks7262-node1.

    Connect using ssh cks7262-node1 from cks7262

  2. Add label security=apparmor to the Node

  3. Create a Deployment named apparmor in Namespace default with:

    • One replica of image nginx:1.27.1

    • NodeSelector for security=apparmor

    • Single container named c1 with the AppArmor profile enabled only for this container

    The Pod might not run properly with the profile enabled. Write the logs of the Pod into /opt/course/9/logs on cks7262 so another team can work on getting the application running.

 

ℹ️ Use sudo -i to become root which may be required for this question

 

Answer:

https://kubernetes.io/docs/tutorials/clusters/apparmor

 

Step 1

First we have a look at the provided profile:

Very simple profile named very-secure which denies all file writes. Next we copy it onto the Node:

And install it:

Verify it has been installed:

There we see among many others the very-secure one, which is the name of the profile specified in /opt/course/9/profile.

 

Step 2

We label the Node:

 

Step 3

Now we can go ahead and create the Deployment which uses the profile.

What's the damage?

This looks alright, the Pod is running on cks7262-node1 because of the nodeSelector. The AppArmor profile simply denies all filesystem writes, but Nginx needs to write into some locations to run, hence the errors.

It looks like our profile is running but we can confirm this as well by inspecting the container directly on the worker node:

First we find the Pod by it's name and get the pod-id. Next we use crictl ps -a to also show stopped containers. Then crictl inspect shows that the container is using our AppArmor profile. Notice to be fast between ps and inspect because K8s will restart the Pod periodically when in error state.

To complete the task we write the logs into the required location:

Fixing the errors is the job of another team, lucky us.

 

 

Question 10 | Container Runtime Sandbox gVisor

 

Solve this question on: ssh cks7262

 

Team purple wants to run some of their workloads more secure. Worker node cks7262-node2 has containerd already configured to support the runsc/gvisor runtime.

Connect to the worker node using ssh cks7262-node2 from cks7262.

  1. Create a RuntimeClass named gvisor with handler runsc

  2. Create a Pod that uses the RuntimeClass. The Pod should be in Namespace team-purple, named gvisor-test and of image nginx:1.27.1. Ensure the Pod runs on cks7262-node2

  3. Write the output of the dmesg command of the successfully started Pod into /opt/course/10/gvisor-test-dmesg on cks7262

 

Answer:

We check the nodes and we can see that all are using containerd:

But, according to the question text, just one has containerd configured to work with runsc/gvisor runtime which is cks7262-node2.

(Optionally) we can ssh into the worker node and check if containerd+runsc is configured:

 

Step 1

Now we best head to the k8s docs for RuntimeClasses https://kubernetes.io/docs/concepts/containers/runtime-class, steal an example and create the gvisor one:

 

Step 2

And the required Pod:

After creating the pod we should check if it's running and if it uses the gvisor sandbox:

Looking deluxe.

 

Step 3

And as required we finally write the dmesg output into the file on cks7262:

 

 

Question 11 | Secrets in ETCD

 

Solve this question on: ssh cks7262

 

There is an existing Secret called database-access in Namespace team-green.

  1. Read the complete Secret content directly from ETCD (using etcdctl) and store it into /opt/course/11/etcd-secret-content on cks7262

  2. Write the plain and decoded Secret's value of key "pass" into /opt/course/11/database-password on cks7262

 

ℹ️ Use sudo -i to become root which may be required for this question

 

Answer:

Let's try to get the Secret value directly from ETCD, which will work since it isn't encrypted.

First, we ssh into the controlplane node where ETCD is running in this setup and check if etcdctl is installed and list it's options:

Among others we see arguments to identify ourselves. The apiserver connects to ETCD, so we can run the following command to get the path of the necessary .crt and .key files:

The output is as follows :

With this information we query ETCD for the secret value:

ETCD in Kubernetes stores data under /registry/{type}/{namespace}/{name}. This is how we came to look for /registry/secrets/team-green/database-access. There is also an example on a page in the k8s documentation which you could access during the exam.

The task requires to store the output on our terminal. For this we can simply copy&paste the content into the requested location /opt/course/11/etcd-secret-content on cks7262.

We're also required to store the plain and "decrypted" database password. For this we can copy the base64-encoded value from the ETCD output and run on our terminal:

 

 

Question 12 | Hack Secrets

 

Solve this question on: ssh cks3477

 

You're asked to investigate a possible permission escape using the pre-defined context. The context authenticates as user restricted which has only limited permissions and shouldn't be able to read Secret values.

  1. Switch to the restricted context with:

  2. Try to find the password-key values of the Secrets secret1, secret2 and secret3 in Namespace restricted using context restricted@infra-prod

  3. Write the decoded plaintext values into files /opt/course/12/secret1, /opt/course/12/secret2 and /opt/course/12/secret3 on cks3477

  4. Switch back to the default context with:

 

 

Answer:

First we should explore the boundaries, we can try:

No permissions to view RBAC resources. So we try the obvious:

We're not allowed to get or list any Secrets.

 

Secret 1

What can we see though?

There are some Pods, lets check these out regarding Secret access:

This output provides us with enough information to do:

 

Secret 2

And for the second Secret:

 

Secret 3

None of the Pods seem to mount secret3 though. Can we create or edit existing Pods to mount secret3?

Doesn't look like it.

But the Pods seem to be able to access the Secrets, we can try to use a Pod's ServiceAccount to access the third Secret. We can actually see (like using k -n restricted get pod -o yaml | grep automountServiceAccountToken) that only Pod pod3-* has the ServiceAccount token mounted:

 

ℹ️ You should have knowledge about ServiceAccounts and how they work with Pods like described in the docs

 

We can see all necessary information to contact the apiserver manually (described in the docs):

Let's encode it and write it into the requested location:

This will give us:

We hacked all Secrets! It can be tricky to get RBAC right and secure.

 

ℹ️ One thing to consider is that giving the permission to "list" Secrets, will also allow the user to read the Secret values like using kubectl get secrets -o yaml even without the "get" permission set.

 

Finally we switch back to the original context:

 

 

Question 13 | Restrict access to Metadata Server

 

Solve this question on: ssh cks3477

 

There is a metadata service available at http://192.168.100.21:32000 on which Nodes can reach sensitive data, like cloud credentials for initialisation. By default, all Pods in the cluster also have access to this endpoint. The DevSecOps team has asked you to restrict access to this metadata server.

In Namespace metadata-access:

  1. Create a NetworkPolicy named metadata-deny which prevents egress to 192.168.100.21 for all Pods but still allows access to everything else

  2. Create a NetworkPolicy named metadata-allow which allows Pods having label role: metadata-accessor to access endpoint 192.168.100.21

There are existing Pods in the target Namespace with which you can test your policies, but don't change their labels.

 

 

Answer:

 

ℹ️ Using a NetworkPolicy with ipBlock+except like done in our solution might cause security issues because of too open permissions that can't be further restricted. A better solution might be using a CiliumNetworkPolicy. Check the end of our solution for more information about this.

 

A great way to inspect and learn writing NetworkPolices is the Network Policy Editor, but it's not an allowed resource during the exam. Regarding Metadata Server security there was a famous hack at Shopify which was based on revealed information via metadata for Nodes.

 

Check metadata server

Check the Pods in the Namespace metadata-access and their labels:

There are three Pods in the Namespace and one of them has the label role=metadata-accessor.

Check access to the metadata server from the Pods:

All three are able to access the metadata server.

 

Step 1

To restrict the access, we create a NetworkPolicy to deny access to the specific IP.

 

ℹ️ You should know about general default-deny K8s NetworkPolcies.

 

Verify that access to the metadata server has been blocked:

But other endpoints are still reachable, like for example https://kubernetes.io:

Looking good.

 

Step 2

Now create another NetworkPolicy that allows access to the metadata server from Pods with label role=metadata-accessor.

Verify that required Pod has access to metadata endpoint and others do not:

It only works for the Pod having the label. With this we implemented the required security restrictions.

 

NetworkPolicy explanation

If a Pod doesn't have a matching NetworkPolicy then all traffic is allowed from and to it. Once a Pod has a matching NP then the contained rules are additive. This means that for Pods having label metadata-accessor the rules will be combined to:

We can see that the merged NP contains two separate rules with one condition each. We could read it as:

Hence it allows Pods with label metadata-accessor to access everything.

 

Security Implications of this solution

Using a NetworkPolicy with ipBlock+except like done in our solution might cause security issues because of too open permissions that can't be further restricted. Because with vanilla Kubernetes NetworkPolicies it's only possible to allow certain ingress/egress. Once one egress rule exists, all other egress is forbidden, same for ingress.

Let's say we want to restrict the NetworkPolicy metadata-deny further, how would that be possible? We already specified one egress rule which allows outgoing traffic to ALL IPs using 0.0.0.0/0, except one. If we now add another rule, all we can do is to allow more stuff:

Above we added one additional egress rule to allow outgoing connection into a certain Namespace. If only that new rule would exist, then all other egress would be forbidden. But because both egress rules exist it could be read as:

So once we allow egress/ingress using a too open ipBlock, we can't further restrict traffic which could be a big issue. A better solution might be for example using a CiliumNetworkPolicy which is able to define deny rules using egressDeny ([docs][https://doc.crds.dev/github.com/cilium/cilium/cilium.io/CiliumNetworkPolicy/v2]).

 

 

Question 14 | Syscall Activity

 

Solve this question on: ssh cks7262

 

There are Pods in Namespace team-yellow. A security investigation noticed that some processes running in these Pods are using the Syscall kill, which is forbidden by an internal policy of Team Yellow.

Find the offending Pod(s) and remove these by reducing the replicas of the parent Deployment to 0.

You can connect to the worker nodes using ssh cks7262-node1 and ssh cks7262-node2 from cks7262.

 

Answer:

Syscalls are used by processes running in Userspace to communicate with the Linux Kernel. There are many available syscalls: https://man7.org/linux/man-pages/man2/syscalls.2.html. It makes sense to restrict these for container processes and Docker/Containerd already restrict some by default, like the reboot Syscall. Restricting even more is possible for example using Seccomp or AppArmor.

 

Find processes of Pod

For this task we should simply find out which binary process executes a specific Syscall. Processes in containers are simply run on the same Linux operating system, but isolated. That's why we first check on which nodes the Pods are running:

All on cks7262-node1, hence we ssh into it and find the processes for the first Deployment collector1 .

  1. Using crictl pods we first searched for the Pods of Deployment collector1, which has two replicas

  2. We then took one pod-id to find it's containers using crictl ps

  3. And finally we used crictl inspect to find the process name, which is collector1-process.

    We can find the process PIDs (two because there are two Pods):

  4. Or we could check for the PID with crictl inspect:

We should only have to check one of the PIDs because it's the same kind of Pod, just a second replica of the Deployment.

 

Check Syscalls of collector1

Using the PIDs we can call strace to find Sycalls:

First try and already a catch! We see it uses the forbidden Syscall by calling kill(666, SIGTERM).

 

Check Syscalls of collector2

Next let's check the Deployment collector2 processes:

Looks alright.

 

Check Syscalls of collector3

What about the collector3 Deployment:

Also nothing about the forbidden Syscall.

 

Scale down Deployment

So we finish the task:

And the world is a bit safer again.

 

 

Question 15 | Configure TLS on Ingress

 

Solve this question on: ssh cks7262

 

In Namespace team-pink there is an existing Nginx Ingress resources named secure which accepts two paths /app and /api which point to different ClusterIP Services.

From your main terminal you can connect to it using for example:

Right now it uses a default generated TLS certificate by the Nginx Ingress Controller.

You're asked to instead use the key and certificate provided at /opt/course/15/tls.key and /opt/course/15/tls.crt. As it's a self-signed certificate you need to use curl -k when connecting to it.

 

Answer:

 

Investigate

We can get the IP address of the Ingress and we see it's the same one to which secure-ingress.test is pointing to:

Now, let's try to access the paths /app and /api via HTTP:

What about HTTPS?

HTTPS seems to be already working if we accept self-signed certificated using -k. But what kind of certificate is used by the server?

It seems to be "Kubernetes Ingress Controller Fake Certificate".

 

Implement own TLS certificate

First, let us generate a Secret using the provided key and certificate:

Now, we configure the Ingress to make use of this Secret:

After adding the changes we check the Ingress resource again:

It now actually lists port 443 for HTTPS. To verify:

We can see that the provided certificate is now being used by the Ingress for TLS termination. We still use curl -k because the provided certificate is self signed.

 

 

Question 16 | Docker Image Attack Surface

 

Solve this question on: ssh cks7262

 

There is a Deployment image-verify in Namespace team-blue which runs image registry.killer.sh:5000/image-verify:v1. DevSecOps has asked you to improve this image by:

  1. Changing the base image to alpine:3.12

  2. Not installing curl

  3. Updating nginx to use the version constraint >=1.18.0

  4. Running the main process as user myuser

Do not add any new lines to the Dockerfile, just edit existing ones. The file is located at /opt/course/16/image/Dockerfile.

Tag your version as v2. You can build, tag and push using:

Make the Deployment use your updated image tag v2.

 

Answer:

We should have a look at the Docker Image at first:

Very simple Dockerfile which seems to execute a script run.sh :

So it only outputs current date and credential information in a loop. We can see that output in the existing Deployment image-verify:

We see it's running as root.

Next we update the Dockerfile according to the requirements:

Then we build the new image:

We can then test our changes by running the container locally:

Looking good, so we push:

And we update the Deployment to use the new image:

And afterwards we can verify our changes by looking at the Pod logs:

Also to verify our changes even further:

Another task solved.

 

 

Question 17 | Audit Log Policy

 

Solve this question on: ssh cks3477

 

Audit Logging has been enabled in the cluster with an Audit Policy located at /etc/kubernetes/audit/policy.yaml on cks3477.

  1. Change the configuration so that only one backup of the logs is stored.

  2. Alter the Policy in a way that it only stores logs:

    • From Secret resources, level Metadata

    • From "system:nodes" userGroups, level RequestResponse

    After you altered the Policy make sure to empty the log file so it only contains entries according to your changes, like using echo > /etc/kubernetes/audit/logs/audit.log.

 

 

ℹ️ You can use jq to render json more readable, like cat data.json | jq

 

ℹ️ Use sudo -i to become root which may be required for this question

 

 

Answer:

 

Step 1

First we check the apiserver configuration and change as requested:

 

ℹ️ You should know how to enable Audit Logging completely yourself as described in the docs. Feel free to try this in another cluster in this environment.

 

Wait for the apiserver container to be restarted for example with:

 

Step 2

Now we look at the existing Policy:

We can see that this simple Policy logs everything on Metadata level. So we change it to the requirements:

After saving the changes we have to restart the apiserver:

That should be it.

 

Check the Audit Logs

Once the apiserver is running again we can check the new logs and scroll through some entries:

Above we logged a watch action by Kubelet for Secrets, level Metadata.

And in the one above we logged a get action by system:nodes for Nodes, level RequestResponse.

Because all JSON entries are written in a single line in the file we could also run some simple verifications on our Policy:

Looks like our job is done.

 

 

Question 18 | SBOM

 

Solve this question on: ssh cks8930

 

Your team received Software Bill Of Materials (SBOM) requests and you have been selected to generate some documents and scans:

  1. Using bom:

    Generate a SPDX-Json SBOM of image registry.k8s.io/kube-apiserver:v1.31.0

    Store it at /opt/course/18/sbom1.json on cks8930

  2. Using trivy:

    Generate a CycloneDX SBOM of image registry.k8s.io/kube-controller-manager:v1.31.0

    Store it at /opt/course/18/sbom2.json on cks8930

  3. Using trivy:

    Scan the existing SPDX-Json SBOM at /opt/course/18/sbom_check.json on cks8930 for known vulnerabilities. Save the result in Json format at /opt/course/18/sbom_check_result.json on cks8930

 

 

Answer:

SBOMs are like an ingredients list for food, just for software. So let's prepare something tasty!

 

Step 1: Create SBOM with Bom

The tool is https://github.com/kubernetes-sigs/bom.

We want to generate a new document and running bom generate should give us enough hints on how we can do this:

Now we can also specify the output at the required location:

Using bom document it's for example possible to visualize SBOMs as well as query them for information, could become handy!

 

Step 2: Create SBOM with Trivy

Trivy the security scanner can also create and work with SBOMs. The usage is similar to scanning images for vulnerabilities, which would be:

Here we can specify an output file and format:

 

Step 3: Scan SBOM with Trivy

With Trivy we can also scan SBOM documents instead of images directly, we do this with the provided file:

By default Trivy uses a human readable format, but we can change it to Json:

Above we can see the ArtifactName used for the report. Finally we export it to the required location:

Done.

 

 

Question 19 | Immutable Root FileSystem

 

Solve this question on: ssh cks7262

 

The Deployment immutable-deployment in Namespace team-purple should run immutable, it's created from file /opt/course/19/immutable-deployment.yaml on cks7262. Even after a successful break-in, it shouldn't be possible for an attacker to modify the filesystem of the running container.

  1. Modify the Deployment in a way that no processes inside the container can modify the local filesystem, only /tmp directory should be writeable. Don't modify the Docker image.

  2. Save the updated YAML under /opt/course/19/immutable-deployment-new.yaml on cks7262 and update the running Deployment.

 

Answer:

Processes in containers can write to the local filesystem by default. This increases the attack surface when a non-malicious process gets hijacked. Preventing applications to write to disk or only allowing to certain directories can mitigate the risk. If there is for example a bug in Nginx which allows an attacker to override any file inside the container, then this only works if the Nginx process itself can write to the filesystem in the first place.

Making the root filesystem readonly can be done in the Docker image itself or in a Pod declaration.

Let us first check the Deployment immutable-deployment in Namespace team-purple:

The container has write access to the Root File System, as there are no restrictions defined for the Pods or containers by an existing SecurityContext. And based on the task we're not allowed to alter the Docker image.

So we modify the YAML manifest to include the required changes:

SecurityContexts can be set on Pod or container level, here the latter was asked. Enforcing readOnlyRootFilesystem: true will render the root filesystem readonly. We can then allow some directories to be writable by using an emptyDir volume.

Once the changes are made, let us update the Deployment:

We can verify if the required changes are propagated:

The Deployment has been updated so that the container's file system is read-only, and the updated YAML has been placed under the required location. Sweet!

 

 

Question 20 | Update Kubernetes

 

Solve this question on: ssh cks8930

 

The cluster is running Kubernetes 1.30.5, update it to 1.31.1.

Use apt package manager and kubeadm for this.

Use ssh cks8930-node1 from cks8930 to connect to the worker node.

 

ℹ️ Use sudo -i to become root which may be required for this question

 

Answer:

Let's have a look at the current versions:

We're logged via ssh into the control plane.

 

Control Plane Components

First we should update the control plane components running on the controlplane node, so we drain it:

Next we check versions:

We see above that kubeadm is already installed in the required version. Otherwise we would need to install it:

Check what kubeadm has available as an upgrade plan:

And we apply to the required version:

Next we can check if our required version was installed correctly:

 

Control Plane kubelet and kubectl

Now we have to upgrade kubelet and kubectl:

Done, only uncordon missing:

 

Data Plane

Our data plane consist of one single worker node, so let's update it. First thing is we should drain it:

Next we ssh into it and upgrade kubeadm to the wanted version, or check if already done:

Now we follow what kubeadm told us in the last line and upgrade kubelet (and kubectl):

Looking good, what does the node status say?

Beautiful, let's make it schedulable again:

We're up to date.

 

 

Question 21 | Image Vulnerability Scanning

 

Solve this question on: ssh cks8930

 

The Vulnerability Scanner trivy is installed on your main terminal. Use it to scan the following images for known CVEs:

Write all images that don't contain the vulnerabilities CVE-2020-10878 or CVE-2020-1967 into /opt/course/21/good-images on cks8930.

 

Answer:

 

The tool trivy is very simple to use, it compares images against public databases.

To solve the task we can run:

The only image without the any of the two CVEs is docker.io/weaveworks/weave-kube:2.7.0, hence our answer will be:

 

 

Question 22 | Manual Static Security Analysis

 

Solve this question on: ssh cks8930

 

The Release Engineering Team has shared some YAML manifests and Dockerfiles with you to review. The files are located under /opt/course/22/files.

As a container security expert, you are asked to perform a manual static analysis and find out possible security issues with respect to unwanted credential exposure. Running processes as root is of no concern in this task.

Write the filenames which have issues into /opt/course/22/security-issues on cks8930.

 

ℹ️ In the Dockerfiles and YAML manifests, assume that the referred files, folders, secrets and volume mounts are present. Disregard syntax or logic errors.

 

Answer:

We check location /opt/course/22/files and list the files.

We have 3 Dockerfiles and 7 Kubernetes Resource YAML manifests. Next we should go over each to find security issues with the way credentials have been used.

 

ℹ️ You should be comfortable with Docker Best Practices and the Kubernetes Configuration Best Practices.

 

While navigating through the files we might notice:

 

Number 1

File Dockerfile-mysql might look innocent on first look. It copies a file secret-token over, uses it and deletes it afterwards. But because of the way Docker works, every RUN, COPY and ADD command creates a new layer and every layer is persistet in the image.

This means even if the file secret-token get's deleted in layer Z, it's still included with the image in layer X and Y. In this case it would be better to use for example variables passed to Docker.

So we do:

 

Number 2

The file deployment-redis.yaml is fetching credentials from a Secret named mysecret and writes these into environment variables. So far so good, but in the command of the container it's echoing these which can be directly read by any user having access to the logs.

Credentials in logs is never a good idea, hence we do:

 

Number 3

In file statefulset-nginx.yaml, the password is directly exposed in the environment variable definition of the container.

This should better be injected via a Secret. So we do:

 

 

Question 23 | ImagePolicyWebhook

 

Solve this question on: ssh cks4024

 

Team White created an ImagePolicyWebhook solution at /opt/course/23/webhook on cks4024 which needs to be enabled for the cluster. There is an existing and working webhook-backend Service in Namespace team-white which will be the ImagePolicyWebhook backend.

 

  1. Create an AdmissionConfiguration at /opt/course/23/webhook/admission-config.yaml which contains the following ImagePolicyWebhook configuration in the same file:

  2. Configure the apiserver to:

    • Mount /opt/course/23/webhook at /etc/kubernetes/webhook

    • Use the AdmissionConfiguration at path /etc/kubernetes/webhook/admission-config.yaml

    • Enable the ImagePolicyWebhook admission plugin

As result the ImagePolicyWebhook backend should prevent container images containing danger-danger from being used, any other image should still work.

 

ℹ️ Create a backup of /etc/kubernetes/manifests/kube-apiserver.yaml outside of /etc/kubernetes/manifests so you can revert back in case of issues

 

ℹ️ Use sudo -i to become root which may be required for this question

 

 

Answer:

 

The ImagePolicyWebhook is a Kubernetes Admission Controller which allows a backend to make admission decisions. According to the question that backend exists already and is working, let's have a short look:

The idea is to let the apiserver know it should contact that webhook-backend before any Pod is created and only if it receives a success-response the Pod will be created. We can see the Service IP is 10.111.10.111 and somehow we need to tell that to the apiserver.

Here we see a KubeConfig formatted file which the apiserver will use to contact the webhook-backend via specified URL server: https://10.111.10.111, which is the Service IP we noticed earlier. In addition we have a certificate at path certificate-authority: /etc/kubernetes/webhook/webhook-backend.crt which is used by the apiserver to communicate with the backend.

 

Step 1

We create the AdmissionConfiguration which contains the provided ImagePolicyWebhook config in the same file:

This should already be the solution for that step. Note that it's also possible to specify a path inside the AdmissionConfiguration pointing to a different file containing the ImagePolicyWebhook:

 

Step 2

We now register the AdmissionConfiguration with the apiserver. And before we do so we should probably create a backup so we can revert back easy:

 

ℹ️ Create a backup always outside of /etc/kubernetes/manifests so the kubelet won't try to create the backup file as a static Pod

 

If there is no existing --enable-admission-plugins argument then we need to create it, otherwise we can expand it as done above.

We create a hostPath volume of /opt/course/23/webhook and mount it to /etc/kubernetes/webhook inside the apiserver container. This way we can then reference /etc/kubernetes/webhook/admission-config.yaml using the --admission-control-config-file argument. Also this means that the provided path /etc/kubernetes/webhook/webhook.yaml in /opt/course/23/webhook/admission-config.yaml will work.

After we saved the changes we need to wait for the apiserver container to be restarted, this can take a minute:

 

Errors

In case the apiserver doesn't restart, or gets restarted over and over again, we should check the errors logs in /var/log/pods/ to investigate any misconfiguration.

If there are no logs available we could also check the kubelet logs in /var/log/syslog or journalctl -u kubelet.

If the apiserver comes back up and there are no errors but the webhook just doesn't work then it could be a connection issue. Because the ImagePolicyWebhook config has setting defaultAllow: true, a connection issue between apiserver and webhook-backend would allow all Pods. We should see information about this in the apiserver logs or kubectl get events -A.

 

Result

Now we can simply try to create a Pod with a forbidden image and one with a still allowed one:

The webhook-backend used in this scenario also outputs some log messages every time it receives a request from the apiserver:

In this case we see that the webhook-backend received three requests for Pod admissions:

  1. registry.k8s.io/kube-apiserver:v1.30.1

  2. something/danger-danger

  3. nginx:alpine

Even before we created the two test Pods, the backend received a request to check the container image of the kube-apiserver itself. This is why misconfigurations can become quite dangerous for the whole cluster if even Kubernetes internal or CNI Pods are prevented from being created.