This is the multi-page printable view of this section. Click here to print.
Configure Pods and Containers
- 1: Assign Memory Resources to Containers and Pods
- 2: Assign CPU Resources to Containers and Pods
- 3: Configure GMSA for Windows Pods and containers
- 4: Resize CPU and Memory Resources assigned to Containers
- 5: Configure RunAsUserName for Windows pods and containers
- 6: Create a Windows HostProcess Pod
- 7: Configure Quality of Service for Pods
- 8: Assign Extended Resources to a Container
- 9: Configure a Pod to Use a Volume for Storage
- 10: Configure a Pod to Use a PersistentVolume for Storage
- 11: Configure a Pod to Use a Projected Volume for Storage
- 12: Configure a Security Context for a Pod or Container
- 13: Configure Service Accounts for Pods
- 14: Pull an Image from a Private Registry
- 15: Configure Liveness, Readiness and Startup Probes
- 16: Assign Pods to Nodes
- 17: Assign Pods to Nodes using Node Affinity
- 18: Configure Pod Initialization
- 19: Attach Handlers to Container Lifecycle Events
- 20: Configure a Pod to Use a ConfigMap
- 21: Share Process Namespace between Containers in a Pod
- 22: Use a User Namespace With a Pod
- 23: Create static Pods
- 24: Translate a Docker Compose File to Kubernetes Resources
- 25: Enforce Pod Security Standards by Configuring the Built-in Admission Controller
- 26: Enforce Pod Security Standards with Namespace Labels
- 27: Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller
1 - Assign Memory Resources to Containers and Pods
This page shows how to assign a memory request and a memory limit to a Container. A Container is guaranteed to have as much memory as it requests, but is not allowed to use more memory than its limit.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Each node in your cluster must have at least 300 MiB of memory.
A few of the steps on this page require you to run the metrics-server service in your cluster. If you have the metrics-server running, you can skip those steps.
If you are running Minikube, run the following command to enable the metrics-server:
minikube addons enable metrics-server
To see whether the metrics-server is running, or another provider of the resource metrics
API (metrics.k8s.io
), run the following command:
kubectl get apiservices
If the resource metrics API is available, the output includes a
reference to metrics.k8s.io
.
NAME
v1beta1.metrics.k8s.io
Create a namespace
Create a namespace so that the resources you create in this exercise are isolated from the rest of your cluster.
kubectl create namespace mem-example
Specify a memory request and a memory limit
To specify a memory request for a Container, include the resources:requests
field
in the Container's resource manifest. To specify a memory limit, include resources:limits
.
In this exercise, you create a Pod that has one Container. The Container has a memory request of 100 MiB and a memory limit of 200 MiB. Here's the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
namespace: mem-example
spec:
containers:
- name: memory-demo-ctr
image: polinux/stress
resources:
requests:
memory: "100Mi"
limits:
memory: "200Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]
The args
section in the configuration file provides arguments for the Container when it starts.
The "--vm-bytes", "150M"
arguments tell the Container to attempt to allocate 150 MiB of memory.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit.yaml --namespace=mem-example
Verify that the Pod Container is running:
kubectl get pod memory-demo --namespace=mem-example
View detailed information about the Pod:
kubectl get pod memory-demo --output=yaml --namespace=mem-example
The output shows that the one Container in the Pod has a memory request of 100 MiB and a memory limit of 200 MiB.
...
resources:
requests:
memory: 100Mi
limits:
memory: 200Mi
...
Run kubectl top
to fetch the metrics for the pod:
kubectl top pod memory-demo --namespace=mem-example
The output shows that the Pod is using about 162,900,000 bytes of memory, which is about 150 MiB. This is greater than the Pod's 100 MiB request, but within the Pod's 200 MiB limit.
NAME CPU(cores) MEMORY(bytes)
memory-demo <something> 162856960
Delete your Pod:
kubectl delete pod memory-demo --namespace=mem-example
Exceed a Container's memory limit
A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.
In this exercise, you create a Pod that attempts to allocate more memory than its limit. Here is the configuration file for a Pod that has one Container with a memory request of 50 MiB and a memory limit of 100 MiB:
apiVersion: v1
kind: Pod
metadata:
name: memory-demo-2
namespace: mem-example
spec:
containers:
- name: memory-demo-2-ctr
image: polinux/stress
resources:
requests:
memory: "50Mi"
limits:
memory: "100Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]
In the args
section of the configuration file, you can see that the Container
will attempt to allocate 250 MiB of memory, which is well above the 100 MiB limit.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-2.yaml --namespace=mem-example
View detailed information about the Pod:
kubectl get pod memory-demo-2 --namespace=mem-example
At this point, the Container might be running or killed. Repeat the preceding command until the Container is killed:
NAME READY STATUS RESTARTS AGE
memory-demo-2 0/1 OOMKilled 1 24s
Get a more detailed view of the Container status:
kubectl get pod memory-demo-2 --output=yaml --namespace=mem-example
The output shows that the Container was killed because it is out of memory (OOM):
lastState:
terminated:
containerID: 65183c1877aaec2e8427bc95609cc52677a454b56fcb24340dbd22917c23b10f
exitCode: 137
finishedAt: 2017-06-20T20:52:19Z
reason: OOMKilled
startedAt: null
The Container in this exercise can be restarted, so the kubelet restarts it. Repeat this command several times to see that the Container is repeatedly killed and restarted:
kubectl get pod memory-demo-2 --namespace=mem-example
The output shows that the Container is killed, restarted, killed again, restarted again, and so on:
kubectl get pod memory-demo-2 --namespace=mem-example
NAME READY STATUS RESTARTS AGE
memory-demo-2 0/1 OOMKilled 1 37s
kubectl get pod memory-demo-2 --namespace=mem-example
NAME READY STATUS RESTARTS AGE
memory-demo-2 1/1 Running 2 40s
View detailed information about the Pod history:
kubectl describe pod memory-demo-2 --namespace=mem-example
The output shows that the Container starts and fails repeatedly:
... Normal Created Created container with id 66a3a20aa7980e61be4922780bf9d24d1a1d8b7395c09861225b0eba1b1f8511
... Warning BackOff Back-off restarting failed container
View detailed information about your cluster's Nodes:
kubectl describe nodes
The output includes a record of the Container being killed because of an out-of-memory condition:
Warning OOMKilling Memory cgroup out of memory: Kill process 4481 (stress) score 1994 or sacrifice child
Delete your Pod:
kubectl delete pod memory-demo-2 --namespace=mem-example
Specify a memory request that is too big for your Nodes
Memory requests and limits are associated with Containers, but it is useful to think of a Pod as having a memory request and limit. The memory request for the Pod is the sum of the memory requests for all the Containers in the Pod. Likewise, the memory limit for the Pod is the sum of the limits of all the Containers in the Pod.
Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough available memory to satisfy the Pod's memory request.
In this exercise, you create a Pod that has a memory request so big that it exceeds the capacity of any Node in your cluster. Here is the configuration file for a Pod that has one Container with a request for 1000 GiB of memory, which likely exceeds the capacity of any Node in your cluster.
apiVersion: v1
kind: Pod
metadata:
name: memory-demo-3
namespace: mem-example
spec:
containers:
- name: memory-demo-3-ctr
image: polinux/stress
resources:
requests:
memory: "1000Gi"
limits:
memory: "1000Gi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-3.yaml --namespace=mem-example
View the Pod status:
kubectl get pod memory-demo-3 --namespace=mem-example
The output shows that the Pod status is PENDING. That is, the Pod is not scheduled to run on any Node, and it will remain in the PENDING state indefinitely:
kubectl get pod memory-demo-3 --namespace=mem-example
NAME READY STATUS RESTARTS AGE
memory-demo-3 0/1 Pending 0 25s
View detailed information about the Pod, including events:
kubectl describe pod memory-demo-3 --namespace=mem-example
The output shows that the Container cannot be scheduled because of insufficient memory on the Nodes:
Events:
... Reason Message
------ -------
... FailedScheduling No nodes are available that match all of the following predicates:: Insufficient memory (3).
Memory units
The memory resource is measured in bytes. You can express memory as a plain integer or a fixed-point integer with one of these suffixes: E, P, T, G, M, K, Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent approximately the same value:
128974848, 129e6, 129M, 123Mi
Delete your Pod:
kubectl delete pod memory-demo-3 --namespace=mem-example
If you do not specify a memory limit
If you do not specify a memory limit for a Container, one of the following situations applies:
-
The Container has no upper bound on the amount of memory it uses. The Container could use all of the memory available on the Node where it is running which in turn could invoke the OOM Killer. Further, in case of an OOM Kill, a container with no resource limits will have a greater chance of being killed.
-
The Container is running in a namespace that has a default memory limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the memory limit.
Motivation for memory requests and limits
By configuring memory requests and limits for the Containers that run in your cluster, you can make efficient use of the memory resources available on your cluster's Nodes. By keeping a Pod's memory request low, you give the Pod a good chance of being scheduled. By having a memory limit that is greater than the memory request, you accomplish two things:
- The Pod can have bursts of activity where it makes use of memory that happens to be available.
- The amount of memory a Pod can use during a burst is limited to some reasonable amount.
Clean up
Delete your namespace. This deletes all the Pods that you created for this task:
kubectl delete namespace mem-example
What's next
For app developers
For cluster administrators
2 - Assign CPU Resources to Containers and Pods
This page shows how to assign a CPU request and a CPU limit to a container. Containers cannot use more CPU than the configured limit. Provided the system has CPU time free, a container is guaranteed to be allocated as much CPU as it requests.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Your cluster must have at least 1 CPU available for use to run the task examples.
A few of the steps on this page require you to run the metrics-server service in your cluster. If you have the metrics-server running, you can skip those steps.
If you are running Minikube, run the following command to enable metrics-server:
minikube addons enable metrics-server
To see whether metrics-server (or another provider of the resource metrics
API, metrics.k8s.io
) is running, type the following command:
kubectl get apiservices
If the resource metrics API is available, the output will include a
reference to metrics.k8s.io
.
NAME
v1beta1.metrics.k8s.io
Create a namespace
Create a Namespace so that the resources you create in this exercise are isolated from the rest of your cluster.
kubectl create namespace cpu-example
Specify a CPU request and a CPU limit
To specify a CPU request for a container, include the resources:requests
field
in the Container resource manifest. To specify a CPU limit, include resources:limits
.
In this exercise, you create a Pod that has one container. The container has a request of 0.5 CPU and a limit of 1 CPU. Here is the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: cpu-demo
namespace: cpu-example
spec:
containers:
- name: cpu-demo-ctr
image: vish/stress
resources:
limits:
cpu: "1"
requests:
cpu: "0.5"
args:
- -cpus
- "2"
The args
section of the configuration file provides arguments for the container when it starts.
The -cpus "2"
argument tells the Container to attempt to use 2 CPUs.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/resource/cpu-request-limit.yaml --namespace=cpu-example
Verify that the Pod is running:
kubectl get pod cpu-demo --namespace=cpu-example
View detailed information about the Pod:
kubectl get pod cpu-demo --output=yaml --namespace=cpu-example
The output shows that the one container in the Pod has a CPU request of 500 milliCPU and a CPU limit of 1 CPU.
resources:
limits:
cpu: "1"
requests:
cpu: 500m
Use kubectl top
to fetch the metrics for the Pod:
kubectl top pod cpu-demo --namespace=cpu-example
This example output shows that the Pod is using 974 milliCPU, which is slightly less than the limit of 1 CPU specified in the Pod configuration.
NAME CPU(cores) MEMORY(bytes)
cpu-demo 974m <something>
Recall that by setting -cpu "2"
, you configured the Container to attempt to use 2 CPUs, but the Container is only being allowed to use about 1 CPU. The container's CPU use is being throttled, because the container is attempting to use more CPU resources than its limit.
Note:
Another possible explanation for the CPU use being below 1.0 is that the Node might not have enough CPU resources available. Recall that the prerequisites for this exercise require your cluster to have at least 1 CPU available for use. If your Container runs on a Node that has only 1 CPU, the Container cannot use more than 1 CPU regardless of the CPU limit specified for the Container.CPU units
The CPU resource is measured in CPU units. One CPU, in Kubernetes, is equivalent to:
- 1 AWS vCPU
- 1 GCP Core
- 1 Azure vCore
- 1 Hyperthread on a bare-metal Intel processor with Hyperthreading
Fractional values are allowed. A Container that requests 0.5 CPU is guaranteed half as much CPU as a Container that requests 1 CPU. You can use the suffix m to mean milli. For example 100m CPU, 100 milliCPU, and 0.1 CPU are all the same. Precision finer than 1m is not allowed.
CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.
Delete your Pod:
kubectl delete pod cpu-demo --namespace=cpu-example
Specify a CPU request that is too big for your Nodes
CPU requests and limits are associated with Containers, but it is useful to think of a Pod as having a CPU request and limit. The CPU request for a Pod is the sum of the CPU requests for all the Containers in the Pod. Likewise, the CPU limit for a Pod is the sum of the CPU limits for all the Containers in the Pod.
Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough CPU resources available to satisfy the Pod CPU request.
In this exercise, you create a Pod that has a CPU request so big that it exceeds the capacity of any Node in your cluster. Here is the configuration file for a Pod that has one Container. The Container requests 100 CPU, which is likely to exceed the capacity of any Node in your cluster.
apiVersion: v1
kind: Pod
metadata:
name: cpu-demo-2
namespace: cpu-example
spec:
containers:
- name: cpu-demo-ctr-2
image: vish/stress
resources:
limits:
cpu: "100"
requests:
cpu: "100"
args:
- -cpus
- "2"
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/resource/cpu-request-limit-2.yaml --namespace=cpu-example
View the Pod status:
kubectl get pod cpu-demo-2 --namespace=cpu-example
The output shows that the Pod status is Pending. That is, the Pod has not been scheduled to run on any Node, and it will remain in the Pending state indefinitely:
NAME READY STATUS RESTARTS AGE
cpu-demo-2 0/1 Pending 0 7m
View detailed information about the Pod, including events:
kubectl describe pod cpu-demo-2 --namespace=cpu-example
The output shows that the Container cannot be scheduled because of insufficient CPU resources on the Nodes:
Events:
Reason Message
------ -------
FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (3).
Delete your Pod:
kubectl delete pod cpu-demo-2 --namespace=cpu-example
If you do not specify a CPU limit
If you do not specify a CPU limit for a Container, then one of these situations applies:
-
The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running.
-
The Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.
If you specify a CPU limit but do not specify a CPU request
If you specify a CPU limit for a Container but do not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit. Similarly, if a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit.
Motivation for CPU requests and limits
By configuring the CPU requests and limits of the Containers that run in your cluster, you can make efficient use of the CPU resources available on your cluster Nodes. By keeping a Pod CPU request low, you give the Pod a good chance of being scheduled. By having a CPU limit that is greater than the CPU request, you accomplish two things:
- The Pod can have bursts of activity where it makes use of CPU resources that happen to be available.
- The amount of CPU resources a Pod can use during a burst is limited to some reasonable amount.
Clean up
Delete your namespace:
kubectl delete namespace cpu-example
What's next
For app developers
For cluster administrators
3 - Configure GMSA for Windows Pods and containers
Kubernetes v1.18 [stable]
This page shows how to configure Group Managed Service Accounts (GMSA) for Pods and containers that will run on Windows nodes. Group Managed Service Accounts are a specific type of Active Directory account that provides automatic password management, simplified service principal name (SPN) management, and the ability to delegate the management to other administrators across multiple servers.
In Kubernetes, GMSA credential specs are configured at a Kubernetes cluster-wide scope as Custom Resources. Windows Pods, as well as individual containers within a Pod, can be configured to use a GMSA for domain based functions (e.g. Kerberos authentication) when interacting with other Windows services.
Before you begin
You need to have a Kubernetes cluster and the kubectl
command-line tool must be
configured to communicate with your cluster. The cluster is expected to have Windows worker nodes.
This section covers a set of initial steps required once for each cluster:
Install the GMSACredentialSpec CRD
A CustomResourceDefinition(CRD)
for GMSA credential spec resources needs to be configured on the cluster to define
the custom resource type GMSACredentialSpec
. Download the GMSA CRD
YAML
and save it as gmsa-crd.yaml. Next, install the CRD with kubectl apply -f gmsa-crd.yaml
Install webhooks to validate GMSA users
Two webhooks need to be configured on the Kubernetes cluster to populate and validate GMSA credential spec references at the Pod or container level:
-
A mutating webhook that expands references to GMSAs (by name from a Pod specification) into the full credential spec in JSON form within the Pod spec.
-
A validating webhook ensures all references to GMSAs are authorized to be used by the Pod service account.
Installing the above webhooks and associated objects require the steps below:
-
Create a certificate key pair (that will be used to allow the webhook container to communicate to the cluster)
-
Install a secret with the certificate from above.
-
Create a deployment for the core webhook logic.
-
Create the validating and mutating webhook configurations referring to the deployment.
A script
can be used to deploy and configure the GMSA webhooks and associated objects
mentioned above. The script can be run with a --dry-run=server
option to
allow you to review the changes that would be made to your cluster.
The YAML template used by the script may also be used to deploy the webhooks and associated objects manually (with appropriate substitutions for the parameters)
Configure GMSAs and Windows nodes in Active Directory
Before Pods in Kubernetes can be configured to use GMSAs, the desired GMSAs need to be provisioned in Active Directory as described in the Windows GMSA documentation. Windows worker nodes (that are part of the Kubernetes cluster) need to be configured in Active Directory to access the secret credentials associated with the desired GMSA as described in the Windows GMSA documentation.
Create GMSA credential spec resources
With the GMSACredentialSpec CRD installed (as described earlier), custom resources containing GMSA credential specs can be configured. The GMSA credential spec does not contain secret or sensitive data. It is information that a container runtime can use to describe the desired GMSA of a container to Windows. GMSA credential specs can be generated in YAML format with a utility PowerShell script.
Following are the steps for generating a GMSA credential spec YAML manually in JSON format and then converting it:
-
Import the CredentialSpec module:
ipmo CredentialSpec.psm1
-
Create a credential spec in JSON format using
New-CredentialSpec
. To create a GMSA credential spec named WebApp1, invokeNew-CredentialSpec -Name WebApp1 -AccountName WebApp1 -Domain $(Get-ADDomain -Current LocalComputer)
-
Use
Get-CredentialSpec
to show the path of the JSON file. -
Convert the credspec file from JSON to YAML format and apply the necessary header fields
apiVersion
,kind
,metadata
andcredspec
to make it a GMSACredentialSpec custom resource that can be configured in Kubernetes.
The following YAML configuration describes a GMSA credential spec named gmsa-WebApp1
:
apiVersion: windows.k8s.io/v1
kind: GMSACredentialSpec
metadata:
name: gmsa-WebApp1 # This is an arbitrary name but it will be used as a reference
credspec:
ActiveDirectoryConfig:
GroupManagedServiceAccounts:
- Name: WebApp1 # Username of the GMSA account
Scope: CONTOSO # NETBIOS Domain Name
- Name: WebApp1 # Username of the GMSA account
Scope: contoso.com # DNS Domain Name
CmsPlugins:
- ActiveDirectory
DomainJoinConfig:
DnsName: contoso.com # DNS Domain Name
DnsTreeName: contoso.com # DNS Domain Name Root
Guid: 244818ae-87ac-4fcd-92ec-e79e5252348a # GUID of the Domain
MachineAccountName: WebApp1 # Username of the GMSA account
NetBiosName: CONTOSO # NETBIOS Domain Name
Sid: S-1-5-21-2126449477-2524075714-3094792973 # SID of the Domain
The above credential spec resource may be saved as gmsa-Webapp1-credspec.yaml
and applied to the cluster using: kubectl apply -f gmsa-Webapp1-credspec.yml
Configure cluster role to enable RBAC on specific GMSA credential specs
A cluster role needs to be defined for each GMSA credential spec resource. This
authorizes the use
verb on a specific GMSA resource by a subject which is typically
a service account. The following example shows a cluster role that authorizes usage
of the gmsa-WebApp1
credential spec from above. Save the file as gmsa-webapp1-role.yaml
and apply using kubectl apply -f gmsa-webapp1-role.yaml
# Create the Role to read the credspec
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: webapp1-role
rules:
- apiGroups: ["windows.k8s.io"]
resources: ["gmsacredentialspecs"]
verbs: ["use"]
resourceNames: ["gmsa-WebApp1"]
Assign role to service accounts to use specific GMSA credspecs
A service account (that Pods will be configured with) needs to be bound to the
cluster role create above. This authorizes the service account to use the desired
GMSA credential spec resource. The following shows the default service account
being bound to a cluster role webapp1-role
to use gmsa-WebApp1
credential spec resource created above.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-default-svc-account-read-on-gmsa-WebApp1
namespace: default
subjects:
- kind: ServiceAccount
name: default
namespace: default
roleRef:
kind: ClusterRole
name: webapp1-role
apiGroup: rbac.authorization.k8s.io
Configure GMSA credential spec reference in Pod spec
The Pod spec field securityContext.windowsOptions.gmsaCredentialSpecName
is used to
specify references to desired GMSA credential spec custom resources in Pod specs.
This configures all containers in the Pod spec to use the specified GMSA. A sample
Pod spec with the annotation populated to refer to gmsa-WebApp1
:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: with-creds
name: with-creds
namespace: default
spec:
replicas: 1
selector:
matchLabels:
run: with-creds
template:
metadata:
labels:
run: with-creds
spec:
securityContext:
windowsOptions:
gmsaCredentialSpecName: gmsa-webapp1
containers:
- image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
imagePullPolicy: Always
name: iis
nodeSelector:
kubernetes.io/os: windows
Individual containers in a Pod spec can also specify the desired GMSA credspec
using a per-container securityContext.windowsOptions.gmsaCredentialSpecName
field. For example:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: with-creds
name: with-creds
namespace: default
spec:
replicas: 1
selector:
matchLabels:
run: with-creds
template:
metadata:
labels:
run: with-creds
spec:
containers:
- image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
imagePullPolicy: Always
name: iis
securityContext:
windowsOptions:
gmsaCredentialSpecName: gmsa-Webapp1
nodeSelector:
kubernetes.io/os: windows
As Pod specs with GMSA fields populated (as described above) are applied in a cluster, the following sequence of events take place:
-
The mutating webhook resolves and expands all references to GMSA credential spec resources to the contents of the GMSA credential spec.
-
The validating webhook ensures the service account associated with the Pod is authorized for the
use
verb on the specified GMSA credential spec. -
The container runtime configures each Windows container with the specified GMSA credential spec so that the container can assume the identity of the GMSA in Active Directory and access services in the domain using that identity.
Authenticating to network shares using hostname or FQDN
If you are experiencing issues connecting to SMB shares from Pods using hostname or FQDN, but are able to access the shares via their IPv4 address then make sure the following registry key is set on the Windows nodes.
reg add "HKLM\SYSTEM\CurrentControlSet\Services\hns\State" /v EnableCompartmentNamespace /t REG_DWORD /d 1
Running Pods will then need to be recreated to pick up the behavior changes. More information on how this registry key is used can be found here
Troubleshooting
If you are having difficulties getting GMSA to work in your environment, there are a few troubleshooting steps you can take.
First, make sure the credspec has been passed to the Pod. To do this you will need
to exec
into one of your Pods and check the output of the nltest.exe /parentdomain
command.
In the example below the Pod did not get the credspec correctly:
kubectl exec -it iis-auth-7776966999-n5nzr powershell.exe
nltest.exe /parentdomain
results in the following error:
Getting parent domain failed: Status = 1722 0x6ba RPC_S_SERVER_UNAVAILABLE
If your Pod did get the credspec correctly, then next check communication with the domain. First, from inside of your Pod, quickly do an nslookup to find the root of your domain.
This will tell us 3 things:
- The Pod can reach the DC
- The DC can reach the Pod
- DNS is working correctly.
If the DNS and communication test passes, next you will need to check if the Pod has
established secure channel communication with the domain. To do this, again,
exec
into your Pod and run the nltest.exe /query
command.
nltest.exe /query
Results in the following output:
I_NetLogonControl failed: Status = 1722 0x6ba RPC_S_SERVER_UNAVAILABLE
This tells us that for some reason, the Pod was unable to logon to the domain using the account specified in the credspec. You can try to repair the secure channel by running the following:
nltest /sc_reset:domain.example
If the command is successful you will see and output similar to this:
Flags: 30 HAS_IP HAS_TIMESERV
Trusted DC Name \\dc10.domain.example
Trusted DC Connection Status Status = 0 0x0 NERR_Success
The command completed successfully
If the above corrects the error, you can automate the step by adding the following lifecycle hook to your Pod spec. If it did not correct the error, you will need to examine your credspec again and confirm that it is correct and complete.
image: registry.domain.example/iis-auth:1809v1
lifecycle:
postStart:
exec:
command: ["powershell.exe","-command","do { Restart-Service -Name netlogon } while ( $($Result = (nltest.exe /query); if ($Result -like '*0x0 NERR_Success*') {return $true} else {return $false}) -eq $false)"]
imagePullPolicy: IfNotPresent
If you add the lifecycle
section show above to your Pod spec, the Pod will execute
the commands listed to restart the netlogon
service until the nltest.exe /query
command exits without error.
4 - Resize CPU and Memory Resources assigned to Containers
Kubernetes v1.27 [alpha]
This page assumes that you are familiar with Quality of Service for Kubernetes Pods.
This page shows how to resize CPU and memory resources assigned to containers
of a running pod without restarting the pod or its containers. A Kubernetes node
allocates resources for a pod based on its requests
, and restricts the pod's
resource usage based on the limits
specified in the pod's containers.
Changing the resource allocation for a running Pod requires the
InPlacePodVerticalScaling
feature gate
to be enabled. The alternative is to delete the Pod and let the
workload controller make a replacement Pod
that has a different resource requirement.
For in-place resize of pod resources:
- Container's resource
requests
andlimits
are mutable for CPU and memory resources. allocatedResources
field incontainerStatuses
of the Pod's status reflects the resources allocated to the pod's containers.resources
field incontainerStatuses
of the Pod's status reflects the actual resourcerequests
andlimits
that are configured on the running containers as reported by the container runtime.resize
field in the Pod's status shows the status of the last requested pending resize. It can have the following values:Proposed
: This value indicates an acknowledgement of the requested resize and that the request was validated and recorded.InProgress
: This value indicates that the node has accepted the resize request and is in the process of applying it to the pod's containers.Deferred
: This value means that the requested resize cannot be granted at this time, and the node will keep retrying. The resize may be granted when other pods leave and free up node resources.Infeasible
: is a signal that the node cannot accommodate the requested resize. This can happen if the requested resize exceeds the maximum resources the node can ever allocate for a pod.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Your Kubernetes server must be at or later than version 1.27. To check the version, enterkubectl version
.
The InPlacePodVerticalScaling
feature gate must be enabled
for your control plane and for all nodes in your cluster.
Container Resize Policies
Resize policies allow for a more fine-grained control over how pod's containers are resized for CPU and memory resources. For example, the container's application may be able to handle CPU resources resized without being restarted, but resizing memory may require that the application hence the containers be restarted.
To enable this, the Container specification allows users to specify a resizePolicy
.
The following restart policies can be specified for resizing CPU and memory:
NotRequired
: Resize the container's resources while it is running.RestartContainer
: Restart the container and apply new resources upon restart.
If resizePolicy[*].restartPolicy
is not specified, it defaults to NotRequired
.
Note:
If the Pod'srestartPolicy
is Never
, container's resize restart policy must be
set to NotRequired
for all Containers in the Pod.Below example shows a Pod whose Container's CPU can be resized without restart, but resizing memory requires the container to be restarted.
apiVersion: v1
kind: Pod
metadata:
name: qos-demo-5
namespace: qos-example
spec:
containers:
- name: qos-demo-ctr-5
image: nginx
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: RestartContainer
resources:
limits:
memory: "200Mi"
cpu: "700m"
requests:
memory: "200Mi"
cpu: "700m"
Note:
In the above example, if desired requests or limits for both CPU and memory have changed, the container will be restarted in order to resize its memory.Create a pod with resource requests and limits
You can create a Guaranteed or Burstable Quality of Service class pod by specifying requests and/or limits for a pod's containers.
Consider the following manifest for a Pod that has one Container.
apiVersion: v1
kind: Pod
metadata:
name: qos-demo-5
namespace: qos-example
spec:
containers:
- name: qos-demo-ctr-5
image: nginx
resources:
limits:
memory: "200Mi"
cpu: "700m"
requests:
memory: "200Mi"
cpu: "700m"
Create the pod in the qos-example
namespace:
kubectl create namespace qos-example
kubectl create -f https://k8s.io/examples/pods/qos/qos-pod-5.yaml
This pod is classified as a Guaranteed QoS class requesting 700m CPU and 200Mi memory.
View detailed information about the pod:
kubectl get pod qos-demo-5 --output=yaml --namespace=qos-example
Also notice that the values of resizePolicy[*].restartPolicy
defaulted to
NotRequired
, indicating that CPU and memory can be resized while container
is running.
spec:
containers:
...
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: NotRequired
resources:
limits:
cpu: 700m
memory: 200Mi
requests:
cpu: 700m
memory: 200Mi
...
containerStatuses:
...
name: qos-demo-ctr-5
ready: true
...
allocatedResources:
cpu: 700m
memory: 200Mi
resources:
limits:
cpu: 700m
memory: 200Mi
requests:
cpu: 700m
memory: 200Mi
restartCount: 0
started: true
...
qosClass: Guaranteed
Updating the pod's resources
Let's say the CPU requirements have increased, and 0.8 CPU is now desired. This may be specified manually, or determined and programmatically applied by an entity such as VerticalPodAutoscaler (VPA).
Note:
While you can change a Pod's requests and limits to express new desired resources, you cannot change the QoS class in which the Pod was created.Now, patch the Pod's Container with CPU requests & limits both set to 800m
:
kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}'
Query the Pod's detailed information after the Pod has been patched.
kubectl get pod qos-demo-5 --output=yaml --namespace=qos-example
The Pod's spec below reflects the updated CPU requests and limits.
spec:
containers:
...
resources:
limits:
cpu: 800m
memory: 200Mi
requests:
cpu: 800m
memory: 200Mi
...
containerStatuses:
...
allocatedResources:
cpu: 800m
memory: 200Mi
resources:
limits:
cpu: 800m
memory: 200Mi
requests:
cpu: 800m
memory: 200Mi
restartCount: 0
started: true
Observe that the allocatedResources
values have been updated to reflect the new
desired CPU requests. This indicates that node was able to accommodate the
increased CPU resource needs.
In the Container's status, updated CPU resource values shows that new CPU
resources have been applied. The Container's restartCount
remains unchanged,
indicating that container's CPU resources were resized without restarting the container.
Clean up
Delete your namespace:
kubectl delete namespace qos-example
What's next
For application developers
For cluster administrators
5 - Configure RunAsUserName for Windows pods and containers
Kubernetes v1.18 [stable]
This page shows how to use the runAsUserName
setting for Pods and containers that will run on Windows nodes. This is roughly equivalent of the Linux-specific runAsUser
setting, allowing you to run applications in a container as a different username than the default.
Before you begin
You need to have a Kubernetes cluster and the kubectl command-line tool must be configured to communicate with your cluster. The cluster is expected to have Windows worker nodes where pods with containers running Windows workloads will get scheduled.
Set the Username for a Pod
To specify the username with which to execute the Pod's container processes, include the securityContext
field (PodSecurityContext) in the Pod specification, and within it, the windowsOptions
(WindowsSecurityContextOptions) field containing the runAsUserName
field.
The Windows security context options that you specify for a Pod apply to all Containers and init Containers in the Pod.
Here is a configuration file for a Windows Pod that has the runAsUserName
field set:
apiVersion: v1
kind: Pod
metadata:
name: run-as-username-pod-demo
spec:
securityContext:
windowsOptions:
runAsUserName: "ContainerUser"
containers:
- name: run-as-username-demo
image: mcr.microsoft.com/windows/servercore:ltsc2019
command: ["ping", "-t", "localhost"]
nodeSelector:
kubernetes.io/os: windows
Create the Pod:
kubectl apply -f https://k8s.io/examples/windows/run-as-username-pod.yaml
Verify that the Pod's Container is running:
kubectl get pod run-as-username-pod-demo
Get a shell to the running Container:
kubectl exec -it run-as-username-pod-demo -- powershell
Check that the shell is running user the correct username:
echo $env:USERNAME
The output should be:
ContainerUser
Set the Username for a Container
To specify the username with which to execute a Container's processes, include the securityContext
field (SecurityContext) in the Container manifest, and within it, the windowsOptions
(WindowsSecurityContextOptions) field containing the runAsUserName
field.
The Windows security context options that you specify for a Container apply only to that individual Container, and they override the settings made at the Pod level.
Here is the configuration file for a Pod that has one Container, and the runAsUserName
field is set at the Pod level and the Container level:
apiVersion: v1
kind: Pod
metadata:
name: run-as-username-container-demo
spec:
securityContext:
windowsOptions:
runAsUserName: "ContainerUser"
containers:
- name: run-as-username-demo
image: mcr.microsoft.com/windows/servercore:ltsc2019
command: ["ping", "-t", "localhost"]
securityContext:
windowsOptions:
runAsUserName: "ContainerAdministrator"
nodeSelector:
kubernetes.io/os: windows
Create the Pod:
kubectl apply -f https://k8s.io/examples/windows/run-as-username-container.yaml
Verify that the Pod's Container is running:
kubectl get pod run-as-username-container-demo
Get a shell to the running Container:
kubectl exec -it run-as-username-container-demo -- powershell
Check that the shell is running user the correct username (the one set at the Container level):
echo $env:USERNAME
The output should be:
ContainerAdministrator
Windows Username limitations
In order to use this feature, the value set in the runAsUserName
field must be a valid username. It must have the following format: DOMAIN\USER
, where DOMAIN\
is optional. Windows user names are case insensitive. Additionally, there are some restrictions regarding the DOMAIN
and USER
:
- The
runAsUserName
field cannot be empty, and it cannot contain control characters (ASCII values:0x00-0x1F
,0x7F
) - The
DOMAIN
must be either a NetBios name, or a DNS name, each with their own restrictions:- NetBios names: maximum 15 characters, cannot start with
.
(dot), and cannot contain the following characters:\ / : * ? " < > |
- DNS names: maximum 255 characters, contains only alphanumeric characters, dots, and dashes, and it cannot start or end with a
.
(dot) or-
(dash).
- NetBios names: maximum 15 characters, cannot start with
- The
USER
must have at most 20 characters, it cannot contain only dots or spaces, and it cannot contain the following characters:" / \ [ ] : ; | = , + * ? < > @
.
Examples of acceptable values for the runAsUserName
field: ContainerAdministrator
, ContainerUser
, NT AUTHORITY\NETWORK SERVICE
, NT AUTHORITY\LOCAL SERVICE
.
For more information about these limtations, check here and here.
What's next
6 - Create a Windows HostProcess Pod
Kubernetes v1.26 [stable]
Windows HostProcess containers enable you to run containerized workloads on a Windows host. These containers operate as normal processes but have access to the host network namespace, storage, and devices when given the appropriate user privileges. HostProcess containers can be used to deploy network plugins, storage configurations, device plugins, kube-proxy, and other components to Windows nodes without the need for dedicated proxies or the direct installation of host services.
Administrative tasks such as installation of security patches, event log collection, and more can be performed without requiring cluster operators to log onto each Windows node. HostProcess containers can run as any user that is available on the host or is in the domain of the host machine, allowing administrators to restrict resource access through user permissions. While neither filesystem or process isolation are supported, a new volume is created on the host upon starting the container to give it a clean and consolidated workspace. HostProcess containers can also be built on top of existing Windows base images and do not inherit the same compatibility requirements as Windows server containers, meaning that the version of the base images does not need to match that of the host. It is, however, recommended that you use the same base image version as your Windows Server container workloads to ensure you do not have any unused images taking up space on the node. HostProcess containers also support volume mounts within the container volume.
When should I use a Windows HostProcess container?
- When you need to perform tasks which require the networking namespace of the host. HostProcess containers have access to the host's network interfaces and IP addresses.
- You need access to resources on the host such as the filesystem, event logs, etc.
- Installation of specific device drivers or Windows services.
- Consolidation of administrative tasks and security policies. This reduces the degree of privileges needed by Windows nodes.
Before you begin
This task guide is specific to Kubernetes v1.30. If you are not running Kubernetes v1.30, check the documentation for that version of Kubernetes.
In Kubernetes 1.30, the HostProcess container feature is enabled by default. The kubelet will communicate with containerd directly by passing the hostprocess flag via CRI. You can use the latest version of containerd (v1.6+) to run HostProcess containers. How to install containerd.
Limitations
These limitations are relevant for Kubernetes v1.30:
- HostProcess containers require containerd 1.6 or higher container runtime and containerd 1.7 is recommended.
- HostProcess pods can only contain HostProcess containers. This is a current limitation of the Windows OS; non-privileged Windows containers cannot share a vNIC with the host IP namespace.
- HostProcess containers run as a process on the host and do not have any degree of isolation other than resource constraints imposed on the HostProcess user account. Neither filesystem or Hyper-V isolation are supported for HostProcess containers.
- Volume mounts are supported and are mounted under the container volume. See Volume Mounts
- A limited set of host user accounts are available for HostProcess containers by default. See Choosing a User Account.
- Resource limits (disk, memory, cpu count) are supported in the same fashion as processes on the host.
- Both Named pipe mounts and Unix domain sockets are not supported and should instead be accessed via their path on the host (e.g. \\.\pipe\*)
HostProcess Pod configuration requirements
Enabling a Windows HostProcess pod requires setting the right configurations in the pod security configuration. Of the policies defined in the Pod Security Standards HostProcess pods are disallowed by the baseline and restricted policies. It is therefore recommended that HostProcess pods run in alignment with the privileged profile.
When running under the privileged policy, here are the configurations which need to be set to enable the creation of a HostProcess pod:
Control | Policy |
---|---|
securityContext.windowsOptions.hostProcess |
Windows pods offer the ability to run HostProcess containers which enables privileged access to the Windows node. Allowed Values
|
hostNetwork |
Pods container HostProcess containers must use the host's network namespace. Allowed Values
|
securityContext.windowsOptions.runAsUserName |
Specification of which user the HostProcess container should run as is required for the pod spec. Allowed Values
|
runAsNonRoot |
Because HostProcess containers have privileged access to the host, the runAsNonRoot field cannot be set to true. Allowed Values
|
Example manifest (excerpt)
spec:
securityContext:
windowsOptions:
hostProcess: true
runAsUserName: "NT AUTHORITY\\Local service"
hostNetwork: true
containers:
- name: test
image: image1:latest
command:
- ping
- -t
- 127.0.0.1
nodeSelector:
"kubernetes.io/os": windows
Volume mounts
HostProcess containers support the ability to mount volumes within the container volume space. Volume mount behavior differs depending on the version of containerd runtime used by on the node.
Containerd v1.6
Applications running inside the container can access volume mounts directly via relative or
absolute paths. An environment variable $CONTAINER_SANDBOX_MOUNT_POINT
is set upon container
creation and provides the absolute host path to the container volume. Relative paths are based
upon the .spec.containers.volumeMounts.mountPath
configuration.
To access service account tokens (for example) the following path structures are supported within the container:
.\var\run\secrets\kubernetes.io\serviceaccount\
$CONTAINER_SANDBOX_MOUNT_POINT\var\run\secrets\kubernetes.io\serviceaccount\
Containerd v1.7 (and greater)
Applications running inside the container can access volume mounts directly via the volumeMount's
specified mountPath
(just like Linux and non-HostProcess Windows containers).
For backwards compatibility volumes can also be accessed via using the same relative paths configured by containerd v1.6.
As an example, to access service account tokens within the container you would use one of the following paths:
c:\var\run\secrets\kubernetes.io\serviceaccount
/var/run/secrets/kubernetes.io/serviceaccount/
$CONTAINER_SANDBOX_MOUNT_POINT\var\run\secrets\kubernetes.io\serviceaccount\
Resource limits
Resource limits (disk, memory, cpu count) are applied to the job and are job wide. For example, with a limit of 10MB set, the memory allocated for any HostProcess job object will be capped at 10MB. This is the same behavior as other Windows container types. These limits would be specified the same way they are currently for whatever orchestrator or runtime is being used. The only difference is in the disk resource usage calculation used for resource tracking due to the difference in how HostProcess containers are bootstrapped.
Choosing a user account
System accounts
By default, HostProcess containers support the ability to run as one of three supported Windows service accounts:
You should select an appropriate Windows service account for each HostProcess container, aiming to limit the degree of privileges so as to avoid accidental (or even malicious) damage to the host. The LocalSystem service account has the highest level of privilege of the three and should be used only if absolutely necessary. Where possible, use the LocalService service account as it is the least privileged of the three options.
Local accounts
If configured, HostProcess containers can also run as local user accounts which allows for node operators to give fine-grained access to workloads.
To run HostProcess containers as a local user; A local usergroup must first be created on the node
and the name of that local usergroup must be specified in the runAsUserName
field in the deployment.
Prior to initializing the HostProcess container, a new ephemeral local user account to be created and joined to the specified usergroup, from which the container is run.
This provides a number a benefits including eliminating the need to manage passwords for local user accounts.
An initial HostProcess container running as a service account can be used to
prepare the user groups for later HostProcess containers.
Note:
Running HostProcess containers as local user accounts requires containerd v1.7+Example:
-
Create a local user group on the node (this can be done in another HostProcess container).
net localgroup hpc-localgroup /add
-
Grant access to desired resources on the node to the local usergroup. This can be done with tools like icacls.
-
Set
runAsUserName
to the name of the local usergroup for the pod or individual containers.securityContext: windowsOptions: hostProcess: true runAsUserName: hpc-localgroup
-
Schedule the pod!
Base Image for HostProcess Containers
HostProcess containers can be built from any of the existing Windows Container base images.
Additionally a new base mage has been created just for HostProcess containers! For more information please check out the windows-host-process-containers-base-image github project.
Troubleshooting HostProcess containers
-
HostProcess containers fail to start with
failed to create user process token: failed to logon user: Access is denied.: unknown
Ensure containerd is running as
LocalSystem
orLocalService
service accounts. User accounts (even Administrator accounts) do not have permissions to create logon tokens for any of the supported user accounts.
7 - Configure Quality of Service for Pods
This page shows how to configure Pods so that they will be assigned particular Quality of Service (QoS) classes. Kubernetes uses QoS classes to make decisions about evicting Pods when Node resources are exceeded.
When Kubernetes creates a Pod it assigns one of these QoS classes to the Pod:
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
You also need to be able to create and delete namespaces.
Create a namespace
Create a namespace so that the resources you create in this exercise are isolated from the rest of your cluster.
kubectl create namespace qos-example
Create a Pod that gets assigned a QoS class of Guaranteed
For a Pod to be given a QoS class of Guaranteed
:
- Every Container in the Pod must have a memory limit and a memory request.
- For every Container in the Pod, the memory limit must equal the memory request.
- Every Container in the Pod must have a CPU limit and a CPU request.
- For every Container in the Pod, the CPU limit must equal the CPU request.
These restrictions apply to init containers and app containers equally. Ephemeral containers cannot define resources so these restrictions do not apply.
Here is a manifest for a Pod that has one Container. The Container has a memory limit and a memory request, both equal to 200 MiB. The Container has a CPU limit and a CPU request, both equal to 700 milliCPU:
apiVersion: v1
kind: Pod
metadata:
name: qos-demo
namespace: qos-example
spec:
containers:
- name: qos-demo-ctr
image: nginx
resources:
limits:
memory: "200Mi"
cpu: "700m"
requests:
memory: "200Mi"
cpu: "700m"
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/qos/qos-pod.yaml --namespace=qos-example
View detailed information about the Pod:
kubectl get pod qos-demo --namespace=qos-example --output=yaml
The output shows that Kubernetes gave the Pod a QoS class of Guaranteed
. The output also
verifies that the Pod Container has a memory request that matches its memory limit, and it has
a CPU request that matches its CPU limit.
spec:
containers:
...
resources:
limits:
cpu: 700m
memory: 200Mi
requests:
cpu: 700m
memory: 200Mi
...
status:
qosClass: Guaranteed
Note:
If a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit. Similarly, if a Container specifies its own CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit.Clean up
Delete your Pod:
kubectl delete pod qos-demo --namespace=qos-example
Create a Pod that gets assigned a QoS class of Burstable
A Pod is given a QoS class of Burstable
if:
- The Pod does not meet the criteria for QoS class
Guaranteed
. - At least one Container in the Pod has a memory or CPU request or limit.
Here is a manifest for a Pod that has one Container. The Container has a memory limit of 200 MiB and a memory request of 100 MiB.
apiVersion: v1
kind: Pod
metadata:
name: qos-demo-2
namespace: qos-example
spec:
containers:
- name: qos-demo-2-ctr
image: nginx
resources:
limits:
memory: "200Mi"
requests:
memory: "100Mi"
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/qos/qos-pod-2.yaml --namespace=qos-example
View detailed information about the Pod:
kubectl get pod qos-demo-2 --namespace=qos-example --output=yaml
The output shows that Kubernetes gave the Pod a QoS class of Burstable
:
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: qos-demo-2-ctr
resources:
limits:
memory: 200Mi
requests:
memory: 100Mi
...
status:
qosClass: Burstable
Clean up
Delete your Pod:
kubectl delete pod qos-demo-2 --namespace=qos-example
Create a Pod that gets assigned a QoS class of BestEffort
For a Pod to be given a QoS class of BestEffort
, the Containers in the Pod must not
have any memory or CPU limits or requests.
Here is a manifest for a Pod that has one Container. The Container has no memory or CPU limits or requests:
apiVersion: v1
kind: Pod
metadata:
name: qos-demo-3
namespace: qos-example
spec:
containers:
- name: qos-demo-3-ctr
image: nginx
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/qos/qos-pod-3.yaml --namespace=qos-example
View detailed information about the Pod:
kubectl get pod qos-demo-3 --namespace=qos-example --output=yaml
The output shows that Kubernetes gave the Pod a QoS class of BestEffort
:
spec:
containers:
...
resources: {}
...
status:
qosClass: BestEffort
Clean up
Delete your Pod:
kubectl delete pod qos-demo-3 --namespace=qos-example
Create a Pod that has two Containers
Here is a manifest for a Pod that has two Containers. One container specifies a memory request of 200 MiB. The other Container does not specify any requests or limits.
apiVersion: v1
kind: Pod
metadata:
name: qos-demo-4
namespace: qos-example
spec:
containers:
- name: qos-demo-4-ctr-1
image: nginx
resources:
requests:
memory: "200Mi"
- name: qos-demo-4-ctr-2
image: redis
Notice that this Pod meets the criteria for QoS class Burstable
. That is, it does not meet the
criteria for QoS class Guaranteed
, and one of its Containers has a memory request.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/qos/qos-pod-4.yaml --namespace=qos-example
View detailed information about the Pod:
kubectl get pod qos-demo-4 --namespace=qos-example --output=yaml
The output shows that Kubernetes gave the Pod a QoS class of Burstable
:
spec:
containers:
...
name: qos-demo-4-ctr-1
resources:
requests:
memory: 200Mi
...
name: qos-demo-4-ctr-2
resources: {}
...
status:
qosClass: Burstable
Retrieve the QoS class for a Pod
Rather than see all the fields, you can view just the field you need:
kubectl --namespace=qos-example get pod qos-demo-4 -o jsonpath='{ .status.qosClass}{"\n"}'
Burstable
Clean up
Delete your namespace:
kubectl delete namespace qos-example
What's next
For app developers
For cluster administrators
8 - Assign Extended Resources to a Container
Kubernetes v1.30 [stable]
This page shows how to assign extended resources to a Container.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Before you do this exercise, do the exercise in Advertise Extended Resources for a Node. That will configure one of your Nodes to advertise a dongle resource.
Assign an extended resource to a Pod
To request an extended resource, include the resources:requests
field in your
Container manifest. Extended resources are fully qualified with any domain outside of
*.kubernetes.io/
. Valid extended resource names have the form example.com/foo
where
example.com
is replaced with your organization's domain and foo
is a
descriptive resource name.
Here is the configuration file for a Pod that has one Container:
apiVersion: v1
kind: Pod
metadata:
name: extended-resource-demo
spec:
containers:
- name: extended-resource-demo-ctr
image: nginx
resources:
requests:
example.com/dongle: 3
limits:
example.com/dongle: 3
In the configuration file, you can see that the Container requests 3 dongles.
Create a Pod:
kubectl apply -f https://k8s.io/examples/pods/resource/extended-resource-pod.yaml
Verify that the Pod is running:
kubectl get pod extended-resource-demo
Describe the Pod:
kubectl describe pod extended-resource-demo
The output shows dongle requests:
Limits:
example.com/dongle: 3
Requests:
example.com/dongle: 3
Attempt to create a second Pod
Here is the configuration file for a Pod that has one Container. The Container requests two dongles.
apiVersion: v1
kind: Pod
metadata:
name: extended-resource-demo-2
spec:
containers:
- name: extended-resource-demo-2-ctr
image: nginx
resources:
requests:
example.com/dongle: 2
limits:
example.com/dongle: 2
Kubernetes will not be able to satisfy the request for two dongles, because the first Pod used three of the four available dongles.
Attempt to create a Pod:
kubectl apply -f https://k8s.io/examples/pods/resource/extended-resource-pod-2.yaml
Describe the Pod
kubectl describe pod extended-resource-demo-2
The output shows that the Pod cannot be scheduled, because there is no Node that has 2 dongles available:
Conditions:
Type Status
PodScheduled False
...
Events:
...
... Warning FailedScheduling pod (extended-resource-demo-2) failed to fit in any node
fit failure summary on nodes : Insufficient example.com/dongle (1)
View the Pod status:
kubectl get pod extended-resource-demo-2
The output shows that the Pod was created, but not scheduled to run on a Node. It has a status of Pending:
NAME READY STATUS RESTARTS AGE
extended-resource-demo-2 0/1 Pending 0 6m
Clean up
Delete the Pods that you created for this exercise:
kubectl delete pod extended-resource-demo
kubectl delete pod extended-resource-demo-2
What's next
For application developers
For cluster administrators
9 - Configure a Pod to Use a Volume for Storage
This page shows how to configure a Pod to use a Volume for storage.
A Container's file system lives only as long as the Container does. So when a Container terminates and restarts, filesystem changes are lost. For more consistent storage that is independent of the Container, you can use a Volume. This is especially important for stateful applications, such as key-value stores (such as Redis) and databases.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Configure a volume for a Pod
In this exercise, you create a Pod that runs one Container. This Pod has a Volume of type emptyDir that lasts for the life of the Pod, even if the Container terminates and restarts. Here is the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: redis
spec:
containers:
- name: redis
image: redis
volumeMounts:
- name: redis-storage
mountPath: /data/redis
volumes:
- name: redis-storage
emptyDir: {}
-
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/storage/redis.yaml
-
Verify that the Pod's Container is running, and then watch for changes to the Pod:
kubectl get pod redis --watch
The output looks like this:
NAME READY STATUS RESTARTS AGE redis 1/1 Running 0 13s
-
In another terminal, get a shell to the running Container:
kubectl exec -it redis -- /bin/bash
-
In your shell, go to
/data/redis
, and then create a file:root@redis:/data# cd /data/redis/ root@redis:/data/redis# echo Hello > test-file
-
In your shell, list the running processes:
root@redis:/data/redis# apt-get update root@redis:/data/redis# apt-get install procps root@redis:/data/redis# ps aux
The output is similar to this:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND redis 1 0.1 0.1 33308 3828 ? Ssl 00:46 0:00 redis-server *:6379 root 12 0.0 0.0 20228 3020 ? Ss 00:47 0:00 /bin/bash root 15 0.0 0.0 17500 2072 ? R+ 00:48 0:00 ps aux
-
In your shell, kill the Redis process:
root@redis:/data/redis# kill <pid>
where
<pid>
is the Redis process ID (PID). -
In your original terminal, watch for changes to the Redis Pod. Eventually, you will see something like this:
NAME READY STATUS RESTARTS AGE redis 1/1 Running 0 13s redis 0/1 Completed 0 6m redis 1/1 Running 1 6m
At this point, the Container has terminated and restarted. This is because the
Redis Pod has a
restartPolicy
of Always
.
-
Get a shell into the restarted Container:
kubectl exec -it redis -- /bin/bash
-
In your shell, go to
/data/redis
, and verify thattest-file
is still there.root@redis:/data/redis# cd /data/redis/ root@redis:/data/redis# ls test-file
-
Delete the Pod that you created for this exercise:
kubectl delete pod redis
What's next
-
See Volume.
-
See Pod.
-
In addition to the local disk storage provided by
emptyDir
, Kubernetes supports many different network-attached storage solutions, including PD on GCE and EBS on EC2, which are preferred for critical data and will handle details such as mounting and unmounting the devices on the nodes. See Volumes for more details.
10 - Configure a Pod to Use a PersistentVolume for Storage
This page shows you how to configure a Pod to use a PersistentVolumeClaim for storage. Here is a summary of the process:
-
You, as cluster administrator, create a PersistentVolume backed by physical storage. You do not associate the volume with any Pod.
-
You, now taking the role of a developer / cluster user, create a PersistentVolumeClaim that is automatically bound to a suitable PersistentVolume.
-
You create a Pod that uses the above PersistentVolumeClaim for storage.
Before you begin
-
You need to have a Kubernetes cluster that has only one Node, and the kubectl command-line tool must be configured to communicate with your cluster. If you do not already have a single-node cluster, you can create one by using Minikube.
-
Familiarize yourself with the material in Persistent Volumes.
Create an index.html file on your Node
Open a shell to the single Node in your cluster. How you open a shell depends
on how you set up your cluster. For example, if you are using Minikube, you
can open a shell to your Node by entering minikube ssh
.
In your shell on that Node, create a /mnt/data
directory:
# This assumes that your Node uses "sudo" to run commands
# as the superuser
sudo mkdir /mnt/data
In the /mnt/data
directory, create an index.html
file:
# This again assumes that your Node uses "sudo" to run commands
# as the superuser
sudo sh -c "echo 'Hello from Kubernetes storage' > /mnt/data/index.html"
Note:
If your Node uses a tool for superuser access other thansudo
, you can
usually make this work if you replace sudo
with the name of the other tool.Test that the index.html
file exists:
cat /mnt/data/index.html
The output should be:
Hello from Kubernetes storage
You can now close the shell to your Node.
Create a PersistentVolume
In this exercise, you create a hostPath PersistentVolume. Kubernetes supports hostPath for development and testing on a single-node cluster. A hostPath PersistentVolume uses a file or directory on the Node to emulate network-attached storage.
In a production cluster, you would not use hostPath. Instead a cluster administrator would provision a network resource like a Google Compute Engine persistent disk, an NFS share, or an Amazon Elastic Block Store volume. Cluster administrators can also use StorageClasses to set up dynamic provisioning.
Here is the configuration file for the hostPath PersistentVolume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
The configuration file specifies that the volume is at /mnt/data
on the
cluster's Node. The configuration also specifies a size of 10 gibibytes and
an access mode of ReadWriteOnce
, which means the volume can be mounted as
read-write by a single Node. It defines the StorageClass name
manual
for the PersistentVolume, which will be used to bind
PersistentVolumeClaim requests to this PersistentVolume.
Note:
This example uses theReadWriteOnce
access mode, for simplicity. For
production use, the Kubernetes project recommends using the ReadWriteOncePod
access mode instead.Create the PersistentVolume:
kubectl apply -f https://k8s.io/examples/pods/storage/pv-volume.yaml
View information about the PersistentVolume:
kubectl get pv task-pv-volume
The output shows that the PersistentVolume has a STATUS
of Available
. This
means it has not yet been bound to a PersistentVolumeClaim.
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
task-pv-volume 10Gi RWO Retain Available manual 4s
Create a PersistentVolumeClaim
The next step is to create a PersistentVolumeClaim. Pods use PersistentVolumeClaims to request physical storage. In this exercise, you create a PersistentVolumeClaim that requests a volume of at least three gibibytes that can provide read-write access for at most one Node at a time.
Here is the configuration file for the PersistentVolumeClaim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
Create the PersistentVolumeClaim:
kubectl apply -f https://k8s.io/examples/pods/storage/pv-claim.yaml
After you create the PersistentVolumeClaim, the Kubernetes control plane looks for a PersistentVolume that satisfies the claim's requirements. If the control plane finds a suitable PersistentVolume with the same StorageClass, it binds the claim to the volume.
Look again at the PersistentVolume:
kubectl get pv task-pv-volume
Now the output shows a STATUS
of Bound
.
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
task-pv-volume 10Gi RWO Retain Bound default/task-pv-claim manual 2m
Look at the PersistentVolumeClaim:
kubectl get pvc task-pv-claim
The output shows that the PersistentVolumeClaim is bound to your PersistentVolume,
task-pv-volume
.
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
task-pv-claim Bound task-pv-volume 10Gi RWO manual 30s
Create a Pod
The next step is to create a Pod that uses your PersistentVolumeClaim as a volume.
Here is the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
Notice that the Pod's configuration file specifies a PersistentVolumeClaim, but it does not specify a PersistentVolume. From the Pod's point of view, the claim is a volume.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/storage/pv-pod.yaml
Verify that the container in the Pod is running;
kubectl get pod task-pv-pod
Get a shell to the container running in your Pod:
kubectl exec -it task-pv-pod -- /bin/bash
In your shell, verify that nginx is serving the index.html
file from the
hostPath volume:
# Be sure to run these 3 commands inside the root shell that comes from
# running "kubectl exec" in the previous step
apt update
apt install curl
curl http://localhost/
The output shows the text that you wrote to the index.html
file on the
hostPath volume:
Hello from Kubernetes storage
If you see that message, you have successfully configured a Pod to use storage from a PersistentVolumeClaim.
Clean up
Delete the Pod, the PersistentVolumeClaim and the PersistentVolume:
kubectl delete pod task-pv-pod
kubectl delete pvc task-pv-claim
kubectl delete pv task-pv-volume
If you don't already have a shell open to the Node in your cluster, open a new shell the same way that you did earlier.
In the shell on your Node, remove the file and directory that you created:
# This assumes that your Node uses "sudo" to run commands
# as the superuser
sudo rm /mnt/data/index.html
sudo rmdir /mnt/data
You can now close the shell to your Node.
Mounting the same persistentVolume in two places
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
image: nginx
volumeMounts:
# a mount for site-data
- name: config
mountPath: /usr/share/nginx/html
subPath: html
# another mount for nginx config
- name: config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
volumes:
- name: config
persistentVolumeClaim:
claimName: test-nfs-claim
You can perform 2 volume mounts on your nginx container:
/usr/share/nginx/html
for the static website/etc/nginx/nginx.conf
for the default config
Access control
Storage configured with a group ID (GID) allows writing only by Pods using the same GID. Mismatched or missing GIDs cause permission denied errors. To reduce the need for coordination with users, an administrator can annotate a PersistentVolume with a GID. Then the GID is automatically added to any Pod that uses the PersistentVolume.
Use the pv.beta.kubernetes.io/gid
annotation as follows:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv1
annotations:
pv.beta.kubernetes.io/gid: "1234"
When a Pod consumes a PersistentVolume that has a GID annotation, the annotated GID is applied to all containers in the Pod in the same way that GIDs specified in the Pod's security context are. Every GID, whether it originates from a PersistentVolume annotation or the Pod's specification, is applied to the first process run in each container.
Note:
When a Pod consumes a PersistentVolume, the GIDs associated with the PersistentVolume are not present on the Pod resource itself.What's next
- Learn more about PersistentVolumes.
- Read the Persistent Storage design document.
Reference
11 - Configure a Pod to Use a Projected Volume for Storage
This page shows how to use a projected
Volume to mount
several existing volume sources into the same directory. Currently, secret
, configMap
, downwardAPI
,
and serviceAccountToken
volumes can be projected.
Note:
serviceAccountToken
is not a volume type.Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Configure a projected volume for a pod
In this exercise, you create username and password Secrets from local files. You then create a Pod that runs one container, using a projected
Volume to mount the Secrets into the same shared directory.
Here is the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: test-projected-volume
spec:
containers:
- name: test-projected-volume
image: busybox:1.28
args:
- sleep
- "86400"
volumeMounts:
- name: all-in-one
mountPath: "/projected-volume"
readOnly: true
volumes:
- name: all-in-one
projected:
sources:
- secret:
name: user
- secret:
name: pass
-
Create the Secrets:
# Create files containing the username and password: echo -n "admin" > ./username.txt echo -n "1f2d1e2e67df" > ./password.txt # Package these files into secrets: kubectl create secret generic user --from-file=./username.txt kubectl create secret generic pass --from-file=./password.txt
-
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/storage/projected.yaml
-
Verify that the Pod's container is running, and then watch for changes to the Pod:
kubectl get --watch pod test-projected-volume
The output looks like this:
NAME READY STATUS RESTARTS AGE test-projected-volume 1/1 Running 0 14s
-
In another terminal, get a shell to the running container:
kubectl exec -it test-projected-volume -- /bin/sh
-
In your shell, verify that the
projected-volume
directory contains your projected sources:ls /projected-volume/
Clean up
Delete the Pod and the Secrets:
kubectl delete pod test-projected-volume
kubectl delete secret user pass
What's next
- Learn more about
projected
volumes. - Read the all-in-one volume design document.
12 - Configure a Security Context for a Pod or Container
A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:
-
Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).
-
Security Enhanced Linux (SELinux): Objects are assigned security labels.
-
Running as privileged or unprivileged.
-
Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
-
AppArmor: Use program profiles to restrict the capabilities of individual programs.
-
Seccomp: Filter a process's system calls.
-
allowPrivilegeEscalation
: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether theno_new_privs
flag gets set on the container process.allowPrivilegeEscalation
is always true when the container:- is run as privileged, or
- has
CAP_SYS_ADMIN
-
readOnlyRootFilesystem
: Mounts the container's root filesystem as read-only.
The above bullets are not a complete set of security context settings -- please see SecurityContext for a comprehensive list.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Set the security context for a Pod
To specify security settings for a Pod, include the securityContext
field
in the Pod specification. The securityContext
field is a
PodSecurityContext object.
The security settings that you specify for a Pod apply to all Containers in the Pod.
Here is a configuration file for a Pod that has a securityContext
and an emptyDir
volume:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: sec-ctx-demo
image: busybox:1.28
command: [ "sh", "-c", "sleep 1h" ]
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
allowPrivilegeEscalation: false
In the configuration file, the runAsUser
field specifies that for any Containers in
the Pod, all processes run with user ID 1000. The runAsGroup
field specifies the primary group ID of 3000 for
all processes within any containers of the Pod. If this field is omitted, the primary group ID of the containers
will be root(0). Any files created will also be owned by user 1000 and group 3000 when runAsGroup
is specified.
Since fsGroup
field is specified, all processes of the container are also part of the supplementary group ID 2000.
The owner for volume /data/demo
and any files created in that volume will be Group ID 2000.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo
Get a shell to the running Container:
kubectl exec -it security-context-demo -- sh
In your shell, list the running processes:
ps
The output shows that the processes are running as user 1000, which is the value of runAsUser
:
PID USER TIME COMMAND
1 1000 0:00 sleep 1h
6 1000 0:00 sh
...
In your shell, navigate to /data
, and list the one directory:
cd /data
ls -l
The output shows that the /data/demo
directory has group ID 2000, which is
the value of fsGroup
.
drwxrwsrwx 2 root 2000 4096 Jun 6 20:08 demo
In your shell, navigate to /data/demo
, and create a file:
cd demo
echo hello > testfile
List the file in the /data/demo
directory:
ls -l
The output shows that testfile
has group ID 2000, which is the value of fsGroup
.
-rw-r--r-- 1 1000 2000 6 Jun 6 20:08 testfile
Run the following command:
id
The output is similar to this:
uid=1000 gid=3000 groups=2000
From the output, you can see that gid
is 3000 which is same as the runAsGroup
field.
If the runAsGroup
was omitted, the gid
would remain as 0 (root) and the process will
be able to interact with files that are owned by the root(0) group and groups that have
the required group permissions for the root (0) group.
Exit your shell:
exit
Configure volume permission and ownership change policy for Pods
Kubernetes v1.23 [stable]
By default, Kubernetes recursively changes ownership and permissions for the contents of each
volume to match the fsGroup
specified in a Pod's securityContext
when that volume is
mounted.
For large volumes, checking and changing ownership and permissions can take a lot of time,
slowing Pod startup. You can use the fsGroupChangePolicy
field inside a securityContext
to control the way that Kubernetes checks and manages ownership and permissions
for a volume.
fsGroupChangePolicy - fsGroupChangePolicy
defines behavior for changing ownership
and permission of the volume before being exposed inside a Pod.
This field only applies to volume types that support fsGroup
controlled ownership and permissions.
This field has two possible values:
- OnRootMismatch: Only change permissions and ownership if the permission and the ownership of root directory does not match with expected permissions of the volume. This could help shorten the time it takes to change ownership and permission of a volume.
- Always: Always change permission and ownership of the volume when volume is mounted.
For example:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
fsGroupChangePolicy: "OnRootMismatch"
Delegating volume permission and ownership change to CSI driver
Kubernetes v1.26 [stable]
If you deploy a Container Storage Interface (CSI)
driver which supports the VOLUME_MOUNT_GROUP
NodeServiceCapability
, the
process of setting file ownership and permissions based on the
fsGroup
specified in the securityContext
will be performed by the CSI driver
instead of Kubernetes. In this case, since Kubernetes doesn't perform any
ownership and permission change, fsGroupChangePolicy
does not take effect, and
as specified by CSI, the driver is expected to mount the volume with the
provided fsGroup
, resulting in a volume that is readable/writable by the
fsGroup
.
Set the security context for a Container
To specify security settings for a Container, include the securityContext
field
in the Container manifest. The securityContext
field is a
SecurityContext object.
Security settings that you specify for a Container apply only to
the individual Container, and they override settings made at the Pod level when
there is overlap. Container settings do not affect the Pod's Volumes.
Here is the configuration file for a Pod that has one Container. Both the Pod
and the Container have a securityContext
field:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-2
spec:
securityContext:
runAsUser: 1000
containers:
- name: sec-ctx-demo-2
image: gcr.io/google-samples/hello-app:2.0
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-2.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo-2
Get a shell into the running Container:
kubectl exec -it security-context-demo-2 -- sh
In your shell, list the running processes:
ps aux
The output shows that the processes are running as user 2000. This is the value
of runAsUser
specified for the Container. It overrides the value 1000 that is
specified for the Pod.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
2000 1 0.0 0.0 4336 764 ? Ss 20:36 0:00 /bin/sh -c node server.js
2000 8 0.1 0.5 772124 22604 ? Sl 20:36 0:00 node server.js
...
Exit your shell:
exit
Set capabilities for a Container
With Linux capabilities,
you can grant certain privileges to a process without granting all the privileges
of the root user. To add or remove Linux capabilities for a Container, include the
capabilities
field in the securityContext
section of the Container manifest.
First, see what happens when you don't include a capabilities
field.
Here is configuration file that does not add or remove any Container capabilities:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-3
spec:
containers:
- name: sec-ctx-3
image: gcr.io/google-samples/hello-app:2.0
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-3.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo-3
Get a shell into the running Container:
kubectl exec -it security-context-demo-3 -- sh
In your shell, list the running processes:
ps aux
The output shows the process IDs (PIDs) for the Container:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4336 796 ? Ss 18:17 0:00 /bin/sh -c node server.js
root 5 0.1 0.5 772124 22700 ? Sl 18:17 0:00 node server.js
In your shell, view the status for process 1:
cd /proc/1
cat status
The output shows the capabilities bitmap for the process:
...
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
...
Make a note of the capabilities bitmap, and then exit your shell:
exit
Next, run a Container that is the same as the preceding container, except that it has additional capabilities set.
Here is the configuration file for a Pod that runs one Container. The configuration
adds the CAP_NET_ADMIN
and CAP_SYS_TIME
capabilities:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
- name: sec-ctx-4
image: gcr.io/google-samples/hello-app:2.0
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-4.yaml
Get a shell into the running Container:
kubectl exec -it security-context-demo-4 -- sh
In your shell, view the capabilities for process 1:
cd /proc/1
cat status
The output shows capabilities bitmap for the process:
...
CapPrm: 00000000aa0435fb
CapEff: 00000000aa0435fb
...
Compare the capabilities of the two Containers:
00000000a80425fb
00000000aa0435fb
In the capability bitmap of the first container, bits 12 and 25 are clear. In the second container,
bits 12 and 25 are set. Bit 12 is CAP_NET_ADMIN
, and bit 25 is CAP_SYS_TIME
.
See capability.h
for definitions of the capability constants.
Note:
Linux capability constants have the formCAP_XXX
.
But when you list capabilities in your container manifest, you must
omit the CAP_
portion of the constant.
For example, to add CAP_SYS_TIME
, include SYS_TIME
in your list of capabilities.Set the Seccomp Profile for a Container
To set the Seccomp profile for a Container, include the seccompProfile
field
in the securityContext
section of your Pod or Container manifest. The
seccompProfile
field is a
SeccompProfile object consisting of type
and localhostProfile
.
Valid options for type
include RuntimeDefault
, Unconfined
, and
Localhost
. localhostProfile
must only be set if type: Localhost
. It
indicates the path of the pre-configured profile on the node, relative to the
kubelet's configured Seccomp profile location (configured with the --root-dir
flag).
Here is an example that sets the Seccomp profile to the node's container runtime default profile:
...
securityContext:
seccompProfile:
type: RuntimeDefault
Here is an example that sets the Seccomp profile to a pre-configured file at
<kubelet-root-dir>/seccomp/my-profiles/profile-allow.json
:
...
securityContext:
seccompProfile:
type: Localhost
localhostProfile: my-profiles/profile-allow.json
Assign SELinux labels to a Container
To assign SELinux labels to a Container, include the seLinuxOptions
field in
the securityContext
section of your Pod or Container manifest. The
seLinuxOptions
field is an
SELinuxOptions
object. Here's an example that applies an SELinux level:
...
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
Note:
To assign SELinux labels, the SELinux security module must be loaded on the host operating system.Efficient SELinux volume relabeling
Kubernetes v1.28 [beta]
Note:
Kubernetes v1.27 introduced an early limited form of this behavior that was only applicable
to volumes (and PersistentVolumeClaims) using the ReadWriteOncePod
access mode.
As an alpha feature, you can enable the SELinuxMount
feature gate to widen that
performance improvement to other kinds of PersistentVolumeClaims, as explained in detail
below.
By default, the container runtime recursively assigns SELinux label to all
files on all Pod volumes. To speed up this process, Kubernetes can change the
SELinux label of a volume instantly by using a mount option
-o context=<label>
.
To benefit from this speedup, all these conditions must be met:
- The feature gates
ReadWriteOncePod
andSELinuxMountReadWriteOncePod
must be enabled. - Pod must use PersistentVolumeClaim with applicable
accessModes
and feature gates:- Either the volume has
accessModes: ["ReadWriteOncePod"]
, and feature gateSELinuxMountReadWriteOncePod
is enabled. - Or the volume can use any other access modes and both feature gates
SELinuxMountReadWriteOncePod
andSELinuxMount
must be enabled.
- Either the volume has
- Pod (or all its Containers that use the PersistentVolumeClaim) must
have
seLinuxOptions
set. - The corresponding PersistentVolume must be either:
- A volume that uses the legacy in-tree
iscsi
,rbd
orfc
volume type. - Or a volume that uses a CSI driver.
The CSI driver must announce that it supports mounting with
-o context
by settingspec.seLinuxMount: true
in its CSIDriver instance.
- A volume that uses the legacy in-tree
For any other volume types, SELinux relabelling happens another way: the container runtime recursively changes the SELinux label for all inodes (files and directories) in the volume. The more files and directories in the volume, the longer that relabelling takes.
Managing access to the /proc
filesystem
Kubernetes v1.12 [alpha]
For runtimes that follow the OCI runtime specification, containers default to running in a mode where there are multiple paths that are both masked and read-only. The result of this is the container has these paths present inside the container's mount namespace, and they can function similarly to if the container was an isolated host, but the container process cannot write to them. The list of masked and read-only paths are as follows:
-
Masked Paths:
/proc/asound
/proc/acpi
/proc/kcore
/proc/keys
/proc/latency_stats
/proc/timer_list
/proc/timer_stats
/proc/sched_debug
/proc/scsi
/sys/firmware
-
Read-Only Paths:
/proc/bus
/proc/fs
/proc/irq
/proc/sys
/proc/sysrq-trigger
For some Pods, you might want to bypass that default masking of paths. The most common context for wanting this is if you are trying to run containers within a Kubernetes container (within a pod).
The securityContext
field procMount
allows a user to request a container's /proc
be Unmasked
, or be mounted as read-write by the container process. This also
applies to /sys/firmware
which is not in /proc
.
...
securityContext:
procMount: Unmasked
Note:
SettingprocMount
to Unmasked requires the spec.hostUsers
value in the pod
spec to be false
. In other words: a container that wishes to have an Unmasked
/proc
or unmasked /sys
must also be in a
user namespace.
Kubernetes v1.12 to v1.29 did not enforce that requirement.Discussion
The security context for a Pod applies to the Pod's Containers and also to
the Pod's Volumes when applicable. Specifically fsGroup
and seLinuxOptions
are
applied to Volumes as follows:
-
fsGroup
: Volumes that support ownership management are modified to be owned and writable by the GID specified infsGroup
. See the Ownership Management design document for more details. -
seLinuxOptions
: Volumes that support SELinux labeling are relabeled to be accessible by the label specified underseLinuxOptions
. Usually you only need to set thelevel
section. This sets the Multi-Category Security (MCS) label given to all Containers in the Pod as well as the Volumes.
Warning:
After you specify an MCS label for a Pod, all Pods with the same label can access the Volume. If you need inter-Pod protection, you must assign a unique MCS label to each Pod.Clean up
Delete the Pod:
kubectl delete pod security-context-demo
kubectl delete pod security-context-demo-2
kubectl delete pod security-context-demo-3
kubectl delete pod security-context-demo-4
What's next
- PodSecurityContext
- SecurityContext
- CRI Plugin Config Guide
- Security Contexts design document
- Ownership Management design document
- PodSecurity Admission
- AllowPrivilegeEscalation design document
- For more information about security mechanisms in Linux, see Overview of Linux Kernel Security Features (Note: Some information is out of date)
- Read about User Namespaces for Linux pods.
- Masked Paths in the OCI Runtime Specification
13 - Configure Service Accounts for Pods
Kubernetes offers two distinct ways for clients that run within your cluster, or that otherwise have a relationship to your cluster's control plane to authenticate to the API server.
A service account provides an identity for processes that run in a Pod, and maps to a ServiceAccount object. When you authenticate to the API server, you identify yourself as a particular user. Kubernetes recognises the concept of a user, however, Kubernetes itself does not have a User API.
This task guide is about ServiceAccounts, which do exist in the Kubernetes API. The guide shows you some ways to configure ServiceAccounts for Pods.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Use the default service account to access the API server
When Pods contact the API server, Pods authenticate as a particular
ServiceAccount (for example, default
). There is always at least one
ServiceAccount in each namespace.
Every Kubernetes namespace contains at least one ServiceAccount: the default
ServiceAccount for that namespace, named default
.
If you do not specify a ServiceAccount when you create a Pod, Kubernetes
automatically assigns the ServiceAccount named default
in that namespace.
You can fetch the details for a Pod you have created. For example:
kubectl get pods/<podname> -o yaml
In the output, you see a field spec.serviceAccountName
.
Kubernetes automatically
sets that value if you don't specify it when you create a Pod.
An application running inside a Pod can access the Kubernetes API using automatically mounted service account credentials. See accessing the Cluster to learn more.
When a Pod authenticates as a ServiceAccount, its level of access depends on the authorization plugin and policy in use.
Opt out of API credential automounting
If you don't want the kubelet
to automatically mount a ServiceAccount's API credentials, you can opt out of
the default behavior.
You can opt out of automounting API credentials on /var/run/secrets/kubernetes.io/serviceaccount/token
for a service account by setting automountServiceAccountToken: false
on the ServiceAccount:
For example:
apiVersion: v1
kind: ServiceAccount
metadata:
name: build-robot
automountServiceAccountToken: false
...
You can also opt out of automounting API credentials for a particular Pod:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
serviceAccountName: build-robot
automountServiceAccountToken: false
...
If both the ServiceAccount and the Pod's .spec
specify a value for
automountServiceAccountToken
, the Pod spec takes precedence.
Use more than one ServiceAccount
Every namespace has at least one ServiceAccount: the default ServiceAccount
resource, called default
. You can list all ServiceAccount resources in your
current namespace
with:
kubectl get serviceaccounts
The output is similar to this:
NAME SECRETS AGE
default 1 1d
You can create additional ServiceAccount objects like this:
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: build-robot
EOF
The name of a ServiceAccount object must be a valid DNS subdomain name.
If you get a complete dump of the service account object, like this:
kubectl get serviceaccounts/build-robot -o yaml
The output is similar to this:
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: 2019-06-16T00:12:34Z
name: build-robot
namespace: default
resourceVersion: "272500"
uid: 721ab723-13bc-11e5-aec2-42010af0021e
You can use authorization plugins to set permissions on service accounts.
To use a non-default service account, set the spec.serviceAccountName
field of a Pod to the name of the ServiceAccount you wish to use.
You can only set the serviceAccountName
field when creating a Pod, or in a
template for a new Pod. You cannot update the .spec.serviceAccountName
field
of a Pod that already exists.
Note:
The.spec.serviceAccount
field is a deprecated alias for .spec.serviceAccountName
.
If you want to remove the fields from a workload resource, set both fields to empty explicitly
on the pod template.Cleanup
If you tried creating build-robot
ServiceAccount from the example above,
you can clean it up by running:
kubectl delete serviceaccount/build-robot
Manually create an API token for a ServiceAccount
Suppose you have an existing service account named "build-robot" as mentioned earlier.
You can get a time-limited API token for that ServiceAccount using kubectl
:
kubectl create token build-robot
The output from that command is a token that you can use to authenticate as that
ServiceAccount. You can request a specific token duration using the --duration
command line argument to kubectl create token
(the actual duration of the issued
token might be shorter, or could even be longer).
When the ServiceAccountTokenNodeBinding
and ServiceAccountTokenNodeBindingValidation
features are enabled and the KUBECTL_NODE_BOUND_TOKENS
environment variable is set to true
,
it is possible to create a service account token that is directly bound to a Node
:
KUBECTL_NODE_BOUND_TOKENS=true kubectl create token build-robot --bound-object-kind Node --bound-object-name node-001 --bound-object-uid 123...456
The token will be valid until it expires or either the associated Node
or service account are deleted.
Note:
Versions of Kubernetes before v1.22 automatically created long term credentials for accessing the Kubernetes API. This older mechanism was based on creating token Secrets that could then be mounted into running Pods. In more recent versions, including Kubernetes v1.30, API credentials are obtained directly by using the TokenRequest API, and are mounted into Pods using a projected volume. The tokens obtained using this method have bounded lifetimes, and are automatically invalidated when the Pod they are mounted into is deleted.
You can still manually create a service account token Secret; for example, if you need a token that never expires. However, using the TokenRequest subresource to obtain a token to access the API is recommended instead.
Manually create a long-lived API token for a ServiceAccount
If you want to obtain an API token for a ServiceAccount, you create a new Secret
with a special annotation, kubernetes.io/service-account.name
.
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: build-robot-secret
annotations:
kubernetes.io/service-account.name: build-robot
type: kubernetes.io/service-account-token
EOF
If you view the Secret using:
kubectl get secret/build-robot-secret -o yaml
you can see that the Secret now contains an API token for the "build-robot" ServiceAccount.
Because of the annotation you set, the control plane automatically generates a token for that ServiceAccounts, and stores them into the associated Secret. The control plane also cleans up tokens for deleted ServiceAccounts.
kubectl describe secrets/build-robot-secret
The output is similar to this:
Name: build-robot-secret
Namespace: default
Labels: <none>
Annotations: kubernetes.io/service-account.name: build-robot
kubernetes.io/service-account.uid: da68f9c6-9d26-11e7-b84e-002dc52800da
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1338 bytes
namespace: 7 bytes
token: ...
Note:
The content of token
is omitted here.
Take care not to display the contents of a kubernetes.io/service-account-token
Secret somewhere that your terminal / computer screen could be seen by an onlooker.
When you delete a ServiceAccount that has an associated Secret, the Kubernetes control plane automatically cleans up the long-lived token from that Secret.
Note:
If you view the ServiceAccount using:
kubectl get serviceaccount build-robot -o yaml
You can't see the build-robot-secret
Secret in the ServiceAccount API objects
.secrets
field
because that field is only populated with auto-generated Secrets.
Add ImagePullSecrets to a service account
First, create an imagePullSecret. Next, verify it has been created. For example:
-
Create an imagePullSecret, as described in Specifying ImagePullSecrets on a Pod.
kubectl create secret docker-registry myregistrykey --docker-server=<registry name> \ --docker-username=DUMMY_USERNAME --docker-password=DUMMY_DOCKER_PASSWORD \ --docker-email=DUMMY_DOCKER_EMAIL
-
Verify it has been created.
kubectl get secrets myregistrykey
The output is similar to this:
NAME TYPE DATA AGE myregistrykey kubernetes.io/.dockerconfigjson 1 1d
Add image pull secret to service account
Next, modify the default service account for the namespace to use this Secret as an imagePullSecret.
kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "myregistrykey"}]}'
You can achieve the same outcome by editing the object manually:
kubectl edit serviceaccount/default
The output of the sa.yaml
file is similar to this:
Your selected text editor will open with a configuration looking something like this:
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: 2021-07-07T22:02:39Z
name: default
namespace: default
resourceVersion: "243024"
uid: 052fb0f4-3d50-11e5-b066-42010af0d7b6
Using your editor, delete the line with key resourceVersion
, add lines for
imagePullSecrets:
and save it. Leave the uid
value set the same as you found it.
After you made those changes, the edited ServiceAccount looks something like this:
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: 2021-07-07T22:02:39Z
name: default
namespace: default
uid: 052fb0f4-3d50-11e5-b066-42010af0d7b6
imagePullSecrets:
- name: myregistrykey
Verify that imagePullSecrets are set for new Pods
Now, when a new Pod is created in the current namespace and using the default
ServiceAccount, the new Pod has its spec.imagePullSecrets
field set automatically:
kubectl run nginx --image=<registry name>/nginx --restart=Never
kubectl get pod nginx -o=jsonpath='{.spec.imagePullSecrets[0].name}{"\n"}'
The output is:
myregistrykey
ServiceAccount token volume projection
Kubernetes v1.20 [stable]
Note:
To enable and use token request projection, you must specify each of the following
command line arguments to kube-apiserver
:
--service-account-issuer
- defines the Identifier of the service account token issuer. You can specify the
--service-account-issuer
argument multiple times, this can be useful to enable a non-disruptive change of the issuer. When this flag is specified multiple times, the first is used to generate tokens and all are used to determine which issuers are accepted. You must be running Kubernetes v1.22 or later to be able to specify--service-account-issuer
multiple times. --service-account-key-file
- specifies the path to a file containing PEM-encoded X.509 private or public keys (RSA or ECDSA), used to verify ServiceAccount tokens. The specified file can contain multiple keys, and the flag can be specified multiple times with different files. If specified multiple times, tokens signed by any of the specified keys are considered valid by the Kubernetes API server.
--service-account-signing-key-file
- specifies the path to a file that contains the current private key of the service account token issuer. The issuer signs issued ID tokens with this private key.
--api-audiences
(can be omitted)- defines audiences for ServiceAccount tokens. The service account token authenticator
validates that tokens used against the API are bound to at least one of these audiences.
If
api-audiences
is specified multiple times, tokens for any of the specified audiences are considered valid by the Kubernetes API server. If you specify the--service-account-issuer
command line argument but you don't set--api-audiences
, the control plane defaults to a single element audience list that contains only the issuer URL.
The kubelet can also project a ServiceAccount token into a Pod. You can specify desired properties of the token, such as the audience and the validity duration. These properties are not configurable on the default ServiceAccount token. The token will also become invalid against the API when either the Pod or the ServiceAccount is deleted.
You can configure this behavior for the spec
of a Pod using a
projected volume type called
ServiceAccountToken
.
The token from this projected volume is a JSON Web Token (JWT). The JSON payload of this token follows a well defined schema - an example payload for a pod bound token:
{
"aud": [ # matches the requested audiences, or the API server's default audiences when none are explicitly requested
"https://kubernetes.default.svc"
],
"exp": 1731613413,
"iat": 1700077413,
"iss": "https://kubernetes.default.svc", # matches the first value passed to the --service-account-issuer flag
"jti": "ea28ed49-2e11-4280-9ec5-bc3d1d84661a", # ServiceAccountTokenJTI feature must be enabled for the claim to be present
"kubernetes.io": {
"namespace": "kube-system",
"node": { # ServiceAccountTokenPodNodeInfo feature must be enabled for the API server to add this node reference claim
"name": "127.0.0.1",
"uid": "58456cb0-dd00-45ed-b797-5578fdceaced"
},
"pod": {
"name": "coredns-69cbfb9798-jv9gn",
"uid": "778a530c-b3f4-47c0-9cd5-ab018fb64f33"
},
"serviceaccount": {
"name": "coredns",
"uid": "a087d5a0-e1dd-43ec-93ac-f13d89cd13af"
},
"warnafter": 1700081020
},
"nbf": 1700077413,
"sub": "system:serviceaccount:kube-system:coredns"
}
Launch a Pod using service account token projection
To provide a Pod with a token with an audience of vault
and a validity duration
of two hours, you could define a Pod manifest that is similar to:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- image: nginx
name: nginx
volumeMounts:
- mountPath: /var/run/secrets/tokens
name: vault-token
serviceAccountName: build-robot
volumes:
- name: vault-token
projected:
sources:
- serviceAccountToken:
path: vault-token
expirationSeconds: 7200
audience: vault
Create the Pod:
kubectl create -f https://k8s.io/examples/pods/pod-projected-svc-token.yaml
The kubelet will: request and store the token on behalf of the Pod; make the token available to the Pod at a configurable file path; and refresh the token as it approaches expiration. The kubelet proactively requests rotation for the token if it is older than 80% of its total time-to-live (TTL), or if the token is older than 24 hours.
The application is responsible for reloading the token when it rotates. It's often good enough for the application to load the token on a schedule (for example: once every 5 minutes), without tracking the actual expiry time.
Service account issuer discovery
Kubernetes v1.21 [stable]
If you have enabled token projection for ServiceAccounts in your cluster, then you can also make use of the discovery feature. Kubernetes provides a way for clients to federate as an identity provider, so that one or more external systems can act as a relying party.
Note:
The issuer URL must comply with the
OIDC Discovery Spec. In
practice, this means it must use the https
scheme, and should serve an OpenID
provider configuration at {service-account-issuer}/.well-known/openid-configuration
.
If the URL does not comply, ServiceAccount issuer discovery endpoints are not registered or accessible.
When enabled, the Kubernetes API server publishes an OpenID Provider
Configuration document via HTTP. The configuration document is published at
/.well-known/openid-configuration
.
The OpenID Provider Configuration is sometimes referred to as the discovery document.
The Kubernetes API server publishes the related
JSON Web Key Set (JWKS), also via HTTP, at /openid/v1/jwks
.
Note:
The responses served at/.well-known/openid-configuration
and
/openid/v1/jwks
are designed to be OIDC compatible, but not strictly OIDC
compliant. Those documents contain only the parameters necessary to perform
validation of Kubernetes service account tokens.Clusters that use RBAC include a
default ClusterRole called system:service-account-issuer-discovery
.
A default ClusterRoleBinding assigns this role to the system:serviceaccounts
group,
which all ServiceAccounts implicitly belong to.
This allows pods running on the cluster to access the service account discovery document
via their mounted service account token. Administrators may, additionally, choose to
bind the role to system:authenticated
or system:unauthenticated
depending on their
security requirements and which external systems they intend to federate with.
The JWKS response contains public keys that a relying party can use to validate
the Kubernetes service account tokens. Relying parties first query for the
OpenID Provider Configuration, and use the jwks_uri
field in the response to
find the JWKS.
In many cases, Kubernetes API servers are not available on the public internet,
but public endpoints that serve cached responses from the API server can be made
available by users or by service providers. In these cases, it is possible to
override the jwks_uri
in the OpenID Provider Configuration so that it points
to the public endpoint, rather than the API server's address, by passing the
--service-account-jwks-uri
flag to the API server. Like the issuer URL, the
JWKS URI is required to use the https
scheme.
What's next
See also:
- Read the Cluster Admin Guide to Service Accounts
- Read about Authorization in Kubernetes
- Read about Secrets
- or learn to distribute credentials securely using Secrets
- but also bear in mind that using Secrets for authenticating as a ServiceAccount is deprecated. The recommended alternative is ServiceAccount token volume projection.
- Read about projected volumes.
- For background on OIDC discovery, read the ServiceAccount signing key retrieval Kubernetes Enhancement Proposal
- Read the OIDC Discovery Spec
14 - Pull an Image from a Private Registry
This page shows how to create a Pod that uses a Secret to pull an image from a private container image registry or repository. There are many private registries in use. This task uses Docker Hub as an example registry.
Before you begin
-
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
-
To do this exercise, you need the
docker
command line tool, and a Docker ID for which you know the password. -
If you are using a different private container registry, you need the command line tool for that registry and any login information for the registry.
Log in to Docker Hub
On your laptop, you must authenticate with a registry in order to pull a private image.
Use the docker
tool to log in to Docker Hub. See the log in section of
Docker ID accounts for more information.
docker login
When prompted, enter your Docker ID, and then the credential you want to use (access token, or the password for your Docker ID).
The login process creates or updates a config.json
file that holds an authorization token.
Review how Kubernetes interprets this file.
View the config.json
file:
cat ~/.docker/config.json
The output contains a section similar to this:
{
"auths": {
"https://index.docker.io/v1/": {
"auth": "c3R...zE2"
}
}
}
Note:
If you use a Docker credentials store, you won't see thatauth
entry but a credsStore
entry with the name of the store as value.
In that case, you can create a secret directly.
See Create a Secret by providing credentials on the command line.Create a Secret based on existing credentials
A Kubernetes cluster uses the Secret of kubernetes.io/dockerconfigjson
type to authenticate with
a container registry to pull a private image.
If you already ran docker login
, you can copy
that credential into Kubernetes:
kubectl create secret generic regcred \
--from-file=.dockerconfigjson=<path/to/.docker/config.json> \
--type=kubernetes.io/dockerconfigjson
If you need more control (for example, to set a namespace or a label on the new secret) then you can customise the Secret before storing it. Be sure to:
- set the name of the data item to
.dockerconfigjson
- base64 encode the Docker configuration file and then paste that string, unbroken
as the value for field
data[".dockerconfigjson"]
- set
type
tokubernetes.io/dockerconfigjson
Example:
apiVersion: v1
kind: Secret
metadata:
name: myregistrykey
namespace: awesomeapps
data:
.dockerconfigjson: UmVhbGx5IHJlYWxseSByZWVlZWVlZWVlZWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGx5eXl5eXl5eXl5eXl5eXl5eXl5eSBsbGxsbGxsbGxsbGxsbG9vb29vb29vb29vb29vb29vb29vb29vb29vb25ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubmdnZ2dnZ2dnZ2dnZ2dnZ2dnZ2cgYXV0aCBrZXlzCg==
type: kubernetes.io/dockerconfigjson
If you get the error message error: no objects passed to create
, it may mean the base64 encoded string is invalid.
If you get an error message like Secret "myregistrykey" is invalid: data[.dockerconfigjson]: invalid value ...
, it means
the base64 encoded string in the data was successfully decoded, but could not be parsed as a .docker/config.json
file.
Create a Secret by providing credentials on the command line
Create this Secret, naming it regcred
:
kubectl create secret docker-registry regcred --docker-server=<your-registry-server> --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
where:
<your-registry-server>
is your Private Docker Registry FQDN. Usehttps://index.docker.io/v1/
for DockerHub.<your-name>
is your Docker username.<your-pword>
is your Docker password.<your-email>
is your Docker email.
You have successfully set your Docker credentials in the cluster as a Secret called regcred
.
Note:
Typing secrets on the command line may store them in your shell history unprotected, and those secrets might also be visible to other users on your PC during the time thatkubectl
is running.Inspecting the Secret regcred
To understand the contents of the regcred
Secret you created, start by viewing the Secret in YAML format:
kubectl get secret regcred --output=yaml
The output is similar to this:
apiVersion: v1
kind: Secret
metadata:
...
name: regcred
...
data:
.dockerconfigjson: eyJodHRwczovL2luZGV4L ... J0QUl6RTIifX0=
type: kubernetes.io/dockerconfigjson
The value of the .dockerconfigjson
field is a base64 representation of your Docker credentials.
To understand what is in the .dockerconfigjson
field, convert the secret data to a
readable format:
kubectl get secret regcred --output="jsonpath={.data.\.dockerconfigjson}" | base64 --decode
The output is similar to this:
{"auths":{"your.private.registry.example.com":{"username":"janedoe","password":"xxxxxxxxxxx","email":"jdoe@example.com","auth":"c3R...zE2"}}}
To understand what is in the auth
field, convert the base64-encoded data to a readable format:
echo "c3R...zE2" | base64 --decode
The output, username and password concatenated with a :
, is similar to this:
janedoe:xxxxxxxxxxx
Notice that the Secret data contains the authorization token similar to your local ~/.docker/config.json
file.
You have successfully set your Docker credentials as a Secret called regcred
in the cluster.
Create a Pod that uses your Secret
Here is a manifest for an example Pod that needs access to your Docker credentials in regcred
:
apiVersion: v1
kind: Pod
metadata:
name: private-reg
spec:
containers:
- name: private-reg-container
image: <your-private-image>
imagePullSecrets:
- name: regcred
Download the above file onto your computer:
curl -L -o my-private-reg-pod.yaml https://k8s.io/examples/pods/private-reg-pod.yaml
In file my-private-reg-pod.yaml
, replace <your-private-image>
with the path to an image in a private registry such as:
your.private.registry.example.com/janedoe/jdoe-private:v1
To pull the image from the private registry, Kubernetes needs credentials.
The imagePullSecrets
field in the configuration file specifies that
Kubernetes should get the credentials from a Secret named regcred
.
Create a Pod that uses your Secret, and verify that the Pod is running:
kubectl apply -f my-private-reg-pod.yaml
kubectl get pod private-reg
Note:
To use image pull secrets for a Pod (or a Deployment, or other object that has a pod template that you are using), you need to make sure that the appropriate Secret does exist in the right namespace. The namespace to use is the same namespace where you defined the Pod.Also, in case the Pod fails to start with the status ImagePullBackOff
, view the Pod events:
kubectl describe pod private-reg
If you then see an event with the reason set to FailedToRetrieveImagePullSecret
,
Kubernetes can't find a Secret with name (regcred
, in this example).
If you specify that a Pod needs image pull credentials, the kubelet checks that it can
access that Secret before attempting to pull the image.
Make sure that the Secret you have specified exists, and that its name is spelled properly.
Events:
... Reason ... Message
------ -------
... FailedToRetrieveImagePullSecret ... Unable to retrieve some image pull secrets (<regcred>); attempting to pull the image may not succeed.
What's next
- Learn more about Secrets
- or read the API reference for Secret
- Learn more about using a private registry.
- Learn more about adding image pull secrets to a service account.
- See kubectl create secret docker-registry.
- See the
imagePullSecrets
field within the container definitions of a Pod
15 - Configure Liveness, Readiness and Startup Probes
This page shows how to configure liveness, readiness and startup probes for containers.
For more information about probes, see Liveness, Readiness and Startup Probes
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold. This ensures that the pod is observed as not-ready for some period of time before it is hard killed.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The kubelet uses startup probes to know when a container application has started. If such a probe is configured, liveness and readiness probes do not start until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.
Caution:
Liveness probes can be a powerful way to recover from application failures, but they should be used with caution. Liveness probes must be configured carefully to ensure that they truly indicate unrecoverable application failure, for example a deadlock.Note:
Incorrect implementation of liveness probes can lead to cascading failures. This results in restarting of container under high load; failed client requests as your application became less scalable; and increased workload on remaining pods due to some failed pods. Understand the difference between readiness and liveness probes and when to apply them for your app.Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Define a liveness command
Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.
In this exercise, you create a Pod that runs a container based on the
registry.k8s.io/busybox
image. Here is the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: registry.k8s.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
In the configuration file, you can see that the Pod has a single Container
.
The periodSeconds
field specifies that the kubelet should perform a liveness
probe every 5 seconds. The initialDelaySeconds
field tells the kubelet that it
should wait 5 seconds before performing the first probe. To perform a probe, the
kubelet executes the command cat /tmp/healthy
in the target container. If the
command succeeds, it returns 0, and the kubelet considers the container to be alive and
healthy. If the command returns a non-zero value, the kubelet kills the container
and restarts it.
When the container starts, it executes this command:
/bin/sh -c "touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600"
For the first 30 seconds of the container's life, there is a /tmp/healthy
file.
So during the first 30 seconds, the command cat /tmp/healthy
returns a success
code. After 30 seconds, cat /tmp/healthy
returns a failure code.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/probe/exec-liveness.yaml
Within 30 seconds, view the Pod events:
kubectl describe pod liveness-exec
The output indicates that no liveness probes have failed yet:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11s default-scheduler Successfully assigned default/liveness-exec to node01
Normal Pulling 9s kubelet, node01 Pulling image "registry.k8s.io/busybox"
Normal Pulled 7s kubelet, node01 Successfully pulled image "registry.k8s.io/busybox"
Normal Created 7s kubelet, node01 Created container liveness
Normal Started 7s kubelet, node01 Started container liveness
After 35 seconds, view the Pod events again:
kubectl describe pod liveness-exec
At the bottom of the output, there are messages indicating that the liveness probes have failed, and the failed containers have been killed and recreated.
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 57s default-scheduler Successfully assigned default/liveness-exec to node01
Normal Pulling 55s kubelet, node01 Pulling image "registry.k8s.io/busybox"
Normal Pulled 53s kubelet, node01 Successfully pulled image "registry.k8s.io/busybox"
Normal Created 53s kubelet, node01 Created container liveness
Normal Started 53s kubelet, node01 Started container liveness
Warning Unhealthy 10s (x3 over 20s) kubelet, node01 Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 10s kubelet, node01 Container liveness failed liveness probe, will be restarted
Wait another 30 seconds, and verify that the container has been restarted:
kubectl get pod liveness-exec
The output shows that RESTARTS
has been incremented. Note that the RESTARTS
counter
increments as soon as a failed container comes back to the running state:
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 1 1m
Define a liveness HTTP request
Another kind of liveness probe uses an HTTP GET request. Here is the configuration
file for a Pod that runs a container based on the registry.k8s.io/e2e-test-images/agnhost
image.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: registry.k8s.io/e2e-test-images/agnhost:2.40
args:
- liveness
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
In the configuration file, you can see that the Pod has a single container.
The periodSeconds
field specifies that the kubelet should perform a liveness
probe every 3 seconds. The initialDelaySeconds
field tells the kubelet that it
should wait 3 seconds before performing the first probe. To perform a probe, the
kubelet sends an HTTP GET request to the server that is running in the container
and listening on port 8080. If the handler for the server's /healthz
path
returns a success code, the kubelet considers the container to be alive and
healthy. If the handler returns a failure code, the kubelet kills the container
and restarts it.
Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.
You can see the source code for the server in server.go.
For the first 10 seconds that the container is alive, the /healthz
handler
returns a status of 200. After that, the handler returns a status of 500.
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
duration := time.Now().Sub(started)
if duration.Seconds() > 10 {
w.WriteHeader(500)
w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
} else {
w.WriteHeader(200)
w.Write([]byte("ok"))
}
})
The kubelet starts performing health checks 3 seconds after the container starts. So the first couple of health checks will succeed. But after 10 seconds, the health checks will fail, and the kubelet will kill and restart the container.
To try the HTTP liveness check, create a Pod:
kubectl apply -f https://k8s.io/examples/pods/probe/http-liveness.yaml
After 10 seconds, view Pod events to verify that liveness probes have failed and the container has been restarted:
kubectl describe pod liveness-http
In releases after v1.13, local HTTP proxy environment variable settings do not affect the HTTP liveness probe.
Define a TCP liveness probe
A third type of liveness probe uses a TCP socket. With this configuration, the kubelet will attempt to open a socket to your container on the specified port. If it can establish a connection, the container is considered healthy, if it can't it is considered a failure.
apiVersion: v1
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: registry.k8s.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
As you can see, configuration for a TCP check is quite similar to an HTTP check.
This example uses both readiness and liveness probes. The kubelet will send the
first readiness probe 15 seconds after the container starts. This will attempt to
connect to the goproxy
container on port 8080. If the probe succeeds, the Pod
will be marked as ready. The kubelet will continue to run this check every 10
seconds.
In addition to the readiness probe, this configuration includes a liveness probe.
The kubelet will run the first liveness probe 15 seconds after the container
starts. Similar to the readiness probe, this will attempt to connect to the
goproxy
container on port 8080. If the liveness probe fails, the container
will be restarted.
To try the TCP liveness check, create a Pod:
kubectl apply -f https://k8s.io/examples/pods/probe/tcp-liveness-readiness.yaml
After 15 seconds, view Pod events to verify that liveness probes:
kubectl describe pod goproxy
Define a gRPC liveness probe
Kubernetes v1.27 [stable]
If your application implements the gRPC Health Checking Protocol, this example shows how to configure Kubernetes to use it for application liveness checks. Similarly you can configure readiness and startup probes.
Here is an example manifest:
apiVersion: v1
kind: Pod
metadata:
name: etcd-with-grpc
spec:
containers:
- name: etcd
image: registry.k8s.io/etcd:3.5.1-0
command: [ "/usr/local/bin/etcd", "--data-dir", "/var/lib/etcd", "--listen-client-urls", "http://0.0.0.0:2379", "--advertise-client-urls", "http://127.0.0.1:2379", "--log-level", "debug"]
ports:
- containerPort: 2379
livenessProbe:
grpc:
port: 2379
initialDelaySeconds: 10
To use a gRPC probe, port
must be configured. If you want to distinguish probes of different types
and probes for different features you can use the service
field.
You can set service
to the value liveness
and make your gRPC Health Checking endpoint
respond to this request differently than when you set service
set to readiness
.
This lets you use the same endpoint for different kinds of container health check
rather than listening on two different ports.
If you want to specify your own custom service name and also specify a probe type,
the Kubernetes project recommends that you use a name that concatenates
those. For example: myservice-liveness
(using -
as a separator).
Note:
Unlike HTTP or TCP probes, you cannot specify the health check port by name, and you cannot configure a custom hostname.Configuration problems (for example: incorrect port or service, unimplemented health checking protocol) are considered a probe failure, similar to HTTP and TCP probes.
To try the gRPC liveness check, create a Pod using the command below. In the example below, the etcd pod is configured to use gRPC liveness probe.
kubectl apply -f https://k8s.io/examples/pods/probe/grpc-liveness.yaml
After 15 seconds, view Pod events to verify that the liveness check has not failed:
kubectl describe pod etcd-with-grpc
When using a gRPC probe, there are some technical details to be aware of:
- The probes run against the pod IP address or its hostname. Be sure to configure your gRPC endpoint to listen on the Pod's IP address.
- The probes do not support any authentication parameters (like
-tls
). - There are no error codes for built-in probes. All errors are considered as probe failures.
- If
ExecProbeTimeout
feature gate is set tofalse
, grpc-health-probe does not respect thetimeoutSeconds
setting (which defaults to 1s), while built-in probe would fail on timeout.
Use a named port
You can use a named port
for HTTP and TCP probes. gRPC probes do not support named ports.
For example:
ports:
- name: liveness-port
containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: liveness-port
Protect slow starting containers with startup probes
Sometimes, you have to deal with legacy applications that might require
an additional startup time on their first initialization.
In such cases, it can be tricky to set up liveness probe parameters without
compromising the fast response to deadlocks that motivated such a probe.
The trick is to set up a startup probe with the same command, HTTP or TCP
check, with a failureThreshold * periodSeconds
long enough to cover the
worst case startup time.
So, the previous example would become:
ports:
- name: liveness-port
containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 1
periodSeconds: 10
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
Thanks to the startup probe, the application will have a maximum of 5 minutes
(30 * 10 = 300s) to finish its startup.
Once the startup probe has succeeded once, the liveness probe takes over to
provide a fast response to container deadlocks.
If the startup probe never succeeds, the container is killed after 300s and
subject to the pod's restartPolicy
.
Define readiness probes
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.
Note:
Readiness probes runs on the container during its whole lifecycle.Caution:
The readiness and liveness probes do not depend on each other to succeed. If you want to wait before executing a readiness probe, you should useinitialDelaySeconds
or a startupProbe
.Readiness probes are configured similarly to liveness probes. The only difference
is that you use the readinessProbe
field instead of the livenessProbe
field.
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Configuration for HTTP and TCP readiness probes also remains identical to liveness probes.
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.
Configure Probes
Probes have a number of fields that you can use to more precisely control the behavior of startup, liveness and readiness checks:
initialDelaySeconds
: Number of seconds after the container has started before startup, liveness or readiness probes are initiated. If a startup probe is defined, liveness and readiness probe delays do not begin until the startup probe has succeeded. If the value ofperiodSeconds
is greater thaninitialDelaySeconds
then theinitialDelaySeconds
would be ignored. Defaults to 0 seconds. Minimum value is 0.periodSeconds
: How often (in seconds) to perform the probe. Default to 10 seconds. The minimum value is 1.timeoutSeconds
: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.successThreshold
: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup Probes. Minimum value is 1.failureThreshold
: After a probe failsfailureThreshold
times in a row, Kubernetes considers that the overall check has failed: the container is not ready/healthy/live. Defaults to 3. Minimum value is 1. For the case of a startup or liveness probe, if at leastfailureThreshold
probes have failed, Kubernetes treats the container as unhealthy and triggers a restart for that specific container. The kubelet honors the setting ofterminationGracePeriodSeconds
for that container. For a failed readiness probe, the kubelet continues running the container that failed checks, and also continues to run more probes; because the check failed, the kubelet sets theReady
condition on the Pod tofalse
.terminationGracePeriodSeconds
: configure a grace period for the kubelet to wait between triggering a shut down of the failed container, and then forcing the container runtime to stop that container. The default is to inherit the Pod-level value forterminationGracePeriodSeconds
(30 seconds if not specified), and the minimum value is 1. See probe-levelterminationGracePeriodSeconds
for more detail.
Caution:
Incorrect implementation of readiness probes may result in an ever growing number of processes in the container, and resource starvation if this is left unchecked.HTTP probes
HTTP probes
have additional fields that can be set on httpGet
:
host
: Host name to connect to, defaults to the pod IP. You probably want to set "Host" inhttpHeaders
instead.scheme
: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to "HTTP".path
: Path to access on the HTTP server. Defaults to "/".httpHeaders
: Custom headers to set in the request. HTTP allows repeated headers.port
: Name or number of the port to access on the container. Number must be in the range 1 to 65535.
For an HTTP probe, the kubelet sends an HTTP request to the specified port and
path to perform the check. The kubelet sends the probe to the Pod's IP address,
unless the address is overridden by the optional host
field in httpGet
. If
scheme
field is set to HTTPS
, the kubelet sends an HTTPS request skipping the
certificate verification. In most scenarios, you do not want to set the host
field.
Here's one scenario where you would set it. Suppose the container listens on 127.0.0.1
and the Pod's hostNetwork
field is true. Then host
, under httpGet
, should be set
to 127.0.0.1. If your pod relies on virtual hosts, which is probably the more common
case, you should not use host
, but rather set the Host
header in httpHeaders
.
For an HTTP probe, the kubelet sends two request headers in addition to the mandatory Host
header:
User-Agent
: The default value iskube-probe/1.30
, where1.30
is the version of the kubelet.Accept
: The default value is*/*
.
You can override the default headers by defining httpHeaders
for the probe.
For example:
livenessProbe:
httpGet:
httpHeaders:
- name: Accept
value: application/json
startupProbe:
httpGet:
httpHeaders:
- name: User-Agent
value: MyUserAgent
You can also remove these two headers by defining them with an empty value.
livenessProbe:
httpGet:
httpHeaders:
- name: Accept
value: ""
startupProbe:
httpGet:
httpHeaders:
- name: User-Agent
value: ""
Note:
When the kubelet probes a Pod using HTTP, it only follows redirects if the redirect
is to the same host. If the kubelet receives 11 or more redirects during probing, the probe is considered successful
and a related Event is created:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 29m default-scheduler Successfully assigned default/httpbin-7b8bc9cb85-bjzwn to daocloud
Normal Pulling 29m kubelet Pulling image "docker.io/kennethreitz/httpbin"
Normal Pulled 24m kubelet Successfully pulled image "docker.io/kennethreitz/httpbin" in 5m12.402735213s
Normal Created 24m kubelet Created container httpbin
Normal Started 24m kubelet Started container httpbin
Warning ProbeWarning 4m11s (x1197 over 24m) kubelet Readiness probe warning: Probe terminated redirects
If the kubelet receives a redirect where the hostname is different from the request, the outcome of the probe is treated as successful and kubelet creates an event to report the redirect failure.
TCP probes
For a TCP probe, the kubelet makes the probe connection at the node, not in the Pod, which
means that you can not use a service name in the host
parameter since the kubelet is unable
to resolve it.
Probe-level terminationGracePeriodSeconds
Kubernetes v1.28 [stable]
In 1.25 and above, users can specify a probe-level terminationGracePeriodSeconds
as part of the probe specification. When both a pod- and probe-level
terminationGracePeriodSeconds
are set, the kubelet will use the probe-level value.
When setting the terminationGracePeriodSeconds
, please note the following:
-
The kubelet always honors the probe-level
terminationGracePeriodSeconds
field if it is present on a Pod. -
If you have existing Pods where the
terminationGracePeriodSeconds
field is set and you no longer wish to use per-probe termination grace periods, you must delete those existing Pods.
For example:
spec:
terminationGracePeriodSeconds: 3600 # pod-level
containers:
- name: test
image: ...
ports:
- name: liveness-port
containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 1
periodSeconds: 60
# Override pod-level terminationGracePeriodSeconds #
terminationGracePeriodSeconds: 60
Probe-level terminationGracePeriodSeconds
cannot be set for readiness probes.
It will be rejected by the API server.
What's next
- Learn more about Container Probes.
You can also read the API references for:
- Pod, and specifically:
16 - Assign Pods to Nodes
This page shows how to assign a Kubernetes Pod to a particular node in a Kubernetes cluster.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Add a label to a node
-
List the nodes in your cluster, along with their labels:
kubectl get nodes --show-labels
The output is similar to this:
NAME STATUS ROLES AGE VERSION LABELS worker0 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker0 worker1 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker1 worker2 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker2
-
Choose one of your nodes, and add a label to it:
kubectl label nodes <your-node-name> disktype=ssd
where
<your-node-name>
is the name of your chosen node. -
Verify that your chosen node has a
disktype=ssd
label:kubectl get nodes --show-labels
The output is similar to this:
NAME STATUS ROLES AGE VERSION LABELS worker0 Ready <none> 1d v1.13.0 ...,disktype=ssd,kubernetes.io/hostname=worker0 worker1 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker1 worker2 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker2
In the preceding output, you can see that the
worker0
node has adisktype=ssd
label.
Create a pod that gets scheduled to your chosen node
This pod configuration file describes a pod that has a node selector,
disktype: ssd
. This means that the pod will get scheduled on a node that has
a disktype=ssd
label.
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
-
Use the configuration file to create a pod that will get scheduled on your chosen node:
kubectl apply -f https://k8s.io/examples/pods/pod-nginx.yaml
-
Verify that the pod is running on your chosen node:
kubectl get pods --output=wide
The output is similar to this:
NAME READY STATUS RESTARTS AGE IP NODE nginx 1/1 Running 0 13s 10.200.0.4 worker0
Create a pod that gets scheduled to specific node
You can also schedule a pod to one specific node via setting nodeName
.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
nodeName: foo-node # schedule pod to specific node
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
Use the configuration file to create a pod that will get scheduled on foo-node
only.
What's next
- Learn more about labels and selectors.
- Learn more about nodes.
17 - Assign Pods to Nodes using Node Affinity
This page shows how to assign a Kubernetes Pod to a particular node using Node Affinity in a Kubernetes cluster.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Your Kubernetes server must be at or later than version v1.10. To check the version, enterkubectl version
.
Add a label to a node
-
List the nodes in your cluster, along with their labels:
kubectl get nodes --show-labels
The output is similar to this:
NAME STATUS ROLES AGE VERSION LABELS worker0 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker0 worker1 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker1 worker2 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker2
-
Choose one of your nodes, and add a label to it:
kubectl label nodes <your-node-name> disktype=ssd
where
<your-node-name>
is the name of your chosen node. -
Verify that your chosen node has a
disktype=ssd
label:kubectl get nodes --show-labels
The output is similar to this:
NAME STATUS ROLES AGE VERSION LABELS worker0 Ready <none> 1d v1.13.0 ...,disktype=ssd,kubernetes.io/hostname=worker0 worker1 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker1 worker2 Ready <none> 1d v1.13.0 ...,kubernetes.io/hostname=worker2
In the preceding output, you can see that the
worker0
node has adisktype=ssd
label.
Schedule a Pod using required node affinity
This manifest describes a Pod that has a requiredDuringSchedulingIgnoredDuringExecution
node affinity,disktype: ssd
.
This means that the pod will get scheduled only on a node that has a disktype=ssd
label.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
-
Apply the manifest to create a Pod that is scheduled onto your chosen node:
kubectl apply -f https://k8s.io/examples/pods/pod-nginx-required-affinity.yaml
-
Verify that the pod is running on your chosen node:
kubectl get pods --output=wide
The output is similar to this:
NAME READY STATUS RESTARTS AGE IP NODE nginx 1/1 Running 0 13s 10.200.0.4 worker0
Schedule a Pod using preferred node affinity
This manifest describes a Pod that has a preferredDuringSchedulingIgnoredDuringExecution
node affinity,disktype: ssd
.
This means that the pod will prefer a node that has a disktype=ssd
label.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: disktype
operator: In
values:
- ssd
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
-
Apply the manifest to create a Pod that is scheduled onto your chosen node:
kubectl apply -f https://k8s.io/examples/pods/pod-nginx-preferred-affinity.yaml
-
Verify that the pod is running on your chosen node:
kubectl get pods --output=wide
The output is similar to this:
NAME READY STATUS RESTARTS AGE IP NODE nginx 1/1 Running 0 13s 10.200.0.4 worker0
What's next
Learn more about Node Affinity.
18 - Configure Pod Initialization
This page shows how to use an Init Container to initialize a Pod before an application Container runs.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Create a Pod that has an Init Container
In this exercise you create a Pod that has one application Container and one Init Container. The init container runs to completion before the application container starts.
Here is the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: init-demo
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: workdir
mountPath: /usr/share/nginx/html
# These containers are run during pod initialization
initContainers:
- name: install
image: busybox:1.28
command:
- wget
- "-O"
- "/work-dir/index.html"
- http://info.cern.ch
volumeMounts:
- name: workdir
mountPath: "/work-dir"
dnsPolicy: Default
volumes:
- name: workdir
emptyDir: {}
In the configuration file, you can see that the Pod has a Volume that the init container and the application container share.
The init container mounts the
shared Volume at /work-dir
, and the application container mounts the shared
Volume at /usr/share/nginx/html
. The init container runs the following command
and then terminates:
wget -O /work-dir/index.html http://info.cern.ch
Notice that the init container writes the index.html
file in the root directory
of the nginx server.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/init-containers.yaml
Verify that the nginx container is running:
kubectl get pod init-demo
The output shows that the nginx container is running:
NAME READY STATUS RESTARTS AGE
init-demo 1/1 Running 0 1m
Get a shell into the nginx container running in the init-demo Pod:
kubectl exec -it init-demo -- /bin/bash
In your shell, send a GET request to the nginx server:
root@nginx:~# apt-get update
root@nginx:~# apt-get install curl
root@nginx:~# curl localhost
The output shows that nginx is serving the web page that was written by the init container:
<html><head></head><body><header>
<title>http://info.cern.ch</title>
</header>
<h1>http://info.cern.ch - home of the first website</h1>
...
<li><a href="http://info.cern.ch/hypertext/WWW/TheProject.html">Browse the first website</a></li>
...
What's next
- Learn more about communicating between Containers running in the same Pod.
- Learn more about Init Containers.
- Learn more about Volumes.
- Learn more about Debugging Init Containers
19 - Attach Handlers to Container Lifecycle Events
This page shows how to attach handlers to Container lifecycle events. Kubernetes supports the postStart and preStop events. Kubernetes sends the postStart event immediately after a Container is started, and it sends the preStop event immediately before the Container is terminated. A Container may specify one handler per event.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Define postStart and preStop handlers
In this exercise, you create a Pod that has one Container. The Container has handlers for the postStart and preStop events.
Here is the configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
preStop:
exec:
command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
In the configuration file, you can see that the postStart command writes a message
file to the Container's /usr/share
directory. The preStop command shuts down
nginx gracefully. This is helpful if the Container is being terminated because of a failure.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/lifecycle-events.yaml
Verify that the Container in the Pod is running:
kubectl get pod lifecycle-demo
Get a shell into the Container running in your Pod:
kubectl exec -it lifecycle-demo -- /bin/bash
In your shell, verify that the postStart
handler created the message
file:
root@lifecycle-demo:/# cat /usr/share/message
The output shows the text written by the postStart handler:
Hello from the postStart handler
Discussion
Kubernetes sends the postStart event immediately after the Container is created. There is no guarantee, however, that the postStart handler is called before the Container's entrypoint is called. The postStart handler runs asynchronously relative to the Container's code, but Kubernetes' management of the container blocks until the postStart handler completes. The Container's status is not set to RUNNING until the postStart handler completes.
Kubernetes sends the preStop event immediately before the Container is terminated. Kubernetes' management of the Container blocks until the preStop handler completes, unless the Pod's grace period expires. For more details, see Pod Lifecycle.
Note:
Kubernetes only sends the preStop event when a Pod or a container in the Pod is terminated. This means that the preStop hook is not invoked when the Pod is completed. About this limitation, please see Container hooks for the detail.What's next
- Learn more about Container lifecycle hooks.
- Learn more about the lifecycle of a Pod.
Reference
20 - Configure a Pod to Use a ConfigMap
Many applications rely on configuration which is used during either application initialization or runtime. Most times, there is a requirement to adjust values assigned to configuration parameters. ConfigMaps are a Kubernetes mechanism that let you inject configuration data into application pods.
The ConfigMap concept allow you to decouple configuration artifacts from image content to keep containerized applications portable. For example, you can download and run the same container image to spin up containers for the purposes of local development, system test, or running a live end-user workload.
This page provides a series of usage examples demonstrating how to create ConfigMaps and configure Pods using data stored in ConfigMaps.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
You need to have the wget
tool installed. If you have a different tool
such as curl
, and you do not have wget
, you will need to adapt the
step that downloads example data.
Create a ConfigMap
You can use either kubectl create configmap
or a ConfigMap generator in kustomization.yaml
to create a ConfigMap.
Create a ConfigMap using kubectl create configmap
Use the kubectl create configmap
command to create ConfigMaps from
directories, files,
or literal values:
kubectl create configmap <map-name> <data-source>
where <map-name> is the name you want to assign to the ConfigMap and <data-source> is the directory, file, or literal value to draw the data from. The name of a ConfigMap object must be a valid DNS subdomain name.
When you are creating a ConfigMap based on a file, the key in the <data-source> defaults to the basename of the file, and the value defaults to the file content.
You can use kubectl describe
or
kubectl get
to retrieve information
about a ConfigMap.
Create a ConfigMap from a directory
You can use kubectl create configmap
to create a ConfigMap from multiple files in the same
directory. When you are creating a ConfigMap based on a directory, kubectl identifies files
whose filename is a valid key in the directory and packages each of those files into the new
ConfigMap. Any directory entries except regular files are ignored (for example: subdirectories,
symlinks, devices, pipes, and more).
Note:
Each filename being used for ConfigMap creation must consist of only acceptable characters,
which are: letters (A
to Z
and a
to z
), digits (0
to 9
), '-', '_', or '.'.
If you use kubectl create configmap
with a directory where any of the file names contains
an unacceptable character, the kubectl
command may fail.
The kubectl
command does not print an error when it encounters an invalid filename.
Create the local directory:
mkdir -p configure-pod-container/configmap/
Now, download the sample configuration and create the ConfigMap:
# Download the sample files into `configure-pod-container/configmap/` directory
wget https://kubernetes.io/examples/configmap/game.properties -O configure-pod-container/configmap/game.properties
wget https://kubernetes.io/examples/configmap/ui.properties -O configure-pod-container/configmap/ui.properties
# Create the ConfigMap
kubectl create configmap game-config --from-file=configure-pod-container/configmap/
The above command packages each file, in this case, game.properties
and ui.properties
in the configure-pod-container/configmap/
directory into the game-config ConfigMap. You can
display details of the ConfigMap using the following command:
kubectl describe configmaps game-config
The output is similar to this:
Name: game-config
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
game.properties:
----
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
ui.properties:
----
color.good=purple
color.bad=yellow
allow.textmode=true
how.nice.to.look=fairlyNice
The game.properties
and ui.properties
files in the configure-pod-container/configmap/
directory are represented in the data
section of the ConfigMap.
kubectl get configmaps game-config -o yaml
The output is similar to this:
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2022-02-18T18:52:05Z
name: game-config
namespace: default
resourceVersion: "516"
uid: b4952dc3-d670-11e5-8cd0-68f728db1985
data:
game.properties: |
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
ui.properties: |
color.good=purple
color.bad=yellow
allow.textmode=true
how.nice.to.look=fairlyNice
Create ConfigMaps from files
You can use kubectl create configmap
to create a ConfigMap from an individual file, or from
multiple files.
For example,
kubectl create configmap game-config-2 --from-file=configure-pod-container/configmap/game.properties
would produce the following ConfigMap:
kubectl describe configmaps game-config-2
where the output is similar to this:
Name: game-config-2
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
game.properties:
----
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
You can pass in the --from-file
argument multiple times to create a ConfigMap from multiple
data sources.
kubectl create configmap game-config-2 --from-file=configure-pod-container/configmap/game.properties --from-file=configure-pod-container/configmap/ui.properties
You can display details of the game-config-2
ConfigMap using the following command:
kubectl describe configmaps game-config-2
The output is similar to this:
Name: game-config-2
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
game.properties:
----
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
ui.properties:
----
color.good=purple
color.bad=yellow
allow.textmode=true
how.nice.to.look=fairlyNice
Use the option --from-env-file
to create a ConfigMap from an env-file, for example:
# Env-files contain a list of environment variables.
# These syntax rules apply:
# Each line in an env file has to be in VAR=VAL format.
# Lines beginning with # (i.e. comments) are ignored.
# Blank lines are ignored.
# There is no special handling of quotation marks (i.e. they will be part of the ConfigMap value)).
# Download the sample files into `configure-pod-container/configmap/` directory
wget https://kubernetes.io/examples/configmap/game-env-file.properties -O configure-pod-container/configmap/game-env-file.properties
wget https://kubernetes.io/examples/configmap/ui-env-file.properties -O configure-pod-container/configmap/ui-env-file.properties
# The env-file `game-env-file.properties` looks like below
cat configure-pod-container/configmap/game-env-file.properties
enemies=aliens
lives=3
allowed="true"
# This comment and the empty line above it are ignored
kubectl create configmap game-config-env-file \
--from-env-file=configure-pod-container/configmap/game-env-file.properties
would produce a ConfigMap. View the ConfigMap:
kubectl get configmap game-config-env-file -o yaml
the output is similar to:
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2019-12-27T18:36:28Z
name: game-config-env-file
namespace: default
resourceVersion: "809965"
uid: d9d1ca5b-eb34-11e7-887b-42010a8002b8
data:
allowed: '"true"'
enemies: aliens
lives: "3"
Starting with Kubernetes v1.23, kubectl
supports the --from-env-file
argument to be
specified multiple times to create a ConfigMap from multiple data sources.
kubectl create configmap config-multi-env-files \
--from-env-file=configure-pod-container/configmap/game-env-file.properties \
--from-env-file=configure-pod-container/configmap/ui-env-file.properties
would produce the following ConfigMap:
kubectl get configmap config-multi-env-files -o yaml
where the output is similar to this:
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2019-12-27T18:38:34Z
name: config-multi-env-files
namespace: default
resourceVersion: "810136"
uid: 252c4572-eb35-11e7-887b-42010a8002b8
data:
allowed: '"true"'
color: purple
enemies: aliens
how: fairlyNice
lives: "3"
textmode: "true"
Define the key to use when creating a ConfigMap from a file
You can define a key other than the file name to use in the data
section of your ConfigMap
when using the --from-file
argument:
kubectl create configmap game-config-3 --from-file=<my-key-name>=<path-to-file>
where <my-key-name>
is the key you want to use in the ConfigMap and <path-to-file>
is the
location of the data source file you want the key to represent.
For example:
kubectl create configmap game-config-3 --from-file=game-special-key=configure-pod-container/configmap/game.properties
would produce the following ConfigMap:
kubectl get configmaps game-config-3 -o yaml
where the output is similar to this:
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2022-02-18T18:54:22Z
name: game-config-3
namespace: default
resourceVersion: "530"
uid: 05f8da22-d671-11e5-8cd0-68f728db1985
data:
game-special-key: |
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
Create ConfigMaps from literal values
You can use kubectl create configmap
with the --from-literal
argument to define a literal
value from the command line:
kubectl create configmap special-config --from-literal=special.how=very --from-literal=special.type=charm
You can pass in multiple key-value pairs. Each pair provided on the command line is represented
as a separate entry in the data
section of the ConfigMap.
kubectl get configmaps special-config -o yaml
The output is similar to this:
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2022-02-18T19:14:38Z
name: special-config
namespace: default
resourceVersion: "651"
uid: dadce046-d673-11e5-8cd0-68f728db1985
data:
special.how: very
special.type: charm
Create a ConfigMap from generator
You can also create a ConfigMap from generators and then apply it to create the object
in the cluster's API server.
You should specify the generators in a kustomization.yaml
file within a directory.
Generate ConfigMaps from files
For example, to generate a ConfigMap from files configure-pod-container/configmap/game.properties
# Create a kustomization.yaml file with ConfigMapGenerator
cat <<EOF >./kustomization.yaml
configMapGenerator:
- name: game-config-4
options:
labels:
game-config: config-4
files:
- configure-pod-container/configmap/game.properties
EOF
Apply the kustomization directory to create the ConfigMap object:
kubectl apply -k .
configmap/game-config-4-m9dm2f92bt created
You can check that the ConfigMap was created like this:
kubectl get configmap
NAME DATA AGE
game-config-4-m9dm2f92bt 1 37s
and also:
kubectl describe configmaps/game-config-4-m9dm2f92bt
Name: game-config-4-m9dm2f92bt
Namespace: default
Labels: game-config=config-4
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","data":{"game.properties":"enemies=aliens\nlives=3\nenemies.cheat=true\nenemies.cheat.level=noGoodRotten\nsecret.code.p...
Data
====
game.properties:
----
enemies=aliens
lives=3
enemies.cheat=true
enemies.cheat.level=noGoodRotten
secret.code.passphrase=UUDDLRLRBABAS
secret.code.allowed=true
secret.code.lives=30
Events: <none>
Notice that the generated ConfigMap name has a suffix appended by hashing the contents. This ensures that a new ConfigMap is generated each time the content is modified.
Define the key to use when generating a ConfigMap from a file
You can define a key other than the file name to use in the ConfigMap generator.
For example, to generate a ConfigMap from files configure-pod-container/configmap/game.properties
with the key game-special-key
# Create a kustomization.yaml file with ConfigMapGenerator
cat <<EOF >./kustomization.yaml
configMapGenerator:
- name: game-config-5
options:
labels:
game-config: config-5
files:
- game-special-key=configure-pod-container/configmap/game.properties
EOF
Apply the kustomization directory to create the ConfigMap object.
kubectl apply -k .
configmap/game-config-5-m67dt67794 created
Generate ConfigMaps from literals
This example shows you how to create a ConfigMap
from two literal key/value pairs:
special.type=charm
and special.how=very
, using Kustomize and kubectl. To achieve
this, you can specify the ConfigMap
generator. Create (or replace)
kustomization.yaml
so that it has the following contents:
---
# kustomization.yaml contents for creating a ConfigMap from literals
configMapGenerator:
- name: special-config-2
literals:
- special.how=very
- special.type=charm
Apply the kustomization directory to create the ConfigMap object:
kubectl apply -k .
configmap/special-config-2-c92b5mmcf2 created
Interim cleanup
Before proceeding, clean up some of the ConfigMaps you made:
kubectl delete configmap special-config
kubectl delete configmap env-config
kubectl delete configmap -l 'game-config in (config-4,config-5)'
Now that you have learned to define ConfigMaps, you can move on to the next section, and learn how to use these objects with Pods.
Define container environment variables using ConfigMap data
Define a container environment variable with data from a single ConfigMap
-
Define an environment variable as a key-value pair in a ConfigMap:
kubectl create configmap special-config --from-literal=special.how=very
-
Assign the
special.how
value defined in the ConfigMap to theSPECIAL_LEVEL_KEY
environment variable in the Pod specification.apiVersion: v1 kind: Pod metadata: name: dapi-test-pod spec: containers: - name: test-container image: registry.k8s.io/busybox command: [ "/bin/sh", "-c", "env" ] env: # Define the environment variable - name: SPECIAL_LEVEL_KEY valueFrom: configMapKeyRef: # The ConfigMap containing the value you want to assign to SPECIAL_LEVEL_KEY name: special-config # Specify the key associated with the value key: special.how restartPolicy: Never
Create the Pod:
kubectl create -f https://kubernetes.io/examples/pods/pod-single-configmap-env-variable.yaml
Now, the Pod's output includes environment variable
SPECIAL_LEVEL_KEY=very
.
Define container environment variables with data from multiple ConfigMaps
As with the previous example, create the ConfigMaps first. Here is the manifest you will use:
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
namespace: default
data:
special.how: very
---
apiVersion: v1
kind: ConfigMap
metadata:
name: env-config
namespace: default
data:
log_level: INFO
-
Create the ConfigMap:
kubectl create -f https://kubernetes.io/examples/configmap/configmaps.yaml
-
Define the environment variables in the Pod specification.
apiVersion: v1 kind: Pod metadata: name: dapi-test-pod spec: containers: - name: test-container image: registry.k8s.io/busybox command: [ "/bin/sh", "-c", "env" ] env: - name: SPECIAL_LEVEL_KEY valueFrom: configMapKeyRef: name: special-config key: special.how - name: LOG_LEVEL valueFrom: configMapKeyRef: name: env-config key: log_level restartPolicy: Never
Create the Pod:
kubectl create -f https://kubernetes.io/examples/pods/pod-multiple-configmap-env-variable.yaml
Now, the Pod's output includes environment variables
SPECIAL_LEVEL_KEY=very
andLOG_LEVEL=INFO
.Once you're happy to move on, delete that Pod:
kubectl delete pod dapi-test-pod --now
Configure all key-value pairs in a ConfigMap as container environment variables
-
Create a ConfigMap containing multiple key-value pairs.
apiVersion: v1 kind: ConfigMap metadata: name: special-config namespace: default data: SPECIAL_LEVEL: very SPECIAL_TYPE: charm
Create the ConfigMap:
kubectl create -f https://kubernetes.io/examples/configmap/configmap-multikeys.yaml
-
Use
envFrom
to define all of the ConfigMap's data as container environment variables. The key from the ConfigMap becomes the environment variable name in the Pod.apiVersion: v1 kind: Pod metadata: name: dapi-test-pod spec: containers: - name: test-container image: registry.k8s.io/busybox command: [ "/bin/sh", "-c", "env" ] envFrom: - configMapRef: name: special-config restartPolicy: Never
Create the Pod:
kubectl create -f https://kubernetes.io/examples/pods/pod-configmap-envFrom.yaml
Now, the Pod's output includes environment variables
SPECIAL_LEVEL=very
andSPECIAL_TYPE=charm
.Once you're happy to move on, delete that Pod:
kubectl delete pod dapi-test-pod --now
Use ConfigMap-defined environment variables in Pod commands
You can use ConfigMap-defined environment variables in the command
and args
of a container
using the $(VAR_NAME)
Kubernetes substitution syntax.
For example, the following Pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: registry.k8s.io/busybox
command: [ "/bin/echo", "$(SPECIAL_LEVEL_KEY) $(SPECIAL_TYPE_KEY)" ]
env:
- name: SPECIAL_LEVEL_KEY
valueFrom:
configMapKeyRef:
name: special-config
key: SPECIAL_LEVEL
- name: SPECIAL_TYPE_KEY
valueFrom:
configMapKeyRef:
name: special-config
key: SPECIAL_TYPE
restartPolicy: Never
Create that Pod, by running:
kubectl create -f https://kubernetes.io/examples/pods/pod-configmap-env-var-valueFrom.yaml
That pod produces the following output from the test-container
container:
kubectl logs dapi-test-pod
very charm
Once you're happy to move on, delete that Pod:
kubectl delete pod dapi-test-pod --now
Add ConfigMap data to a Volume
As explained in Create ConfigMaps from files, when you create
a ConfigMap using --from-file
, the filename becomes a key stored in the data
section of
the ConfigMap. The file contents become the key's value.
The examples in this section refer to a ConfigMap named special-config
:
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
namespace: default
data:
SPECIAL_LEVEL: very
SPECIAL_TYPE: charm
Create the ConfigMap:
kubectl create -f https://kubernetes.io/examples/configmap/configmap-multikeys.yaml
Populate a Volume with data stored in a ConfigMap
Add the ConfigMap name under the volumes
section of the Pod specification.
This adds the ConfigMap data to the directory specified as volumeMounts.mountPath
(in this
case, /etc/config
). The command
section lists directory files with names that match the
keys in ConfigMap.
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: registry.k8s.io/busybox
command: [ "/bin/sh", "-c", "ls /etc/config/" ]
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
# Provide the name of the ConfigMap containing the files you want
# to add to the container
name: special-config
restartPolicy: Never
Create the Pod:
kubectl create -f https://kubernetes.io/examples/pods/pod-configmap-volume.yaml
When the pod runs, the command ls /etc/config/
produces the output below:
SPECIAL_LEVEL
SPECIAL_TYPE
Text data is exposed as files using the UTF-8 character encoding. To use some other
character encoding, use binaryData
(see ConfigMap object for more details).
Note:
If there are any files in the/etc/config
directory of that container image, the volume
mount will make those files from the image inaccessible.Once you're happy to move on, delete that Pod:
kubectl delete pod dapi-test-pod --now
Add ConfigMap data to a specific path in the Volume
Use the path
field to specify the desired file path for specific ConfigMap items.
In this case, the SPECIAL_LEVEL
item will be mounted in the config-volume
volume at /etc/config/keys
.
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: registry.k8s.io/busybox
command: [ "/bin/sh","-c","cat /etc/config/keys" ]
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: special-config
items:
- key: SPECIAL_LEVEL
path: keys
restartPolicy: Never
Create the Pod:
kubectl create -f https://kubernetes.io/examples/pods/pod-configmap-volume-specific-key.yaml
When the pod runs, the command cat /etc/config/keys
produces the output below:
very
Caution:
Like before, all previous files in the/etc/config/
directory will be deleted.Delete that Pod:
kubectl delete pod dapi-test-pod --now
Project keys to specific paths and file permissions
You can project keys to specific paths. Refer to the corresponding section in the Secrets guide for the syntax.
You can set POSIX permissions for keys. Refer to the corresponding section in the Secrets guide for the syntax.
Optional references
A ConfigMap reference may be marked optional. If the ConfigMap is non-existent, the mounted volume will be empty. If the ConfigMap exists, but the referenced key is non-existent, the path will be absent beneath the mount point. See Optional ConfigMaps for more details.
Mounted ConfigMaps are updated automatically
When a mounted ConfigMap is updated, the projected content is eventually updated too. This applies in the case where an optionally referenced ConfigMap comes into existence after a pod has started.
Kubelet checks whether the mounted ConfigMap is fresh on every periodic sync. However, it uses its local TTL-based cache for getting the current value of the ConfigMap. As a result, the total delay from the moment when the ConfigMap is updated to the moment when new keys are projected to the pod can be as long as kubelet sync period (1 minute by default) + TTL of ConfigMaps cache (1 minute by default) in kubelet. You can trigger an immediate refresh by updating one of the pod's annotations.
Note:
A container using a ConfigMap as a subPath volume will not receive ConfigMap updates.Understanding ConfigMaps and Pods
The ConfigMap API resource stores configuration data as key-value pairs. The data can be consumed in pods or provide the configurations for system components such as controllers. ConfigMap is similar to Secrets, but provides a means of working with strings that don't contain sensitive information. Users and system components alike can store configuration data in ConfigMap.
Note:
ConfigMaps should reference properties files, not replace them. Think of the ConfigMap as representing something similar to the Linux/etc
directory and its contents. For example,
if you create a Kubernetes Volume from a ConfigMap, each
data item in the ConfigMap is represented by an individual file in the volume.The ConfigMap's data
field contains the configuration data. As shown in the example below,
this can be simple (like individual properties defined using --from-literal
) or complex
(like configuration files or JSON blobs defined using --from-file
).
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2016-02-18T19:14:38Z
name: example-config
namespace: default
data:
# example of a simple property defined using --from-literal
example.property.1: hello
example.property.2: world
# example of a complex property defined using --from-file
example.property.file: |-
property.1=value-1
property.2=value-2
property.3=value-3
When kubectl
creates a ConfigMap from inputs that are not ASCII or UTF-8, the tool puts
these into the binaryData
field of the ConfigMap, and not in data
. Both text and binary
data sources can be combined in one ConfigMap.
If you want to view the binaryData
keys (and their values) in a ConfigMap, you can run
kubectl get configmap -o jsonpath='{.binaryData}' <name>
.
Pods can load data from a ConfigMap that uses either data
or binaryData
.
Optional ConfigMaps
You can mark a reference to a ConfigMap as optional in a Pod specification. If the ConfigMap doesn't exist, the configuration for which it provides data in the Pod (for example: environment variable, mounted volume) will be empty. If the ConfigMap exists, but the referenced key is non-existent the data is also empty.
For example, the following Pod specification marks an environment variable from a ConfigMap as optional:
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: ["/bin/sh", "-c", "env"]
env:
- name: SPECIAL_LEVEL_KEY
valueFrom:
configMapKeyRef:
name: a-config
key: akey
optional: true # mark the variable as optional
restartPolicy: Never
If you run this pod, and there is no ConfigMap named a-config
, the output is empty.
If you run this pod, and there is a ConfigMap named a-config
but that ConfigMap doesn't have
a key named akey
, the output is also empty. If you do set a value for akey
in the a-config
ConfigMap, this pod prints that value and then terminates.
You can also mark the volumes and files provided by a ConfigMap as optional. Kubernetes always creates the mount paths for the volume, even if the referenced ConfigMap or key doesn't exist. For example, the following Pod specification marks a volume that references a ConfigMap as optional:
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: ["/bin/sh", "-c", "ls /etc/config"]
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: no-config
optional: true # mark the source ConfigMap as optional
restartPolicy: Never
Restrictions
-
You must create the
ConfigMap
object before you reference it in a Pod specification. Alternatively, mark the ConfigMap reference asoptional
in the Pod spec (see Optional ConfigMaps). If you reference a ConfigMap that doesn't exist and you don't mark the reference asoptional
, the Pod won't start. Similarly, references to keys that don't exist in the ConfigMap will also prevent the Pod from starting, unless you mark the key references asoptional
. -
If you use
envFrom
to define environment variables from ConfigMaps, keys that are considered invalid will be skipped. The pod will be allowed to start, but the invalid names will be recorded in the event log (InvalidVariableNames
). The log message lists each skipped key. For example:kubectl get events
The output is similar to this:
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 0s 0s 1 dapi-test-pod Pod Warning InvalidEnvironmentVariableNames {kubelet, 127.0.0.1} Keys [1badkey, 2alsobad] from the EnvFrom configMap default/myconfig were skipped since they are considered invalid environment variable names.
-
ConfigMaps reside in a specific Namespace. Pods can only refer to ConfigMaps that are in the same namespace as the Pod.
-
You can't use ConfigMaps for static pods, because the kubelet does not support this.
Cleaning up
Delete the ConfigMaps and Pods that you made:
kubectl delete configmaps/game-config configmaps/game-config-2 configmaps/game-config-3 \
configmaps/game-config-env-file
kubectl delete pod dapi-test-pod --now
# You might already have removed the next set
kubectl delete configmaps/special-config configmaps/env-config
kubectl delete configmap -l 'game-config in (config-4,config-5)'
If you created a directory configure-pod-container
and no longer need it, you should remove that too,
or move it into the trash can / deleted files location.
What's next
- Follow a real world example of Configuring Redis using a ConfigMap.
- Follow an example of Updating configuration via a ConfigMap.
21 - Share Process Namespace between Containers in a Pod
This page shows how to configure process namespace sharing for a pod. When process namespace sharing is enabled, processes in a container are visible to all other containers in the same pod.
You can use this feature to configure cooperating containers, such as a log handler sidecar container, or to troubleshoot container images that don't include debugging utilities like a shell.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Configure a Pod
Process namespace sharing is enabled using the shareProcessNamespace
field of
.spec
for a Pod. For example:
-
Create the pod
nginx
on your cluster:kubectl apply -f https://k8s.io/examples/pods/share-process-namespace.yaml
-
Attach to the
shell
container and runps
:kubectl attach -it nginx -c shell
If you don't see a command prompt, try pressing enter. In the container shell:
# run this inside the "shell" container ps ax
The output is similar to this:
PID USER TIME COMMAND 1 root 0:00 /pause 8 root 0:00 nginx: master process nginx -g daemon off; 14 101 0:00 nginx: worker process 15 root 0:00 sh 21 root 0:00 ps ax
You can signal processes in other containers. For example, send SIGHUP
to
nginx
to restart the worker process. This requires the SYS_PTRACE
capability.
# run this inside the "shell" container
kill -HUP 8 # change "8" to match the PID of the nginx leader process, if necessary
ps ax
The output is similar to this:
PID USER TIME COMMAND
1 root 0:00 /pause
8 root 0:00 nginx: master process nginx -g daemon off;
15 root 0:00 sh
22 101 0:00 nginx: worker process
23 root 0:00 ps ax
It's even possible to access the file system of another container using the
/proc/$pid/root
link.
# run this inside the "shell" container
# change "8" to the PID of the Nginx process, if necessary
head /proc/8/root/etc/nginx/nginx.conf
The output is similar to this:
user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
Understanding process namespace sharing
Pods share many resources so it makes sense they would also share a process namespace. Some containers may expect to be isolated from others, though, so it's important to understand the differences:
-
The container process no longer has PID 1. Some containers refuse to start without PID 1 (for example, containers using
systemd
) or run commands likekill -HUP 1
to signal the container process. In pods with a shared process namespace,kill -HUP 1
will signal the pod sandbox (/pause
in the above example). -
Processes are visible to other containers in the pod. This includes all information visible in
/proc
, such as passwords that were passed as arguments or environment variables. These are protected only by regular Unix permissions. -
Container filesystems are visible to other containers in the pod through the
/proc/$pid/root
link. This makes debugging easier, but it also means that filesystem secrets are protected only by filesystem permissions.
22 - Use a User Namespace With a Pod
Kubernetes v1.30 [beta]
This page shows how to configure a user namespace for pods. This allows you to isolate the user running inside the container from the one in the host.
A process running as root in a container can run as a different (non-root) user in the host; in other words, the process has full privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace.
You can use this feature to reduce the damage a compromised container can do to the host or other pods in the same node. There are several security vulnerabilities rated either HIGH or CRITICAL that were not exploitable when user namespaces is active. It is expected user namespace will mitigate some future vulnerabilities too.
Without using a user namespace a container running as root, in the case of a container breakout, has root privileges on the node. And if some capability were granted to the container, the capabilities are valid on the host too. None of this is true when user namespaces are used.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Your Kubernetes server must be at or later than version v1.25. To check the version, enterkubectl version
.
- The node OS needs to be Linux
- You need to exec commands in the host
- You need to be able to exec into pods
- You need to enable the
UserNamespacesSupport
feature gate
Note:
The feature gate to enable user namespaces was previously namedUserNamespacesStatelessPodsSupport
, when only stateless pods were supported.
Only Kubernetes v1.25 through to v1.27 recognise UserNamespacesStatelessPodsSupport
.The cluster that you're using must include at least one node that meets the requirements for using user namespaces with Pods.
If you have a mixture of nodes and only some of the nodes provide user namespace support for Pods, you also need to ensure that the user namespace Pods are scheduled to suitable nodes.
Run a Pod that uses a user namespace
A user namespace for a pod is enabled setting the hostUsers
field of .spec
to false
. For example:
apiVersion: v1
kind: Pod
metadata:
name: userns
spec:
hostUsers: false
containers:
- name: shell
command: ["sleep", "infinity"]
image: debian
-
Create the pod on your cluster:
kubectl apply -f https://k8s.io/examples/pods/user-namespaces-stateless.yaml
-
Attach to the container and run
readlink /proc/self/ns/user
:kubectl attach -it userns bash
Run this command:
readlink /proc/self/ns/user
The output is similar to:
user:[4026531837]
Also run:
cat /proc/self/uid_map
The output is similar to:
0 833617920 65536
Then, open a shell in the host and run the same commands.
The readlink
command shows the user namespace the process is running in. It
should be different when it is run on the host and inside the container.
The last number of the uid_map
file inside the container must be 65536, on the
host it must be a bigger number.
If you are running the kubelet inside a user namespace, you need to compare the output from running the command in the pod to the output of running in the host:
readlink /proc/$pid/ns/user
replacing $pid
with the kubelet PID.
23 - Create static Pods
Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them. Unlike Pods that are managed by the control plane (for example, a Deployment); instead, the kubelet watches each static Pod (and restarts it if it fails).
Static Pods are always bound to one Kubelet on a specific node.
The kubelet automatically tries to create a mirror Pod on the Kubernetes API server for each static Pod. This means that the Pods running on a node are visible on the API server, but cannot be controlled from there. The Pod names will be suffixed with the node hostname with a leading hyphen.
Note:
If you are running clustered Kubernetes and are using static Pods to run a Pod on every node, you should probably be using a DaemonSet instead.Note:
Thespec
of a static Pod cannot refer to other API objects
(e.g., ServiceAccount,
ConfigMap,
Secret, etc).Note:
Static pods do not support ephemeral containers.Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
This page assumes you're using CRI-O to run Pods, and that your nodes are running the Fedora operating system. Instructions for other distributions or Kubernetes installations may vary.
Create a static pod
You can configure a static Pod with either a file system hosted configuration file or a web hosted configuration file.
Filesystem-hosted static Pod manifest
Manifests are standard Pod definitions in JSON or YAML format in a specific directory.
Use the staticPodPath: <the directory>
field in the
kubelet configuration file,
which periodically scans the directory and creates/deletes static Pods as YAML/JSON files appear/disappear there.
Note that the kubelet will ignore files starting with dots when scanning the specified directory.
For example, this is how to start a simple web server as a static Pod:
-
Choose a node where you want to run the static Pod. In this example, it's
my-node1
.ssh my-node1
-
Choose a directory, say
/etc/kubernetes/manifests
and place a web server Pod definition there, for example/etc/kubernetes/manifests/static-web.yaml
:# Run this command on the node where kubelet is running mkdir -p /etc/kubernetes/manifests/ cat <<EOF >/etc/kubernetes/manifests/static-web.yaml apiVersion: v1 kind: Pod metadata: name: static-web labels: role: myrole spec: containers: - name: web image: nginx ports: - name: web containerPort: 80 protocol: TCP EOF
-
Configure the kubelet on that node to set a
staticPodPath
value in the kubelet configuration file.
See Set Kubelet Parameters Via A Configuration File for more information.An alternative and deprecated method is to configure the kubelet on that node to look for static Pod manifests locally, using a command line argument. To use the deprecated approach, start the kubelet with the
--pod-manifest-path=/etc/kubernetes/manifests/
argument. -
Restart the kubelet. On Fedora, you would run:
# Run this command on the node where the kubelet is running systemctl restart kubelet
Web-hosted static pod manifest
Kubelet periodically downloads a file specified by --manifest-url=<URL>
argument
and interprets it as a JSON/YAML file that contains Pod definitions.
Similar to how filesystem-hosted manifests work, the kubelet
refetches the manifest on a schedule. If there are changes to the list of static
Pods, the kubelet applies them.
To use this approach:
-
Create a YAML file and store it on a web server so that you can pass the URL of that file to the kubelet.
apiVersion: v1 kind: Pod metadata: name: static-web labels: role: myrole spec: containers: - name: web image: nginx ports: - name: web containerPort: 80 protocol: TCP
-
Configure the kubelet on your selected node to use this web manifest by running it with
--manifest-url=<manifest-url>
. On Fedora, edit/etc/kubernetes/kubelet
to include this line:KUBELET_ARGS="--cluster-dns=10.254.0.10 --cluster-domain=kube.local --manifest-url=<manifest-url>"
-
Restart the kubelet. On Fedora, you would run:
# Run this command on the node where the kubelet is running systemctl restart kubelet
Observe static pod behavior
When the kubelet starts, it automatically starts all defined static Pods. As you have defined a static Pod and restarted the kubelet, the new static Pod should already be running.
You can view running containers (including static Pods) by running (on the node):
# Run this command on the node where the kubelet is running
crictl ps
The output might be something like:
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
129fd7d382018 docker.io/library/nginx@sha256:... 11 minutes ago Running web 0 34533c6729106
Note:
crictl
outputs the image URI and SHA-256 checksum. NAME
will look more like:
docker.io/library/nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
.You can see the mirror Pod on the API server:
kubectl get pods
NAME READY STATUS RESTARTS AGE
static-web-my-node1 1/1 Running 0 2m
Note:
Make sure the kubelet has permission to create the mirror Pod in the API server. If not, the creation request is rejected by the API server.Labels from the static Pod are propagated into the mirror Pod. You can use those labels as normal via selectors, etc.
If you try to use kubectl
to delete the mirror Pod from the API server,
the kubelet doesn't remove the static Pod:
kubectl delete pod static-web-my-node1
pod "static-web-my-node1" deleted
You can see that the Pod is still running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
static-web-my-node1 1/1 Running 0 4s
Back on your node where the kubelet is running, you can try to stop the container manually. You'll see that, after a time, the kubelet will notice and will restart the Pod automatically:
# Run these commands on the node where the kubelet is running
crictl stop 129fd7d382018 # replace with the ID of your container
sleep 20
crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
89db4553e1eeb docker.io/library/nginx@sha256:... 19 seconds ago Running web 1 34533c6729106
Once you identify the right container, you can get the logs for that container with crictl
:
# Run these commands on the node where the container is running
crictl logs <container_id>
10.240.0.48 - - [16/Nov/2022:12:45:49 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
10.240.0.48 - - [16/Nov/2022:12:45:50 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
10.240.0.48 - - [16/Nove/2022:12:45:51 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.0" "-"
To find more about how to debug using crictl
, please visit
Debugging Kubernetes nodes with crictl.
Dynamic addition and removal of static pods
The running kubelet periodically scans the configured directory
(/etc/kubernetes/manifests
in our example) for changes and
adds/removes Pods as files appear/disappear in this directory.
# This assumes you are using filesystem-hosted static Pod configuration
# Run these commands on the node where the container is running
#
mv /etc/kubernetes/manifests/static-web.yaml /tmp
sleep 20
crictl ps
# You see that no nginx container is running
mv /tmp/static-web.yaml /etc/kubernetes/manifests/
sleep 20
crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
f427638871c35 docker.io/library/nginx@sha256:... 19 seconds ago Running web 1 34533c6729106
What's next
24 - Translate a Docker Compose File to Kubernetes Resources
What's Kompose? It's a conversion tool for all things compose (namely Docker Compose) to container orchestrators (Kubernetes or OpenShift).
More information can be found on the Kompose website at http://kompose.io.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Install Kompose
We have multiple ways to install Kompose. Our preferred method is downloading the binary from the latest GitHub release.
Kompose is released via GitHub on a three-week cycle, you can see all current releases on the GitHub release page.
# Linux
curl -L https://github.com/kubernetes/kompose/releases/download/v1.34.0/kompose-linux-amd64 -o kompose
# macOS
curl -L https://github.com/kubernetes/kompose/releases/download/v1.34.0/kompose-darwin-amd64 -o kompose
# Windows
curl -L https://github.com/kubernetes/kompose/releases/download/v1.34.0/kompose-windows-amd64.exe -o kompose.exe
chmod +x kompose
sudo mv ./kompose /usr/local/bin/kompose
Alternatively, you can download the tarball.
Installing using go get
pulls from the master branch with the latest development changes.
go get -u github.com/kubernetes/kompose
Kompose is in EPEL CentOS repository.
If you don't have EPEL repository already installed and enabled you can do it by running sudo yum install epel-release
.
If you have EPEL enabled in your system, you can install Kompose like any other package.
sudo yum -y install kompose
Kompose is in Fedora 24, 25 and 26 repositories. You can install it like any other package.
sudo dnf -y install kompose
On macOS you can install the latest release via Homebrew:
brew install kompose
Use Kompose
In a few steps, we'll take you from Docker Compose to Kubernetes. All
you need is an existing docker-compose.yml
file.
-
Go to the directory containing your
docker-compose.yml
file. If you don't have one, test using this one.services: redis-leader: container_name: redis-leader image: redis ports: - "6379" redis-replica: container_name: redis-replica image: redis ports: - "6379" command: redis-server --replicaof redis-leader 6379 --dir /tmp web: container_name: web image: quay.io/kompose/web ports: - "8080:8080" environment: - GET_HOSTS_FROM=dns labels: kompose.service.type: LoadBalancer
-
To convert the
docker-compose.yml
file to files that you can use withkubectl
, runkompose convert
and thenkubectl apply -f <output file>
.kompose convert
The output is similar to:
INFO Kubernetes file "redis-leader-service.yaml" created INFO Kubernetes file "redis-replica-service.yaml" created INFO Kubernetes file "web-tcp-service.yaml" created INFO Kubernetes file "redis-leader-deployment.yaml" created INFO Kubernetes file "redis-replica-deployment.yaml" created INFO Kubernetes file "web-deployment.yaml" created
kubectl apply -f web-tcp-service.yaml,redis-leader-service.yaml,redis-replica-service.yaml,web-deployment.yaml,redis-leader-deployment.yaml,redis-replica-deployment.yaml
The output is similar to:
deployment.apps/redis-leader created deployment.apps/redis-replica created deployment.apps/web created service/redis-leader created service/redis-replica created service/web-tcp created
Your deployments are running in Kubernetes.
-
Access your application.
If you're already using
minikube
for your development process:minikube service web-tcp
Otherwise, let's look up what IP your service is using!
kubectl describe svc web-tcp
Name: web-tcp Namespace: default Labels: io.kompose.service=web-tcp Annotations: kompose.cmd: kompose convert kompose.service.type: LoadBalancer kompose.version: 1.33.0 (3ce457399) Selector: io.kompose.service=web Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.102.30.3 IPs: 10.102.30.3 Port: 8080 8080/TCP TargetPort: 8080/TCP NodePort: 8080 31624/TCP Endpoints: 10.244.0.5:8080 Session Affinity: None External Traffic Policy: Cluster Events: <none>
If you're using a cloud provider, your IP will be listed next to
LoadBalancer Ingress
.curl http://192.0.2.89
-
Clean-up.
After you are finished testing out the example application deployment, simply run the following command in your shell to delete the resources used.
kubectl delete -f web-tcp-service.yaml,redis-leader-service.yaml,redis-replica-service.yaml,web-deployment.yaml,redis-leader-deployment.yaml,redis-replica-deployment.yaml
User Guide
- CLI
- Documentation
Kompose has support for two providers: OpenShift and Kubernetes.
You can choose a targeted provider using global option --provider
. If no provider is specified, Kubernetes is set by default.
kompose convert
Kompose supports conversion of V1, V2, and V3 Docker Compose files into Kubernetes and OpenShift objects.
Kubernetes kompose convert
example
kompose --file docker-voting.yml convert
WARN Unsupported key networks - ignoring
WARN Unsupported key build - ignoring
INFO Kubernetes file "worker-svc.yaml" created
INFO Kubernetes file "db-svc.yaml" created
INFO Kubernetes file "redis-svc.yaml" created
INFO Kubernetes file "result-svc.yaml" created
INFO Kubernetes file "vote-svc.yaml" created
INFO Kubernetes file "redis-deployment.yaml" created
INFO Kubernetes file "result-deployment.yaml" created
INFO Kubernetes file "vote-deployment.yaml" created
INFO Kubernetes file "worker-deployment.yaml" created
INFO Kubernetes file "db-deployment.yaml" created
ls
db-deployment.yaml docker-compose.yml docker-gitlab.yml redis-deployment.yaml result-deployment.yaml vote-deployment.yaml worker-deployment.yaml
db-svc.yaml docker-voting.yml redis-svc.yaml result-svc.yaml vote-svc.yaml worker-svc.yaml
You can also provide multiple docker-compose files at the same time:
kompose -f docker-compose.yml -f docker-guestbook.yml convert
INFO Kubernetes file "frontend-service.yaml" created
INFO Kubernetes file "mlbparks-service.yaml" created
INFO Kubernetes file "mongodb-service.yaml" created
INFO Kubernetes file "redis-master-service.yaml" created
INFO Kubernetes file "redis-slave-service.yaml" created
INFO Kubernetes file "frontend-deployment.yaml" created
INFO Kubernetes file "mlbparks-deployment.yaml" created
INFO Kubernetes file "mongodb-deployment.yaml" created
INFO Kubernetes file "mongodb-claim0-persistentvolumeclaim.yaml" created
INFO Kubernetes file "redis-master-deployment.yaml" created
INFO Kubernetes file "redis-slave-deployment.yaml" created
ls
mlbparks-deployment.yaml mongodb-service.yaml redis-slave-service.jsonmlbparks-service.yaml
frontend-deployment.yaml mongodb-claim0-persistentvolumeclaim.yaml redis-master-service.yaml
frontend-service.yaml mongodb-deployment.yaml redis-slave-deployment.yaml
redis-master-deployment.yaml
When multiple docker-compose files are provided the configuration is merged. Any configuration that is common will be overridden by subsequent file.
OpenShift kompose convert
example
kompose --provider openshift --file docker-voting.yml convert
WARN [worker] Service cannot be created because of missing port.
INFO OpenShift file "vote-service.yaml" created
INFO OpenShift file "db-service.yaml" created
INFO OpenShift file "redis-service.yaml" created
INFO OpenShift file "result-service.yaml" created
INFO OpenShift file "vote-deploymentconfig.yaml" created
INFO OpenShift file "vote-imagestream.yaml" created
INFO OpenShift file "worker-deploymentconfig.yaml" created
INFO OpenShift file "worker-imagestream.yaml" created
INFO OpenShift file "db-deploymentconfig.yaml" created
INFO OpenShift file "db-imagestream.yaml" created
INFO OpenShift file "redis-deploymentconfig.yaml" created
INFO OpenShift file "redis-imagestream.yaml" created
INFO OpenShift file "result-deploymentconfig.yaml" created
INFO OpenShift file "result-imagestream.yaml" created
It also supports creating buildconfig for build directive in a service. By default, it uses the remote repo for the current git branch as the source repo, and the current branch as the source branch for the build. You can specify a different source repo and branch using --build-repo
and --build-branch
options respectively.
kompose --provider openshift --file buildconfig/docker-compose.yml convert
WARN [foo] Service cannot be created because of missing port.
INFO OpenShift Buildconfig using git@github.com:rtnpro/kompose.git::master as source.
INFO OpenShift file "foo-deploymentconfig.yaml" created
INFO OpenShift file "foo-imagestream.yaml" created
INFO OpenShift file "foo-buildconfig.yaml" created
Note:
If you are manually pushing the OpenShift artifacts usingoc create -f
, you need to ensure that you push the imagestream artifact before the buildconfig artifact, to workaround this OpenShift issue: https://github.com/openshift/origin/issues/4518 .Alternative Conversions
The default kompose
transformation will generate Kubernetes Deployments and Services, in yaml format. You have alternative option to generate json with -j
. Also, you can alternatively generate Replication Controllers objects, Daemon Sets, or Helm charts.
kompose convert -j
INFO Kubernetes file "redis-svc.json" created
INFO Kubernetes file "web-svc.json" created
INFO Kubernetes file "redis-deployment.json" created
INFO Kubernetes file "web-deployment.json" created
The *-deployment.json
files contain the Deployment objects.
kompose convert --replication-controller
INFO Kubernetes file "redis-svc.yaml" created
INFO Kubernetes file "web-svc.yaml" created
INFO Kubernetes file "redis-replicationcontroller.yaml" created
INFO Kubernetes file "web-replicationcontroller.yaml" created
The *-replicationcontroller.yaml
files contain the Replication Controller objects. If you want to specify replicas (default is 1), use --replicas
flag: kompose convert --replication-controller --replicas 3
.
kompose convert --daemon-set
INFO Kubernetes file "redis-svc.yaml" created
INFO Kubernetes file "web-svc.yaml" created
INFO Kubernetes file "redis-daemonset.yaml" created
INFO Kubernetes file "web-daemonset.yaml" created
The *-daemonset.yaml
files contain the DaemonSet objects.
If you want to generate a Chart to be used with Helm run:
kompose convert -c
INFO Kubernetes file "web-svc.yaml" created
INFO Kubernetes file "redis-svc.yaml" created
INFO Kubernetes file "web-deployment.yaml" created
INFO Kubernetes file "redis-deployment.yaml" created
chart created in "./docker-compose/"
tree docker-compose/
docker-compose
├── Chart.yaml
├── README.md
└── templates
├── redis-deployment.yaml
├── redis-svc.yaml
├── web-deployment.yaml
└── web-svc.yaml
The chart structure is aimed at providing a skeleton for building your Helm charts.
Labels
kompose
supports Kompose-specific labels within the docker-compose.yml
file in order to explicitly define a service's behavior upon conversion.
-
kompose.service.type
defines the type of service to be created.For example:
version: "2" services: nginx: image: nginx dockerfile: foobar build: ./foobar cap_add: - ALL container_name: foobar labels: kompose.service.type: nodeport
-
kompose.service.expose
defines if the service needs to be made accessible from outside the cluster or not. If the value is set to "true", the provider sets the endpoint automatically, and for any other value, the value is set as the hostname. If multiple ports are defined in a service, the first one is chosen to be the exposed.- For the Kubernetes provider, an ingress resource is created and it is assumed that an ingress controller has already been configured.
- For the OpenShift provider, a route is created.
For example:
version: "2" services: web: image: tuna/docker-counter23 ports: - "5000:5000" links: - redis labels: kompose.service.expose: "counter.example.com" redis: image: redis:3.0 ports: - "6379"
The currently supported options are:
Key | Value |
---|---|
kompose.service.type | nodeport / clusterip / loadbalancer |
kompose.service.expose | true / hostname |
Note:
Thekompose.service.type
label should be defined with ports
only, otherwise kompose
will fail.Restart
If you want to create normal pods without controllers you can use restart
construct of docker-compose to define that. Follow table below to see what happens on the restart
value.
docker-compose restart |
object created | Pod restartPolicy |
---|---|---|
"" |
controller object | Always |
always |
controller object | Always |
on-failure |
Pod | OnFailure |
no |
Pod | Never |
Note:
The controller object could bedeployment
or replicationcontroller
.For example, the pival
service will become pod down here. This container calculated value of pi
.
version: '2'
services:
pival:
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restart: "on-failure"
Warning about Deployment Configurations
If the Docker Compose file has a volume specified for a service, the Deployment (Kubernetes) or DeploymentConfig (OpenShift) strategy is changed to "Recreate" instead of "RollingUpdate" (default). This is done to avoid multiple instances of a service from accessing a volume at the same time.
If the Docker Compose file has service name with _
in it (for example, web_service
), then it will be replaced by -
and the service name will be renamed accordingly (for example, web-service
). Kompose does this because "Kubernetes" doesn't allow _
in object name.
Please note that changing service name might break some docker-compose
files.
Docker Compose Versions
Kompose supports Docker Compose versions: 1, 2 and 3. We have limited support on versions 2.1 and 3.2 due to their experimental nature.
A full list on compatibility between all three versions is listed in our conversion document including a list of all incompatible Docker Compose keys.
25 - Enforce Pod Security Standards by Configuring the Built-in Admission Controller
Kubernetes provides a built-in admission controller to enforce the Pod Security Standards. You can configure this admission controller to set cluster-wide defaults and exemptions.
Before you begin
Following an alpha release in Kubernetes v1.22, Pod Security Admission became available by default in Kubernetes v1.23, as a beta. From version 1.25 onwards, Pod Security Admission is generally available.
To check the version, enter kubectl version
.
If you are not running Kubernetes 1.30, you can switch to viewing this page in the documentation for the Kubernetes version that you are running.
Configure the Admission Controller
Note:
pod-security.admission.config.k8s.io/v1
configuration requires v1.25+.
For v1.23 and v1.24, use v1beta1.
For v1.22, use v1alpha1.apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1 # see compatibility note
kind: PodSecurityConfiguration
# Defaults applied when a mode label is not set.
#
# Level label values must be one of:
# - "privileged" (default)
# - "baseline"
# - "restricted"
#
# Version label values must be one of:
# - "latest" (default)
# - specific version like "v1.30"
defaults:
enforce: "privileged"
enforce-version: "latest"
audit: "privileged"
audit-version: "latest"
warn: "privileged"
warn-version: "latest"
exemptions:
# Array of authenticated usernames to exempt.
usernames: []
# Array of runtime class names to exempt.
runtimeClasses: []
# Array of namespaces to exempt.
namespaces: []
Note:
The above manifest needs to be specified via the--admission-control-config-file
to kube-apiserver.26 - Enforce Pod Security Standards with Namespace Labels
Namespaces can be labeled to enforce the Pod Security Standards. The three policies privileged, baseline and restricted broadly cover the security spectrum and are implemented by the Pod Security admission controller.
Before you begin
Pod Security Admission was available by default in Kubernetes v1.23, as a beta. From version 1.25 onwards, Pod Security Admission is generally available.
To check the version, enter kubectl version
.
Requiring the baseline
Pod Security Standard with namespace labels
This manifest defines a Namespace my-baseline-namespace
that:
- Blocks any pods that don't satisfy the
baseline
policy requirements. - Generates a user-facing warning and adds an audit annotation to any created pod that does not
meet the
restricted
policy requirements. - Pins the versions of the
baseline
andrestricted
policies to v1.30.
apiVersion: v1
kind: Namespace
metadata:
name: my-baseline-namespace
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/enforce-version: v1.30
# We are setting these to our _desired_ `enforce` level.
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/audit-version: v1.30
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: v1.30
Add labels to existing namespaces with kubectl label
Note:
When anenforce
policy (or version) label is added or changed, the admission plugin will test
each pod in the namespace against the new policy. Violations are returned to the user as warnings.It is helpful to apply the --dry-run
flag when initially evaluating security profile changes for
namespaces. The Pod Security Standard checks will still be run in dry run mode, giving you
information about how the new policy would treat existing pods, without actually updating a policy.
kubectl label --dry-run=server --overwrite ns --all \
pod-security.kubernetes.io/enforce=baseline
Applying to all namespaces
If you're just getting started with the Pod Security Standards, a suitable first step would be to
configure all namespaces with audit annotations for a stricter level such as baseline
:
kubectl label --overwrite ns --all \
pod-security.kubernetes.io/audit=baseline \
pod-security.kubernetes.io/warn=baseline
Note that this is not setting an enforce level, so that namespaces that haven't been explicitly evaluated can be distinguished. You can list namespaces without an explicitly set enforce level using this command:
kubectl get namespaces --selector='!pod-security.kubernetes.io/enforce'
Applying to a single namespace
You can update a specific namespace as well. This command adds the enforce=restricted
policy to my-existing-namespace
, pinning the restricted policy version to v1.30.
kubectl label --overwrite ns my-existing-namespace \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=v1.30
27 - Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller
This page describes the process of migrating from PodSecurityPolicies to the built-in PodSecurity
admission controller. This can be done effectively using a combination of dry-run and audit
and
warn
modes, although this becomes harder if mutating PSPs are used.
Before you begin
Your Kubernetes server must be at or later than version v1.22.
To check the version, enter kubectl version
.
If you are currently running a version of Kubernetes other than 1.30, you may want to switch to viewing this page in the documentation for the version of Kubernetes that you are actually running.
This page assumes you are already familiar with the basic Pod Security Admission concepts.
Overall approach
There are multiple strategies you can take for migrating from PodSecurityPolicy to Pod Security Admission. The following steps are one possible migration path, with a goal of minimizing both the risks of a production outage and of a security gap.
- Decide whether Pod Security Admission is the right fit for your use case.
- Review namespace permissions
- Simplify & standardize PodSecurityPolicies
- Update namespaces
- Identify an appropriate Pod Security level
- Verify the Pod Security level
- Enforce the Pod Security level
- Bypass PodSecurityPolicy
- Review namespace creation processes
- Disable PodSecurityPolicy
0. Decide whether Pod Security Admission is right for you
Pod Security Admission was designed to meet the most common security needs out of the box, and to provide a standard set of security levels across clusters. However, it is less flexible than PodSecurityPolicy. Notably, the following features are supported by PodSecurityPolicy but not Pod Security Admission:
- Setting default security constraints - Pod Security Admission is a non-mutating admission controller, meaning it won't modify pods before validating them. If you were relying on this aspect of PSP, you will need to either modify your workloads to meet the Pod Security constraints, or use a Mutating Admission Webhook to make those changes. See Simplify & Standardize PodSecurityPolicies below for more detail.
- Fine-grained control over policy definition - Pod Security Admission only supports 3 standard levels. If you require more control over specific constraints, then you will need to use a Validating Admission Webhook to enforce those policies.
- Sub-namespace policy granularity - PodSecurityPolicy lets you bind different policies to different Service Accounts or users, even within a single namespace. This approach has many pitfalls and is not recommended, but if you require this feature anyway you will need to use a 3rd party webhook instead. The exception to this is if you only need to completely exempt specific users or RuntimeClasses, in which case Pod Security Admission does expose some static configuration for exemptions.
Even if Pod Security Admission does not meet all of your needs it was designed to be complementary to other policy enforcement mechanisms, and can provide a useful fallback running alongside other admission webhooks.
1. Review namespace permissions
Pod Security Admission is controlled by labels on namespaces. This means that anyone who can update (or patch or create) a namespace can also modify the Pod Security level for that namespace, which could be used to bypass a more restrictive policy. Before proceeding, ensure that only trusted, privileged users have these namespace permissions. It is not recommended to grant these powerful permissions to users that shouldn't have elevated permissions, but if you must you will need to use an admission webhook to place additional restrictions on setting Pod Security labels on Namespace objects.
2. Simplify & standardize PodSecurityPolicies
In this section, you will reduce mutating PodSecurityPolicies and remove options that are outside
the scope of the Pod Security Standards. You should make the changes recommended here to an offline
copy of the original PodSecurityPolicy being modified. The cloned PSP should have a different
name that is alphabetically before the original (for example, prepend a 0
to it). Do not create the
new policies in Kubernetes yet - that will be covered in the Rollout the updated
policies section below.
2.a. Eliminate purely mutating fields
If a PodSecurityPolicy is mutating pods, then you could end up with pods that don't meet the Pod Security level requirements when you finally turn PodSecurityPolicy off. In order to avoid this, you should eliminate all PSP mutation prior to switching over. Unfortunately PSP does not cleanly separate mutating & validating fields, so this is not a straightforward migration.
You can start by eliminating the fields that are purely mutating, and don't have any bearing on the validating policy. These fields (also listed in the Mapping PodSecurityPolicies to Pod Security Standards reference) are:
.spec.defaultAllowPrivilegeEscalation
.spec.runtimeClass.defaultRuntimeClassName
.metadata.annotations['seccomp.security.alpha.kubernetes.io/defaultProfileName']
.metadata.annotations['apparmor.security.beta.kubernetes.io/defaultProfileName']
.spec.defaultAddCapabilities
- Although technically a mutating & validating field, these should be merged into.spec.allowedCapabilities
which performs the same validation without mutation.
Caution:
Removing these could result in workloads missing required configuration, and cause problems. See Rollout the updated policies below for advice on how to roll these changes out safely.2.b. Eliminate options not covered by the Pod Security Standards
There are several fields in PodSecurityPolicy that are not covered by the Pod Security Standards. If you must enforce these options, you will need to supplement Pod Security Admission with an admission webhook, which is outside the scope of this guide.
First, you can remove the purely validating fields that the Pod Security Standards do not cover. These fields (also listed in the Mapping PodSecurityPolicies to Pod Security Standards reference with "no opinion") are:
.spec.allowedHostPaths
.spec.allowedFlexVolumes
.spec.allowedCSIDrivers
.spec.forbiddenSysctls
.spec.runtimeClass
You can also remove the following fields, that are related to POSIX / UNIX group controls.
Caution:
If any of these use theMustRunAs
strategy they may be mutating! Removing these could result in
workloads not setting the required groups, and cause problems. See
Rollout the updated policies below for advice on how to roll these changes
out safely..spec.runAsGroup
.spec.supplementalGroups
.spec.fsGroup
The remaining mutating fields are required to properly support the Pod Security Standards, and will need to be handled on a case-by-case basis later:
.spec.requiredDropCapabilities
- Required to dropALL
for the Restricted profile..spec.seLinux
- (Only mutating with theMustRunAs
rule) required to enforce the SELinux requirements of the Baseline & Restricted profiles..spec.runAsUser
- (Non-mutating with theRunAsAny
rule) required to enforceRunAsNonRoot
for the Restricted profile..spec.allowPrivilegeEscalation
- (Only mutating if set tofalse
) required for the Restricted profile.
2.c. Rollout the updated PSPs
Next, you can rollout the updated policies to your cluster. You should proceed with caution, as removing the mutating options may result in workloads missing required configuration.
For each updated PodSecurityPolicy:
- Identify pods running under the original PSP. This can be done using the
kubernetes.io/psp
annotation. For example, using kubectl:PSP_NAME="original" # Set the name of the PSP you're checking for kubectl get pods --all-namespaces -o jsonpath="{range .items[?(@.metadata.annotations.kubernetes\.io\/psp=='$PSP_NAME')]}{.metadata.namespace} {.metadata.name}{'\n'}{end}"
- Compare these running pods against the original pod spec to determine whether PodSecurityPolicy
has modified the pod. For pods created by a workload resource
you can compare the pod with the PodTemplate in the controller resource. If any changes are
identified, the original Pod or PodTemplate should be updated with the desired configuration.
The fields to review are:
.metadata.annotations['container.apparmor.security.beta.kubernetes.io/*']
(replace * with each container name).spec.runtimeClassName
.spec.securityContext.fsGroup
.spec.securityContext.seccompProfile
.spec.securityContext.seLinuxOptions
.spec.securityContext.supplementalGroups
- On containers, under
.spec.containers[*]
and.spec.initContainers[*]
:.securityContext.allowPrivilegeEscalation
.securityContext.capabilities.add
.securityContext.capabilities.drop
.securityContext.readOnlyRootFilesystem
.securityContext.runAsGroup
.securityContext.runAsNonRoot
.securityContext.runAsUser
.securityContext.seccompProfile
.securityContext.seLinuxOptions
- Create the new PodSecurityPolicies. If any Roles or ClusterRoles are granting
use
on all PSPs this could cause the new PSPs to be used instead of their mutating counter-parts. - Update your authorization to grant access to the new PSPs. In RBAC this means updating any Roles
or ClusterRoles that grant the
use
permission on the original PSP to also grant it to the updated PSP. - Verify: after some soak time, rerun the command from step 1 to see if any pods are still using the original PSPs. Note that pods need to be recreated after the new policies have been rolled out before they can be fully verified.
- (optional) Once you have verified that the original PSPs are no longer in use, you can delete them.
3. Update Namespaces
The following steps will need to be performed on every namespace in the cluster. Commands referenced
in these steps use the $NAMESPACE
variable to refer to the namespace being updated.
3.a. Identify an appropriate Pod Security level
Start reviewing the Pod Security Standards and familiarizing yourself with the 3 different levels.
There are several ways to choose a Pod Security level for your namespace:
- By security requirements for the namespace - If you are familiar with the expected access level for the namespace, you can choose an appropriate level based on those requirements, similar to how one might approach this on a new cluster.
- By existing PodSecurityPolicies - Using the
Mapping PodSecurityPolicies to Pod Security Standards
reference you can map each
PSP to a Pod Security Standard level. If your PSPs aren't based on the Pod Security Standards, you
may need to decide between choosing a level that is at least as permissive as the PSP, and a
level that is at least as restrictive. You can see which PSPs are in use for pods in a given
namespace with this command:
kubectl get pods -n $NAMESPACE -o jsonpath="{.items[*].metadata.annotations.kubernetes\.io\/psp}" | tr " " "\n" | sort -u
- By existing pods - Using the strategies under Verify the Pod Security level, you can test out both the Baseline and Restricted levels to see whether they are sufficiently permissive for existing workloads, and chose the least-privileged valid level.
Caution:
Options 2 & 3 above are based on existing pods, and may miss workloads that aren't currently running, such as CronJobs, scale-to-zero workloads, or other workloads that haven't rolled out.3.b. Verify the Pod Security level
Once you have selected a Pod Security level for the namespace (or if you're trying several), it's a good idea to test it out first (you can skip this step if using the Privileged level). Pod Security includes several tools to help test and safely roll out profiles.
First, you can dry-run the policy, which will evaluate pods currently running in the namespace against the applied policy, without making the new policy take effect:
# $LEVEL is the level to dry-run, either "baseline" or "restricted".
kubectl label --dry-run=server --overwrite ns $NAMESPACE pod-security.kubernetes.io/enforce=$LEVEL
This command will return a warning for any existing pods that are not valid under the proposed level.
The second option is better for catching workloads that are not currently running: audit mode. When running under audit-mode (as opposed to enforcing), pods that violate the policy level are recorded in the audit logs, which can be reviewed later after some soak time, but are not forbidden. Warning mode works similarly, but returns the warning to the user immediately. You can set the audit level on a namespace with this command:
kubectl label --overwrite ns $NAMESPACE pod-security.kubernetes.io/audit=$LEVEL
If either of these approaches yield unexpected violations, you will need to either update the violating workloads to meet the policy requirements, or relax the namespace Pod Security level.
3.c. Enforce the Pod Security level
When you are satisfied that the chosen level can safely be enforced on the namespace, you can update the namespace to enforce the desired level:
kubectl label --overwrite ns $NAMESPACE pod-security.kubernetes.io/enforce=$LEVEL
3.d. Bypass PodSecurityPolicy
Finally, you can effectively bypass PodSecurityPolicy at the namespace level by binding the fully privileged PSP to all service accounts in the namespace.
# The following cluster-scoped commands are only needed once.
kubectl apply -f privileged-psp.yaml
kubectl create clusterrole privileged-psp --verb use --resource podsecuritypolicies.policy --resource-name privileged
# Per-namespace disable
kubectl create -n $NAMESPACE rolebinding disable-psp --clusterrole privileged-psp --group system:serviceaccounts:$NAMESPACE
Since the privileged PSP is non-mutating, and the PSP admission controller always prefers non-mutating PSPs, this will ensure that pods in this namespace are no longer being modified or restricted by PodSecurityPolicy.
The advantage to disabling PodSecurityPolicy on a per-namespace basis like this is if a problem arises you can easily roll the change back by deleting the RoleBinding. Just make sure the pre-existing PodSecurityPolicies are still in place!
# Undo PodSecurityPolicy disablement.
kubectl delete -n $NAMESPACE rolebinding disable-psp
4. Review namespace creation processes
Now that existing namespaces have been updated to enforce Pod Security Admission, you should ensure that your processes and/or policies for creating new namespaces are updated to ensure that an appropriate Pod Security profile is applied to new namespaces.
You can also statically configure the Pod Security admission controller to set a default enforce, audit, and/or warn level for unlabeled namespaces. See Configure the Admission Controller for more information.
5. Disable PodSecurityPolicy
Finally, you're ready to disable PodSecurityPolicy. To do so, you will need to modify the admission configuration of the API server: How do I turn off an admission controller?.
To verify that the PodSecurityPolicy admission controller is no longer enabled, you can manually run a test by impersonating a user without access to any PodSecurityPolicies (see the PodSecurityPolicy example), or by verifying in the API server logs. At startup, the API server outputs log lines listing the loaded admission controller plugins:
I0218 00:59:44.903329 13 plugins.go:158] Loaded 16 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,ExtendedResourceToleration,PersistentVolumeLabel,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0218 00:59:44.903350 13 plugins.go:161] Loaded 14 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,DenyServiceExternalIPs,ValidatingAdmissionWebhook,ResourceQuota.
You should see PodSecurity
(in the validating admission controllers), and neither list should
contain PodSecurityPolicy
.
Once you are certain the PSP admission controller is disabled (and after sufficient soak time to be confident you won't need to roll back), you are free to delete your PodSecurityPolicies and any associated Roles, ClusterRoles, RoleBindings and ClusterRoleBindings (just make sure they don't grant any other unrelated permissions).