1 - Kubelet Checkpoint API
FEATURE STATE: Kubernetes v1.30 [beta]
Checkpointing a container is the functionality to create a stateful copy of a
running container. Once you have a stateful copy of a container, you could
move it to a different computer for debugging or similar purposes.
If you move the checkpointed container data to a computer that's able to restore
it, that restored container continues to run at exactly the same
point it was checkpointed. You can also inspect the saved data, provided that you
have suitable tools for doing so.
Creating a checkpoint of a container might have security implications. Typically
a checkpoint contains all memory pages of all processes in the checkpointed
container. This means that everything that used to be in memory is now available
on the local disk. This includes all private data and possibly keys used for
encryption. The underlying CRI implementations (the container runtime on that node)
should create the checkpoint archive to be only accessible by the root
user. It
is still important to remember if the checkpoint archive is transferred to another
system all memory pages will be readable by the owner of the checkpoint archive.
Operations
post
checkpoint the specified container
Tell the kubelet to checkpoint a specific container from the specified Pod.
Consult the Kubelet authentication/authorization reference
for more information about how access to the kubelet checkpoint interface is
controlled.
The kubelet will request a checkpoint from the underlying
CRI implementation. In the checkpoint
request the kubelet will specify the name of the checkpoint archive as
checkpoint-<podFullName>-<containerName>-<timestamp>.tar
and also request to
store the checkpoint archive in the checkpoints
directory below its root
directory (as defined by --root-dir
). This defaults to
/var/lib/kubelet/checkpoints
.
The checkpoint archive is in tar format, and could be listed using an implementation of
tar
. The contents of the
archive depend on the underlying CRI implementation (the container runtime on that node).
HTTP Request
POST /checkpoint/{namespace}/{pod}/{container}
Parameters
-
namespace (in path): string, required
Namespace
-
pod (in path): string, required
Pod
-
container (in path): string, required
Container
-
timeout (in query): integer
Timeout in seconds to wait until the checkpoint creation is finished.
If zero or no timeout is specified the default CRI timeout value will be used. Checkpoint
creation time depends directly on the used memory of the container.
The more memory a container uses the more time is required to create
the corresponding checkpoint.
Response
200: OK
401: Unauthorized
404: Not Found (if the ContainerCheckpoint
feature gate is disabled)
404: Not Found (if the specified namespace
, pod
or container
cannot be found)
500: Internal Server Error (if the CRI implementation encounter an error during checkpointing (see error message for further details))
500: Internal Server Error (if the CRI implementation does not implement the checkpoint CRI API (see error message for further details))
3 - Node Labels Populated By The Kubelet
Kubernetes nodes come pre-populated
with a standard set of labels.
You can also set your own labels on nodes, either through the kubelet configuration or
using the Kubernetes API.
Preset labels
The preset labels that Kubernetes sets on nodes are:
Note:
The value of these labels is cloud provider specific and is not guaranteed to be reliable.
For example, the value of kubernetes.io/hostname
may be the same as the node name in some environments
and a different value in other environments.
What's next
4 - Kubelet Configuration Directory Merging
When using the kubelet's --config-dir
flag to specify a drop-in directory for
configuration, there is some specific behavior on how different types are
merged.
Here are some examples of how different data types behave during configuration merging:
Structure Fields
There are two types of structure fields in a YAML structure: singular (or a
scalar type) and embedded (structures that contain scalar types).
The configuration merging process handles the overriding of singular and embedded struct fields to create a resulting kubelet configuration.
For instance, you may want a baseline kubelet configuration for all nodes, but you may want to customize the address
and authorization
fields.
This can be done as follows:
Main kubelet configuration file contents:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: "5m"
cacheUnauthorizedTTL: "30s"
serializeImagePulls: false
address: "192.168.0.1"
Contents of a file in --config-dir
directory:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
authorization:
mode: AlwaysAllow
webhook:
cacheAuthorizedTTL: "8m"
cacheUnauthorizedTTL: "45s"
address: "192.168.0.8"
The resulting configuration will be as follows:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
authorization:
mode: AlwaysAllow
webhook:
cacheAuthorizedTTL: "8m"
cacheUnauthorizedTTL: "45s"
address: "192.168.0.8"
Lists
You can overide the slices/lists values of the kubelet configuration.
However, the entire list gets overridden during the merging process.
For example, you can override the clusterDNS
list as follows:
Main kubelet configuration file contents:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
clusterDNS:
- "192.168.0.9"
- "192.168.0.8"
Contents of a file in --config-dir
directory:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- "192.168.0.2"
- "192.168.0.3"
- "192.168.0.5"
The resulting configuration will be as follows:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
clusterDNS:
- "192.168.0.2"
- "192.168.0.3"
- "192.168.0.5"
Maps, including Nested Structures
Individual fields in maps, regardless of their value types (boolean, string, etc.), can be selectively overridden.
However, for map[string][]string
, the entire list associated with a specific field gets overridden.
Let's understand this better with an example, particularly on fields like featureGates
and staticPodURLHeader
:
Main kubelet configuration file contents:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
featureGates:
AllAlpha: false
MemoryQoS: true
staticPodURLHeader:
kubelet-api-support:
- "Authorization: 234APSDFA"
- "X-Custom-Header: 123"
custom-static-pod:
- "Authorization: 223EWRWER"
- "X-Custom-Header: 456"
Contents of a file in --config-dir
directory:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
MemoryQoS: false
KubeletTracing: true
DynamicResourceAllocation: true
staticPodURLHeader:
custom-static-pod:
- "Authorization: 223EWRWER"
- "X-Custom-Header: 345"
The resulting configuration will be as follows:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
featureGates:
AllAlpha: false
MemoryQoS: false
KubeletTracing: true
DynamicResourceAllocation: true
staticPodURLHeader:
kubelet-api-support:
- "Authorization: 234APSDFA"
- "X-Custom-Header: 123"
custom-static-pod:
- "Authorization: 223EWRWER"
- "X-Custom-Header: 345"
5 - Kubelet Device Manager API Versions
This page provides details of version compatibility between the Kubernetes
device plugin API,
and different versions of Kubernetes itself.
Compatibility matrix
|
v1alpha1 |
v1beta1 |
Kubernetes 1.21 |
- |
✓ |
Kubernetes 1.22 |
- |
✓ |
Kubernetes 1.23 |
- |
✓ |
Kubernetes 1.24 |
- |
✓ |
Kubernetes 1.25 |
- |
✓ |
Kubernetes 1.26 |
- |
✓ |
Key:
✓
Exactly the same features / API objects in both device plugin API and
the Kubernetes version.
+
The device plugin API has features or API objects that may not be present in the
Kubernetes cluster, either because the device plugin API has added additional new API
calls, or that the server has removed an old API call. However, everything they have in
common (most other APIs) will work. Note that alpha APIs may vanish or
change significantly between one minor release and the next.
-
The Kubernetes cluster has features the device plugin API can't use,
either because server has added additional API calls, or that device plugin API has
removed an old API call. However, everything they share in common (most APIs) will work.
6 - Node Status
The status of a node in Kubernetes is a critical
aspect of managing a Kubernetes cluster. In this article, we'll cover the basics of
monitoring and maintaining node status to ensure a healthy and stable cluster.
Node status fields
A Node's status contains the following information:
You can use kubectl
to view a Node's status and other details:
kubectl describe node <insert-node-name-here>
Each section of the output is described below.
Addresses
The usage of these fields varies depending on your cloud provider or bare metal configuration.
- HostName: The hostname as reported by the node's kernel. Can be overridden via the kubelet
--hostname-override
parameter.
- ExternalIP: Typically the IP address of the node that is externally routable (available from
outside the cluster).
- InternalIP: Typically the IP address of the node that is routable only within the cluster.
Conditions
The conditions
field describes the status of all Running
nodes. Examples of conditions include:
Node conditions, and a description of when each condition applies.
Node Condition |
Description |
Ready |
True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds) |
DiskPressure |
True if pressure exists on the disk size—that is, if the disk capacity is low; otherwise False |
MemoryPressure |
True if pressure exists on the node memory—that is, if the node memory is low; otherwise False |
PIDPressure |
True if pressure exists on the processes—that is, if there are too many processes on the node; otherwise False |
NetworkUnavailable |
True if the network for the node is not correctly configured, otherwise False |
Note:
If you use command-line tools to print details of a cordoned Node, the Condition includes
SchedulingDisabled
. SchedulingDisabled
is not a Condition in the Kubernetes API; instead,
cordoned nodes are marked Unschedulable in their spec.
In the Kubernetes API, a node's condition is represented as part of the .status
of the Node resource. For example, the following JSON structure describes a healthy node:
"conditions": [
{
"type": "Ready",
"status": "True",
"reason": "KubeletReady",
"message": "kubelet is posting ready status",
"lastHeartbeatTime": "2019-06-05T18:38:35Z",
"lastTransitionTime": "2019-06-05T11:41:27Z"
}
]
When problems occur on nodes, the Kubernetes control plane automatically creates
taints that match the conditions
affecting the node. An example of this is when the status
of the Ready condition
remains Unknown
or False
for longer than the kube-controller-manager's NodeMonitorGracePeriod
,
which defaults to 40 seconds. This will cause either an node.kubernetes.io/unreachable
taint, for an Unknown
status,
or a node.kubernetes.io/not-ready
taint, for a False
status, to be added to the Node.
These taints affect pending pods as the scheduler takes the Node's taints into consideration when
assigning a pod to a Node. Existing pods scheduled to the node may be evicted due to the application
of NoExecute
taints. Pods may also have tolerations that let
them schedule to and continue running on a Node even though it has a specific taint.
See Taint Based Evictions and
Taint Nodes by Condition
for more details.
Capacity and Allocatable
Describes the resources available on the node: CPU, memory, and the maximum
number of pods that can be scheduled onto the node.
The fields in the capacity block indicate the total amount of resources that a
Node has. The allocatable block indicates the amount of resources on a
Node that is available to be consumed by normal Pods.
You may read more about capacity and allocatable resources while learning how
to reserve compute resources
on a Node.
Info
Describes general information about the node, such as kernel version, Kubernetes
version (kubelet and kube-proxy version), container runtime details, and which
operating system the node uses.
The kubelet gathers this information from the node and publishes it into
the Kubernetes API.
Heartbeats
Heartbeats, sent by Kubernetes nodes, help your cluster determine the
availability of each node, and to take action when failures are detected.
For nodes there are two forms of heartbeats:
- updates to the
.status
of a Node
- Lease objects
within the
kube-node-lease
namespace.
Each Node has an associated Lease object.
Compared to updates to .status
of a Node, a Lease is a lightweight resource.
Using Leases for heartbeats reduces the performance impact of these updates
for large clusters.
The kubelet is responsible for creating and updating the .status
of Nodes,
and for updating their related Leases.
- The kubelet updates the node's
.status
either when there is change in status
or if there has been no update for a configured interval. The default interval
for .status
updates to Nodes is 5 minutes, which is much longer than the 40
second default timeout for unreachable nodes.
- The kubelet creates and then updates its Lease object every 10 seconds
(the default update interval). Lease updates occur independently from
updates to the Node's
.status
. If the Lease update fails, the kubelet retries,
using exponential backoff that starts at 200 milliseconds and capped at 7 seconds.