Kubernetes Troubleshooting

February 24, 2026 - 5 mins read

Continuing the series of articles about troubleshooting we are now talking about Kubernetes one of the platform tools that I really like to work everyday.

Knowing how to troubleshoot efficiently is not about memorizing commands it is about understanding how the control plane, scheduler, kubelet, and networking interact and, of cours know how to use kubectl as an inspection tool.

This guide focuses on real-world debugging using:

kubectl get
kubectl describe
kubectl logs
kubectl events
kubectl exec
kubectl top
kubectl port-forward
kubectl debug

All examples assume a namespace called prod.

1. `kubectl get`

Always begin by checking object state.

## start always by checking the pods state
kubectl get pods -n prod

## to see more details you use `-o wide`
kubectl get pods -n prod -o wide

## for deeper inspection, show the full yaml
kubectl get pod api-exchange-893edf -n prod -o yaml

Generally what we are looking for is one of these things:

CrashLoopBackOff
ImagePullBackOff
Pending
Node placement
Restart count
Resource requests/limits
Environment variables
Mounted volumes
Node affinity and tolerations

2. `kubectl describe`

describe provides structured event-driven details. If you want to have a event driven details of some pod or resource, this is the subcommand you should use. You will be evaluating sections like:

Events
Container state
Last termination reason
Probes status
Conditions

kubectl describe pod api-exchange-893edf -n prod

Here is a common example of a health check probe failure:

Liveness probe failed: HTTP probe failed with statuscode: 500
Back-off restarting failed container

This immediately indicates:

The container starts
The health endpoint fails
Kubernetes restarts it

This is a common thing, and generally it made you craze if you not tweak the correct timeout for the readiness and liveness probes. Now the focus shifts to logs.

3. `kubectl logs`

As the subcommand name says, its basic used to read the output logs from the application / service you are debugging, here are the basic commands:

## to read the logs of a unique pod
kubectl logs api-exchange-893edf -n prod

## check multiple containers:
kubectl logs api-exchange-893edf -c api-container -n prod

## read the previous crash log:
kubectl logs api-exchange-893edf -p -n prod

## follow the live log, like a `tail -f`:
kubectl logs -f api-exchange-893edf -n prod

Its always nice to have on the logs these typical patterns:

DB connection refused, networking or service issue
TLS handshake failure, certificate issue
OOMKilled, memory limit too low
Well structured logs

4. `kubectl events`

Events sub command shows messages from the cluster itself based on timestamp and generally the event which was triggered.

## to check events in the whole namespace `prod`
kubectl events -n prod

## if you want to sort by timestamp:
kubectl events -n prod --sort-by='.lastTimestamp'

Events reveal:

Scheduling failures
Insufficient CPU/memory
Volume mount errors
Image pull failures
Node pressure

Here is an example of an event:

0/3 nodes are available: 3 Insufficient memory.

This is not an app problem. It is a scheduling constraint problem. It basically show that the node where the application was scheduled don’t have enough memory to start it.

5. `kubectl exec`

Sometimes only read events, logs and analyze a container outside is not enough, you need to go inside them and do your stuff, thats why we have exec.

Use it to verify:

DNS resolution from CoreDNS or any other serivce
Service connectivity between pods
Environment variables
Mounted file system, secrets and file permissions

## starts the /bin/sh shell into the container
kubectl exec -it api-exchange-893edf -n prod -- /bin/sh

## there you could you use diagnotic tools to check real problems
curl http://another-pod.kubernetes.default.svc:8080
cat /etc/resolv.conf
nc database 5432

If exec fails because the container crashes too fast, use kubectl debug.

6. `kubectl debug`

Attach a debug container, this is useful for:

Images that has no shell
Distroless containers
Production minimal images

kubectl debug -it api-exchange-893edf -n prod --image=busybox --target=api-container

7. `kubectl top`

One of the main issues on the day by day is about the resources used by the containers, to check it you will need to deploy the metrics-server, this subcommand will show you the memory and cpu used by the containers.

kubectl top pods -n prod
kubectl top nodes

8. `kubectl port-forward`

And of course, sometimes you need to do network requests to a specific container to try validade somethings, for that you can open a local port to a remote service and do your calls

## open the service `api` pod locally
kubectl port-forward svc/api 8080:80 -n prod

## execute a curl test
curl http://localhost:8080/health
Now test:

9. Service and Node Level Debugging

Check endpoints:

# check endpoints and look for label mismatches and incorrect selectors
kubectl get endpoints api -n prod

# check for the service definition
kubectl describe svc api -n prod

# look for node status to find disk and memory pressures 
kubectl get nodes
kubectl describe node worker-us-08

Troubleshooting Flow

Is the pod running?
If not, start by using kubectl describe command.
If running but failing, check the logs with kubectl logs.
If unstable, check previous logs using kubectl logs -p
If networking, try kubectl port-forward or kubectl exec
If scheduling, check with kubectl events
If resource-related, memory and cpu will appear in kubectl top
If deeper isolation required debug

Remember, troubleshooting has always a order of execution and its a discipline that you need to master if you want to be a good Systems Administrator and Kubernetes guru.

Kubernetes Troubleshooting

1. kubectl get

2. kubectl describe

3. kubectl logs

4. kubectl events

5. kubectl exec

6. kubectl debug

7. kubectl top

8. kubectl port-forward