Pod Termination lifecycle in Kubernetes

Pod Termination lifecycle in Kubernetes

The Beginning:

Hi there,

The goal of this article is to understand easily what exactly happens when you do the deployment in K8s. If you face any intermittent failure of your API/HTML calls during deployment, then this article is for you :)

Prerequisites: Basic knowledge of K8s deployment and service spec, and a cup of coffee or mug of beer.

I started the deployment, now what happens?

After initiating the deployment, Kubernetes will perform the following main steps.

Let's try to write a simple code in JS to understand this:

PS: This is not actual Kubernetes code but just a high-level dummy code to understand the concepts.

// Let's assume there are 3 pods initially and we are terminating "pod1":
let podsAttachedToService = ["pod1", "pod2", "pod3"];

// Initally, all pods are in "running" status and serving requests:
const podStatus = {
  pod1: "running",
  pod2: "running",
  pod3: "running",
};

// Let's assume, there is no preStop Hook defined for any pod:
// Also inside a pod, preStopHook function can be implemented in following formats (part of deployment spec):
// 1. Exec (Executes a specific command)
// 2. HTTP (Executes an HTTP request against a specific endpoint)
const podPreStopHookDefination = {
  pod1: () => {},
  pod2: () => {},
  pod3: () => {},
};

// This function, simply removes a pod from K8s service:
const removePodFromService = (podArg): Promise<void> => {
  return new Promise((res) => {
    podsAttachedToService = podsAttachedToService.filter(
      (pod) => pod !== podArg
    );
    podStatus[podArg] = "terminating";
    return res();
  });
};

// This function, initiates preStopHook for a pod:
const preStopHook = (podArg): Promise<void> => {
  return new Promise(async (res) => {
    // Executed preStopHook function of the pod:
    await podPreStopHookDefination[podArg]();
    return res();
  });
};

const sendSigtermSignal = (_): Promise<any> => {
  // excecute "kill -s SIGTERM <nodePID>" command inside pod
};

const sendSigKillSignal = (_): Promise<any> => {
  // excecute "kill -s SIGKILL <nodePID>" command inside pod
};

const main = async (podToTerminate) => {
  // Step 1: Remove the Pod from the K8s Service, so that no new requests comes to it.
  await removePodFromService(podToTerminate);

  // Step 2: Execute preStopHook of pod (if added, otherwise go to Step 3):
  await preStopHook(podToTerminate);

  // Step 3: Send SIGTERM Signal to Node process:
  await sendSigtermSignal(podToTerminate);

  // Step 4: Send SIGKILL Signal to Node process:
  await sendSigKillSignal(podToTerminate);
};

// The MAIN Function to terminate, let's say "pod1":
main("pod1");
  • Following things happen in this order:

    1. STEP 1 - Pod is set to the “Terminating” State and removed from the endpoints list of all K8s Services

    2. STEP 2 - preStop Hook is executed:

      1. The preStop Hook is a special command or HTTP request and as the name suggests, this is called before starting the stopping process - "preStop".

      2. If there is no preStop Hook then immediately next step happens (which is to send SIGTERM event to the node container)

    3. STEP 3 - SIGTERM signal is sent to the pod:

      1. If your application doesn’t gracefully shut down when receiving a SIGTERM you can use this hook to trigger a graceful shutdown.

      2. This Signal ideally means that the container will soon be killed and you should start wrapping things up - like finishing the existing requests, removing DB connections, etc.

      3. For nodejs container: on receiving SIGTERM signal the container kills immediately!

        1. So the requests that the pod may be serving at the moment are killed instantly! and these calls respond with 502 BAD Gateway to their clients.

        2. Example:

          1. Here I am sending SIGTERM signal to the node server running on port 3000 (with process id = 78771), and the server stops abruptly.

          2. From nodejs offical docs - "Sending SIGINT, SIGTERM, and SIGKILL will cause the unconditional termination of the target process, and afterwards, subprocess will report that the process was terminated by signal."

    4. THE CATCH - Kubernetes waits for a grace period (a.k.a terminationGracePeriod):

      1. Default value of terminationGracePeriod is 30 seconds and Kubernetes waits for these 30 seconds in parallel from the point STEP 2 is started.

      2. If the preStop hook handler (Step 2) or SIGTERM signal handler (Step 3) is taking more than 30 seconds, Kubernetes does not wait for these steps to finish and moves to the next step immediately!

        • After 30 seconds, whatever step is executing does not matter and a SIGKILL is sent to the pod.
      3. If your app finishes shutting down and exits before 30 seconds, Kubernetes moves to the next step immediately.

      4. The default value of terminationGracePeriod is 30 seconds.

        1. This can also be changed from the deployment spec.

        2. For Example - if your pod takes 80 seconds to gracefully stop, then you can increase terminationGracePeriod to more than 80 seconds.

    5. STEP 4 - SIGKILL signal is sent to the pod, and the pod no longer exists.

      1. You can listen to the SIGKILL event in your node application.

      2. It will unconditionally terminate Node.js on all platforms.

By default, no one handles STEP 2 and Step 3 in nodejs pod - so directly a SIGTERM signal is sent to your pod which makes the pod kill itself instantly and all requests that it might be serving gives a response of - 502 BAD GATEWAY.


How to fix intermittent 502 Bad Gateway?

Well, there are 2 solutions for your node server to gracefully shutdown:

  1. Handle the SIGTERM signal in your node server and gracefully shut down the server:
const server = app.listen(3000, () => console.log('Example app listening on port 3000!'));

process.on('SIGTERM', () => {
  console.info('SIGTERM signal received.');
  console.log('Closing http server.');
  server.close(() => {
    console.log('Http server closed.');
    // boolean means [force], see in mongoose doc
    mongoose.connection.close(false, () => {
      console.log('MongoDb connection closed.');
      process.exit(0);
    });
  });
  1. 2. Use the preStop Hook in your K8s deployment spec:

      containers:
             - name: nodejs
               image: something/nodejs:latest
               imagePullPolicy: IfNotPresent
               lifecycle:
                 preStop:
                   exec:
                     command:
                       - "sleep"
                       - "30"
    
    • This command makes the container of pod to sleeps for 30 seconds in STEP 2 and post sleeping STEP 3 is executed which sends the SIGTERM signal (also terminationGracePeriod of 30 seconds is also finished at this point) and kills the node server.

    • This solution is best if:

      • Your node server is a simple proxy server to the actual backend and does not have any DataBase connections.

      • For all your API calls, 30 seconds (which is the sleep time) is more than enough to serve all existing ongoing requests to the backend, so that when SIGTERM is sent to the pod - the pod has already served all requests and is simply waiting to kill itself.

      • If you do not want to touch the application code right now, and only want to change the K8s deployment spec.


The conclusion:

The article's goal was to explain what happens during a deployment to a pod, but the concept applies anytime a pod is scheduled to be terminated. The pod can be terminated for a variety of reasons like - liveness/startup probe failure, node failure, etc.

Also, I have assumed a nodeJS based pod in the article but the concepts apply to all languages.

Hope this article was interesting and explained the concepts well. In case something was missed or incorrectly explained, will be happy to correct them.


Resources:

  1. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

  2. https://nodejs.org/docs/latest/api/process.html#signal-events


Did you find this article valuable?

Support Sagarpreet Chadha by becoming a sponsor. Any amount is appreciated!