Scheduling
==========

.. raw:: html

   <!-- SPDX-License-Identifier: CC-BY-4.0 -->

Scheduling, in Kubernetes, is the process responsible for placing a new
pod on the best node possible, based on several criteria.

..  Note::
   Please refer to the 
`Kubernetes documentation <https://kubernetes.io/docs/concepts/scheduling-eviction/>`_ 

for more information on scheduling, including all the available
policies. On this page we assume you are familiar with concepts like
affinity, anti-affinity, node selectors, and so on. 　

You can control how the CloudNativePG cluster’s instances should be
scheduled through the :ref:`AffinityConfiguration <AffinityConfiguration>` 

section in the definition of the cluster, which supports:

- pod affinity/anti-affinity

- node selectors

- tolerations

Pod Affinity and Anti-Affinity
------------------------------

Kubernetes provides mechanisms to control where pods are scheduled using
*affinity* and *anti-affinity* rules. These rules allow you to specify
whether a pod should be scheduled on particular nodes (*affinity*) or
avoided on specific nodes (*anti-affinity*) based on the workloads
already running there. This capability is technically referred to as
**inter-pod affinity/anti-affinity**.

By default, CloudNativePG configures cluster instances to preferably be
scheduled on different nodes, while ``pgBouncer`` instances might still
run on the same nodes.

For example, given the following ``Cluster`` specification:

.. code:: yaml

   apiVersion: postgresql.cnpg.io/v1
   kind: Cluster
   metadata:
     name: cluster-example
   spec:
     instances: 3
     imageName: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie

     affinity:
       enablePodAntiAffinity: true # Default value
       topologyKey: kubernetes.io/hostname # Default value
       podAntiAffinityType: preferred # Default value

     storage:
       size: 1Gi

The ``affinity`` configuration applied in the instance pods will be:

.. code:: yaml

   affinity:
     podAntiAffinity:
       preferredDuringSchedulingIgnoredDuringExecution:
         - podAffinityTerm:
             labelSelector:
               matchExpressions:
                 - key: cnpg.io/cluster
                   operator: In
                   values:
                     - cluster-example
                 - key: cnpg.io/podRole
                   operator: In
                   values:
                     - instance
             topologyKey: kubernetes.io/hostname
           weight: 100

With this setup, Kubernetes will *prefer* to schedule a 3-node
PostgreSQL cluster across three different nodes, assuming sufficient
resources are available.

Requiring Pod Anti-Affinity
^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can modify the default behavior by adjusting the settings mentioned
above.

For example, setting ``podAntiAffinityType`` to ``required`` will
enforce ``requiredDuringSchedulingIgnoredDuringExecution`` instead of
``preferredDuringSchedulingIgnoredDuringExecution`` .

However, be aware that this strict requirement may cause pods to remain
pending if resources are insufficient—this is particularly relevant when
using `Cluster Autoscaler <https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler>`_  for automated horizontal scaling in a Kubernetes
cluster.

..  Note::
   For more details, refer to the  
`Kubernetes documentation <https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity>`_  . 　

Topology Considerations
^^^^^^^^^^^^^^^^^^^^^^^

In cloud environments, you might consider using
``topology.kubernetes.io/zone`` as the ``topologyKey`` to ensure pods
are distributed across different availability zones rather than just
nodes. For more options, see `Well-Known Labels, Annotations, and Taints <https://kubernetes.io/docs/reference/labels-annotations-taints/>`_  .

Disabling Anti-Affinity Policies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If needed, you can disable the operator-generated anti-affinity policies
by setting ``enablePodAntiAffinity`` to ``false`` .

Fine-Grained Control with Custom Rules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For scenarios requiring more precise control, you can specify custom pod
affinity or anti-affinity rules using the ``additionalPodAffinity`` and
``additionalPodAntiAffinity`` configuration attributes. These custom
rules will be added to those generated by the operator, if enabled, or
used directly if the operator-generated rules are disabled.

..  Note::
   When using `additionalPodAntiAffinity`  or `additionalPodAffinity` , you must provide the full `podAntiAffinity`  or `podAffinity`  structure expected by the Pod specification. The following YAML example demonstrates how to configure only one instance of PostgreSQL per worker node, regardless of which PostgreSQL cluster it belongs to: 　

.. code:: yaml

       additionalPodAntiAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
             matchExpressions:
             - key: postgresql
               operator: Exists
               values: []
           topologyKey: "kubernetes.io/hostname"

Node selection through ``nodeSelector``
---------------------------------------

Kubernetes allows ``nodeSelector`` to provide a list of labels (defined
as key-value pairs) to select the nodes on which a pod can run.
Specifically, the node must have each indicated key-value pair as labels
for the pod to be scheduled and run.

Similarly, CloudNativePG consents you to define a ``nodeSelector`` in
the ``affinity`` section, so that you can request a PostgreSQL cluster
to run only on nodes that have those labels.

Tolerations
-----------

Kubernetes allows you to specify (through ``taints`` ) whether a node
should repel all pods not explicitly tolerating (through ``tolerations``
) their ``taints`` .

So, by setting a proper set of ``tolerations`` for a workload matching a
specific node’s ``taints`` , Kubernetes scheduler will now take into
consideration the tainted node, while deciding on which node to schedule
the workload. Tolerations can be configured for all the pods of a
Cluster through the ``.spec.affinity.tolerations`` section, which
accepts the usual Kubernetes syntax for tolerations.

..  Note::
   More information on taints and tolerations can be found in the 
`Kubernetes documentation <https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/>`_  . 　

Isolating PostgreSQL workloads
------------------------------

..  Note::
   Before proceeding, please ensure you have read the :ref:`Architecture <Architecture>`  section of the documentation. 　

While you can deploy PostgreSQL on Kubernetes in various ways, we
recommend following these essential principles for production
environments:

- **Exploit Availability Zones:** If possible, take advantage of
  availability zones (AZs) within the same Kubernetes cluster by
  distributing PostgreSQL instances across different AZs.

- **Dedicate Worker Nodes:** Allocate specific worker nodes for
  PostgreSQL workloads through the ``node-role.kubernetes.io/postgres``
  label and taint, as detailed in the :ref:`Reserving nodes for PostgreSQL workloads <Reserving nodes for PostgreSQL workloads>` 

section.

- **Avoid Node Overlap:** Ensure that no instances from the same
  PostgreSQL cluster are running on the same node.

As explained in greater detail in the previous sections, CloudNativePG
provides the flexibility to configure pod anti-affinity, node selectors,
and tolerations.

Below is a sample configuration to ensure that a PostgreSQL ``Cluster``
is deployed on ``postgres`` nodes, with its instances distributed across
different nodes:

.. code:: yaml

     # <snip>
     affinity:
       enablePodAntiAffinity: true
       topologyKey: kubernetes.io/hostname
       podAntiAffinityType: required
       nodeSelector:
         node-role.kubernetes.io/postgres: ""
       tolerations:
       - key: node-role.kubernetes.io/postgres
         operator: Exists
         effect: NoSchedule
     # <snip>

Despite its simplicity, this setup ensures optimal distribution and
isolation of PostgreSQL workloads, leading to enhanced performance and
reliability in your production environment.