docs/sections/guides/development_guides/oom_dev_container_orchestration.rst

   1 .. This work is licensed under a Creative Commons Attribution 4.0
   2 .. International License.
   3 .. http://creativecommons.org/licenses/by/4.0
   4 .. Copyright 2018-2020 Amdocs, Bell Canada, Orange, Samsung
   5 .. Modification copyright (C) 2022 Nordix Foundation
   6
   7 .. Links
   8 .. _Kubernetes: https://Kubernetes.io/
   9 .. _AWS Elastic Block Store: https://aws.amazon.com/ebs/
  10 .. _Azure File: https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction
  11 .. _GCE Persistent Disk: https://cloud.google.com/compute/docs/disks/
  12 .. _Gluster FS: https://www.gluster.org/
  13 .. _Kubernetes Storage Class: https://Kubernetes.io/docs/concepts/storage/storage-classes/
  14 .. _Assigning Pods to Nodes: https://Kubernetes.io/docs/concepts/configuration/assign-pod-node/
  15
  16
  17 .. _oom_dev_container_orch:
  18
  19 Kubernetes Container Orchestration
  20 ##################################
  21
  22 The ONAP components are managed by the Kubernetes_ container management system
  23 which maintains the desired state of the container system as described by one
  24 or more deployment descriptors - similar in concept to OpenStack HEAT
  25 Orchestration Templates. The following sections describe the fundamental
  26 objects managed by Kubernetes, the network these components use to communicate
  27 with each other and other entities outside of ONAP and the templates that
  28 describe the configuration and desired state of the ONAP components.
  29
  30 **Name Spaces**
  31
  32 Within the namespaces are Kubernetes services that provide external
  33 connectivity to pods that host Docker containers.
  34
  35 ONAP Components to Kubernetes Object Relationships
  36 --------------------------------------------------
  37 Kubernetes deployments consist of multiple objects:
  38
  39 - **nodes** - a worker machine - either physical or virtual - that hosts
  40   multiple containers managed by Kubernetes.
  41 - **services** - an abstraction of a logical set of pods that provide a
  42   micro-service.
  43 - **pods** - one or more (but typically one) container(s) that provide specific
  44   application functionality.
  45 - **persistent volumes** - One or more permanent volumes need to be established
  46   to hold non-ephemeral configuration and state data.
  47
  48 The relationship between these objects is shown in the following figure:
  49
  50 .. .. uml::
  51 ..
  52 ..   @startuml
  53 ..   node PH {
  54 ..      component Service {
  55 ..         component Pod0
  56 ..         component Pod1
  57 ..      }
  58 ..   }
  59 ..
  60 ..   database PV
  61 ..   @enduml
  62
  63 .. figure:: ../../resources/images/k8s/kubernetes_objects.png
  64
  65 OOM uses these Kubernetes objects as described in the following sections.
  66
  67 Nodes
  68 ~~~~~
  69 OOM works with both physical and virtual worker machines.
  70
  71 * Virtual Machine Deployments - If ONAP is to be deployed onto a set of virtual
  72   machines, the creation of the VMs is outside of the scope of OOM and could be
  73   done in many ways, such as
  74
  75   * manually, for example by a user using the OpenStack Horizon dashboard or
  76     AWS EC2, or
  77   * automatically, for example with the use of a OpenStack Heat Orchestration
  78     Template which builds an ONAP stack, Azure ARM template, AWS CloudFormation
  79     Template, or
  80   * orchestrated, for example with Cloudify creating the VMs from a TOSCA
  81     template and controlling their life cycle for the life of the ONAP
  82     deployment.
  83
  84 * Physical Machine Deployments - If ONAP is to be deployed onto physical
  85   machines there are several options but the recommendation is to use Rancher
  86   along with Helm to associate hosts with a Kubernetes cluster.
  87
  88 Pods
  89 ~~~~
  90 A group of containers with shared storage and networking can be grouped
  91 together into a Kubernetes pod.  All of the containers within a pod are
  92 co-located and co-scheduled so they operate as a single unit.  Within ONAP
  93 Amsterdam release, pods are mapped one-to-one to docker containers although
  94 this may change in the future.  As explained in the Services section below the
  95 use of Pods within each ONAP component is abstracted from other ONAP
  96 components.
  97
  98 Services
  99 ~~~~~~~~
 100 OOM uses the Kubernetes service abstraction to provide a consistent access
 101 point for each of the ONAP components independent of the pod or container
 102 architecture of that component.  For example, the SDNC component may introduce
 103 OpenDaylight clustering as some point and change the number of pods in this
 104 component to three or more but this change will be isolated from the other ONAP
 105 components by the service abstraction.  A service can include a load balancer
 106 on its ingress to distribute traffic between the pods and even react to dynamic
 107 changes in the number of pods if they are part of a replica set.
 108
 109 Persistent Volumes
 110 ~~~~~~~~~~~~~~~~~~
 111 To enable ONAP to be deployed into a wide variety of cloud infrastructures a
 112 flexible persistent storage architecture, built on Kubernetes persistent
 113 volumes, provides the ability to define the physical storage in a central
 114 location and have all ONAP components securely store their data.
 115
 116 When deploying ONAP into a public cloud, available storage services such as
 117 `AWS Elastic Block Store`_, `Azure File`_, or `GCE Persistent Disk`_ are
 118 options.  Alternatively, when deploying into a private cloud the storage
 119 architecture might consist of Fiber Channel, `Gluster FS`_, or iSCSI. Many
 120 other storage options existing, refer to the `Kubernetes Storage Class`_
 121 documentation for a full list of the options. The storage architecture may vary
 122 from deployment to deployment but in all cases a reliable, redundant storage
 123 system must be provided to ONAP with which the state information of all ONAP
 124 components will be securely stored. The Storage Class for a given deployment is
 125 a single parameter listed in the ONAP values.yaml file and therefore is easily
 126 customized. Operation of this storage system is outside the scope of the OOM.
 127
 128 .. code-block:: yaml
 129
 130   Insert values.yaml code block with storage block here
 131
 132 Once the storage class is selected and the physical storage is provided, the
 133 ONAP deployment step creates a pool of persistent volumes within the given
 134 physical storage that is used by all of the ONAP components. ONAP components
 135 simply make a claim on these persistent volumes (PV), with a persistent volume
 136 claim (PVC), to gain access to their storage.
 137
 138 The following figure illustrates the relationships between the persistent
 139 volume claims, the persistent volumes, the storage class, and the physical
 140 storage.
 141
 142 .. graphviz::
 143
 144    digraph PV {
 145       label = "Persistance Volume Claim to Physical Storage Mapping"
 146       {
 147          node [shape=cylinder]
 148          D0 [label="Drive0"]
 149          D1 [label="Drive1"]
 150          Dx [label="Drivex"]
 151       }
 152       {
 153          node [shape=Mrecord label="StorageClass:ceph"]
 154          sc
 155       }
 156       {
 157          node [shape=point]
 158          p0 p1 p2
 159          p3 p4 p5
 160       }
 161       subgraph clusterSDC {
 162          label="SDC"
 163          PVC0
 164          PVC1
 165       }
 166       subgraph clusterSDNC {
 167          label="SDNC"
 168          PVC2
 169       }
 170       subgraph clusterSO {
 171          label="SO"
 172          PVCn
 173       }
 174       PV0 -> sc
 175       PV1 -> sc
 176       PV2 -> sc
 177       PVn -> sc
 178
 179       sc -> {D0 D1 Dx}
 180       PVC0 -> PV0
 181       PVC1 -> PV1
 182       PVC2 -> PV2
 183       PVCn -> PVn
 184
 185       # force all of these nodes to the same line in the given order
 186       subgraph {
 187          rank = same; PV0;PV1;PV2;PVn;p0;p1;p2
 188          PV0->PV1->PV2->p0->p1->p2->PVn [style=invis]
 189       }
 190
 191       subgraph {
 192          rank = same; D0;D1;Dx;p3;p4;p5
 193          D0->D1->p3->p4->p5->Dx [style=invis]
 194       }
 195
 196    }
 197
 198 In-order for an ONAP component to use a persistent volume it must make a claim
 199 against a specific persistent volume defined in the ONAP common charts.  Note
 200 that there is a one-to-one relationship between a PVC and PV.  The following is
 201 an excerpt from a component chart that defines a PVC:
 202
 203 .. code-block:: yaml
 204
 205   Insert PVC example here
 206
 207 OOM Networking with Kubernetes
 208 ------------------------------
 209
 210 - DNS
 211 - Ports - Flattening the containers also expose port conflicts between the
 212   containers which need to be resolved.
 213
 214
 215 Pod Placement Rules
 216 -------------------
 217 OOM will use the rich set of Kubernetes node and pod affinity /
 218 anti-affinity rules to minimize the chance of a single failure resulting in a
 219 loss of ONAP service. Node affinity / anti-affinity is used to guide the
 220 Kubernetes orchestrator in the placement of pods on nodes (physical or virtual
 221 machines).  For example:
 222
 223 - if a container used Intel DPDK technology the pod may state that it as
 224   affinity to an Intel processor based node, or
 225 - geographical based node labels (such as the Kubernetes standard zone or
 226   region labels) may be used to ensure placement of a DCAE complex close to the
 227   VNFs generating high volumes of traffic thus minimizing networking cost.
 228   Specifically, if nodes were pre-assigned labels East and West, the pod
 229   deployment spec to distribute pods to these nodes would be:
 230
 231 .. code-block:: yaml
 232
 233   nodeSelector:
 234     failure-domain.beta.Kubernetes.io/region: {{ .Values.location }}
 235
 236 - "location: West" is specified in the `values.yaml` file used to deploy
 237   one DCAE cluster and  "location: East" is specified in a second `values.yaml`
 238   file (see OOM Configuration Management for more information about
 239   configuration files like the `values.yaml` file).
 240
 241 Node affinity can also be used to achieve geographic redundancy if pods are
 242 assigned to multiple failure domains. For more information refer to `Assigning
 243 Pods to Nodes`_.
 244
 245 .. note::
 246    One could use Pod to Node assignment to totally constrain Kubernetes when
 247    doing initial container assignment to replicate the Amsterdam release
 248    OpenStack Heat based deployment. Should one wish to do this, each VM would
 249    need a unique node name which would be used to specify a node constaint
 250    for every component.  These assignment could be specified in an environment
 251    specific values.yaml file. Constraining Kubernetes in this way is not
 252    recommended.
 253
 254 Kubernetes has a comprehensive system called Taints and Tolerations that can be
 255 used to force the container orchestrator to repel pods from nodes based on
 256 static events (an administrator assigning a taint to a node) or dynamic events
 257 (such as a node becoming unreachable or running out of disk space). There are
 258 no plans to use taints or tolerations in the ONAP Beijing release.  Pod
 259 affinity / anti-affinity is the concept of creating a spacial relationship
 260 between pods when the Kubernetes orchestrator does assignment (both initially
 261 an in operation) to nodes as explained in Inter-pod affinity and anti-affinity.
 262 For example, one might choose to co-located all of the ONAP SDC containers on a
 263 single node as they are not critical runtime components and co-location
 264 minimizes overhead. On the other hand, one might choose to ensure that all of
 265 the containers in an ODL cluster (SDNC and APPC) are placed on separate nodes
 266 such that a node failure has minimal impact to the operation of the cluster.
 267 An example of how pod affinity / anti-affinity is shown below:
 268
 269 Pod Affinity / Anti-Affinity
 270
 271 .. code-block:: yaml
 272
 273   apiVersion: v1
 274   kind: Pod
 275   metadata:
 276     name: with-pod-affinity
 277   spec:
 278     affinity:
 279       podAffinity:
 280         requiredDuringSchedulingIgnoredDuringExecution:
 281         - labelSelector:
 282             matchExpressions:
 283         - key: security
 284           operator: In
 285           values:
 286           - S1
 287           topologyKey: failure-domain.beta.Kubernetes.io/zone
 288       podAntiAffinity:
 289         preferredDuringSchedulingIgnoredDuringExecution:
 290         - weight: 100
 291           podAffinityTerm:
 292             labelSelector:
 293               matchExpressions:
 294               - key: security
 295                 operator: In
 296                 values:
 297                 - S2
 298             topologyKey: Kubernetes.io/hostname
 299        containers:
 300        - name: with-pod-affinity
 301          image: gcr.io/google_containers/pause:2.0
 302
 303 This example contains both podAffinity and podAntiAffinity rules, the first
 304 rule is is a must (requiredDuringSchedulingIgnoredDuringExecution) while the
 305 second will be met pending other considerations
 306 (preferredDuringSchedulingIgnoredDuringExecution).  Preemption Another feature
 307 that may assist in achieving a repeatable deployment in the presence of faults
 308 that may have reduced the capacity of the cloud is assigning priority to the
 309 containers such that mission critical components have the ability to evict less
 310 critical components.  Kubernetes provides this capability with Pod Priority and
 311 Preemption.  Prior to having more advanced production grade features available,
 312 the ability to at least be able to re-deploy ONAP (or a subset of) reliably
 313 provides a level of confidence that should an outage occur the system can be
 314 brought back on-line predictably.
 315
 316 Health Checks
 317 -------------
 318
 319 Monitoring of ONAP components is configured in the agents within JSON files and
 320 stored in gerrit under the consul-agent-config, here is an example from the AAI
 321 model loader (aai-model-loader-health.json):
 322
 323 .. code-block:: json
 324
 325   {
 326     "service": {
 327       "name": "A&AI Model Loader",
 328       "checks": [
 329         {
 330           "id": "model-loader-process",
 331           "name": "Model Loader Presence",
 332           "script": "/consul/config/scripts/model-loader-script.sh",
 333           "interval": "15s",
 334           "timeout": "1s"
 335         }
 336       ]
 337     }
 338   }
 339
 340 Liveness Probes
 341 ---------------
 342
 343 These liveness probes can simply check that a port is available, that a
 344 built-in health check is reporting good health, or that the Consul health check
 345 is positive.  For example, to monitor the SDNC component has following liveness
 346 probe can be found in the SDNC DB deployment specification:
 347
 348 .. code-block:: yaml
 349
 350   sdnc db liveness probe
 351
 352   livenessProbe:
 353     exec:
 354       command: ["mysqladmin", "ping"]
 355       initialDelaySeconds: 30 periodSeconds: 10
 356       timeoutSeconds: 5
 357
 358 The 'initialDelaySeconds' control the period of time between the readiness
 359 probe succeeding and the liveness probe starting. 'periodSeconds' and
 360 'timeoutSeconds' control the actual operation of the probe.  Note that
 361 containers are inherently ephemeral so the healing action destroys failed
 362 containers and any state information within it.  To avoid a loss of state, a
 363 persistent volume should be used to store all data that needs to be persisted
 364 over the re-creation of a container.  Persistent volumes have been created for
 365 the database components of each of the projects and the same technique can be
 366 used for all persistent state information.