docs/sections/guides/user_guides/oom_user_guide.rst

   1 .. This work is licensed under a Creative Commons Attribution 4.0
   2 .. International License.
   3 .. http://creativecommons.org/licenses/by/4.0
   4 .. Copyright (C) 2022 Nordix Foundation
   5
   6 .. Links
   7 .. _Curated applications for Kubernetes: https://github.com/kubernetes/charts
   8 .. _Services: https://kubernetes.io/docs/concepts/services-networking/service/
   9 .. _ReplicaSet: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
  10 .. _StatefulSet: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
  11 .. _Helm Documentation: https://docs.helm.sh/helm/
  12 .. _Helm: https://docs.helm.sh/
  13 .. _Kubernetes: https://Kubernetes.io/
  14 .. _Kubernetes LoadBalancer: https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer
  15 .. _user-guide-label:
  16
  17
  18 .. _oom_user_guide:
  19
  20
  21 OOM User Guide
  22 ##############
  23
  24 .. warning::
  25
  26     **THIS PAGE NEEDS TO BE EITHER REWRITTEN OR SOMETING AS SOME INFO IS NO LONGER RELEVANT**
  27
  28 The ONAP Operations Manager (OOM) provide the ability to manage the entire
  29 life-cycle of an ONAP installation, from the initial deployment to final
  30 decommissioning. This guide provides instructions for users of ONAP to
  31 use the Kubernetes_/Helm_ system as a complete ONAP management system.
  32
  33 This guide provides many examples of Helm command line operations.  For a
  34 complete description of these commands please refer to the `Helm
  35 Documentation`_.
  36
  37 .. figure:: ../../resources/images/oom_logo/oomLogoV2-medium.png
  38    :align: right
  39
  40 The following sections describe the life-cycle operations:
  41
  42 - Deploy_ - with built-in component dependency management
  43 - Configure_ - unified configuration across all ONAP components
  44 - Monitor_ - real-time health monitoring feeding to a Consul UI and Kubernetes
  45 - Heal_- failed ONAP containers are recreated automatically
  46 - Scale_ - cluster ONAP services to enable seamless scaling
  47 - Upgrade_ - change-out containers or configuration with little or no service impact
  48 - Delete_ - cleanup individual containers or entire deployments
  49
  50 .. figure:: ../../resources/images/oom_logo/oomLogoV2-Deploy.png
  51    :align: right
  52
  53 Deploy
  54 ======
  55
  56 The OOM team with assistance from the ONAP project teams, have built a
  57 comprehensive set of Helm charts, yaml files very similar to TOSCA files, that
  58 describe the composition of each of the ONAP components and the relationship
  59 within and between components. Using this model Helm is able to deploy all of
  60 ONAP with a few simple commands.
  61
  62 Please refer to the :ref:`oom_deploy_guide` for deployment pre-requisites and options
  63
  64 .. note::
  65   Refer to the :ref:`oom_customize_overrides` section on how to update overrides.yaml and values.yaml
  66
  67 .. figure:: ../../resources/images/oom_logo/oomLogoV2-Configure.png
  68    :align: right
  69
  70 Configure
  71 =========
  72
  73 Each project within ONAP has its own configuration data generally consisting
  74 of: environment variables, configuration files, and database initial values.
  75 Many technologies are used across the projects resulting in significant
  76 operational complexity and an inability to apply global parameters across the
  77 entire ONAP deployment. OOM solves this problem by introducing a common
  78 configuration technology, Helm charts, that provide a hierarchical
  79 configuration with the ability to override values with higher
  80 level charts or command line options.
  81
  82 The structure of the configuration of ONAP is shown in the following diagram.
  83 Note that key/value pairs of a parent will always take precedence over those
  84 of a child. Also note that values set on the command line have the highest
  85 precedence of all.
  86
  87 .. graphviz::
  88
  89    digraph config {
  90       {
  91          node     [shape=folder]
  92          oValues  [label="values.yaml"]
  93          demo     [label="onap-demo.yaml"]
  94          prod     [label="onap-production.yaml"]
  95          oReq     [label="Chart.yaml"]
  96          soValues [label="values.yaml"]
  97          soReq    [label="Chart.yaml"]
  98          mdValues [label="values.yaml"]
  99       }
 100       {
 101          oResources  [label="resources"]
 102       }
 103       onap -> oResources
 104       onap -> oValues
 105       oResources -> environments
 106       oResources -> oReq
 107       oReq -> so
 108       environments -> demo
 109       environments -> prod
 110       so -> soValues
 111       so -> soReq
 112       so -> charts
 113       charts -> mariadb
 114       mariadb -> mdValues
 115
 116    }
 117
 118 The top level onap/values.yaml file contains the values required to be set
 119 before deploying ONAP.  Here is the contents of this file:
 120
 121 .. collapse:: Default ONAP values.yaml
 122
 123     .. include:: ../../../../kubernetes/onap/values.yaml
 124        :code: yaml
 125
 126 |
 127
 128
 129 One may wish to create a value file that is specific to a given deployment such
 130 that it can be differentiated from other deployments.  For example, a
 131 onap-development.yaml file may create a minimal environment for development
 132 while onap-production.yaml might describe a production deployment that operates
 133 independently of the developer version.
 134
 135 For example, if the production OpenStack instance was different from a
 136 developer's instance, the onap-production.yaml file may contain a different
 137 value for the vnfDeployment/openstack/oam_network_cidr key as shown below.
 138
 139 .. code-block:: yaml
 140
 141   nsPrefix: onap
 142   nodePortPrefix: 302
 143   apps: consul msb mso message-router sdnc vid robot portal policy appc aai
 144   sdc dcaegen2 log cli multicloud clamp vnfsdk aaf kube2msb
 145   dataRootDir: /dockerdata-nfs
 146
 147   # docker repositories
 148   repository:
 149     onap: nexus3.onap.org:10001
 150     oom: oomk8s
 151     aai: aaionap
 152     filebeat: docker.elastic.co
 153
 154   image:
 155     pullPolicy: Never
 156
 157   # vnf deployment environment
 158   vnfDeployment:
 159     openstack:
 160       ubuntu_14_image: "Ubuntu_14.04.5_LTS"
 161       public_net_id: "e8f51956-00dd-4425-af36-045716781ffc"
 162       oam_network_id: "d4769dfb-c9e4-4f72-b3d6-1d18f4ac4ee6"
 163       oam_subnet_id: "191f7580-acf6-4c2b-8ec0-ba7d99b3bc4e"
 164       oam_network_cidr: "192.168.30.0/24"
 165   <...>
 166
 167
 168 To deploy ONAP with this environment file, enter::
 169
 170   > helm deploy local/onap -n onap -f onap/resources/environments/onap-production.yaml --set global.masterPassword=password
 171
 172
 173 .. collapse:: Default ONAP values.yaml
 174
 175     .. include:: ../../resources/yaml/environments_onap_demo.yaml
 176        :code: yaml
 177
 178 |
 179
 180 When deploying all of ONAP, the dependencies section of the Chart.yaml file
 181 controls which and what version of the ONAP components are included.
 182 Here is an excerpt of this file:
 183
 184 .. code-block:: yaml
 185
 186   dependencies:
 187   <...>
 188     - name: so
 189       version: ~11.0.0
 190       repository: '@local'
 191       condition: so.enabled
 192   <...>
 193
 194 The ~ operator in the `so` version value indicates that the latest "10.X.X"
 195 version of `so` shall be used thus allowing the chart to allow for minor
 196 upgrades that don't impact the so API; hence, version 10.0.1 will be installed
 197 in this case.
 198
 199 The onap/resources/environment/dev.yaml (see the excerpt below) enables
 200 for fine grained control on what components are included as part of this
 201 deployment. By changing this `so` line to `enabled: false` the `so` component
 202 will not be deployed.  If this change is part of an upgrade the existing `so`
 203 component will be shut down. Other `so` parameters and even `so` child values
 204 can be modified, for example the `so`'s `liveness` probe could be disabled
 205 (which is not recommended as this change would disable auto-healing of `so`).
 206
 207 .. code-block:: yaml
 208
 209   #################################################################
 210   # Global configuration overrides.
 211   #
 212   # These overrides will affect all helm charts (ie. applications)
 213   # that are listed below and are 'enabled'.
 214   #################################################################
 215   global:
 216   <...>
 217
 218   #################################################################
 219   # Enable/disable and configure helm charts (ie. applications)
 220   # to customize the ONAP deployment.
 221   #################################################################
 222   aaf:
 223     enabled: false
 224   <...>
 225   so: # Service Orchestrator
 226     enabled: true
 227
 228     replicaCount: 1
 229
 230     liveness:
 231       # necessary to disable liveness probe when setting breakpoints
 232       # in debugger so K8s doesn't restart unresponsive container
 233       enabled: true
 234
 235   <...>
 236
 237
 238 .. figure:: ../../resources/images/oom_logo/oomLogoV2-Monitor.png
 239    :align: right
 240
 241 Monitor
 242 =======
 243
 244 All highly available systems include at least one facility to monitor the
 245 health of components within the system.  Such health monitors are often used as
 246 inputs to distributed coordination systems (such as etcd, Zookeeper, or Consul)
 247 and monitoring systems (such as Nagios or Zabbix). OOM provides two mechanisms
 248 to monitor the real-time health of an ONAP deployment:
 249
 250 - a Consul GUI for a human operator or downstream monitoring systems and
 251   Kubernetes liveness probes that enable automatic healing of failed
 252   containers, and
 253 - a set of liveness probes which feed into the Kubernetes manager which
 254   are described in the Heal section.
 255
 256 Within ONAP, Consul is the monitoring system of choice and deployed by OOM in
 257 two parts:
 258
 259 - a three-way, centralized Consul server cluster is deployed as a highly
 260   available monitor of all of the ONAP components, and
 261 - a number of Consul agents.
 262
 263 The Consul server provides a user interface that allows a user to graphically
 264 view the current health status of all of the ONAP components for which agents
 265 have been created - a sample from the ONAP Integration labs follows:
 266
 267 .. figure:: ../../resources/images/consul/consulHealth.png
 268    :align: center
 269
 270 To see the real-time health of a deployment go to: ``http://<kubernetes IP>:30270/ui/``
 271 where a GUI much like the following will be found:
 272
 273 .. note::
 274   If Consul GUI is not accessible, you can refer this
 275   `kubectl port-forward <https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/>`_ method to access an application
 276
 277 .. figure:: ../../resources/images/oom_logo/oomLogoV2-Heal.png
 278    :align: right
 279
 280 Heal
 281 ====
 282
 283 The ONAP deployment is defined by Helm charts as mentioned earlier.  These Helm
 284 charts are also used to implement automatic recoverability of ONAP components
 285 when individual components fail. Once ONAP is deployed, a "liveness" probe
 286 starts checking the health of the components after a specified startup time.
 287
 288 Should a liveness probe indicate a failed container it will be terminated and a
 289 replacement will be started in its place - containers are ephemeral. Should the
 290 deployment specification indicate that there are one or more dependencies to
 291 this container or component (for example a dependency on a database) the
 292 dependency will be satisfied before the replacement container/component is
 293 started. This mechanism ensures that, after a failure, all of the ONAP
 294 components restart successfully.
 295
 296 To test healing, the following command can be used to delete a pod::
 297
 298   > kubectl delete pod [pod name] -n [pod namespace]
 299
 300 One could then use the following command to monitor the pods and observe the
 301 pod being terminated and the service being automatically healed with the
 302 creation of a replacement pod::
 303
 304   > kubectl get pods --all-namespaces -o=wide
 305
 306 .. figure:: ../../resources/images/oom_logo/oomLogoV2-Scale.png
 307    :align: right
 308
 309 Scale
 310 =====
 311
 312 Many of the ONAP components are horizontally scalable which allows them to
 313 adapt to expected offered load.  During the Beijing release scaling is static,
 314 that is during deployment or upgrade a cluster size is defined and this cluster
 315 will be maintained even in the presence of faults. The parameter that controls
 316 the cluster size of a given component is found in the values.yaml file for that
 317 component.  Here is an excerpt that shows this parameter:
 318
 319 .. code-block:: yaml
 320
 321   # default number of instances
 322   replicaCount: 1
 323
 324 In order to change the size of a cluster, an operator could use a helm upgrade
 325 (described in detail in the next section) as follows::
 326
 327    > helm upgrade [RELEASE] [CHART] [flags]
 328
 329 The RELEASE argument can be obtained from the following command::
 330
 331    > helm list
 332
 333 Below is the example for the same::
 334
 335   > helm list
 336     NAME                    REVISION        UPDATED                         STATUS          CHART                   APP VERSION     NAMESPACE
 337     dev                     1               Wed Oct 14 13:49:52 2020        DEPLOYED        onap-11.0.0             Kohn          onap
 338     dev-cassandra           5               Thu Oct 15 14:45:34 2020        DEPLOYED        cassandra-11.0.0                         onap
 339     dev-contrib             1               Wed Oct 14 13:52:53 2020        DEPLOYED        contrib-11.0.0                           onap
 340     dev-mariadb-galera      1               Wed Oct 14 13:55:56 2020        DEPLOYED        mariadb-galera-11.0.0                    onap
 341
 342 Here the Name column shows the RELEASE NAME, In our case we want to try the
 343 scale operation on cassandra, thus the RELEASE NAME would be dev-cassandra.
 344
 345 Now we need to obtain the chart name for cassandra. Use the below
 346 command to get the chart name::
 347
 348   > helm search cassandra
 349
 350 Below is the example for the same::
 351
 352   > helm search cassandra
 353     NAME                    CHART VERSION   APP VERSION     DESCRIPTION
 354     local/cassandra         11.0.0                          ONAP cassandra
 355     local/portal-cassandra  11.0.0                          Portal cassandra
 356     local/aaf-cass          11.0.0                          ONAP AAF cassandra
 357     local/sdc-cs            11.0.0                          ONAP Service Design and Creation Cassandra
 358
 359 Here the Name column shows the chart name. As we want to try the scale
 360 operation for cassandra, thus the corresponding chart name is local/cassandra
 361
 362
 363 Now we have both the command's arguments, thus we can perform the
 364 scale operation for cassandra as follows::
 365
 366   > helm upgrade dev-cassandra local/cassandra --set replicaCount=3
 367
 368 Using this command we can scale up or scale down the cassandra db instances.
 369
 370
 371 The ONAP components use Kubernetes provided facilities to build clustered,
 372 highly available systems including: Services_ with load-balancers, ReplicaSet_,
 373 and StatefulSet_.  Some of the open-source projects used by the ONAP components
 374 directly support clustered configurations, for example ODL and MariaDB Galera.
 375
 376 The Kubernetes Services_ abstraction to provide a consistent access point for
 377 each of the ONAP components, independent of the pod or container architecture
 378 of that component.  For example, SDN-C uses OpenDaylight clustering with a
 379 default cluster size of three but uses a Kubernetes service to and change the
 380 number of pods in this abstract this cluster from the other ONAP components
 381 such that the cluster could change size and this change is isolated from the
 382 other ONAP components by the load-balancer implemented in the ODL service
 383 abstraction.
 384
 385 A ReplicaSet_ is a construct that is used to describe the desired state of the
 386 cluster.  For example 'replicas: 3' indicates to Kubernetes that a cluster of 3
 387 instances is the desired state.  Should one of the members of the cluster fail,
 388 a new member will be automatically started to replace it.
 389
 390 Some of the ONAP components many need a more deterministic deployment; for
 391 example to enable intra-cluster communication. For these applications the
 392 component can be deployed as a Kubernetes StatefulSet_ which will maintain a
 393 persistent identifier for the pods and thus a stable network id for the pods.
 394 For example: the pod names might be web-0, web-1, web-{N-1} for N 'web' pods
 395 with corresponding DNS entries such that intra service communication is simple
 396 even if the pods are physically distributed across multiple nodes. An example
 397 of how these capabilities can be used is described in the Running Consul on
 398 Kubernetes tutorial.
 399
 400 .. figure:: ../../resources/images/oom_logo/oomLogoV2-Upgrade.png
 401    :align: right
 402
 403 Upgrade
 404 =======
 405
 406 Helm has built-in capabilities to enable the upgrade of pods without causing a
 407 loss of the service being provided by that pod or pods (if configured as a
 408 cluster).  As described in the OOM Developer's Guide, ONAP components provide
 409 an abstracted 'service' end point with the pods or containers providing this
 410 service hidden from other ONAP components by a load balancer. This capability
 411 is used during upgrades to allow a pod with a new image to be added to the
 412 service before removing the pod with the old image. This 'make before break'
 413 capability ensures minimal downtime.
 414
 415 Prior to doing an upgrade, determine of the status of the deployed charts::
 416
 417   > helm list
 418   NAME REVISION UPDATED                  STATUS    CHART     NAMESPACE
 419   so   1        Mon Feb 5 10:05:22 2020  DEPLOYED  so-11.0.0 onap
 420
 421 When upgrading a cluster a parameter controls the minimum size of the cluster
 422 during the upgrade while another parameter controls the maximum number of nodes
 423 in the cluster.  For example, SNDC configured as a 3-way ODL cluster might
 424 require that during the upgrade no fewer than 2 pods are available at all times
 425 to provide service while no more than 5 pods are ever deployed across the two
 426 versions at any one time to avoid depleting the cluster of resources. In this
 427 scenario, the SDNC cluster would start with 3 old pods then Kubernetes may add
 428 a new pod (3 old, 1 new), delete one old (2 old, 1 new), add two new pods (2
 429 old, 3 new) and finally delete the 2 old pods (3 new).  During this sequence
 430 the constraints of the minimum of two pods and maximum of five would be
 431 maintained while providing service the whole time.
 432
 433 Initiation of an upgrade is triggered by changes in the Helm charts.  For
 434 example, if the image specified for one of the pods in the SDNC deployment
 435 specification were to change (i.e. point to a new Docker image in the nexus3
 436 repository - commonly through the change of a deployment variable), the
 437 sequence of events described in the previous paragraph would be initiated.
 438
 439 For example, to upgrade a container by changing configuration, specifically an
 440 environment value::
 441
 442   > helm upgrade so onap/so --version 11.0.1 --set enableDebug=true
 443
 444 Issuing this command will result in the appropriate container being stopped by
 445 Kubernetes and replaced with a new container with the new environment value.
 446
 447 To upgrade a component to a new version with a new configuration file enter::
 448
 449   > helm upgrade so onap/so --version 11.0.1 -f environments/demo.yaml
 450
 451 To fetch release history enter::
 452
 453   > helm history so
 454   REVISION UPDATED                  STATUS     CHART     DESCRIPTION
 455   1        Mon Jul 5 10:05:22 2022  SUPERSEDED so-11.0.0 Install complete
 456   2        Mon Jul 5 10:10:55 2022  DEPLOYED   so-11.0.1 Upgrade complete
 457
 458 Unfortunately, not all upgrades are successful.  In recognition of this the
 459 lineup of pods within an ONAP deployment is tagged such that an administrator
 460 may force the ONAP deployment back to the previously tagged configuration or to
 461 a specific configuration, say to jump back two steps if an incompatibility
 462 between two ONAP components is discovered after the two individual upgrades
 463 succeeded.
 464
 465 This rollback functionality gives the administrator confidence that in the
 466 unfortunate circumstance of a failed upgrade the system can be rapidly brought
 467 back to a known good state.  This process of rolling upgrades while under
 468 service is illustrated in this short YouTube video showing a Zero Downtime
 469 Upgrade of a web application while under a 10 million transaction per second
 470 load.
 471
 472 For example, to roll-back back to previous system revision enter::
 473
 474   > helm rollback so 1
 475
 476   > helm history so
 477   REVISION UPDATED                  STATUS     CHART     DESCRIPTION
 478   1        Mon Jul 5 10:05:22 2022  SUPERSEDED so-11.0.0 Install complete
 479   2        Mon Jul 5 10:10:55 2022  SUPERSEDED so-11.0.1 Upgrade complete
 480   3        Mon Jul 5 10:14:32 2022  DEPLOYED   so-11.0.0 Rollback to 1
 481
 482 .. note::
 483
 484   The description field can be overridden to document actions taken or include
 485   tracking numbers.
 486
 487 Many of the ONAP components contain their own databases which are used to
 488 record configuration or state information.  The schemas of these databases may
 489 change from version to version in such a way that data stored within the
 490 database needs to be migrated between versions. If such a migration script is
 491 available it can be invoked during the upgrade (or rollback) by Container
 492 Lifecycle Hooks. Two such hooks are available, PostStart and PreStop, which
 493 containers can access by registering a handler against one or both. Note that
 494 it is the responsibility of the ONAP component owners to implement the hook
 495 handlers - which could be a shell script or a call to a specific container HTTP
 496 endpoint - following the guidelines listed on the Kubernetes site. Lifecycle
 497 hooks are not restricted to database migration or even upgrades but can be used
 498 anywhere specific operations need to be taken during lifecycle operations.
 499
 500 OOM uses Helm K8S package manager to deploy ONAP components. Each component is
 501 arranged in a packaging format called a chart - a collection of files that
 502 describe a set of k8s resources. Helm allows for rolling upgrades of the ONAP
 503 component deployed. To upgrade a component Helm release you will need an
 504 updated Helm chart. The chart might have modified, deleted or added values,
 505 deployment yamls, and more.  To get the release name use::
 506
 507   > helm ls
 508
 509 To easily upgrade the release use::
 510
 511   > helm upgrade [RELEASE] [CHART]
 512
 513 To roll back to a previous release version use::
 514
 515   > helm rollback [flags] [RELEASE] [REVISION]
 516
 517 For example, to upgrade the onap-so helm release to the latest SO container
 518 release v1.1.2:
 519
 520 - Edit so values.yaml which is part of the chart
 521 - Change "so: nexus3.onap.org:10001/openecomp/so:v1.1.1" to
 522   "so: nexus3.onap.org:10001/openecomp/so:v1.1.2"
 523 - From the chart location run::
 524
 525   > helm upgrade onap-so
 526
 527 The previous so pod will be terminated and a new so pod with an updated so
 528 container will be created.
 529
 530 .. figure:: ../../resources/images/oom_logo/oomLogoV2-Delete.png
 531    :align: right
 532
 533 Delete
 534 ======
 535
 536 Existing deployments can be partially or fully removed once they are no longer
 537 needed.  To minimize errors it is recommended that before deleting components
 538 from a running deployment the operator perform a 'dry-run' to display exactly
 539 what will happen with a given command prior to actually deleting anything.
 540 For example::
 541
 542   > helm undeploy onap --dry-run
 543
 544 will display the outcome of deleting the 'onap' release from the
 545 deployment.
 546 To completely delete a release and remove it from the internal store enter::
 547
 548   > helm undeploy onap
 549
 550 Once complete undeploy is done then delete the namespace as well
 551 using following command::
 552
 553   >  kubectl delete namespace <name of namespace>
 554
 555 .. note::
 556    You need to provide the namespace name which you used during deployment,
 557    below is the example::
 558
 559    >  kubectl delete namespace onap
 560
 561 One can also remove individual components from a deployment by changing the
 562 ONAP configuration values.  For example, to remove `so` from a running
 563 deployment enter::
 564
 565   > helm undeploy onap-so
 566
 567 will remove `so` as the configuration indicates it's no longer part of the
 568 deployment. This might be useful if a one wanted to replace just `so` by
 569 installing a custom version.