docs/oom_user_guide.rst

   1 .. This work is licensed under a Creative Commons Attribution 4.0 International License.
   2 .. http://creativecommons.org/licenses/by/4.0
   3 .. Copyright 2018 Amdocs, Bell Canada
   4
   5 .. Links
   6 .. _Curated applications for Kubernetes: https://github.com/kubernetes/charts
   7 .. _Services: https://kubernetes.io/docs/concepts/services-networking/service/
   8 .. _ReplicaSet: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
   9 .. _StatefulSet: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
  10 .. _Helm Documentation: https://docs.helm.sh/helm/
  11 .. _Helm: https://docs.helm.sh/
  12 .. _Kubernetes: https://Kubernetes.io/
  13
  14 .. _user-guide-label:
  15
  16 OOM User Guide
  17 ##############
  18
  19 The ONAP Operations Manager (OOM) provide the ability to manage the entire
  20 life-cycle of an ONAP installation, from the initial deployment to final
  21 decommissioning. This guide provides instructions for users of ONAP to
  22 use the Kubernetes_/Helm_ system as a complete ONAP management system.
  23
  24 This guide provides many examples of Helm command line operations.  For a
  25 complete description of these commands please refer to the `Helm
  26 Documentation`_.
  27
  28 .. figure:: oomLogoV2-medium.png
  29    :align: right
  30
  31 The following sections describe the life-cycle operations:
  32
  33 - Deploy_ - with built-in component dependency management
  34 - Configure_ - unified configuration across all ONAP components
  35 - Monitor_ - real-time health monitoring feeding to a Consul UI and Kubernetes
  36 - Heal_- failed ONAP containers are recreated automatically
  37 - Scale_ - cluster ONAP services to enable seamless scaling
  38 - Upgrade_ - change-out containers or configuration with little or no service impact
  39 - Delete_ - cleanup individual containers or entire deployments
  40
  41 .. figure:: oomLogoV2-Deploy.png
  42    :align: right
  43
  44 Deploy
  45 ======
  46
  47 The OOM team with assistance from the ONAP project teams, have built a
  48 comprehensive set of Helm charts, yaml files very similar to TOSCA files, that
  49 describe the composition of each of the ONAP components and the relationship
  50 within and between components. Using this model Helm is able to deploy all of
  51 ONAP this simple command::
  52
  53   > helm install osn/onap
  54
  55 .. note::
  56   The osn repo is not currently available so creation of a local repository is
  57   required.
  58
  59 Helm is able to use charts served up from a repository and comes setup with a
  60 default CNCF provided `Curated applications for Kubernetes`_ repository called
  61 stable which should be removed to avoid confusion::
  62
  63   > helm repo remove stable
  64
  65 .. To setup the Open Source Networking Nexus repository for helm enter::
  66 ..  > helm repo add osn 'https://nexus3.onap.org:10001/helm/helm-repo-in-nexus/master/'
  67
  68 To prepare your system for an installation of ONAP, you'll need to::
  69
  70   > git clone http://gerrit.onap.org/r/oom
  71   > cd kubernetes
  72
  73 Then build your local Helm repository::
  74
  75   > make all
  76
  77 To setup a local Helm server to server up the ONAP charts::
  78
  79   > helm serve &
  80
  81 Note the port number that is listed and use it in the Helm repo add as follows::
  82
  83   > helm repo add local http://127.0.0.1:8879
  84
  85 To get a list of all of the available Helm chart repositories::
  86
  87   > helm repo list
  88   NAME   URL
  89   local  http://127.0.0.1:8879
  90
  91 The Helm search command reads through all of the repositories configured on the
  92 system, and looks for matches::
  93
  94   > helm search -l
  95   NAME                    VERSION    DESCRIPTION
  96   local/appc              2.0.0      Application Controller
  97   local/clamp             2.0.0      ONAP Clamp
  98   local/common            2.0.0      Common templates for inclusion in other charts
  99   local/onap              2.0.0      Open Network Automation Platform (ONAP)
 100   local/robot             2.0.0      A helm Chart for kubernetes-ONAP Robot
 101   local/so                2.0.0      ONAP Service Orchestrator
 102
 103 In any case, setup of the Helm repository is a one time activity.
 104
 105 Once the repo is setup, installation of ONAP can be done with a single command::
 106
 107   > helm install local/onap -name development
 108
 109 This will install ONAP from a local repository in a 'development' Helm release.
 110 As described below, to override the default configuration values provided by
 111 OOM, an environment file can be provided on the command line as follows::
 112
 113   > helm install local/onap -name development -f onap-development.yaml
 114
 115 To get a summary of the status of all of the pods (containers) running in your
 116 deployment::
 117
 118   > kubectl get pods --all-namespaces -o=wide
 119
 120 .. note::
 121   The Kubernetes namespace concept allows for multiple instances of a component
 122   (such as all of ONAP) to co-exist with other components in the same
 123   Kubernetes cluster by isolating them entirely.  Namespaces share only the
 124   hosts that form the cluster thus providing isolation between production and
 125   development systems as an example.  The OOM deployment of ONAP in Beijing is
 126   now done within a single Kubernetes namespace where in Amsterdam a namespace
 127   was created for each of the ONAP components.
 128
 129 .. note::
 130   The Helm `-name` option refers to a release name and not a Kubernetes namespace.
 131
 132
 133 To install a specific version of a single ONAP component (`so` in this example)
 134 with the given name enter::
 135
 136   > helm install onap/so --version 2.0.1 -n so
 137
 138 To display details of a specific resource or group of resources type::
 139
 140   > kubectl describe pod so-1071802958-6twbl
 141
 142 where the pod identifier refers to the auto-generated pod identifier.
 143
 144 .. figure:: oomLogoV2-Configure.png
 145    :align: right
 146
 147 Configure
 148 =========
 149
 150 Each project within ONAP has its own configuration data generally consisting
 151 of: environment variables, configuration files, and database initial values.
 152 Many technologies are used across the projects resulting in significant
 153 operational complexity and an inability to apply global parameters across the
 154 entire ONAP deployment. OOM solves this problem by introducing a common
 155 configuration technology, Helm charts, that provide a hierarchical
 156 configuration configuration with the ability to override values with higher
 157 level charts or command line options.
 158
 159 The structure of the configuration of ONAP is shown in the following diagram.
 160 Note that key/value pairs of a parent will always take precedence over those
 161 of a child. Also note that values set on the command line have the highest
 162 precedence of all.
 163
 164 .. graphviz::
 165
 166    digraph config {
 167       {
 168          node     [shape=folder]
 169          oValues  [label="values.yaml"]
 170          demo     [label="onap-demo.yaml"]
 171          prod     [label="onap-production.yaml"]
 172          oReq     [label="requirements.yaml"]
 173          soValues [label="values.yaml"]
 174          soReq    [label="requirements.yaml"]
 175          mdValues [label="values.yaml"]
 176       }
 177       {
 178          oResources  [label="resources"]
 179       }
 180       onap -> oResources
 181       onap -> oValues
 182       oResources -> environments
 183       oResources -> oReq
 184       oReq -> so
 185       environments -> demo
 186       environments -> prod
 187       so -> soValues
 188       so -> soReq
 189       so -> charts
 190       charts -> mariadb
 191       mariadb -> mdValues
 192
 193    }
 194
 195 The top level onap/values.yaml file contains the values required to be set
 196 before deploying ONAP.  Here is the contents of this file:
 197
 198 .. include:: onap_values.yaml
 199    :code: yaml
 200
 201 One may wish to create a value file that is specific to a given deployment such
 202 that it can be differentiated from other deployments.  For example, a
 203 onap-development.yaml file may create a minimal environment for development
 204 while onap-production.yaml might describe a production deployment that operates
 205 independently of the developer version.
 206
 207 For example, if the production OpenStack instance was different from a
 208 developer's instance, the onap-production.yaml file may contain a different
 209 value for the vnfDeployment/openstack/oam_network_cidr key as shown below.
 210
 211 .. code-block:: yaml
 212
 213   nsPrefix: onap
 214   nodePortPrefix: 302
 215   apps: consul msb mso message-router sdnc vid robot portal policy appc aai
 216   sdc dcaegen2 log cli multicloud clamp vnfsdk aaf kube2msb
 217   dataRootDir: /dockerdata-nfs
 218
 219   # docker repositories
 220   repository:
 221     onap: nexus3.onap.org:10001
 222     oom: oomk8s
 223     aai: aaionap
 224     filebeat: docker.elastic.co
 225
 226   image:
 227     pullPolicy: Never
 228
 229   # vnf deployment environment
 230   vnfDeployment:
 231     openstack:
 232       ubuntu_14_image: "Ubuntu_14.04.5_LTS"
 233       public_net_id: "e8f51956-00dd-4425-af36-045716781ffc"
 234       oam_network_id: "d4769dfb-c9e4-4f72-b3d6-1d18f4ac4ee6"
 235       oam_subnet_id: "191f7580-acf6-4c2b-8ec0-ba7d99b3bc4e"
 236       oam_network_cidr: "192.168.30.0/24"
 237   <...>
 238
 239
 240 To deploy ONAP with this environment file, enter::
 241
 242   > helm install local/onap -n beijing -f environments/onap-production.yaml
 243
 244 .. include:: environments_onap_demo.yaml
 245    :code: yaml
 246
 247 When deploying all of ONAP a requirements.yaml file control which and what
 248 version of the ONAP components are included.  Here is an excerpt of this
 249 file:
 250
 251 .. code-block:: yaml
 252
 253   # Referencing a named repo called 'local'.
 254   # Can add this repo by running commands like:
 255   # > helm serve
 256   # > helm repo add local http://127.0.0.1:8879
 257   dependencies:
 258   <...>
 259     - name: so
 260       version: ~2.0.0
 261       repository: '@local'
 262       condition: so.enabled
 263   <...>
 264
 265 The ~ operator in the `so` version value indicates that the latest "2.X.X"
 266 version of `so` shall be used thus allowing the chart to allow for minor
 267 upgrades that don't impact the so API; hence, version 2.0.1 will be installed
 268 in this case.
 269
 270 The onap/resources/environment/onap-dev.yaml (see the excerpt below) enables
 271 for fine grained control on what components are included as part of this
 272 deployment. By changing this `so` line to `enabled: false` the `so` component
 273 will not be deployed.  If this change is part of an upgrade the existing `so`
 274 component will be shut down. Other `so` parameters and even `so` child values
 275 can be modified, for example the `so`'s `liveness` probe could be disabled
 276 (which is not recommended as this change would disable auto-healing of `so`).
 277
 278 .. code-block:: yaml
 279
 280   #################################################################
 281   # Global configuration overrides.
 282   #
 283   # These overrides will affect all helm charts (ie. applications)
 284   # that are listed below and are 'enabled'.
 285   #################################################################
 286   global:
 287   <...>
 288
 289   #################################################################
 290   # Enable/disable and configure helm charts (ie. applications)
 291   # to customize the ONAP deployment.
 292   #################################################################
 293   aaf:
 294     enabled: false
 295   <...>
 296   so: # Service Orchestrator
 297     enabled: true
 298
 299     replicaCount: 1
 300
 301     liveness:
 302       # necessary to disable liveness probe when setting breakpoints
 303       # in debugger so K8s doesn't restart unresponsive container
 304       enabled: true
 305
 306   <...>
 307
 308 .. figure:: oomLogoV2-Monitor.png
 309    :align: right
 310
 311 Monitor
 312 =======
 313
 314 All highly available systems include at least one facility to monitor the
 315 health of components within the system.  Such health monitors are often used as
 316 inputs to distributed coordination systems (such as etcd, zookeeper, or consul)
 317 and monitoring systems (such as nagios or zabbix). OOM provides two mechanims
 318 to monitor the real-time health of an ONAP deployment:
 319
 320 - a Consul GUI for a human operator or downstream monitoring systems and
 321   Kubernetes liveness probes that enable automatic healing of failed
 322   containers, and
 323 - a set of liveness probes which feed into the Kubernetes manager which
 324   are described in the Heal section.
 325
 326 Within ONAP Consul is the monitoring system of choice and deployed by OOM in two parts:
 327
 328 - a three-way, centralized Consul server cluster is deployed as a highly
 329   available monitor of all of the ONAP components,and
 330 - a number of Consul agents.
 331
 332 The Consul server provides a user interface that allows a user to graphically
 333 view the current health status of all of the ONAP components for which agents
 334 have been created - a sample from the ONAP Integration labs follows:
 335
 336 .. figure:: consulHealth.png
 337    :align: center
 338
 339 To see the real-time health of a deployment go to: http://<kubernetes IP>:30270/ui/
 340 where a GUI much like the following will be found:
 341
 342
 343 .. figure:: oomLogoV2-Heal.png
 344    :align: right
 345
 346 Heal
 347 ====
 348
 349 The ONAP deployment is defined by Helm charts as mentioned earlier.  These Helm
 350 charts are also used to implement automatic recoverability of ONAP components
 351 when individual components fail. Once ONAP is deployed, a "liveness" probe
 352 starts checking the health of the components after a specified startup time.
 353
 354 Should a liveness probe indicate a failed container it will be terminated and a
 355 replacement will be started in its place - containers are ephemeral. Should the
 356 deployment specification indicate that there are one or more dependencies to
 357 this container or component (for example a dependency on a database) the
 358 dependency will be satisfied before the replacement container/component is
 359 started. This mechanism ensures that, after a failure, all of the ONAP
 360 components restart successfully.
 361
 362 To test healing, the following command can be used to delete a pod::
 363
 364   > kubectl delete pod [pod name] -n [pod namespace]
 365
 366 One could then use the following command to monitor the pods and observe the
 367 pod being terminated and the service being automatically healed with the
 368 creation of a replacement pod::
 369
 370   > kubectl get pods --all-namespaces -o=wide
 371
 372 .. figure:: oomLogoV2-Scale.png
 373    :align: right
 374
 375 Scale
 376 =====
 377
 378 Many of the ONAP components are horizontally scalable which allows them to
 379 adapt to expected offered load.  During the Beijing release scaling is static,
 380 that is during deployment or upgrade a cluster size is defined and this cluster
 381 will be maintained even in the presence of faults. The parameter that controls
 382 the cluster size of a given component is found in the values.yaml file for that
 383 component.  Here is an excerpt that shows this parameter:
 384
 385 .. code-block:: yaml
 386
 387   # default number of instances
 388   replicaCount: 1
 389
 390 In order to change the size of a cluster, an operator could use a helm upgrade
 391 (described in detail in the next section) as follows::
 392
 393   > helm upgrade --set replicaCount=3 onap/so/mariadb
 394
 395 The ONAP components use Kubernetes provided facilities to build clustered,
 396 highly available systems including: Services_ with load-balancers, ReplicaSet_,
 397 and StatefulSet_.  Some of the open-source projects used by the ONAP components
 398 directly support clustered configurations, for example ODL and MariaDB Galera.
 399
 400 The Kubernetes Services_ abstraction to provide a consistent access point for
 401 each of the ONAP components, independent of the pod or container architecture
 402 of that component.  For example, SDN-C uses OpenDaylight clustering with a
 403 default cluster size of three but uses a Kubernetes service to and change the
 404 number of pods in this abstract this cluster from the other ONAP components
 405 such that the cluster could change size and this change is isolated from the
 406 other ONAP components by the load-balancer implemented in the ODL service
 407 abstraction.
 408
 409 A ReplicaSet_ is a construct that is used to describe the desired state of the
 410 cluster.  For example 'replicas: 3' indicates to Kubernetes that a cluster of 3
 411 instances is the desired state.  Should one of the members of the cluster fail,
 412 a new member will be automatically started to replace it.
 413
 414 Some of the ONAP components many need a more deterministic deployment; for
 415 example to enable intra-cluster communication. For these applications the
 416 component can be deployed as a Kubernetes StatefulSet_ which will maintain a
 417 persistent identifier for the pods and thus a stable network id for the pods.
 418 For example: the pod names might be web-0, web-1, web-{N-1} for N 'web' pods
 419 with corresponding DNS entries such that intra service communication is simple
 420 even if the pods are physically distributed across multiple nodes. An example
 421 of how these capabilities can be used is described in the Running Consul on
 422 Kubernetes tutorial.
 423
 424 .. figure:: oomLogoV2-Upgrade.png
 425    :align: right
 426
 427 Upgrade
 428 =======
 429
 430 Helm has built-in capabilities to enable the upgrade of pods without causing a
 431 loss of the service being provided by that pod or pods (if configured as a
 432 cluster).  As described in the OOM Developer's Guide, ONAP components provide
 433 an abstracted 'service' end point with the pods or containers providing this
 434 service hidden from other ONAP components by a load balancer. This capability
 435 is used during upgrades to allow a pod with a new image to be added to the
 436 service before removing the pod with the old image. This 'make before break'
 437 capability ensures minimal downtime.
 438
 439 Prior to doing an upgrade, determine of the status of the deployed charts::
 440
 441   > helm list
 442   NAME REVISION UPDATED                  STATUS    CHART     NAMESPACE
 443   so   1        Mon Feb 5 10:05:22 2018  DEPLOYED  so-2.0.1  default
 444
 445 When upgrading a cluster a parameter controls the minimum size of the cluster
 446 during the upgrade while another parameter controls the maximum number of nodes
 447 in the cluster.  For example, SNDC configured as a 3-way ODL cluster might
 448 require that during the upgrade no fewer than 2 pods are available at all times
 449 to provide service while no more than 5 pods are ever deployed across the two
 450 versions at any one time to avoid depleting the cluster of resources. In this
 451 scenario, the SDNC cluster would start with 3 old pods then Kubernetes may add
 452 a new pod (3 old, 1 new), delete one old (2 old, 1 new), add two new pods (2
 453 old, 3 new) and finally delete the 2 old pods (3 new).  During this sequence
 454 the constraints of the minimum of two pods and maximum of five would be
 455 maintained while providing service the whole time.
 456
 457 Initiation of an upgrade is triggered by changes in the Helm charts.  For
 458 example, if the image specified for one of the pods in the SDNC deployment
 459 specification were to change (i.e. point to a new Docker image in the nexus3
 460 repository - commonly through the change of a deployment variable), the
 461 sequence of events described in the previous paragraph would be initiated.
 462
 463 For example, to upgrade a container by changing configuration, specifically an
 464 environment value::
 465
 466   > helm upgrade beijing onap/so --version 2.0.1 --set enableDebug=true
 467
 468 Issuing this command will result in the appropriate container being stopped by
 469 Kubernetes and replaced with a new container with the new environment value.
 470
 471 To upgrade a component to a new version with a new configuration file enter::
 472
 473   > helm upgrade beijing onap/so --version 2.0.2 -f environments/demo.yaml
 474
 475 To fetch release history enter::
 476
 477   > helm history so
 478   REVISION UPDATED                  STATUS     CHART     DESCRIPTION
 479   1        Mon Feb 5 10:05:22 2018  SUPERSEDED so-2.0.1  Install complete
 480   2        Mon Feb 5 10:10:55 2018  DEPLOYED   so-2.0.2  Upgrade complete
 481
 482 Unfortunately, not all upgrades are successful.  In recognition of this the
 483 lineup of pods within an ONAP deployment is tagged such that an administrator
 484 may force the ONAP deployment back to the previously tagged configuration or to
 485 a specific configuration, say to jump back two steps if an incompatibility
 486 between two ONAP components is discovered after the two individual upgrades
 487 succeeded.
 488
 489 This rollback functionality gives the administrator confidence that in the
 490 unfortunate circumstance of a failed upgrade the system can be rapidly brought
 491 back to a known good state.  This process of rolling upgrades while under
 492 service is illustrated in this short YouTube video showing a Zero Downtime
 493 Upgrade of a web application while under a 10 million transaction per second
 494 load.
 495
 496 For example, to roll-back back to previous system revision enter::
 497
 498   > helm rollback so 1
 499
 500   > helm history so
 501   REVISION UPDATED                  STATUS     CHART     DESCRIPTION
 502   1        Mon Feb 5 10:05:22 2018  SUPERSEDED so-2.0.1  Install complete
 503   2        Mon Feb 5 10:10:55 2018  SUPERSEDED so-2.0.2  Upgrade complete
 504   3        Mon Feb 5 10:14:32 2018  DEPLOYED   so-2.0.1  Rollback to 1
 505
 506 .. note::
 507
 508   The description field can be overridden to document actions taken or include
 509   tracking numbers.
 510
 511 Many of the ONAP components contain their own databases which are used to
 512 record configuration or state information.  The schemas of these databases may
 513 change from version to version in such a way that data stored within the
 514 database needs to be migrated between versions. If such a migration script is
 515 available it can be invoked during the upgrade (or rollback) by Container
 516 Lifecycle Hooks. Two such hooks are available, PostStart and PreStop, which
 517 containers can access by registering a handler against one or both. Note that
 518 it is the responsibility of the ONAP component owners to implement the hook
 519 handlers - which could be a shell script or a call to a specific container HTTP
 520 endpoint - following the guidelines listed on the Kubernetes site. Lifecycle
 521 hooks are not restricted to database migration or even upgrades but can be used
 522 anywhere specific operations need to be taken during lifecycle operations.
 523
 524 OOM uses Helm K8S package manager to deploy ONAP components. Each component is
 525 arranged in a packaging format called a chart - a collection of files that
 526 describe a set of k8s resources. Helm allows for rolling upgrades of the ONAP
 527 component deployed. To upgrade a component Helm release you will need an
 528 updated Helm chart. The chart might have modified, deleted or added values,
 529 deployment yamls, and more.  To get the release name use::
 530
 531   > helm ls
 532
 533 To easily upgrade the release use::
 534
 535   > helm upgrade [RELEASE] [CHART]
 536
 537 To roll back to a previous release version use::
 538
 539   > helm rollback [flags] [RELEASE] [REVISION]
 540
 541 For example, to upgrade the onap-so helm release to the latest SO container
 542 release v1.1.2:
 543
 544 - Edit so values.yaml which is part of the chart
 545 - Change "so: nexus3.onap.org:10001/openecomp/so:v1.1.1" to
 546   "so: nexus3.onap.org:10001/openecomp/so:v1.1.2"
 547 - From the chart location run::
 548
 549   > helm upgrade onap-so
 550
 551 The previous so pod will be terminated and a new so pod with an updated so
 552 container will be created.
 553
 554 .. figure:: oomLogoV2-Delete.png
 555    :align: right
 556
 557 Delete
 558 ======
 559
 560 Existing deployments can be partially or fully removed once they are no longer
 561 needed.  To minimize errors it is recommended that before deleting components
 562 from a running deployment the operator perform a 'dry-run' to display exactly
 563 what will happen with a given command prior to actually deleting anything.  For
 564 example::
 565
 566   > helm delete --dry-run beijing
 567
 568 will display the outcome of deleting the 'beijing' release from the deployment.
 569 To completely delete a release and remove it from the internal store enter::
 570
 571   > helm delete --purge beijing
 572
 573 One can also remove individual components from a deployment by changing the
 574 ONAP configuration values.  For example, to remove `so` from a running
 575 deployment enter::
 576
 577   > helm upgrade beijing osn/onap --set so.enabled=false
 578
 579 will remove `so` as the configuration indicates it's no longer part of the
 580 deployment. This might be useful if a one wanted to replace just `so` by
 581 installing a custom version.