docs/integration-s3p.rst

   1 .. _integration-s3p:
   2
   3 ONAP Maturity Testing Notes
   4 ---------------------------
   5
   6 Historically integration team used to execute specific stability and resilience
   7 tests on target release. For frankfurt a stability test was executed.
   8 Openlab, based on  Frankfurt RC0 dockers was also observed a long duration
   9 period to evaluate the overall stability.
  10 Finally the CI daily chain created at Frankfurt RC0 was also a precious indicator
  11 to estimate the solution stability.
  12
  13 No resilience or stress tests have been executed due to a lack of resources
  14 and late availability of the release. The testing strategy shall be amended in
  15 Guilin, several requirements have been created to improve the S3P testing domain.
  16
  17 Stability
  18 =========
  19
  20 ONAP stability was tested through a 72 hour test.
  21 The intent of the 72 hour stability test is not to exhaustively test all
  22 functions but to run a steady load against the system and look for issues like
  23 memory leaks that cannot be found in the short duration install and functional
  24 testing during the development cycle.
  25
  26 Integration Stability Testing verifies that the ONAP platform remains fully
  27 functional after running for an extended amounts of time.
  28 This is done by repeated running tests against an ONAP instance for a period of
  29 72 hours.
  30
  31 ::
  32
  33   **The 72 hour stability run result was PASS**
  34
  35 The onboard and instantiate tests ran for over **115 hours** before environment
  36 issues stopped the test. There were errors due to both tooling and environment
  37 errors.
  38
  39 The overall memory utilization only grew about **2%** on the work nodes despite
  40 the environment issues. Interestingly the kubernetes ochestration node memory
  41 grew more which could mean we are over driving the API's in some fashion.
  42
  43 We did not limit other tenant activities in Windriver during this test run and
  44 we saw the impact from things like the re-installation of SB00 in the tenant
  45 and general network latency impacts that caused openstack to be slower to
  46 instantiate.
  47 For future stability runs we should go back to the process of shutting down
  48 non-critical tenants in the test environment to free up host resources for
  49 the test run (or other ways to prevent other testing from affecting the stability
  50 run).
  51
  52 The control loop tests were **100% successful** and the cycle time for the loop was
  53 fairly consistent despite the environment issues. Future control loop stability
  54 tests should consider doing more policy edit type activites and running more
  55 control loop if host resources are available. The 10 second VES telemetry event
  56 is quite aggressive so we are sending more load into the VES collector and TCA
  57 engine during onset events than would be typical so adding additional loops
  58 should factor that in. The jenkins jobs ran fairly well although the instantiate
  59 Demo vFWCL took longer than usual and should be factored into future test planning.
  60
  61
  62 Methodology
  63 ~~~~~~~~~~~
  64
  65 The Stability Test has two main components:
  66
  67 - Running "ete stability72hr" Robot suite periodically.  This test suite
  68   verifies that ONAP can instantiate vDNS, vFWCL, and VVG.
  69 - Set up vFW Closed Loop to remain running, then check periodically that the
  70   closed loop functionality is still working.
  71
  72 The integration-longevity tenant in Intel/Windriver environment was used for the
  73 72 hour tests.
  74
  75 The onap-ci job for  "Project windriver-longevity-release-manual" was used for
  76 the deployment with the OOM set to frankfurt and Integration branches set to
  77 master. Integration master was used so we could catch the latest updates to
  78 integration scripts and vnf heat templates.
  79
  80 The jenkins job needs a couple of updates for each release:
  81
  82 - Set the integration branch to 'origin/master'
  83 - Modify the parameters to deploy.sh to specify "-i master" and "-o frankfurt"
  84   to get integration master an oom frankfurt clones onto the nfs server.
  85
  86 The path for robot logs on dockerdata-nfs  changed in Frankfurt so the
  87 /dev-robot/ becomes /dev/robot
  88
  89 .. note::
  90    For Frankfurt release, the  stability test has been executed on an
  91    kubernetes infrastructure based on El Alto recommendations. The kubernetes
  92    version was 1.15.3 (frankfurt 1.15.11) and the helm version was 2.14.2
  93    (frankfurt 2.16.6). However the ONAP dockers were updated to Frankfurt RC2
  94    candidate versions. The results are informative and can be compared with
  95    previous campaigns. The stability tests used robot container image
  96    **1.6.1-STAGING-20200519T201214Z**. Robot container was patched to use GRA_API
  97    since VNF_API has been deprecated.
  98
  99 Shakedown consists of creating some temporary tags for stability72hrvLB,
 100 stability72hrvVG,stability72hrVFWCL to make sure each sub test ran successfully
 101 (including cleanup) in the environment before the jenkins job started with the
 102 higher level testsuite tag stability72hr that covers all three test types.
 103
 104 Clean out the old buid jobs using a jenkins console script (manage jenkins)
 105
 106 ::
 107
 108   def jobName = "windriver-longevity-stability72hr"=
 109   def job = Jenkins.instance.getItem(jobName)
 110   job.getBuilds().each { it.delete() }
 111   job.nextBuildNumber = 1
 112   job.save()
 113
 114
 115 appc.properties updated to apply the fix for DMaaP message processing to call
 116 http://localhost:8181 for the streams update.
 117
 118 Results: 100% PASS
 119 ~~~~~~~~~~~~~~~~~~
 120 =================== ======== ========== ======== ========= =========
 121 Test Case           Attempts Env Issues Failures Successes Pass Rate
 122 =================== ======== ========== ======== ========= =========
 123 Stability 72 hours  77       19         0        58        100%
 124 vFW Closed Loop     60       0          0        100       100%
 125 **Total**           137      19         0        158       **100%**
 126 =================== ======== ========== ======== ========= =========
 127
 128 Detailed results can be found at https://wiki.onap.org/display/DW/Frankfurt+Stability+Run+Notes
 129
 130 .. note::
 131  - Overall results were good. All of the test failures were due to
 132    issues with the unstable environment and tooling framework.
 133  - JIRAs were created for readiness/liveness probe issues found while
 134    testing under the unstable environment. Patches applied to oom and
 135    testsuite during the testing helped reduce test failures due to
 136    environment and tooling framework issues.
 137  - The vFW Closed Loop test was very stable and self recovered from
 138    environment issues.
 139
 140 Resources overview
 141 ~~~~~~~~~~~~~~~~~~
 142 ============ ====================== =========== ========== ==========
 143 Date          #1 CPU                #1 RAM      CPU*       RAM**
 144 ============ ====================== =========== ========== ==========
 145 May 20 18:45 dcae-tca-anaytics:511m appc:2901Mi 1649       36092
 146 May 21 12:33 dcae-tca-anaytics:664m appc:2901Mi 1605       38221
 147 May 22 09:35 dcae-tca-anaytics:425m appc:2837Mi 1459       38488
 148 May 23 11:01 cassandra-1:371m       appc:2849Mi 1829       39431
 149 ============ ====================== =========== ========== ==========
 150
 151 .. note::
 152   - Results are given from the command "kubectl -n onap top pods | sort -rn -k 3
 153     | head -20"
 154   - * sum of the top 20 CPU consumption
 155   - ** sum of the top 20 RAM consumption
 156
 157 CI results
 158 ==========
 159
 160 A daily Frankfurt CI chain has been created after RC0.
 161
 162 The evolution of the full healthcheck test suite can be described as follows:
 163
 164 |image1|
 165
 166 Full healthcheck testsuite verifies the status of each component. It is
 167 composed of 47 tests. The success rate from the 9th to the 28th was never under
 168 95%.
 169
 170 4 test categories were defined:
 171
 172 - infrastructure healthcheck: test of ONAP kubernetes cluster and help chart status
 173 - healthcheck tests: verification of the components in the target deployment
 174   environment
 175 - smoke tests: basic VM tests (including onboarding/distribution/instantiation),
 176   and automated use cases (pnf-registrate, hvves, 5gbulkpm)
 177 - security tests
 178
 179 The security target (66% for Frankfurt) was reached after the RC1. A regression
 180 due to the automation of the hvves use case (triggering the exposition of a
 181 public port in HTTP) was fixed on the 28th of May.
 182
 183 |image2|
 184
 185 Orange Openlab
 186 ==============
 187
 188 The Orange Openlab is a community lab targeting ONAP end user. It provides an
 189 ONAP and cloud resources to discover ONAP.
 190 A Frankfurt pre-RC0 version was installed beginning of May. The usual gating
 191 testing suite was run daily in addition of the traffic generated by the lab
 192 users. The VM instantiation has been working well without any reinstallation
 193 over the **27** last days.
 194
 195 Resilience
 196 ==========
 197
 198 The resilience test executed in El Alto was not realized in Frankfurt.
 199
 200 .. |image1| image:: files/s3p/daily_frankfurt1.png
 201       :width: 6.5in
 202
 203 .. |image2| image:: files/s3p/daily_frankfurt2.png
 204       :width: 6.5in