1 .. This work is licensed under a
2 Creative Commons Attribution 4.0 International License.
9 The Release stability has been evaluated by:
11 - The daily Honolulu CI/CD chain
16 The scope of these tests remains limited and does not provide a full set of
17 KPIs to determinate the limits and the dimensioning of the ONAP solution.
22 As usual, a daily CI chain dedicated to the release is created after RC0.
23 A Honolulu chain has been created on the 6th of April 2021.
25 The daily results can be found in `LF daily results web site
26 <https://logs.onap.org/onap-integration/daily/onap_daily_pod4_honolulu/2021-04/>`_.
28 Infrastructure Healthcheck Tests
29 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31 These tests deal with the Kubernetes/Helm tests on ONAP cluster.
33 The global expected criteria is **75%**.
34 The onap-k8s and onap-k8s-teardown providing a snapshop of the onap namespace in
35 Kubernetes as well as the onap-helm tests are expected to be PASS.
37 nodeport_check_certs test is expected to fail. Even tremendous progress have
38 been done in this area, some certificates (unmaintained, upstream or integration
39 robot pods) are still not correct due to bad certificate issuers (Root CA
40 certificate non valid) or extra long validity. Most of the certificates have
41 been installed using cert-manager and will be easily renewable.
43 .. image:: files/s3p/honolulu_daily_infrastructure_healthcheck.png
49 These tests are the traditionnal robot healthcheck tests and additional tests
50 dealing with a single component.
52 Some tests (basic_onboard, basic_cds) may fail episodically due to the fact that
53 the startup of the SDC is sometimes not fully completed.
55 The same test is run as first step of smoke tests and is usually PASS.
56 The mechanism to detect that all the components are fully operational may be
57 improved, timer based solutions are not robust enough.
59 The expectation is **100% OK**.
61 .. image:: files/s3p/honolulu_daily_healthcheck.png
67 These tests are end to end and automated use case tests.
68 See the :ref:`the Integration Test page <integration-tests>` for details.
70 The expectation is **100% OK**.
72 .. figure:: files/s3p/honolulu_daily_smoke.png
75 An error has been detected on the SDNC preventing the basic_vm_macro to work.
76 See `SDNC-1529 <https://jira.onap.org/browse/SDNC-1529/>`_ for details.
77 We may also notice that SO timeouts occured more frequently than in Guilin.
78 See `SO-3584 <https://jira.onap.org/browse/SO-3584>`_ for details.
83 These tests are tests dealing with security.
84 See the :ref:`the Integration Test page <integration-tests>` for details.
86 The expectation is **66% OK**. The criteria is met.
88 It may even be above as 2 fail tests are almost correct:
90 - The unlimited pod test is still fail due testing pod (DCAE-tca).
91 - The nonssl tests is FAIL due to so and so-etsi-sol003-adapter, which were
92 supposed to be managed with the ingress (not possible for this release) and
93 got a waiver in Frankfurt. The pods cds-blueprints-processor-http and aws-web
96 .. figure:: files/s3p/honolulu_daily_security.png
102 The goal of the resiliency testing was to evaluate the capability of the
103 Honolulu solution to survive a stop or restart of a Kubernetes control or
106 Controller node resiliency
107 ~~~~~~~~~~~~~~~~~~~~~~~~~~
109 By default the ONAP solution is installed with 3 controllers for high
110 availability. The test for controller resiliency can be described as follows:
112 - Run tests: check that they are PASS
113 - Stop a controller node: check that the node appears in NotReady state
114 - Run tests: check that they are PASS
116 2 tests were performed on the weekly honolulu lab. No problem was observed on
117 controller shutdown, tests were still PASS with a stoped controller node.
119 More details can be found in <https://jira.onap.org/browse/TEST-309>.
121 Worker node resiliency
122 ~~~~~~~~~~~~~~~~~~~~~~
124 In community weekly lab, the ONAP pods are distributed on 12 workers. The goal
125 of the test was to evaluate the behavior of the pod on a worker restart
126 (disaster scenario assuming that the node was moved accidentally from Ready to
128 The original conditions of such tests may be different as the Kubernetes
129 scheduler does not distribute the pods on the same worker from an installation
132 The test procedure can be described as follows:
134 - Run tests: check that they are PASS (Healthcheck and basic_vm used)
135 - Check that all the workers are in ready state
138 NAME STATUS ROLES AGE VERSION
139 compute01-onap-honolulu Ready <none> 18h v1.19.9
140 compute02-onap-honolulu Ready <none> 18h v1.19.9
141 compute03-onap-honolulu Ready <none> 18h v1.19.9
142 compute04-onap-honolulu Ready <none> 18h v1.19.9
143 compute05-onap-honolulu Ready <none> 18h v1.19.9
144 compute06-onap-honolulu Ready <none> 18h v1.19.9
145 compute07-onap-honolulu Ready <none> 18h v1.19.9
146 compute08-onap-honolulu Ready <none> 18h v1.19.9
147 compute09-onap-honolulu Ready <none> 18h v1.19.9
148 compute10-onap-honolulu Ready <none> 18h v1.19.9
149 compute11-onap-honolulu Ready <none> 18h v1.19.9
150 compute12-onap-honolulu Ready <none> 18h v1.19.9
151 control01-onap-honolulu Ready master 18h v1.19.9
152 control02-onap-honolulu Ready master 18h v1.19.9
153 control03-onap-honolulu Ready master 18h v1.19.9
155 - Select a worker, list the impacted pods
157 $ kubectl get pod -n onap --field-selector spec.nodeName=compute01-onap-honolulu
158 NAME READY STATUS RESTARTS AGE
159 onap-aaf-fs-7b6648db7f-shcn5 1/1 Running 1 22h
160 onap-aaf-oauth-5896545fb7-x6grg 1/1 Running 1 22h
161 onap-aaf-sms-quorumclient-2 1/1 Running 1 22h
162 onap-aai-modelloader-86d95c994b-87tsh 2/2 Running 2 22h
163 onap-aai-schema-service-75575cb488-7fxs4 2/2 Running 2 22h
164 onap-appc-cdt-58cb4766b6-vl78q 1/1 Running 1 22h
165 onap-appc-db-0 2/2 Running 4 22h
166 onap-appc-dgbuilder-5bb94d46bd-h2gbs 1/1 Running 1 22h
167 onap-awx-0 4/4 Running 4 22h
168 onap-cassandra-1 1/1 Running 1 22h
169 onap-cds-blueprints-processor-76f8b9b5c7-hb5bg 1/1 Running 1 22h
170 onap-dmaap-dr-db-1 2/2 Running 5 22h
171 onap-ejbca-6cbdb7d6dd-hmw6z 1/1 Running 1 22h
172 onap-kube2msb-858f46f95c-jws4m 1/1 Running 1 22h
173 onap-message-router-0 1/1 Running 1 22h
174 onap-message-router-kafka-0 1/1 Running 1 22h
175 onap-message-router-kafka-1 1/1 Running 1 22h
176 onap-message-router-kafka-2 1/1 Running 1 22h
177 onap-message-router-zookeeper-0 1/1 Running 1 22h
178 onap-multicloud-794c6dffc8-bfwr8 2/2 Running 2 22h
179 onap-multicloud-starlingx-58f6b86c55-mff89 3/3 Running 3 22h
180 onap-multicloud-vio-584d556876-87lxn 2/2 Running 2 22h
181 onap-music-cassandra-0 1/1 Running 1 22h
182 onap-netbox-nginx-8667d6675d-vszhb 1/1 Running 2 22h
183 onap-policy-api-6dbf8485d7-k7cpv 1/1 Running 1 22h
184 onap-policy-clamp-be-6d77597477-4mffk 1/1 Running 1 22h
185 onap-policy-pap-785bd79759-xxhvx 1/1 Running 1 22h
186 onap-policy-xacml-pdp-7d8fd58d59-d4m7g 1/1 Running 6 22h
187 onap-sdc-be-5f99c6c644-dcdz8 2/2 Running 2 22h
188 onap-sdc-fe-7577d58fb5-kwxpj 2/2 Running 2 22h
189 onap-sdc-wfd-fe-6997567759-gl9g6 2/2 Running 2 22h
190 onap-sdnc-dgbuilder-564d6475fd-xwwrz 1/1 Running 1 22h
191 onap-sdnrdb-master-0 1/1 Running 1 22h
192 onap-so-admin-cockpit-6c5b44694-h4d2n 1/1 Running 1 21h
193 onap-so-etsi-sol003-adapter-c9bf4464-pwn97 1/1 Running 1 21h
194 onap-so-sdc-controller-6899b98b8b-hfgvc 2/2 Running 2 21h
195 onap-vfc-mariadb-1 2/2 Running 4 21h
196 onap-vfc-nslcm-6c67677546-xcvl2 2/2 Running 2 21h
197 onap-vfc-vnflcm-78ff4d8778-sgtv6 2/2 Running 2 21h
198 onap-vfc-vnfres-6c96f9ff5b-swq5z 2/2 Running 2 21h
200 - Stop the worker (shutdown the machine for baremetal or the VM if you installed
201 your Kubernetes on top of an OpenStack solution)
202 - Wait for the pod eviction procedure completion (5 minutes)
205 NAME STATUS ROLES AGE VERSION
206 compute01-onap-honolulu NotReady <none> 18h v1.19.9
207 compute02-onap-honolulu Ready <none> 18h v1.19.9
208 compute03-onap-honolulu Ready <none> 18h v1.19.9
209 compute04-onap-honolulu Ready <none> 18h v1.19.9
210 compute05-onap-honolulu Ready <none> 18h v1.19.9
211 compute06-onap-honolulu Ready <none> 18h v1.19.9
212 compute07-onap-honolulu Ready <none> 18h v1.19.9
213 compute08-onap-honolulu Ready <none> 18h v1.19.9
214 compute09-onap-honolulu Ready <none> 18h v1.19.9
215 compute10-onap-honolulu Ready <none> 18h v1.19.9
216 compute11-onap-honolulu Ready <none> 18h v1.19.9
217 compute12-onap-honolulu Ready <none> 18h v1.19.9
218 control01-onap-honolulu Ready master 18h v1.19.9
219 control02-onap-honolulu Ready master 18h v1.19.9
220 control03-onap-honolulu Ready master 18h v1.19.9
222 - Run the tests: check that they are PASS
225 In these conditions, **the tests will never be PASS**. In fact several components
226 will remeain in INIT state.
227 A procedure is required to ensure a clean restart.
229 List the non running pods::
231 $ kubectl get pods -n onap --field-selector status.phase!=Running | grep -v Completed
232 NAME READY STATUS RESTARTS AGE
233 onap-appc-dgbuilder-5bb94d46bd-sxmmc 0/1 Init:3/4 15 156m
234 onap-cds-blueprints-processor-76f8b9b5c7-m7nmb 0/1 Init:1/3 0 156m
235 onap-portal-app-595bd6cd95-bkswr 0/2 Init:0/4 84 23h
236 onap-portal-db-config-6s75n 0/2 Error 0 23h
237 onap-portal-db-config-7trzx 0/2 Error 0 23h
238 onap-portal-db-config-jt2jl 0/2 Error 0 23h
239 onap-portal-db-config-mjr5q 0/2 Error 0 23h
240 onap-portal-db-config-qxvdt 0/2 Error 0 23h
241 onap-portal-db-config-z8c5n 0/2 Error 0 23h
242 onap-sdc-be-5f99c6c644-kplqx 0/2 Init:2/5 14 156
243 onap-vfc-nslcm-6c67677546-86mmj 0/2 Init:0/1 15 156m
244 onap-vfc-vnflcm-78ff4d8778-h968x 0/2 Init:0/1 15 156m
245 onap-vfc-vnfres-6c96f9ff5b-kt9rz 0/2 Init:0/1 15 156m
247 Some pods are not rescheduled (i.e. onap-awx-0 and onap-cassandra-1 above)
248 because they are part of a statefulset. List the statefulset objects::
250 $ kubectl get statefulsets.apps -n onap | grep -v "1/1" | grep -v "3/3"
252 onap-aaf-sms-quorumclient 2/3 24h
255 onap-cassandra 2/3 24h
256 onap-dmaap-dr-db 2/3 24h
257 onap-message-router 0/1 24h
258 onap-message-router-kafka 0/3 24h
259 onap-message-router-zookeeper 2/3 24h
260 onap-music-cassandra 2/3 24h
261 onap-sdnrdb-master 2/3 24h
262 onap-vfc-mariadb 2/3 24h
264 For the pods being part of the statefulset, a forced deleteion is required.
265 As an example if we consider the statefulset onap-sdnrdb-master, we must follow
268 $ kubectl get pods -n onap -o wide |grep onap-sdnrdb-master
269 onap-sdnrdb-master-0 1/1 Terminating 1 24h 10.42.3.92 node1
270 onap-sdnrdb-master-1 1/1 Running 1 24h 10.42.1.122 node2
271 onap-sdnrdb-master-2 1/1 Running 1 24h 10.42.2.134 node3
273 $ kubectl delete -n onap pod onap-sdnrdb-master-0 --force
274 warning: Immediate deletion does not wait for confirmation that the running
275 resource has been terminated. The resource may continue to run on the cluster
277 pod "onap-sdnrdb-master-0" force deleted
279 $ kubectl get pods |grep onap-sdnrdb-master
280 onap-sdnrdb-master-0 0/1 PodInitializing 0 11s
281 onap-sdnrdb-master-1 1/1 Running 1 24h
282 onap-sdnrdb-master-2 1/1 Running 1 24h
284 $ kubectl get pods |grep onap-sdnrdb-master
285 onap-sdnrdb-master-0 1/1 Running 0 43s
286 onap-sdnrdb-master-1 1/1 Running 1 24h
287 onap-sdnrdb-master-2 1/1 Running 1 24h
289 Once all the statefulset are properly restarted, the other components shall
290 continue their restart properly.
291 Once the restart of the pods is completed, the tests are PASS.
295 K8s node reboots/shutdown is showing some deficiencies in ONAP components in
296 regard of their availability measured with HC results. Some pods may
297 still fail to initialize after reboot/shutdown(pod rescheduled).
299 However cluster as a whole behaves as expected, pods are rescheduled after
300 node shutdown (except pods being part of statefulset which need to be deleted
301 forcibly - normal Kubernetes behavior)
303 On rebooted node, should its downtime not exceed eviction timeout, pods are
304 restarted back after it is again available.
306 Please see `Integration Resiliency page <https://jira.onap.org/browse/TEST-308>`_
312 Three stability tests have been performed in Honolulu:
315 - Simple instantiation test (basic_vm)
316 - Parallel instantiation test
321 In this test, we consider the basic_onboard automated test and we run 5
322 simultaneous onboarding procedures in parallel during 72h.
324 The basic_onboard test consists in the following steps:
326 - [SDC] VendorOnboardStep: Onboard vendor in SDC.
327 - [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC.
328 - [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC.
329 - [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file
332 The test has been initiated on the honolulu weekly lab on the 19th of April.
334 As already observed in daily|weekly|gating chain, we got race conditions on
335 some tests (https://jira.onap.org/browse/INT-1918).
337 The success rate is above 95% on the 100 first model upload and above 80%
338 until we onboard more than 500 models.
340 We may also notice that the function test_duration=f(time) increases
341 continuously. At the beginning the test takes about 200s, 24h later the same
342 test will take around 1000s.
343 Finally after 36h, the SDC systematically answers with a 500 HTTP answer code
344 explaining the linear decrease of the success rate.
346 The following graphs provides a good view of the SDC stability test.
348 .. image:: files/s3p/honolulu_sdc_stability.png
352 SDC can support up to 100s models onboarding.
353 The onbaording duration increases linearly with the number of onboarded
355 After a while, the SDC is no more usable.
356 No major Cluster resource issues have been detected during the test. The
357 memory consumption is however relatively high regarding the load.
359 .. image:: files/s3p/honolulu_sdc_stability_resources.png
363 Simple stability test
364 ~~~~~~~~~~~~~~~~~~~~~
366 This test consists on running the test basic_vm continuously during 72h.
368 We observe the cluster metrics as well as the evolution of the test duration.
370 The test basic_vm is described in :ref:`the Integration Test page <integration-tests>`.
372 The basic_vm test consists in the different following steps:
374 - [SDC] VendorOnboardStep: Onboard vendor in SDC.
375 - [SDC] YamlTemplateVspOnboardStep: Onboard vsp described in YAML file in SDC.
376 - [SDC] YamlTemplateVfOnboardStep: Onboard vf described in YAML file in SDC.
377 - [SDC] YamlTemplateServiceOnboardStep: Onboard service described in YAML file
379 - [AAI] RegisterCloudRegionStep: Register cloud region.
380 - [AAI] ComplexCreateStep: Create complex.
381 - [AAI] LinkCloudRegionToComplexStep: Connect cloud region with complex.
382 - [AAI] CustomerCreateStep: Create customer.
383 - [AAI] CustomerServiceSubscriptionCreateStep: Create customer's service
385 - [AAI] ConnectServiceSubToCloudRegionStep: Connect service subscription with
387 - [SO] YamlTemplateServiceAlaCarteInstantiateStep: Instantiate service described
388 in YAML using SO a'la carte method.
389 - [SO] YamlTemplateVnfAlaCarteInstantiateStep: Instantiate vnf described in YAML
390 using SO a'la carte method.
391 - [SO] YamlTemplateVfModuleAlaCarteInstantiateStep: Instantiate VF module
392 described in YAML using SO a'la carte method.
394 The test has been initiated on the Honolulu weekly lab on the 26th of April 2021.
395 This test has been run after the test described in the next section.
396 A first error occured after few hours (mariadbgalera), then the system
397 automatically recovered for some hours before a full crash of the mariadb
402 debian@control01-onap-honolulu:~$ kubectl get pod -n onap |grep mariadb-galera
403 onap-mariadb-galera-0 1/2 CrashLoopBackOff 625 5d16h
404 onap-mariadb-galera-1 1/2 CrashLoopBackOff 1134 5d16h
405 onap-mariadb-galera-2 1/2 CrashLoopBackOff 407 5d16h
408 It was unfortunately not possible to collect the root cause (logs of the first
409 restart of onap-mariadb-galera-1).
411 Community members reported that they already faced such issues and suggest to
412 deploy a single maria instance instead of using MariaDB galera.
413 Moreover, in Honolulu there were some changes in order to allign Camunda (SO)
414 requirements for MariaDB galera..
416 During the limited valid window, the success rate was about 78% (85% for the
417 same test in Guilin).
418 The duration of the test remain very variable as also already reported in Guilin
419 (https://jira.onap.org/browse/SO-3419). The duration of the same test may vary
420 from 500s to 2500s as illustrated in the following graph:
422 .. image:: files/s3p/honolulu_so_stability_1_duration.png
425 The changes in MariaDB galera seems to have introduced some issues leading to
426 more unexpected timeouts.
427 A troubleshooting campaign has been launched to evaluate possible evolutions in
430 Parallel instantiations stability test
431 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
433 Still based on basic_vm, 5 instantiation attempts are done simultaneously on the
434 ONAP solution during 48h.
436 The results can be described as follows:
438 .. image:: files/s3p/honolulu_so_stability_5.png
441 For this test, we have to restart the SDNC once. The last failures are due to
442 a certificate infrastructure issue and are independent from ONAP.
448 No major cluster resource issues have been detected in the cluster metrics
450 The metrics of the ONAP cluster have been recorded over the full week of
454 :file: ./files/csv/stability_cluster_metric_cpu.csv
455 :widths: 20,20,20,20,20
459 .. image:: files/s3p/honolulu_weekly_cpu.png
462 .. image:: files/s3p/honolulu_weekly_memory.png
465 The Top Ten for CPU consumption is given in the table below:
468 :file: ./files/csv/stability_top10_cpu.csv
469 :widths: 20,15,15,20,15,15
473 CPU consumption is negligeable and not dimensioning. It shall be reconsider for
474 use cases including extensive computation (loops, optimization algorithms).
476 The Top Ten for Memory consumption is given in the table below:
478 .. csv-table:: Memory
479 :file: ./files/csv/stability_top10_memory.csv
480 :widths: 20,15,15,20,15,15
484 Without surprise, the Cassandra databases are using most of the memory.
486 The Top Ten for Network consumption is given in the table below:
488 .. csv-table:: Network
489 :file: ./files/csv/stability_top10_net.csv
490 :widths: 10,15,15,15,15,15,15