-Resiliency tests
-----------------
-
-The goal of the resiliency testing was to evaluate the capability of the
-Istanbul solution to survive a stop or restart of a Kubernetes worker node.
-
-This test has been automated thanks to the
-Litmus chaos framework(https://litmuschaos.io/) and automated in the CI on the
-weekly chains.
-
-2 additional tests based on Litmus chaos scenario have been added but will be tuned
-in Jakarta.
-
-- node cpu hog (temporary increase of CPU on 1 kubernetes node)
-- node memory hog (temporary increase of Memory on 1 kubernetes node)
-
-The main test for Istanbul is node drain corresponding to the resiliency scenario
-previously managed manually.
-
-The system under test is defined in OOM.
-The resources are described in the table below:
-
-.. code-block:: shell
-
- +-------------------------+-------+--------+--------+
- | Name | vCPUs | Memory | Disk |
- +-------------------------+-------+--------+--------+
- | compute12-onap-istanbul | 16 | 24Go | 10 Go |
- | compute11-onap-istanbul | 16 | 24Go | 10 Go |
- | compute10-onap-istanbul | 16 | 24Go | 10 Go |
- | compute09-onap-istanbul | 16 | 24Go | 10 Go |
- | compute08-onap-istanbul | 16 | 24Go | 10 Go |
- | compute07-onap-istanbul | 16 | 24Go | 10 Go |
- | compute06-onap-istanbul | 16 | 24Go | 10 Go |
- | compute05-onap-istanbul | 16 | 24Go | 10 Go |
- | compute04-onap-istanbul | 16 | 24Go | 10 Go |
- | compute03-onap-istanbul | 16 | 24Go | 10 Go |
- | compute02-onap-istanbul | 16 | 24Go | 10 Go |
- | compute01-onap-istanbul | 16 | 24Go | 10 Go |
- | etcd03-onap-istanbul | 4 | 6Go | 10 Go |
- | etcd02-onap-istanbul | 4 | 6Go | 10 Go |
- | etcd01-onap-istanbul | 4 | 6Go | 10 Go |
- | control03-onap-istanbul | 4 | 6Go | 10 Go |
- | control02-onap-istanbul | 4 | 6Go | 10 Go |
- | control01-onap-istanbul | 4 | 6Go | 10 Go |
- +-------------------------+-------+--------+--------+
-
-
-The test sequence can be defined as follows:
-
-- Cordon a compute node (prevent any new scheduling)
-- Launch node drain chaos scenario, all the pods on the given compute node
- are evicted
-
-Once all the pods have been evicted:
-
-- Uncordon the compute node
-- Replay a basic_vm test
-
-This test has been successfully executed.
-
-.. image:: files/s3p/istanbul_resiliency.png
- :align: center
-
-.. important::
-
- Please note that the chaos framework select one compute node (the first one by
- default).
- The distribution of the pods is random, on our target architecture about 15
- pods are scheduled on each node. The chaos therefore affects only a limited
- number of pods.
-
-For the Istanbul tests, the evicted pods (compute01) were:
-
-
-.. code-block:: shell
-
- NAME READY STATUS RESTARTS AGE
- onap-aaf-service-dbd8fc76b-vnmqv 1/1 Running 0 2d19h
- onap-aai-graphadmin-5799bfc5bb-psfvs 2/2 Running 0 2d19h
- onap-cassandra-1 1/1 Running 0 2d19h
- onap-dcae-ves-collector-856fcb67bd-lb8sz 2/2 Running 0 2d19h
- onap-dcaemod-distributor-api-85df84df49-zj9zn 1/1 Running 0 2d19h
- onap-msb-consul-86975585d9-8nfs2 1/1 Running 0 2d19h
- onap-multicloud-pike-88bb965f4-v2qc8 2/2 Running 0 2d19h
- onap-netbox-nginx-5b9b57d885-hjv84 1/1 Running 0 2d19h
- onap-portal-app-66d9f54446-sjhld 2/2 Running 0 2d19h
- onap-sdnc-ueb-listener-5b6bb95c68-d24xr 1/1 Running 0 2d19h
- onap-sdnc-web-8f5c9fbcc-2l8sp 1/1 Running 0 2d19h
- onap-so-779655cb6b-9tzq4 2/2 Running 1 2d19h
- onap-so-oof-adapter-54b5b99788-x7rlk 2/2 Running 0 2d19h
-
-In the future, it would be interesting to elaborate a resiliency testing strategy
-in order to check the eviction of all the critical components.
-