vnfs/DAaaS/README.md

   1 # Distributed Analytics Framework
   2
   3
   4 ## Pre-requisites
   5 | Required   | Version |
   6 |------------|---------|
   7 | Kubernetes | 1.12.3+ |
   8 | Docker CE  | 18.09+  |
   9 | Helm       | >=2.12.1 and <=2.13.1 |
  10 ## Download Framework
  11 ```bash
  12 git clone https://github.com/onap/demo.git
  13 DA_WORKING_DIR=$PWD/demo/vnfs/DAaaS/deploy
  14 ```
  15
  16 ## Install Istio Service Mesh
  17
  18 ## Istio is installed in two Steps
  19 ```bash
  20 1. Istio-Operator
  21 2. Istio-config
  22 ```
  23
  24 ## Download the Istio Installation repo
  25
  26 ```bash
  27 cd DA_WORKING_DIR/00-init
  28 helm install --name=istio-operator --namespace=istio-system istio-operator
  29 helm install istio-instance --name istio --namespace istio-system
  30 ```
  31
  32 ## Install Metallb to act as a Loadbalancer
  33 ```bash
  34 cd  DA_WORKING_DIR/00-init
  35 NOTE: Update the IP Address Ranges before you Install Metallb
  36 helm install --name metallb -f values.yaml metallb
  37 ```
  38
  39 ## Install Rook-Ceph for Persistent Storage
  40 Note: This is unusual but Flex volume path can be different than the default value. values.yaml has the most common flexvolume path configured. In case of errors related to flexvolume please refer to the https://rook.io/docs/rook/v0.9/flexvolume.html#configuring-the-flexvolume-path to find the appropriate flexvolume-path and set it in values.yaml
  41 ```bash
  42 cd $DA_WORKING_DIR/00-init/rook-ceph
  43 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
  44 ```
  45 Check for the status of the pods in rook-ceph namespace. Once all pods are in Ready state move on to the next section.
  46
  47 ```bash
  48 $ kubectl get pods -n rook-ceph-system
  49 NAME                                 READY   STATUS    RESTARTS   AGE
  50 rook-ceph-agent-9wszf                1/1     Running   0          121s
  51 rook-ceph-agent-xnbt8                1/1     Running   0          121s
  52 rook-ceph-operator-bc77d6d75-ltwww   1/1     Running   0          158s
  53 rook-discover-bvj65                  1/1     Running   0          133s
  54 rook-discover-nbfrp                  1/1     Running   0          133s
  55 ```
  56 ```bash
  57 $ kubectl -n rook-ceph get pod
  58 NAME                                   READY   STATUS      RESTARTS   AGE
  59 rook-ceph-mgr-a-d9dcf5748-5s9ft        1/1     Running     0          77s
  60 rook-ceph-mon-a-7d8f675889-nw5pl       1/1     Running     0          105s
  61 rook-ceph-mon-b-856fdd5cb9-5h2qk       1/1     Running     0          94s
  62 rook-ceph-mon-c-57545897fc-j576h       1/1     Running     0          85s
  63 rook-ceph-osd-0-7cbbbf749f-j8fsd       1/1     Running     0          25s
  64 rook-ceph-osd-1-7f67f9646d-44p7v       1/1     Running     0          25s
  65 rook-ceph-osd-2-6cd4b776ff-v4d68       1/1     Running     0          25s
  66 rook-ceph-osd-prepare-vx2rz            0/2     Completed   0          60s
  67 rook-ceph-tools-5bd5cdb949-j68kk       1/1     Running     0          53s
  68 ```
  69
  70 #### Troubleshooting Rook-Ceph installation
  71
  72 In case your machine had rook previously installed successfully or unsuccessfully
  73 and you are attempting a fresh installation of rook operator, you may face some issues.
  74 Lets help you with that.
  75
  76 * First check if there are some rook CRDs existing :
  77 ```
  78 kubectl get crds | grep rook
  79 ```
  80 If this return results like :
  81 ```
  82 otc@otconap7 /var/lib/rook $  kc get crds | grep rook
  83 cephblockpools.ceph.rook.io         2019-07-19T18:19:05Z
  84 cephclusters.ceph.rook.io           2019-07-19T18:19:05Z
  85 cephfilesystems.ceph.rook.io        2019-07-19T18:19:05Z
  86 cephobjectstores.ceph.rook.io       2019-07-19T18:19:05Z
  87 cephobjectstoreusers.ceph.rook.io   2019-07-19T18:19:05Z
  88 volumes.rook.io                     2019-07-19T18:19:05Z
  89 ```
  90 then you should delete these previously existing rook based CRDs by generating a delete
  91 manifest file by these commands and then deleting those files:
  92 ```
  93 helm template -n rook . -f values.yaml > ~/delete.yaml
  94 kc delete -f ~/delete.yaml
  95 ```
  96
  97 After this, delete the below directory in all the nodes.
  98 ```
  99 sudo rm -rf /var/lib/rook/
 100 ```
 101 Now, again attempt :
 102 ```
 103 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
 104 ```
 105
 106 ## Install Operator package
 107 ### Build docker images
 108 #### collectd-operator
 109 ```bash
 110 cd $DA_WORKING_DIR/../microservices
 111
 112 ## Note: The image tag and respository in the Collectd-operator helm charts needs to match the IMAGE_NAME
 113 IMAGE_NAME=dcr.cluster.local:32644/collectd-operator:latest
 114 ./build_image.sh collectd-operator $IMAGE_NAME
 115 ```
 116 #### visualization-operator
 117 ```bash
 118 cd $DA_WORKING_DIR/../microservices
 119
 120 ## Note: The image tag and respository in the Visualization-operator helm charts needs to match the IMAGE_NAME
 121 IMAGE_NAME=dcr.cluster.local:32644/visualization-operator:latest
 122 ./build_image.sh visualization-operator $IMAGE_NAME
 123 ```
 124
 125 ### Install the Operator Package
 126 ```bash
 127 cd $DA_WORKING_DIR/operator
 128 helm install -n operator . -f values.yaml --namespace=operator
 129 ```
 130 Check for the status of the pods in operator namespace. Check if Prometheus operator pods are in Ready state.
 131 ```bash
 132 kubectl get pods -n operator
 133 NAME                                                      READY   STATUS    RESTARTS
 134 m3db-operator-0                                           1/1     Running   0
 135 op-etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr    1/1     Running   0
 136 op-etcd-operator-etcd-operator-79fd99f8b7-fdc7p           1/1     Running   0
 137 op-etcd-operator-etcd-restore-operator-855f7478bf-r7qxp   1/1     Running   0
 138 op-prometheus-operator-operator-5c9b87965b-wjtw5          1/1     Running   1
 139 op-sparkoperator-6cb4db884c-75rcd                         1/1     Running   0
 140 strimzi-cluster-operator-5bffdd7b85-rlrvj                 1/1     Running   0
 141 ```
 142 #### Troubleshooting Operator installation
 143 Sometimes deleting the previously installed Operator package will fail to remove all operator pods. To troubleshoot this ensure these following steps.
 144
 145 1. Make sure that all the other deployments or helm release is deleted (purged). Operator package is a baseline package for the applications, so if the applications are still running while trying to delete the operator package might result in unwarrented state.
 146
 147 2. Delete all the resources and CRDs associated with operator package.
 148 ```bash
 149 #NOTE: Use the same release name and namespace as in installation of operator package in the previous section
 150 cd $DA_WORKING_DIR/operator
 151 helm template -n operator . -f values.yaml --namespace=operator > ../delete_operator.yaml
 152 cd ../
 153 kubectl delete -f delete_operator.yaml
 154 ```
 155 ## Install Collection package
 156 Note: Collectd.conf is avaliable in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory. Any valid collectd.conf can be placed here.
 157 ```bash
 158 Default (For custom collectd skip this section)
 159 =======
 160 cd $DA_WORKING_DIR/collection
 161 helm install -n cp . -f values.yaml --namespace=edge1
 162
 163 Custom Collectd
 164 ===============
 165 1. Build the custom collectd image
 166 2. Set COLLECTD_IMAGE_NAME with appropriate image_repository:tag
 167 3. Push the image to docker registry using the command
 168 4. docker push ${COLLECTD_IMAGE_NAME}
 169 5. Edit the values.yaml and change the image repository and tag using
 170    COLLECTD_IMAGE_NAME appropriately.
 171 6. Place the collectd.conf in
 172    $DA_WORKING_DIR/collection/charts/collectd/resources
 173
 174 7. cd $DA_WORKING_DIR/collection
 175 8. helm install -n cp . -f values.yaml --namespace=edge1
 176 ```
 177
 178 #### Verify Collection package
 179 * Check if all pods are up in edge1 namespace
 180 * Check the prometheus UI using port-forwarding port 9090 (default for prometheus service)
 181 ```
 182 $ kubectl get pods -n edge1
 183 NAME                                      READY   STATUS    RESTARTS   AGE
 184 cp-cadvisor-8rk2b                       1/1     Running   0          15s
 185 cp-cadvisor-nsjr6                       1/1     Running   0          15s
 186 cp-collectd-h5krd                       1/1     Running   0          23s
 187 cp-collectd-jc9m2                       1/1     Running   0          23s
 188 cp-prometheus-node-exporter-blc6p       1/1     Running   0          17s
 189 cp-prometheus-node-exporter-qbvdx       1/1     Running   0          17s
 190 prometheus-cp-prometheus-prometheus-0   4/4     Running   1          33s
 191
 192 $ kubectl get svc -n edge1
 193 NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)
 194 cadvisor                        NodePort    10.43.53.122   <none>        80:30091/TCP
 195 collectd                        ClusterIP   10.43.222.34   <none>        9103/TCP
 196 cp13-prometheus-node-exporter   ClusterIP   10.43.17.242   <none>        9100/TCP
 197 cp13-prometheus-prometheus      NodePort    10.43.26.155   <none>        9090:30090/TCP
 198 prometheus-operated             ClusterIP   None           <none>        9090/TCP
 199 ```
 200 #### Configure Collectd Plugins
 201 1. Using the sample [collectdglobal.yaml](microservices/collectd-operator/examples/collectd/collectdglobal.yaml), Configure the CollectdGlobal CR
 202 2. If there are additional Types.db files to update, Copy the additional types.db files to resources folder.
 203 3. Create a ConfigMap to load the types.db and update the configMap with name of the ConfigMap created.
 204 4. Create and configure the required CollectdPlugin CRs. Use these samples as a reference [cpu_collectdplugin_cr.yaml](microservices/collectd-operator/examples/collectd/cpu_collectdplugin_cr.yaml), [prometheus_collectdplugin_cr.yaml](microservices/collectd-operator/examples/collectd/prometheus_collectdplugin_cr.yaml).
 205 4. Use the same namespace where the collection package was installed.
 206 5. Assuming it is edge1, create the config resources that are applicable. Apply the following commands in the same order.
 207 ```yaml
 208 # Note:
 209 ## 1. Create Configmap is optional and required only if additional types.db file needs to be mounted.
 210 ## 2. Add/Remove --from-file accordingly. Use the correct file name based on the context.
 211 kubectl create configmap typesdb-configmap --from-file ./resource/[FILE_NAME1] --from-file ./resource/[FILE_NAME2]
 212 kubectl create -f edge1 collectdglobal.yaml
 213 kubectl create -f edge1 [PLUGIN_NAME1]_collectdplugin_cr.yaml
 214 kubectl create -f edge1 [PLUGIN_NAME2]_collectdplugin_cr.yaml
 215 kubectl create -f edge1 [PLUGIN_NAME3]_collectdplugin_cr.yaml
 216 ...
 217 ```
 218
 219 #Install visualization package
 220 ```bash
 221 Default (For custom Grafana dashboards skip this section)
 222 =======
 223 cd $DA_WORKING_DIR/visualization
 224 helm install -n viz . -f values.yaml -f grafana-values.yaml
 225
 226 Custom Grafana dashboards
 227 =========================
 228 1. Place the custom dashboard definition into the folder $DA_WORKING_DIR/visualization/charts/grafana/dashboards
 229     Example dashboard definition can be found at $DA_WORKING_DIR/visualization/charts/grafana/dashboards/dashboard1.json
 230 2. Create a configmap.yaml that imports above created dashboard.json file as config and copy that configmap.yaml to $DA_WORKING_DIR/visualization/charts/grafana/templates/
 231     Example configmap can be found at $DA_WORKING_DIR/visualization/charts/grafana/templates/configmap-add-dashboard.yaml
 232 3. Add custom dashboard configuration to values.yaml or an overriding values.yaml.
 233     Example configuration can be found in the "dashboardProviders" section of grafana-values.yaml
 234
 235 4. cd $DA_WORKING_DIR/visualization
 236 5. For a fresh install of visualization package, do "helm install"
 237     e.g., helm install -n viz . -f values.yaml -f grafana-values.yaml
 238    If the custom dashboard is being added to an already running Grafana, do "helm upgrade"
 239     e.g., helm upgrade -n viz . -f values.yaml -f grafana-values.yaml -f ......
 240 ```
 241
 242 #### Verify Visualization package
 243 Check if the visualization pod is up
 244 ```
 245 $ kubectl get pods
 246     NAME                          READY   STATUS    RESTARTS   AGE
 247     viz-grafana-78dcffd75-sxnjv   1/1     Running   0          52m
 248 ```
 249
 250 ### Login to Grafana
 251 ```
 252 1. Get your 'admin' user password by running:
 253     kubectl get secret --namespace default viz-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
 254
 255 2. Get the Grafana URL to visit by running these commands in the same shell:
 256     export POD_NAME=$(kubectl get pods --namespace default -l "app=grafana,release=viz" -o jsonpath="{.items[0].metadata.name}")
 257     kubectl --namespace default port-forward $POD_NAME 3000
 258
 259 3. Visit the URL : http://localhost:3000 and login with the password from step 1 and the username: admin
 260 ```
 261
 262 #### Configure Grafana Datasources
 263 Using the sample [prometheus_grafanadatasource_cr.yaml](microservices/visualization-operator/examples/grafana/prometheus_grafanadatasource_cr.yaml), Configure the GrafanaDataSource CR by running the command below
 264 ```yaml
 265 kubectl create -f [DATASOURCE_NAME1]_grafanadatasource_cr.yaml
 266 kubectl create -f [DATASOURCE_NAME2]_grafanadatasource_cr.yaml
 267 ...
 268 ```
 269
 270 ## Install Minio Model repository
 271 * Prerequisite: Dynamic storage provisioner needs to be enabled. Either rook-ceph ($DA_WORKING_DIR/00-init) or another alternate provisioner needs to be enabled.
 272 ```bash
 273 cd $DA_WORKING_DIR/minio
 274
 275 Edit the values.yaml to set the credentials to access the minio UI.
 276 Default values are
 277 accessKey: "onapdaas"
 278 secretKey: "onapsecretdaas"
 279
 280 helm install -n minio . -f values.yaml --namespace=edge1
 281 ```
 282
 283 ## Install Messaging platform
 284
 285 We have currently support strimzi based kafka operator.
 286 Navigate to ```$DA_WORKING_DIR/deploy/messaging/charts/strimzi-kafka-operator``` directory.
 287 Use the below command :
 288 ```
 289 helm install . -f values.yaml  --name sko --namespace=test
 290 ```
 291
 292 NOTE: Make changes in the values.yaml if required.
 293
 294 Once the strimzi operator ready, you shall get a pod like :
 295
 296 ```
 297 strimzi-cluster-operator-5cf7648b8c-zgxv7       1/1     Running   0          53m
 298 ```
 299
 300 Once this done, install the kafka package like any other helm charts you have.
 301 Navigate to dir : ```$DA_WORKING_DIRdeploy/messaging``` and use command:
 302 ```
 303 helm install --name kafka-cluster charts/kafka/
 304 ```
 305
 306 Once this done, you should have the following pods up and running.
 307
 308 ```
 309 kafka-cluster-entity-operator-b6557fc6c-hlnkm   3/3     Running   0          47m
 310 kafka-cluster-kafka-0                           2/2     Running   0          48m
 311 kafka-cluster-kafka-1                           2/2     Running   0          48m
 312 kafka-cluster-kafka-2                           2/2     Running   0          48m
 313 kafka-cluster-zookeeper-0                       2/2     Running   0          49m
 314 kafka-cluster-zookeeper-1                       2/2     Running   0          49m
 315 kafka-cluster-zookeeper-2                       2/2     Running   0          49m
 316 ```
 317
 318 You should have the following services when do a ```kubectl get svc```
 319
 320 ```
 321 kafka-cluster-kafka-bootstrap    ClusterIP   10.XX.YY.ZZ   <none>        9091/TCP,9092/TCP,9093/TCP   53m
 322 kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP,9093/TCP   53m
 323 kafka-cluster-zookeeper-client   ClusterIP   10.XX.YY.ZZ   <none>        2181/TCP                     55m
 324 kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP   55m
 325 ```
 326 #### Testing messaging
 327
 328 You can test your kafka brokers by creating a simple producer and consumer.
 329
 330 Producer :
 331 ```
 332 kubectl run kafka-producer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka-cluster-kafka-bootstrap:9092 --topic my-topic
 333  ```
 334  Consumer :
 335  ```
 336
 337 kubectl run kafka-consumer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
 338 ```
 339
 340 ## Install Training Package
 341
 342 #### Install M3DB (Time series Data lake)
 343 ##### Pre-requisites
 344 1.  kubernetes cluster with atleast 3 nodes
 345 2.  Etcd operator, M3DB operator
 346 3.  Node labelled with zone and region.
 347
 348 ```bash
 349 ## Defult region is us-west1, Default labels are us-west1-a, us-west1-b, us-west1-c
 350 ## If this is changed then isolationGroups in training-core/charts/m3db/values.yaml needs to be updated.
 351 NODES=($(kubectl get nodes --output=jsonpath={.items..metadata.name}))
 352
 353 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/region=us-west1
 354 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/region=us-west1
 355 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/region=us-west1
 356
 357 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/zone=us-west1-a --overwrite=true
 358 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/zone=us-west1-b --overwrite=true
 359 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/zone=us-west1-c --overwrite=true
 360 ```
 361 ```bash
 362 cd $DA_WORKING_DIR/training-core/charts/m3db
 363 helm install -n m3db . -f values.yaml --namespace training
 364 ```
 365 ```
 366 $ kubectl get pods -n training
 367 NAME                   READY   STATUS    RESTARTS   AGE
 368 m3db-cluster-rep0-0    1/1     Running   0          103s
 369 m3db-cluster-rep1-0    1/1     Running   0          83s
 370 m3db-cluster-rep1-0    1/1     Running   0          62s
 371 m3db-etcd-sjhgl4xfgc   1/1     Running   0          83s
 372 m3db-etcd-lfs96hngz6   1/1     Running   0          67s
 373 m3db-etcd-rmgdkkx4bq   1/1     Running   0          51s
 374 ```
 375
 376 ##### Configure remote write from Prometheus to M3DB
 377 ```bash
 378 cd $DA_WORKING_DIR/day2_configs/prometheus/
 379 ```
 380 ```yaml
 381 cat << EOF > add_m3db_remote.yaml
 382 spec:
 383   remoteWrite:
 384   - url: "http://m3coordinator-m3db.training.svc.cluster.local:7201/api/v1/prom/remote/write"
 385     writeRelabelConfigs:
 386       - targetLabel: metrics_storage
 387         replacement: m3db_remote
 388 EOF
 389 ```
 390 ```bash
 391 kubectl patch --namespace=edge1 prometheus cp-prometheus-prometheus -p "$(cat add_m3db_remote.yaml)" --type=merge
 392 ```
 393 Verify the prometheus GUI to see if the m3db remote write is enabled.