vnfs/DAaaS/README.md

   1 # Distributed Analytics Framework
   2
   3
   4 ## Pre-requisites
   5 | Required   | Version |
   6 |------------|---------|
   7 | Kubernetes | 1.12.3+ |
   8 | Docker CE  | 18.09+  |
   9 | Helm       | >=2.12.1 and <=2.13.1 |
  10 ## Download Framework
  11 ```bash
  12 git clone https://github.com/onap/demo.git
  13 DA_WORKING_DIR=$PWD/demo/vnfs/DAaaS/deploy
  14 ```
  15
  16 ## Install Rook-Ceph for Persistent Storage
  17 Note: This is unusual but Flex volume path can be different than the default value. values.yaml has the most common flexvolume path configured. In case of errors related to flexvolume please refer to the https://rook.io/docs/rook/v0.9/flexvolume.html#configuring-the-flexvolume-path to find the appropriate flexvolume-path and set it in values.yaml
  18 ```bash
  19 cd $DA_WORKING_DIR/00-init/rook-ceph
  20 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
  21 ```
  22 Check for the status of the pods in rook-ceph namespace. Once all pods are in Ready state move on to the next section.
  23
  24 ```bash
  25 $ kubectl get pods -n rook-ceph-system
  26 NAME                                 READY   STATUS    RESTARTS   AGE
  27 rook-ceph-agent-9wszf                1/1     Running   0          121s
  28 rook-ceph-agent-xnbt8                1/1     Running   0          121s
  29 rook-ceph-operator-bc77d6d75-ltwww   1/1     Running   0          158s
  30 rook-discover-bvj65                  1/1     Running   0          133s
  31 rook-discover-nbfrp                  1/1     Running   0          133s
  32 ```
  33 ```bash
  34 $ kubectl -n rook-ceph get pod
  35 NAME                                   READY   STATUS      RESTARTS   AGE
  36 rook-ceph-mgr-a-d9dcf5748-5s9ft        1/1     Running     0          77s
  37 rook-ceph-mon-a-7d8f675889-nw5pl       1/1     Running     0          105s
  38 rook-ceph-mon-b-856fdd5cb9-5h2qk       1/1     Running     0          94s
  39 rook-ceph-mon-c-57545897fc-j576h       1/1     Running     0          85s
  40 rook-ceph-osd-0-7cbbbf749f-j8fsd       1/1     Running     0          25s
  41 rook-ceph-osd-1-7f67f9646d-44p7v       1/1     Running     0          25s
  42 rook-ceph-osd-2-6cd4b776ff-v4d68       1/1     Running     0          25s
  43 rook-ceph-osd-prepare-vx2rz            0/2     Completed   0          60s
  44 rook-ceph-tools-5bd5cdb949-j68kk       1/1     Running     0          53s
  45 ```
  46
  47 #### Troubleshooting Rook-Ceph installation
  48
  49 In case your machine had rook previously installed successfully or unsuccessfully
  50 and you are attempting a fresh installation of rook operator, you may face some issues.
  51 Lets help you with that.
  52
  53 * First check if there are some rook CRDs existing :
  54 ```
  55 kubectl get crds | grep rook
  56 ```
  57 If this return results like :
  58 ```
  59 otc@otconap7 /var/lib/rook $  kc get crds | grep rook
  60 cephblockpools.ceph.rook.io         2019-07-19T18:19:05Z
  61 cephclusters.ceph.rook.io           2019-07-19T18:19:05Z
  62 cephfilesystems.ceph.rook.io        2019-07-19T18:19:05Z
  63 cephobjectstores.ceph.rook.io       2019-07-19T18:19:05Z
  64 cephobjectstoreusers.ceph.rook.io   2019-07-19T18:19:05Z
  65 volumes.rook.io                     2019-07-19T18:19:05Z
  66 ```
  67 then you should delete these previously existing rook based CRDs by generating a delete
  68 manifest file by these commands and then deleting those files:
  69 ```
  70 helm template -n rook . -f values.yaml > ~/delete.yaml
  71 kc delete -f ~/delete.yaml
  72 ```
  73
  74 After this, delete the below directory in all the nodes.
  75 ```
  76 sudo rm -rf /var/lib/rook/
  77 ```
  78 Now, again attempt :
  79 ```
  80 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
  81 ```
  82
  83 ## Install Operator package
  84 ### Build docker images
  85 #### collectd-operator
  86 ```bash
  87 cd $DA_WORKING_DIR/../microservices/collectd-operator
  88
  89 ## Note: The image tag and respository in the Collectd-operator helm charts needs to match the IMAGE_NAME
  90 IMAGE_NAME=dcr.cluster.local:32644/collectd-operator:latest
  91 ./build/build_image.sh $IMAGE_NAME
  92 ```
  93 ### Install the Operator Package
  94 ```bash
  95 cd $DA_WORKING_DIR/operator
  96 helm install -n operator . -f values.yaml --namespace=operator
  97 ```
  98 Check for the status of the pods in operator namespace. Check if Prometheus operator pods are in Ready state.
  99 ```bash
 100 kubectl get pods -n operator
 101 NAME                                                      READY   STATUS    RESTARTS
 102 m3db-operator-0                                           1/1     Running   0
 103 op-etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr    1/1     Running   0
 104 op-etcd-operator-etcd-operator-79fd99f8b7-fdc7p           1/1     Running   0
 105 op-etcd-operator-etcd-restore-operator-855f7478bf-r7qxp   1/1     Running   0
 106 op-prometheus-operator-operator-5c9b87965b-wjtw5          1/1     Running   1
 107 op-sparkoperator-6cb4db884c-75rcd                         1/1     Running   0
 108 strimzi-cluster-operator-5bffdd7b85-rlrvj                 1/1     Running   0
 109 ```
 110 #### Troubleshooting Operator installation
 111 Sometimes deleting the previously installed Operator package will fail to remove all operator pods. To troubleshoot this ensure these following steps.
 112
 113 1. Make sure that all the other deployments or helm release is deleted (purged). Operator package is a baseline package for the applications, so if the applications are still running while trying to delete the operator package might result in unwarrented state.
 114
 115 2. Delete all the resources and CRDs associated with operator package.
 116 ```bash
 117 #NOTE: Use the same release name and namespace as in installation of operator package in the previous section
 118 cd $DA_WORKING_DIR/operator
 119 helm template -n operator . -f values.yaml --namespace=operator > ../delete_operator.yaml
 120 cd ../
 121 kubectl delete -f delete_operator.yaml
 122 ```
 123 ## Install Collection package
 124 Note: Collectd.conf is avaliable in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory. Any valid collectd.conf can be placed here.
 125 ```bash
 126 Default (For custom collectd skip this section)
 127 =======
 128 cd $DA_WORKING_DIR/collection
 129 helm install -n cp . -f values.yaml --namespace=edge1
 130
 131 Custom Collectd
 132 ===============
 133 1. Build the custom collectd image
 134 2. Set COLLECTD_IMAGE_NAME with appropriate image_repository:tag
 135 3. Push the image to docker registry using the command
 136 4. docker push ${COLLECTD_IMAGE_NAME}
 137 5. Edit the values.yaml and change the image repository and tag using
 138    COLLECTD_IMAGE_NAME appropriately.
 139 6. Place the collectd.conf in
 140    $DA_WORKING_DIR/collection/charts/collectd/resources
 141
 142 7. cd $DA_WORKING_DIR/collection
 143 8. helm install -n cp . -f values.yaml --namespace=edge1
 144 ```
 145
 146 #### Verify Collection package
 147 * Check if all pods are up in edge1 namespace
 148 * Check the prometheus UI using port-forwarding port 9090 (default for prometheus service)
 149 ```
 150 $ kubectl get pods -n edge1
 151 NAME                                      READY   STATUS    RESTARTS   AGE
 152 cp-cadvisor-8rk2b                       1/1     Running   0          15s
 153 cp-cadvisor-nsjr6                       1/1     Running   0          15s
 154 cp-collectd-h5krd                       1/1     Running   0          23s
 155 cp-collectd-jc9m2                       1/1     Running   0          23s
 156 cp-prometheus-node-exporter-blc6p       1/1     Running   0          17s
 157 cp-prometheus-node-exporter-qbvdx       1/1     Running   0          17s
 158 prometheus-cp-prometheus-prometheus-0   4/4     Running   1          33s
 159
 160 $ kubectl get svc -n edge1
 161 NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)
 162 cadvisor                        NodePort    10.43.53.122   <none>        80:30091/TCP
 163 collectd                        ClusterIP   10.43.222.34   <none>        9103/TCP
 164 cp13-prometheus-node-exporter   ClusterIP   10.43.17.242   <none>        9100/TCP
 165 cp13-prometheus-prometheus      NodePort    10.43.26.155   <none>        9090:30090/TCP
 166 prometheus-operated             ClusterIP   None           <none>        9090/TCP
 167 ```
 168 #### Configure Collectd Plugins
 169 1. Using the sample [collectdglobal.yaml](microservices/collectd-operator/examples/collectd/collectdglobal.yaml), Configure the CollectdGlobal CR
 170 2. If there are additional Types.db files to update, Copy the additional types.db files to resources folder.
 171 3. Create a ConfigMap to load the types.db and update the configMap with name of the ConfigMap created.
 172 4. Create and configure the required CollectdPlugin CRs. Use these samples as a reference [cpu_collectdplugin_cr.yaml](microservices/collectd-operator/examples/collectd/cpu_collectdplugin_cr.yaml), [prometheus_collectdplugin_cr.yaml](microservices/collectd-operator/examples/collectd/prometheus_collectdplugin_cr.yaml).
 173 4. Use the same namespace where the collection package was installed.
 174 5. Assuming it is edge1, create the config resources that are applicable. Apply the following commands in the same order.
 175 ```yaml
 176 # Note:
 177 ## 1. Create Configmap is optional and required only if additional types.db file needs to be mounted.
 178 ## 2. Add/Remove --from-file accordingly. Use the correct file name based on the context.
 179 kubectl create configmap typesdb-configmap --from-file ./resource/[FILE_NAME1] --from-file ./resource/[FILE_NAME2]
 180 kubectl create -f edge1 collectdglobal.yaml
 181 kubectl create -f edge1 [PLUGIN_NAME1]_collectdplugin_cr.yaml
 182 kubectl create -f edge1 [PLUGIN_NAME2]_collectdplugin_cr.yaml
 183 kubectl create -f edge1 [PLUGIN_NAME3]_collectdplugin_cr.yaml
 184 ...
 185 ```
 186
 187 ## Install Minio Model repository
 188 * Prerequisite: Dynamic storage provisioner needs to be enabled. Either rook-ceph ($DA_WORKING_DIR/00-init) or another alternate provisioner needs to be enabled.
 189 ```bash
 190 cd $DA_WORKING_DIR/minio
 191
 192 Edit the values.yaml to set the credentials to access the minio UI.
 193 Default values are
 194 accessKey: "onapdaas"
 195 secretKey: "onapsecretdaas"
 196
 197 helm install -n minio . -f values.yaml --namespace=edge1
 198 ```
 199
 200 ## Install Messaging platform
 201
 202 We have currently support strimzi based kafka operator.
 203 Navigate to ```$DA_WORKING_DIR/deploy/messaging/charts/strimzi-kafka-operator``` directory.
 204 Use the below command :
 205 ```
 206 helm install . -f values.yaml  --name sko --namespace=test
 207 ```
 208
 209 NOTE: Make changes in the values.yaml if required.
 210
 211 Once the strimzi operator ready, you shall get a pod like :
 212
 213 ```
 214 strimzi-cluster-operator-5cf7648b8c-zgxv7       1/1     Running   0          53m
 215 ```
 216
 217 Once this done, install the kafka package like any other helm charts you have.
 218 Navigate to dir : ```$DA_WORKING_DIRdeploy/messaging``` and use command:
 219 ```
 220 helm install --name kafka-cluster charts/kafka/
 221 ```
 222
 223 Once this done, you should have the following pods up and running.
 224
 225 ```
 226 kafka-cluster-entity-operator-b6557fc6c-hlnkm   3/3     Running   0          47m
 227 kafka-cluster-kafka-0                           2/2     Running   0          48m
 228 kafka-cluster-kafka-1                           2/2     Running   0          48m
 229 kafka-cluster-kafka-2                           2/2     Running   0          48m
 230 kafka-cluster-zookeeper-0                       2/2     Running   0          49m
 231 kafka-cluster-zookeeper-1                       2/2     Running   0          49m
 232 kafka-cluster-zookeeper-2                       2/2     Running   0          49m
 233 ```
 234
 235 You should have the following services when do a ```kubectl get svc```
 236
 237 ```
 238 kafka-cluster-kafka-bootstrap    ClusterIP   10.XX.YY.ZZ   <none>        9091/TCP,9092/TCP,9093/TCP   53m
 239 kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP,9093/TCP   53m
 240 kafka-cluster-zookeeper-client   ClusterIP   10.XX.YY.ZZ   <none>        2181/TCP                     55m
 241 kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP   55m
 242 ```
 243 #### Testing messaging
 244
 245 You can test your kafka brokers by creating a simple producer and consumer.
 246
 247 Producer :
 248 ```
 249 kubectl run kafka-producer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka-cluster-kafka-bootstrap:9092 --topic my-topic
 250  ```
 251  Consumer :
 252  ```
 253
 254 kubectl run kafka-consumer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
 255 ```
 256
 257 ## Install Training Package
 258
 259 #### Install M3DB (Time series Data lake)
 260 ##### Pre-requisites
 261 1.  kubernetes cluster with atleast 3 nodes
 262 2.  Etcd operator, M3DB operator
 263 3.  Node labelled with zone and region.
 264
 265 ```bash
 266 ## Defult region is us-west1, Default labels are us-west1-a, us-west1-b, us-west1-c
 267 ## If this is changed then isolationGroups in training-core/charts/m3db/values.yaml needs to be updated.
 268 NODES=($(kubectl get nodes --output=jsonpath={.items..metadata.name}))
 269
 270 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/region=us-west1
 271 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/region=us-west1
 272 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/region=us-west1
 273
 274 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/zone=us-west1-a --overwrite=true
 275 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/zone=us-west1-b --overwrite=true
 276 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/zone=us-west1-c --overwrite=true
 277 ```
 278 ```bash
 279 cd $DA_WORKING_DIR/training-core/charts/m3db
 280 helm install -n m3db . -f values.yaml --namespace training
 281 ```
 282 ```
 283 $ kubectl get pods -n training
 284 NAME                   READY   STATUS    RESTARTS   AGE
 285 m3db-cluster-rep0-0    1/1     Running   0          103s
 286 m3db-cluster-rep1-0    1/1     Running   0          83s
 287 m3db-cluster-rep1-0    1/1     Running   0          62s
 288 m3db-etcd-sjhgl4xfgc   1/1     Running   0          83s
 289 m3db-etcd-lfs96hngz6   1/1     Running   0          67s
 290 m3db-etcd-rmgdkkx4bq   1/1     Running   0          51s
 291 ```
 292
 293 ##### Configure remote write from Prometheus to M3DB
 294 ```bash
 295 cd $DA_WORKING_DIR/day2_configs/prometheus/
 296 ```
 297 ```yaml
 298 cat << EOF > add_m3db_remote.yaml
 299 spec:
 300   remoteWrite:
 301   - url: "http://m3coordinator-m3db.training.svc.cluster.local:7201/api/v1/prom/remote/write"
 302     writeRelabelConfigs:
 303       - targetLabel: metrics_storage
 304         replacement: m3db_remote
 305 EOF
 306 ```
 307 ```bash
 308 kubectl patch --namespace=edge1 prometheus cp-prometheus-prometheus -p "$(cat add_m3db_remote.yaml)" --type=merge
 309 ```
 310 Verify the prometheus GUI to see if the m3db remote write is enabled.