vnfs/DAaaS/README.md

   1 # Distributed Analytics Framework
   2
   3
   4 ## Pre-requisites
   5 | Required   | Version |
   6 |------------|---------|
   7 | Kubernetes | 1.12.3+ |
   8 | Docker CE  | 18.09+  |
   9 | Helm       | >=2.12.1 and <=2.13.1 |
  10 ## Download Framework
  11 ```bash
  12 git clone https://github.com/onap/demo.git
  13 DA_WORKING_DIR=$PWD/demo/vnfs/DAaaS/deploy
  14 ```
  15
  16 ## Install Rook-Ceph for Persistent Storage
  17 Note: This is unusual but Flex volume path can be different than the default value. values.yaml has the most common flexvolume path configured. In case of errors related to flexvolume please refer to the https://rook.io/docs/rook/v0.9/flexvolume.html#configuring-the-flexvolume-path to find the appropriate flexvolume-path and set it in values.yaml
  18 ```bash
  19 cd $DA_WORKING_DIR/00-init/rook-ceph
  20 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
  21 ```
  22 Check for the status of the pods in rook-ceph namespace. Once all pods are in Ready state move on to the next section.
  23
  24 ```bash
  25 $ kubectl get pods -n rook-ceph-system
  26 NAME                                 READY   STATUS    RESTARTS   AGE
  27 rook-ceph-agent-9wszf                1/1     Running   0          121s
  28 rook-ceph-agent-xnbt8                1/1     Running   0          121s
  29 rook-ceph-operator-bc77d6d75-ltwww   1/1     Running   0          158s
  30 rook-discover-bvj65                  1/1     Running   0          133s
  31 rook-discover-nbfrp                  1/1     Running   0          133s
  32 ```
  33 ```bash
  34 $ kubectl -n rook-ceph get pod
  35 NAME                                   READY   STATUS      RESTARTS   AGE
  36 rook-ceph-mgr-a-d9dcf5748-5s9ft        1/1     Running     0          77s
  37 rook-ceph-mon-a-7d8f675889-nw5pl       1/1     Running     0          105s
  38 rook-ceph-mon-b-856fdd5cb9-5h2qk       1/1     Running     0          94s
  39 rook-ceph-mon-c-57545897fc-j576h       1/1     Running     0          85s
  40 rook-ceph-osd-0-7cbbbf749f-j8fsd       1/1     Running     0          25s
  41 rook-ceph-osd-1-7f67f9646d-44p7v       1/1     Running     0          25s
  42 rook-ceph-osd-2-6cd4b776ff-v4d68       1/1     Running     0          25s
  43 rook-ceph-osd-prepare-vx2rz            0/2     Completed   0          60s
  44 rook-ceph-tools-5bd5cdb949-j68kk       1/1     Running     0          53s
  45 ```
  46
  47 #### Troubleshooting Rook-Ceph installation
  48
  49 In case your machine had rook previously installed successfully or unsuccessfully
  50 and you are attempting a fresh installation of rook operator, you may face some issues.
  51 Lets help you with that.
  52
  53 * First check if there are some rook CRDs existing :
  54 ```
  55 kubectl get crds | grep rook
  56 ```
  57 If this return results like :
  58 ```
  59 otc@otconap7 /var/lib/rook $  kc get crds | grep rook
  60 cephblockpools.ceph.rook.io         2019-07-19T18:19:05Z
  61 cephclusters.ceph.rook.io           2019-07-19T18:19:05Z
  62 cephfilesystems.ceph.rook.io        2019-07-19T18:19:05Z
  63 cephobjectstores.ceph.rook.io       2019-07-19T18:19:05Z
  64 cephobjectstoreusers.ceph.rook.io   2019-07-19T18:19:05Z
  65 volumes.rook.io                     2019-07-19T18:19:05Z
  66 ```
  67 then you should delete these previously existing rook based CRDs by generating a delete
  68 manifest file by these commands and then deleting those files:
  69 ```
  70 helm template -n rook . -f values.yaml > ~/delete.yaml
  71 kc delete -f ~/delete.yaml
  72 ```
  73
  74 After this, delete the below directory in all the nodes.
  75 ```
  76 sudo rm -rf /var/lib/rook/
  77 ```
  78 Now, again attempt :
  79 ```
  80 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
  81 ```
  82
  83 ## Install Operator package
  84 ```bash
  85 cd $DA_WORKING_DIR/operator
  86 helm install -n operator . -f values.yaml --namespace=operator
  87 ```
  88 Check for the status of the pods in operator namespace. Check if Prometheus operator pods are in Ready state.
  89 ```bash
  90 kubectl get pods -n operator
  91 NAME                                                      READY   STATUS    RESTARTS
  92 m3db-operator-0                                           1/1     Running   0
  93 op-etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr    1/1     Running   0
  94 op-etcd-operator-etcd-operator-79fd99f8b7-fdc7p           1/1     Running   0
  95 op-etcd-operator-etcd-restore-operator-855f7478bf-r7qxp   1/1     Running   0
  96 op-prometheus-operator-operator-5c9b87965b-wjtw5          1/1     Running   1
  97 op-sparkoperator-6cb4db884c-75rcd                         1/1     Running   0
  98 strimzi-cluster-operator-5bffdd7b85-rlrvj                 1/1     Running   0
  99 ```
 100 #### Troubleshooting Operator installation
 101 Sometimes deleting the previously installed Operator package will fail to remove all operator pods. To troubleshoot this ensure these following steps.
 102
 103 1. Make sure that all the other deployments or helm release is deleted (purged). Operator package is a baseline package for the applications, so if the applications are still running while trying to delete the operator package might result in unwarrented state.
 104
 105 2. Delete all the resources and CRDs associated with operator package.
 106 ```bash
 107 #NOTE: Use the same release name and namespace as in installation of operator package in the previous section
 108 cd $DA_WORKING_DIR/operator
 109 helm template -n operator . -f values.yaml --namespace=operator > ../delete_operator.yaml
 110 cd ../
 111 kubectl delete -f delete_operator.yaml
 112 ```
 113 ## Install Collection package
 114 Note: Collectd.conf is avaliable in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory. Any valid collectd.conf can be placed here.
 115 ```bash
 116 Default (For custom collectd skip this section)
 117 =======
 118 cd $DA_WORKING_DIR/collection
 119 helm install -n cp . -f values.yaml --namespace=edge1
 120
 121 Custom Collectd
 122 ===============
 123 1. Build the custom collectd image
 124 2. Set COLLECTD_IMAGE_NAME with appropriate image_repository:tag
 125 3. Push the image to docker registry using the command
 126 4. docker push ${COLLECTD_IMAGE_NAME}
 127 5. Edit the values.yaml and change the image repository and tag using
 128    COLLECTD_IMAGE_NAME appropriately.
 129 6. Place the collectd.conf in
 130    $DA_WORKING_DIR/collection/charts/collectd/resources/config
 131
 132 7. cd $DA_WORKING_DIR/collection
 133 8. helm install -n cp . -f values.yaml --namespace=edge1
 134 ```
 135
 136 #### Verify Collection package
 137 * Check if all pods are up in edge1 namespace
 138 * Check the prometheus UI using port-forwarding port 9090 (default for prometheus service)
 139 ```
 140 $ kubectl get pods -n edge1
 141 NAME                                      READY   STATUS    RESTARTS   AGE
 142 cp-cadvisor-8rk2b                       1/1     Running   0          15s
 143 cp-cadvisor-nsjr6                       1/1     Running   0          15s
 144 cp-collectd-h5krd                       1/1     Running   0          23s
 145 cp-collectd-jc9m2                       1/1     Running   0          23s
 146 cp-prometheus-node-exporter-blc6p       1/1     Running   0          17s
 147 cp-prometheus-node-exporter-qbvdx       1/1     Running   0          17s
 148 prometheus-cp-prometheus-prometheus-0   4/4     Running   1          33s
 149
 150 $ kubectl get svc -n edge1
 151 NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)
 152 cadvisor                        NodePort    10.43.53.122   <none>        80:30091/TCP
 153 collectd                        ClusterIP   10.43.222.34   <none>        9103/TCP
 154 cp13-prometheus-node-exporter   ClusterIP   10.43.17.242   <none>        9100/TCP
 155 cp13-prometheus-prometheus      NodePort    10.43.26.155   <none>        9090:30090/TCP
 156 prometheus-operated             ClusterIP   None           <none>        9090/TCP
 157 ```
 158
 159 ## Install Minio Model repository
 160 * Prerequisite: Dynamic storage provisioner needs to be enabled. Either rook-ceph ($DA_WORKING_DIR/00-init) or another alternate provisioner needs to be enabled.
 161 ```bash
 162 cd $DA_WORKING_DIR/minio
 163
 164 Edit the values.yaml to set the credentials to access the minio UI.
 165 Default values are
 166 accessKey: "onapdaas"
 167 secretKey: "onapsecretdaas"
 168
 169 helm install -n minio . -f values.yaml --namespace=edge1
 170 ```
 171
 172 ## Install Messaging platform
 173
 174 We have currently support strimzi based kafka operator.
 175 Navigate to ```$DA_WORKING_DIR/deploy/messaging/charts/strimzi-kafka-operator``` directory.
 176 Use the below command :
 177 ```
 178 helm install . -f values.yaml  --name sko --namespace=test
 179 ```
 180
 181 NOTE: Make changes in the values.yaml if required.
 182
 183 Once the strimzi operator ready, you shall get a pod like :
 184
 185 ```
 186 strimzi-cluster-operator-5cf7648b8c-zgxv7       1/1     Running   0          53m
 187 ```
 188
 189 Once this done, install the kafka package like any other helm charts you have.
 190 Navigate to dir : ```$DA_WORKING_DIRdeploy/messaging``` and use command:
 191 ```
 192 helm install --name kafka-cluster charts/kafka/
 193 ```
 194
 195 Once this done, you should have the following pods up and running.
 196
 197 ```
 198 kafka-cluster-entity-operator-b6557fc6c-hlnkm   3/3     Running   0          47m
 199 kafka-cluster-kafka-0                           2/2     Running   0          48m
 200 kafka-cluster-kafka-1                           2/2     Running   0          48m
 201 kafka-cluster-kafka-2                           2/2     Running   0          48m
 202 kafka-cluster-zookeeper-0                       2/2     Running   0          49m
 203 kafka-cluster-zookeeper-1                       2/2     Running   0          49m
 204 kafka-cluster-zookeeper-2                       2/2     Running   0          49m
 205 ```
 206
 207 You should have the following services when do a ```kubectl get svc```
 208
 209 ```
 210 kafka-cluster-kafka-bootstrap    ClusterIP   10.XX.YY.ZZ   <none>        9091/TCP,9092/TCP,9093/TCP   53m
 211 kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP,9093/TCP   53m
 212 kafka-cluster-zookeeper-client   ClusterIP   10.XX.YY.ZZ   <none>        2181/TCP                     55m
 213 kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP   55m
 214 ```
 215 #### Testing messaging
 216
 217 You can test your kafka brokers by creating a simple producer and consumer.
 218
 219 Producer :
 220 ```
 221 kubectl run kafka-producer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka-cluster-kafka-bootstrap:9092 --topic my-topic
 222  ```
 223  Consumer :
 224  ```
 225
 226 kubectl run kafka-consumer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
 227 ```
 228
 229 ## Install Training Package
 230
 231 #### Install M3DB (Time series Data lake)
 232 ##### Pre-requisites
 233 1.  kubernetes cluster with atleast 3 nodes
 234 2.  Etcd operator, M3DB operator
 235 3.  Node labelled with zone and region.
 236
 237 ```bash
 238 ## Defult region is us-west1, Default labels are us-west1-a, us-west1-b, us-west1-c
 239 ## If this is changed then isolationGroups in training-core/charts/m3db/values.yaml needs to be updated.
 240 NODES=($(kubectl get nodes --output=jsonpath={.items..metadata.name}))
 241
 242 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/region=us-west1
 243 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/region=us-west1
 244 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/region=us-west1
 245
 246 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/zone=us-west1-a --overwrite=true
 247 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/zone=us-west1-b --overwrite=true
 248 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/zone=us-west1-c --overwrite=true
 249 ```
 250 ```bash
 251 cd $DA_WORKING_DIR/training-core/charts/m3db
 252 helm install -n m3db . -f values.yaml --namespace training
 253 ```
 254 ```
 255 $ kubectl get pods -n training
 256 NAME                   READY   STATUS    RESTARTS   AGE
 257 m3db-cluster-rep0-0    1/1     Running   0          103s
 258 m3db-cluster-rep1-0    1/1     Running   0          83s
 259 m3db-cluster-rep1-0    1/1     Running   0          62s
 260 m3db-etcd-sjhgl4xfgc   1/1     Running   0          83s
 261 m3db-etcd-lfs96hngz6   1/1     Running   0          67s
 262 m3db-etcd-rmgdkkx4bq   1/1     Running   0          51s
 263 ```
 264
 265 ##### Configure remote write from Prometheus to M3DB
 266 ```bash
 267 cd $DA_WORKING_DIR/day2_configs/prometheus/
 268 ```
 269 ```yaml
 270 cat << EOF > add_m3db_remote.yaml
 271 spec:
 272   remoteWrite:
 273   - url: "http://m3coordinator-m3db.training.svc.cluster.local:7201/api/v1/prom/remote/write"
 274     writeRelabelConfigs:
 275       - targetLabel: metrics_storage
 276         replacement: m3db_remote
 277 EOF
 278 ```
 279 ```bash
 280 kubectl patch --namespace=edge1 prometheus cp-prometheus-prometheus -p "$(cat add_m3db_remote.yaml)" --type=merge
 281 ```
 282 Verify the prometheus GUI to see if the m3db remote write is enabled.