Update README for installing M3DB

[demo.git] / vnfs / DAaaS / README.md
diff --git a/vnfs/DAaaS/README.md b/vnfs/DAaaS/README.md

index 2b3c0d6..defa366 100644 (file)
--- a/vnfs/DAaaS/README.md
+++ b/vnfs/DAaaS/README.md
@@ -1,30 +1,38 @@
  # Distributed Analytics Framework
-## Install
  
-#### Pre-requisites
+
+## Pre-requisites
  | Required   | Version |
  |------------|---------|
  | Kubernetes | 1.12.3+ |
  | Docker CE  | 18.09+  |
-| Helm       | 2.12.1+ |
-#### Download Framework
+| Helm       | >=2.12.1 and <=2.13.1 |
+## Download Framework
  ```bash
  git clone https://github.com/onap/demo.git
  DA_WORKING_DIR=$PWD/demo/vnfs/DAaaS/deploy
-cd $DA_WORKING_DIR
  ```
  
-#### Install Rook-Ceph for Persistent Storage
+## Install Rook-Ceph for Persistent Storage
  Note: This is unusual but Flex volume path can be different than the default value. values.yaml has the most common flexvolume path configured. In case of errors related to flexvolume please refer to the https://rook.io/docs/rook/v0.9/flexvolume.html#configuring-the-flexvolume-path to find the appropriate flexvolume-path and set it in values.yaml
  ```bash
-cd 00-init/rook-ceph
+cd $DA_WORKING_DIR/00-init/rook-ceph
  helm install -n rook . -f values.yaml --namespace=rook-ceph-system
  ```
  Check for the status of the pods in rook-ceph namespace. Once all pods are in Ready state move on to the next section.
+
+```bash
+$ kubectl get pods -n rook-ceph-system
+NAME                                 READY   STATUS    RESTARTS   AGE
+rook-ceph-agent-9wszf                1/1     Running   0          121s
+rook-ceph-agent-xnbt8                1/1     Running   0          121s
+rook-ceph-operator-bc77d6d75-ltwww   1/1     Running   0          158s
+rook-discover-bvj65                  1/1     Running   0          133s
+rook-discover-nbfrp                  1/1     Running   0          133s
+```
  ```bash
  $ kubectl -n rook-ceph get pod
  NAME                                   READY   STATUS      RESTARTS   AGE
-rook-ceph-agent-4zkg8                  1/1     Running     0          140s
  rook-ceph-mgr-a-d9dcf5748-5s9ft        1/1     Running     0          77s
  rook-ceph-mon-a-7d8f675889-nw5pl       1/1     Running     0          105s
  rook-ceph-mon-b-856fdd5cb9-5h2qk       1/1     Running     0          94s
@@ -33,10 +41,46 @@ rook-ceph-osd-0-7cbbbf749f-j8fsd       1/1     Running     0          25s
  rook-ceph-osd-1-7f67f9646d-44p7v       1/1     Running     0          25s
  rook-ceph-osd-2-6cd4b776ff-v4d68       1/1     Running     0          25s
  rook-ceph-osd-prepare-vx2rz            0/2     Completed   0          60s
-rook-discover-dhkb8                    1/1     Running     0          140s
+rook-ceph-tools-5bd5cdb949-j68kk       1/1     Running     0          53s
+```
+
+#### Troubleshooting Rook-Ceph installation
+
+In case your machine had rook previously installed successfully or unsuccessfully
+and you are attempting a fresh installation of rook operator, you may face some issues.
+Lets help you with that.
+
+* First check if there are some rook CRDs existing :
+```
+kubectl get crds | grep rook
+```
+If this return results like :
+```
+otc@otconap7 /var/lib/rook $  kc get crds | grep rook
+cephblockpools.ceph.rook.io         2019-07-19T18:19:05Z
+cephclusters.ceph.rook.io           2019-07-19T18:19:05Z
+cephfilesystems.ceph.rook.io        2019-07-19T18:19:05Z
+cephobjectstores.ceph.rook.io       2019-07-19T18:19:05Z
+cephobjectstoreusers.ceph.rook.io   2019-07-19T18:19:05Z
+volumes.rook.io                     2019-07-19T18:19:05Z
+```
+then you should delete these previously existing rook based CRDs by generating a delete 
+manifest file by these commands and then deleting those files:
+```
+helm template -n rook . -f values.yaml > ~/delete.yaml
+kc delete -f ~/delete.yaml
+```
+
+After this, delete the below directory in all the nodes.
+```
+sudo rm -rf /var/lib/rook/
+```
+Now, again attempt : 
+```
+helm install -n rook . -f values.yaml --namespace=rook-ceph-system
  ```
  
-#### Install Operator package
+## Install Operator package
  ```bash
  cd $DA_WORKING_DIR/operator
  helm install -n operator . -f values.yaml --namespace=operator
@@ -45,15 +89,28 @@ Check for the status of the pods in operator namespace. Check if Prometheus oper
  ```bash
  kubectl get pods -n operator
  NAME                                                      READY   STATUS    RESTARTS
-m3db-operator-0                                           1/1     Running   0       -etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr      1/1     Running   0
+m3db-operator-0                                           1/1     Running   0
+op-etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr    1/1     Running   0
  op-etcd-operator-etcd-operator-79fd99f8b7-fdc7p           1/1     Running   0
  op-etcd-operator-etcd-restore-operator-855f7478bf-r7qxp   1/1     Running   0
  op-prometheus-operator-operator-5c9b87965b-wjtw5          1/1     Running   1
  op-sparkoperator-6cb4db884c-75rcd                         1/1     Running   0
  strimzi-cluster-operator-5bffdd7b85-rlrvj                 1/1     Running   0
  ```
+#### Troubleshooting Operator installation
+Sometimes deleting the previously installed Operator package will fail to remove all operator pods. To troubleshoot this ensure these following steps.
+
+1. Make sure that all the other deployments or helm release is deleted (purged). Operator package is a baseline package for the applications, so if the applications are still running while trying to delete the operator package might result in unwarrented state. 
  
-#### Install Collection package
+2. Delete all the resources and CRDs associated with operator package.
+```bash
+#NOTE: Use the same release name and namespace as in installation of operator package in the previous section
+cd $DA_WORKING_DIR/operator
+helm template -n operator . -f values.yaml --namespace=operator > ../delete_operator.yaml
+cd ../
+kubectl delete -f delete_operator.yaml
+```
+## Install Collection package
  Note: Collectd.conf is avaliable in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory. Any valid collectd.conf can be placed here.
  ```bash
  Default (For custom collectd skip this section)
@@ -63,23 +120,163 @@ helm install -n cp . -f values.yaml --namespace=edge1
  
  Custom Collectd
  ===============
-Build the image and set the image name with tag to COLLECTD_IMAGE_NAME
-Push the image to docker registry using the command
-docker push dcr.default.svc.local:32000/${COLLECTD_IMAGE_NAME}
-Edit the values.yaml and change the image name as the value of COLLECTD_IMAGE_NAME
-place the collectd.conf in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory
+1. Build the custom collectd image
+2. Set COLLECTD_IMAGE_NAME with appropriate image_repository:tag
+3. Push the image to docker registry using the command
+4. docker push ${COLLECTD_IMAGE_NAME}
+5. Edit the values.yaml and change the image repository and tag using 
+   COLLECTD_IMAGE_NAME appropriately.
+6. Place the collectd.conf in 
+   $DA_WORKING_DIR/collection/charts/collectd/resources/config 
  
-cd $DA_WORKING_DIR/collection
-helm install -n cp . -f values.yaml --namespace=edge1
+7. cd $DA_WORKING_DIR/collection
+8. helm install -n cp . -f values.yaml --namespace=edge1
  ```
  
  #### Verify Collection package
+* Check if all pods are up in edge1 namespace
+* Check the prometheus UI using port-forwarding port 9090 (default for prometheus service)
+```
+$ kubectl get pods -n edge1
+NAME                                      READY   STATUS    RESTARTS   AGE
+cp-cadvisor-8rk2b                       1/1     Running   0          15s
+cp-cadvisor-nsjr6                       1/1     Running   0          15s
+cp-collectd-h5krd                       1/1     Running   0          23s
+cp-collectd-jc9m2                       1/1     Running   0          23s
+cp-prometheus-node-exporter-blc6p       1/1     Running   0          17s
+cp-prometheus-node-exporter-qbvdx       1/1     Running   0          17s
+prometheus-cp-prometheus-prometheus-0   4/4     Running   1          33s
+
+$ kubectl get svc -n edge1
+NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)  
+cadvisor                        NodePort    10.43.53.122   <none>        80:30091/TCP
+collectd                        ClusterIP   10.43.222.34   <none>        9103/TCP
+cp13-prometheus-node-exporter   ClusterIP   10.43.17.242   <none>        9100/TCP
+cp13-prometheus-prometheus      NodePort    10.43.26.155   <none>        9090:30090/TCP
+prometheus-operated             ClusterIP   None           <none>        9090/TCP
+```
+
+## Install Minio Model repository
+* Prerequisite: Dynamic storage provisioner needs to be enabled. Either rook-ceph ($DA_WORKING_DIR/00-init) or another alternate provisioner needs to be enabled.
+```bash
+cd $DA_WORKING_DIR/minio
+
+Edit the values.yaml to set the credentials to access the minio UI.
+Default values are
+accessKey: "onapdaas"
+secretKey: "onapsecretdaas"
+
+helm install -n minio . -f values.yaml --namespace=edge1
+```
+
+## Install Messaging platform
+
+We have currently support strimzi based kafka operator.
+Navigate to ```$DA_WORKING_DIR/deploy/messaging/charts/strimzi-kafka-operator``` directory.
+Use the below command :
+```
+helm install . -f values.yaml  --name sko --namespace=test
  ```
-TODO
-1. Check if all pods are up uin edge1 namespace
-2. Check the prometheus UI using the port 30090
+
+NOTE: Make changes in the values.yaml if required.
+
+Once the strimzi operator ready, you shall get a pod like :
+
+```
+strimzi-cluster-operator-5cf7648b8c-zgxv7       1/1     Running   0          53m
+```
+
+Once this done, install the kafka package like any other helm charts you have.
+Navigate to dir : ```$DA_WORKING_DIRdeploy/messaging``` and use command:
+```
+helm install --name kafka-cluster charts/kafka/
+```
+
+Once this done, you should have the following pods up and running.
+
+```
+kafka-cluster-entity-operator-b6557fc6c-hlnkm   3/3     Running   0          47m
+kafka-cluster-kafka-0                           2/2     Running   0          48m
+kafka-cluster-kafka-1                           2/2     Running   0          48m
+kafka-cluster-kafka-2                           2/2     Running   0          48m
+kafka-cluster-zookeeper-0                       2/2     Running   0          49m
+kafka-cluster-zookeeper-1                       2/2     Running   0          49m
+kafka-cluster-zookeeper-2                       2/2     Running   0          49m
+```
+
+You should have the following services when do a ```kubectl get svc```
+
+```
+kafka-cluster-kafka-bootstrap    ClusterIP   10.XX.YY.ZZ   <none>        9091/TCP,9092/TCP,9093/TCP   53m
+kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP,9093/TCP   53m
+kafka-cluster-zookeeper-client   ClusterIP   10.XX.YY.ZZ   <none>        2181/TCP                     55m
+kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP   55m
+```
+#### Testing messaging 
+
+You can test your kafka brokers by creating a simple producer and consumer.
+
+Producer : 
  ```
-#### Onboard an Inference Application
+kubectl run kafka-producer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka-cluster-kafka-bootstrap:9092 --topic my-topic
+ ```
+ Consumer :
+ ```
+
+kubectl run kafka-consumer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
+```
+
+## Install Training Package
+
+#### Install M3DB (Time series Data lake)
+##### Pre-requisites
+1.  kubernetes cluster with atleast 3 nodes
+2.  Etcd operator, M3DB operator
+3.  Node labelled with zone and region.
+
+```bash
+## Defult region is us-west1, Default labels are us-west1-a, us-west1-b, us-west1-c
+## If this is changed then isolationGroups in training-core/charts/m3db/values.yaml needs to be updated.
+NODES=($(kubectl get nodes --output=jsonpath={.items..metadata.name}))
+
+kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/region=us-west1
+kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/region=us-west1
+kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/region=us-west1
+
+kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/zone=us-west1-a --overwrite=true
+kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/zone=us-west1-b --overwrite=true
+kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/zone=us-west1-c --overwrite=true
+```
+```bash
+cd $DA_WORKING_DIR/training-core/charts/m3db
+helm install -n m3db . -f values.yaml --namespace training
+```
+```
+$ kubectl get pods -n training
+NAME                   READY   STATUS    RESTARTS   AGE
+m3db-cluster-rep0-0    1/1     Running   0          103s
+m3db-cluster-rep1-0    1/1     Running   0          83s
+m3db-cluster-rep1-0    1/1     Running   0          62s
+m3db-etcd-sjhgl4xfgc   1/1     Running   0          83s
+m3db-etcd-lfs96hngz6   1/1     Running   0          67s
+m3db-etcd-rmgdkkx4bq   1/1     Running   0          51s
+```
+
+##### Configure remote write from Prometheus to M3DB
+```bash
+cd $DA_WORKING_DIR/day2_configs/prometheus/
+```
+```yaml
+cat << EOF > add_m3db_remote.yaml
+spec:
+  remoteWrite:
+  - url: "http://m3coordinator-m3db.training.svc.cluster.local:7201/api/v1/prom/remote/write"
+    writeRelabelConfigs:
+      - targetLabel: metrics_storage
+        replacement: m3db_remote
+EOF
+```
+```bash
+kubectl patch --namespace=edge1 prometheus cp-prometheus-prometheus -p "$(cat add_m3db_remote.yaml)" --type=merge
  ```
-TODO
-```
-\ No newline at end of file
+Verify the prometheus GUI to see if the m3db remote write is enabled.