Update README for installing M3DB
[demo.git] / vnfs / DAaaS / README.md
1 # Distributed Analytics Framework
2
3
4 ## Pre-requisites
5 | Required   | Version |
6 |------------|---------|
7 | Kubernetes | 1.12.3+ |
8 | Docker CE  | 18.09+  |
9 | Helm       | >=2.12.1 and <=2.13.1 |
10 ## Download Framework
11 ```bash
12 git clone https://github.com/onap/demo.git
13 DA_WORKING_DIR=$PWD/demo/vnfs/DAaaS/deploy
14 ```
15
16 ## Install Rook-Ceph for Persistent Storage
17 Note: This is unusual but Flex volume path can be different than the default value. values.yaml has the most common flexvolume path configured. In case of errors related to flexvolume please refer to the https://rook.io/docs/rook/v0.9/flexvolume.html#configuring-the-flexvolume-path to find the appropriate flexvolume-path and set it in values.yaml
18 ```bash
19 cd $DA_WORKING_DIR/00-init/rook-ceph
20 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
21 ```
22 Check for the status of the pods in rook-ceph namespace. Once all pods are in Ready state move on to the next section.
23
24 ```bash
25 $ kubectl get pods -n rook-ceph-system
26 NAME                                 READY   STATUS    RESTARTS   AGE
27 rook-ceph-agent-9wszf                1/1     Running   0          121s
28 rook-ceph-agent-xnbt8                1/1     Running   0          121s
29 rook-ceph-operator-bc77d6d75-ltwww   1/1     Running   0          158s
30 rook-discover-bvj65                  1/1     Running   0          133s
31 rook-discover-nbfrp                  1/1     Running   0          133s
32 ```
33 ```bash
34 $ kubectl -n rook-ceph get pod
35 NAME                                   READY   STATUS      RESTARTS   AGE
36 rook-ceph-mgr-a-d9dcf5748-5s9ft        1/1     Running     0          77s
37 rook-ceph-mon-a-7d8f675889-nw5pl       1/1     Running     0          105s
38 rook-ceph-mon-b-856fdd5cb9-5h2qk       1/1     Running     0          94s
39 rook-ceph-mon-c-57545897fc-j576h       1/1     Running     0          85s
40 rook-ceph-osd-0-7cbbbf749f-j8fsd       1/1     Running     0          25s
41 rook-ceph-osd-1-7f67f9646d-44p7v       1/1     Running     0          25s
42 rook-ceph-osd-2-6cd4b776ff-v4d68       1/1     Running     0          25s
43 rook-ceph-osd-prepare-vx2rz            0/2     Completed   0          60s
44 rook-ceph-tools-5bd5cdb949-j68kk       1/1     Running     0          53s
45 ```
46
47 #### Troubleshooting Rook-Ceph installation
48
49 In case your machine had rook previously installed successfully or unsuccessfully
50 and you are attempting a fresh installation of rook operator, you may face some issues.
51 Lets help you with that.
52
53 * First check if there are some rook CRDs existing :
54 ```
55 kubectl get crds | grep rook
56 ```
57 If this return results like :
58 ```
59 otc@otconap7 /var/lib/rook $  kc get crds | grep rook
60 cephblockpools.ceph.rook.io         2019-07-19T18:19:05Z
61 cephclusters.ceph.rook.io           2019-07-19T18:19:05Z
62 cephfilesystems.ceph.rook.io        2019-07-19T18:19:05Z
63 cephobjectstores.ceph.rook.io       2019-07-19T18:19:05Z
64 cephobjectstoreusers.ceph.rook.io   2019-07-19T18:19:05Z
65 volumes.rook.io                     2019-07-19T18:19:05Z
66 ```
67 then you should delete these previously existing rook based CRDs by generating a delete 
68 manifest file by these commands and then deleting those files:
69 ```
70 helm template -n rook . -f values.yaml > ~/delete.yaml
71 kc delete -f ~/delete.yaml
72 ```
73
74 After this, delete the below directory in all the nodes.
75 ```
76 sudo rm -rf /var/lib/rook/
77 ```
78 Now, again attempt : 
79 ```
80 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
81 ```
82
83 ## Install Operator package
84 ```bash
85 cd $DA_WORKING_DIR/operator
86 helm install -n operator . -f values.yaml --namespace=operator
87 ```
88 Check for the status of the pods in operator namespace. Check if Prometheus operator pods are in Ready state.
89 ```bash
90 kubectl get pods -n operator
91 NAME                                                      READY   STATUS    RESTARTS
92 m3db-operator-0                                           1/1     Running   0
93 op-etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr    1/1     Running   0
94 op-etcd-operator-etcd-operator-79fd99f8b7-fdc7p           1/1     Running   0
95 op-etcd-operator-etcd-restore-operator-855f7478bf-r7qxp   1/1     Running   0
96 op-prometheus-operator-operator-5c9b87965b-wjtw5          1/1     Running   1
97 op-sparkoperator-6cb4db884c-75rcd                         1/1     Running   0
98 strimzi-cluster-operator-5bffdd7b85-rlrvj                 1/1     Running   0
99 ```
100 #### Troubleshooting Operator installation
101 Sometimes deleting the previously installed Operator package will fail to remove all operator pods. To troubleshoot this ensure these following steps.
102
103 1. Make sure that all the other deployments or helm release is deleted (purged). Operator package is a baseline package for the applications, so if the applications are still running while trying to delete the operator package might result in unwarrented state. 
104
105 2. Delete all the resources and CRDs associated with operator package.
106 ```bash
107 #NOTE: Use the same release name and namespace as in installation of operator package in the previous section
108 cd $DA_WORKING_DIR/operator
109 helm template -n operator . -f values.yaml --namespace=operator > ../delete_operator.yaml
110 cd ../
111 kubectl delete -f delete_operator.yaml
112 ```
113 ## Install Collection package
114 Note: Collectd.conf is avaliable in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory. Any valid collectd.conf can be placed here.
115 ```bash
116 Default (For custom collectd skip this section)
117 =======
118 cd $DA_WORKING_DIR/collection
119 helm install -n cp . -f values.yaml --namespace=edge1
120
121 Custom Collectd
122 ===============
123 1. Build the custom collectd image
124 2. Set COLLECTD_IMAGE_NAME with appropriate image_repository:tag
125 3. Push the image to docker registry using the command
126 4. docker push ${COLLECTD_IMAGE_NAME}
127 5. Edit the values.yaml and change the image repository and tag using 
128    COLLECTD_IMAGE_NAME appropriately.
129 6. Place the collectd.conf in 
130    $DA_WORKING_DIR/collection/charts/collectd/resources/config 
131
132 7. cd $DA_WORKING_DIR/collection
133 8. helm install -n cp . -f values.yaml --namespace=edge1
134 ```
135
136 #### Verify Collection package
137 * Check if all pods are up in edge1 namespace
138 * Check the prometheus UI using port-forwarding port 9090 (default for prometheus service)
139 ```
140 $ kubectl get pods -n edge1
141 NAME                                      READY   STATUS    RESTARTS   AGE
142 cp-cadvisor-8rk2b                       1/1     Running   0          15s
143 cp-cadvisor-nsjr6                       1/1     Running   0          15s
144 cp-collectd-h5krd                       1/1     Running   0          23s
145 cp-collectd-jc9m2                       1/1     Running   0          23s
146 cp-prometheus-node-exporter-blc6p       1/1     Running   0          17s
147 cp-prometheus-node-exporter-qbvdx       1/1     Running   0          17s
148 prometheus-cp-prometheus-prometheus-0   4/4     Running   1          33s
149
150 $ kubectl get svc -n edge1
151 NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)  
152 cadvisor                        NodePort    10.43.53.122   <none>        80:30091/TCP
153 collectd                        ClusterIP   10.43.222.34   <none>        9103/TCP
154 cp13-prometheus-node-exporter   ClusterIP   10.43.17.242   <none>        9100/TCP
155 cp13-prometheus-prometheus      NodePort    10.43.26.155   <none>        9090:30090/TCP
156 prometheus-operated             ClusterIP   None           <none>        9090/TCP
157 ```
158
159 ## Install Minio Model repository
160 * Prerequisite: Dynamic storage provisioner needs to be enabled. Either rook-ceph ($DA_WORKING_DIR/00-init) or another alternate provisioner needs to be enabled.
161 ```bash
162 cd $DA_WORKING_DIR/minio
163
164 Edit the values.yaml to set the credentials to access the minio UI.
165 Default values are
166 accessKey: "onapdaas"
167 secretKey: "onapsecretdaas"
168
169 helm install -n minio . -f values.yaml --namespace=edge1
170 ```
171
172 ## Install Messaging platform
173
174 We have currently support strimzi based kafka operator.
175 Navigate to ```$DA_WORKING_DIR/deploy/messaging/charts/strimzi-kafka-operator``` directory.
176 Use the below command :
177 ```
178 helm install . -f values.yaml  --name sko --namespace=test
179 ```
180
181 NOTE: Make changes in the values.yaml if required.
182
183 Once the strimzi operator ready, you shall get a pod like :
184
185 ```
186 strimzi-cluster-operator-5cf7648b8c-zgxv7       1/1     Running   0          53m
187 ```
188
189 Once this done, install the kafka package like any other helm charts you have.
190 Navigate to dir : ```$DA_WORKING_DIRdeploy/messaging``` and use command:
191 ```
192 helm install --name kafka-cluster charts/kafka/
193 ```
194
195 Once this done, you should have the following pods up and running.
196
197 ```
198 kafka-cluster-entity-operator-b6557fc6c-hlnkm   3/3     Running   0          47m
199 kafka-cluster-kafka-0                           2/2     Running   0          48m
200 kafka-cluster-kafka-1                           2/2     Running   0          48m
201 kafka-cluster-kafka-2                           2/2     Running   0          48m
202 kafka-cluster-zookeeper-0                       2/2     Running   0          49m
203 kafka-cluster-zookeeper-1                       2/2     Running   0          49m
204 kafka-cluster-zookeeper-2                       2/2     Running   0          49m
205 ```
206
207 You should have the following services when do a ```kubectl get svc```
208
209 ```
210 kafka-cluster-kafka-bootstrap    ClusterIP   10.XX.YY.ZZ   <none>        9091/TCP,9092/TCP,9093/TCP   53m
211 kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP,9093/TCP   53m
212 kafka-cluster-zookeeper-client   ClusterIP   10.XX.YY.ZZ   <none>        2181/TCP                     55m
213 kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP   55m
214 ```
215 #### Testing messaging 
216
217 You can test your kafka brokers by creating a simple producer and consumer.
218
219 Producer : 
220 ```
221 kubectl run kafka-producer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka-cluster-kafka-bootstrap:9092 --topic my-topic
222  ```
223  Consumer :
224  ```
225
226 kubectl run kafka-consumer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
227 ```
228
229 ## Install Training Package
230
231 #### Install M3DB (Time series Data lake)
232 ##### Pre-requisites
233 1.  kubernetes cluster with atleast 3 nodes
234 2.  Etcd operator, M3DB operator
235 3.  Node labelled with zone and region.
236
237 ```bash
238 ## Defult region is us-west1, Default labels are us-west1-a, us-west1-b, us-west1-c
239 ## If this is changed then isolationGroups in training-core/charts/m3db/values.yaml needs to be updated.
240 NODES=($(kubectl get nodes --output=jsonpath={.items..metadata.name}))
241
242 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/region=us-west1
243 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/region=us-west1
244 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/region=us-west1
245
246 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/zone=us-west1-a --overwrite=true
247 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/zone=us-west1-b --overwrite=true
248 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/zone=us-west1-c --overwrite=true
249 ```
250 ```bash
251 cd $DA_WORKING_DIR/training-core/charts/m3db
252 helm install -n m3db . -f values.yaml --namespace training
253 ```
254 ```
255 $ kubectl get pods -n training
256 NAME                   READY   STATUS    RESTARTS   AGE
257 m3db-cluster-rep0-0    1/1     Running   0          103s
258 m3db-cluster-rep1-0    1/1     Running   0          83s
259 m3db-cluster-rep1-0    1/1     Running   0          62s
260 m3db-etcd-sjhgl4xfgc   1/1     Running   0          83s
261 m3db-etcd-lfs96hngz6   1/1     Running   0          67s
262 m3db-etcd-rmgdkkx4bq   1/1     Running   0          51s
263 ```
264
265 ##### Configure remote write from Prometheus to M3DB
266 ```bash
267 cd $DA_WORKING_DIR/day2_configs/prometheus/
268 ```
269 ```yaml
270 cat << EOF > add_m3db_remote.yaml
271 spec:
272   remoteWrite:
273   - url: "http://m3coordinator-m3db.training.svc.cluster.local:7201/api/v1/prom/remote/write"
274     writeRelabelConfigs:
275       - targetLabel: metrics_storage
276         replacement: m3db_remote
277 EOF
278 ```
279 ```bash
280 kubectl patch --namespace=edge1 prometheus cp-prometheus-prometheus -p "$(cat add_m3db_remote.yaml)" --type=merge
281 ```
282 Verify the prometheus GUI to see if the m3db remote write is enabled.