Collectd-Operator build script and Readme
[demo.git] / vnfs / DAaaS / README.md
1 # Distributed Analytics Framework
2
3
4 ## Pre-requisites
5 | Required   | Version |
6 |------------|---------|
7 | Kubernetes | 1.12.3+ |
8 | Docker CE  | 18.09+  |
9 | Helm       | >=2.12.1 and <=2.13.1 |
10 ## Download Framework
11 ```bash
12 git clone https://github.com/onap/demo.git
13 DA_WORKING_DIR=$PWD/demo/vnfs/DAaaS/deploy
14 ```
15
16 ## Install Rook-Ceph for Persistent Storage
17 Note: This is unusual but Flex volume path can be different than the default value. values.yaml has the most common flexvolume path configured. In case of errors related to flexvolume please refer to the https://rook.io/docs/rook/v0.9/flexvolume.html#configuring-the-flexvolume-path to find the appropriate flexvolume-path and set it in values.yaml
18 ```bash
19 cd $DA_WORKING_DIR/00-init/rook-ceph
20 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
21 ```
22 Check for the status of the pods in rook-ceph namespace. Once all pods are in Ready state move on to the next section.
23
24 ```bash
25 $ kubectl get pods -n rook-ceph-system
26 NAME                                 READY   STATUS    RESTARTS   AGE
27 rook-ceph-agent-9wszf                1/1     Running   0          121s
28 rook-ceph-agent-xnbt8                1/1     Running   0          121s
29 rook-ceph-operator-bc77d6d75-ltwww   1/1     Running   0          158s
30 rook-discover-bvj65                  1/1     Running   0          133s
31 rook-discover-nbfrp                  1/1     Running   0          133s
32 ```
33 ```bash
34 $ kubectl -n rook-ceph get pod
35 NAME                                   READY   STATUS      RESTARTS   AGE
36 rook-ceph-mgr-a-d9dcf5748-5s9ft        1/1     Running     0          77s
37 rook-ceph-mon-a-7d8f675889-nw5pl       1/1     Running     0          105s
38 rook-ceph-mon-b-856fdd5cb9-5h2qk       1/1     Running     0          94s
39 rook-ceph-mon-c-57545897fc-j576h       1/1     Running     0          85s
40 rook-ceph-osd-0-7cbbbf749f-j8fsd       1/1     Running     0          25s
41 rook-ceph-osd-1-7f67f9646d-44p7v       1/1     Running     0          25s
42 rook-ceph-osd-2-6cd4b776ff-v4d68       1/1     Running     0          25s
43 rook-ceph-osd-prepare-vx2rz            0/2     Completed   0          60s
44 rook-ceph-tools-5bd5cdb949-j68kk       1/1     Running     0          53s
45 ```
46
47 #### Troubleshooting Rook-Ceph installation
48
49 In case your machine had rook previously installed successfully or unsuccessfully
50 and you are attempting a fresh installation of rook operator, you may face some issues.
51 Lets help you with that.
52
53 * First check if there are some rook CRDs existing :
54 ```
55 kubectl get crds | grep rook
56 ```
57 If this return results like :
58 ```
59 otc@otconap7 /var/lib/rook $  kc get crds | grep rook
60 cephblockpools.ceph.rook.io         2019-07-19T18:19:05Z
61 cephclusters.ceph.rook.io           2019-07-19T18:19:05Z
62 cephfilesystems.ceph.rook.io        2019-07-19T18:19:05Z
63 cephobjectstores.ceph.rook.io       2019-07-19T18:19:05Z
64 cephobjectstoreusers.ceph.rook.io   2019-07-19T18:19:05Z
65 volumes.rook.io                     2019-07-19T18:19:05Z
66 ```
67 then you should delete these previously existing rook based CRDs by generating a delete 
68 manifest file by these commands and then deleting those files:
69 ```
70 helm template -n rook . -f values.yaml > ~/delete.yaml
71 kc delete -f ~/delete.yaml
72 ```
73
74 After this, delete the below directory in all the nodes.
75 ```
76 sudo rm -rf /var/lib/rook/
77 ```
78 Now, again attempt : 
79 ```
80 helm install -n rook . -f values.yaml --namespace=rook-ceph-system
81 ```
82
83 ## Install Operator package
84 ### Build docker images
85 #### collectd-operator
86 ```bash
87 cd $DA_WORKING_DIR/../microservices/collectd-operator
88
89 ## Note: The image tag and respository in the Collectd-operator helm charts needs to match the IMAGE_NAME
90 IMAGE_NAME=dcr.cluster.local:32644/collectd-operator:latest
91 ./build/build_image.sh $IMAGE_NAME
92 ```
93 ### Install the Operator Package
94 ```bash
95 cd $DA_WORKING_DIR/operator
96 helm install -n operator . -f values.yaml --namespace=operator
97 ```
98 Check for the status of the pods in operator namespace. Check if Prometheus operator pods are in Ready state.
99 ```bash
100 kubectl get pods -n operator
101 NAME                                                      READY   STATUS    RESTARTS
102 m3db-operator-0                                           1/1     Running   0
103 op-etcd-operator-etcd-backup-operator-6cdc577f7d-ltgsr    1/1     Running   0
104 op-etcd-operator-etcd-operator-79fd99f8b7-fdc7p           1/1     Running   0
105 op-etcd-operator-etcd-restore-operator-855f7478bf-r7qxp   1/1     Running   0
106 op-prometheus-operator-operator-5c9b87965b-wjtw5          1/1     Running   1
107 op-sparkoperator-6cb4db884c-75rcd                         1/1     Running   0
108 strimzi-cluster-operator-5bffdd7b85-rlrvj                 1/1     Running   0
109 ```
110 #### Troubleshooting Operator installation
111 Sometimes deleting the previously installed Operator package will fail to remove all operator pods. To troubleshoot this ensure these following steps.
112
113 1. Make sure that all the other deployments or helm release is deleted (purged). Operator package is a baseline package for the applications, so if the applications are still running while trying to delete the operator package might result in unwarrented state. 
114
115 2. Delete all the resources and CRDs associated with operator package.
116 ```bash
117 #NOTE: Use the same release name and namespace as in installation of operator package in the previous section
118 cd $DA_WORKING_DIR/operator
119 helm template -n operator . -f values.yaml --namespace=operator > ../delete_operator.yaml
120 cd ../
121 kubectl delete -f delete_operator.yaml
122 ```
123 ## Install Collection package
124 Note: Collectd.conf is avaliable in $DA_WORKING_DIR/collection/charts/collectd/resources/config directory. Any valid collectd.conf can be placed here.
125 ```bash
126 Default (For custom collectd skip this section)
127 =======
128 cd $DA_WORKING_DIR/collection
129 helm install -n cp . -f values.yaml --namespace=edge1
130
131 Custom Collectd
132 ===============
133 1. Build the custom collectd image
134 2. Set COLLECTD_IMAGE_NAME with appropriate image_repository:tag
135 3. Push the image to docker registry using the command
136 4. docker push ${COLLECTD_IMAGE_NAME}
137 5. Edit the values.yaml and change the image repository and tag using 
138    COLLECTD_IMAGE_NAME appropriately.
139 6. Place the collectd.conf in 
140    $DA_WORKING_DIR/collection/charts/collectd/resources/config 
141
142 7. cd $DA_WORKING_DIR/collection
143 8. helm install -n cp . -f values.yaml --namespace=edge1
144 ```
145
146 #### Verify Collection package
147 * Check if all pods are up in edge1 namespace
148 * Check the prometheus UI using port-forwarding port 9090 (default for prometheus service)
149 ```
150 $ kubectl get pods -n edge1
151 NAME                                      READY   STATUS    RESTARTS   AGE
152 cp-cadvisor-8rk2b                       1/1     Running   0          15s
153 cp-cadvisor-nsjr6                       1/1     Running   0          15s
154 cp-collectd-h5krd                       1/1     Running   0          23s
155 cp-collectd-jc9m2                       1/1     Running   0          23s
156 cp-prometheus-node-exporter-blc6p       1/1     Running   0          17s
157 cp-prometheus-node-exporter-qbvdx       1/1     Running   0          17s
158 prometheus-cp-prometheus-prometheus-0   4/4     Running   1          33s
159
160 $ kubectl get svc -n edge1
161 NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)  
162 cadvisor                        NodePort    10.43.53.122   <none>        80:30091/TCP
163 collectd                        ClusterIP   10.43.222.34   <none>        9103/TCP
164 cp13-prometheus-node-exporter   ClusterIP   10.43.17.242   <none>        9100/TCP
165 cp13-prometheus-prometheus      NodePort    10.43.26.155   <none>        9090:30090/TCP
166 prometheus-operated             ClusterIP   None           <none>        9090/TCP
167 ```
168
169 ## Install Minio Model repository
170 * Prerequisite: Dynamic storage provisioner needs to be enabled. Either rook-ceph ($DA_WORKING_DIR/00-init) or another alternate provisioner needs to be enabled.
171 ```bash
172 cd $DA_WORKING_DIR/minio
173
174 Edit the values.yaml to set the credentials to access the minio UI.
175 Default values are
176 accessKey: "onapdaas"
177 secretKey: "onapsecretdaas"
178
179 helm install -n minio . -f values.yaml --namespace=edge1
180 ```
181
182 ## Install Messaging platform
183
184 We have currently support strimzi based kafka operator.
185 Navigate to ```$DA_WORKING_DIR/deploy/messaging/charts/strimzi-kafka-operator``` directory.
186 Use the below command :
187 ```
188 helm install . -f values.yaml  --name sko --namespace=test
189 ```
190
191 NOTE: Make changes in the values.yaml if required.
192
193 Once the strimzi operator ready, you shall get a pod like :
194
195 ```
196 strimzi-cluster-operator-5cf7648b8c-zgxv7       1/1     Running   0          53m
197 ```
198
199 Once this done, install the kafka package like any other helm charts you have.
200 Navigate to dir : ```$DA_WORKING_DIRdeploy/messaging``` and use command:
201 ```
202 helm install --name kafka-cluster charts/kafka/
203 ```
204
205 Once this done, you should have the following pods up and running.
206
207 ```
208 kafka-cluster-entity-operator-b6557fc6c-hlnkm   3/3     Running   0          47m
209 kafka-cluster-kafka-0                           2/2     Running   0          48m
210 kafka-cluster-kafka-1                           2/2     Running   0          48m
211 kafka-cluster-kafka-2                           2/2     Running   0          48m
212 kafka-cluster-zookeeper-0                       2/2     Running   0          49m
213 kafka-cluster-zookeeper-1                       2/2     Running   0          49m
214 kafka-cluster-zookeeper-2                       2/2     Running   0          49m
215 ```
216
217 You should have the following services when do a ```kubectl get svc```
218
219 ```
220 kafka-cluster-kafka-bootstrap    ClusterIP   10.XX.YY.ZZ   <none>        9091/TCP,9092/TCP,9093/TCP   53m
221 kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP,9093/TCP   53m
222 kafka-cluster-zookeeper-client   ClusterIP   10.XX.YY.ZZ   <none>        2181/TCP                     55m
223 kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP   55m
224 ```
225 #### Testing messaging 
226
227 You can test your kafka brokers by creating a simple producer and consumer.
228
229 Producer : 
230 ```
231 kubectl run kafka-producer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list kafka-cluster-kafka-bootstrap:9092 --topic my-topic
232  ```
233  Consumer :
234  ```
235
236 kubectl run kafka-consumer -ti --image=strimzi/kafka:0.12.2-kafka-2.2.1 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
237 ```
238
239 ## Install Training Package
240
241 #### Install M3DB (Time series Data lake)
242 ##### Pre-requisites
243 1.  kubernetes cluster with atleast 3 nodes
244 2.  Etcd operator, M3DB operator
245 3.  Node labelled with zone and region.
246
247 ```bash
248 ## Defult region is us-west1, Default labels are us-west1-a, us-west1-b, us-west1-c
249 ## If this is changed then isolationGroups in training-core/charts/m3db/values.yaml needs to be updated.
250 NODES=($(kubectl get nodes --output=jsonpath={.items..metadata.name}))
251
252 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/region=us-west1
253 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/region=us-west1
254 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/region=us-west1
255
256 kubectl label node/${NODES[0]} failure-domain.beta.kubernetes.io/zone=us-west1-a --overwrite=true
257 kubectl label node/${NODES[1]} failure-domain.beta.kubernetes.io/zone=us-west1-b --overwrite=true
258 kubectl label node/${NODES[2]} failure-domain.beta.kubernetes.io/zone=us-west1-c --overwrite=true
259 ```
260 ```bash
261 cd $DA_WORKING_DIR/training-core/charts/m3db
262 helm install -n m3db . -f values.yaml --namespace training
263 ```
264 ```
265 $ kubectl get pods -n training
266 NAME                   READY   STATUS    RESTARTS   AGE
267 m3db-cluster-rep0-0    1/1     Running   0          103s
268 m3db-cluster-rep1-0    1/1     Running   0          83s
269 m3db-cluster-rep1-0    1/1     Running   0          62s
270 m3db-etcd-sjhgl4xfgc   1/1     Running   0          83s
271 m3db-etcd-lfs96hngz6   1/1     Running   0          67s
272 m3db-etcd-rmgdkkx4bq   1/1     Running   0          51s
273 ```
274
275 ##### Configure remote write from Prometheus to M3DB
276 ```bash
277 cd $DA_WORKING_DIR/day2_configs/prometheus/
278 ```
279 ```yaml
280 cat << EOF > add_m3db_remote.yaml
281 spec:
282   remoteWrite:
283   - url: "http://m3coordinator-m3db.training.svc.cluster.local:7201/api/v1/prom/remote/write"
284     writeRelabelConfigs:
285       - targetLabel: metrics_storage
286         replacement: m3db_remote
287 EOF
288 ```
289 ```bash
290 kubectl patch --namespace=edge1 prometheus cp-prometheus-prometheus -p "$(cat add_m3db_remote.yaml)" --type=merge
291 ```
292 Verify the prometheus GUI to see if the m3db remote write is enabled.