From: Rajamohan Raj Date: Fri, 19 Jul 2019 22:32:58 +0000 (+0000) Subject: Adding few troubleshooting guidelines for HDFS. X-Git-Tag: 1.5.0~46 X-Git-Url: https://gerrit.onap.org/r/gitweb?p=demo.git;a=commitdiff_plain;h=18909c10d00e4591f931475a1082edb209b66071 Adding few troubleshooting guidelines for HDFS. Issue-ID: ONAPARC-391 Change-Id: Ic4cd3376aa668b46da1890a09aaf3f461a04e254 Signed-off-by: Rajamohan Raj --- diff --git a/vnfs/DAaaS/README.md b/vnfs/DAaaS/README.md index 4b6fcf50..de701fd4 100644 --- a/vnfs/DAaaS/README.md +++ b/vnfs/DAaaS/README.md @@ -44,6 +44,42 @@ rook-ceph-osd-prepare-vx2rz 0/2 Completed 0 60s rook-ceph-tools-5bd5cdb949-j68kk 1/1 Running 0 53s ``` +#### Troubleshooting Rook-Ceph installation + +In case your machine had rook previously installed successfully or unsuccessfully +and you are attempting a fresh installation of rook operator, you may face some issues. +Lets help you with that. + +* First check if there are some rook CRDs existing : +``` +kubectl get crds | grep rook +``` +If this return results like : +``` +otc@otconap7 /var/lib/rook $ kc get crds | grep rook +cephblockpools.ceph.rook.io 2019-07-19T18:19:05Z +cephclusters.ceph.rook.io 2019-07-19T18:19:05Z +cephfilesystems.ceph.rook.io 2019-07-19T18:19:05Z +cephobjectstores.ceph.rook.io 2019-07-19T18:19:05Z +cephobjectstoreusers.ceph.rook.io 2019-07-19T18:19:05Z +volumes.rook.io 2019-07-19T18:19:05Z +``` +then you should delete these previously existing rook based CRDs by generating a delete +manifest file by these commands and then deleting those files: +``` +helm template -n rook . -f values.yaml > ~/delete.yaml +kc delete -f ~/delete.yaml +``` + +After this, delete the below directory in all the nodes. +``` +sudo rm -rf /var/lib/rook/ +``` +Now, again attempt : +``` +helm install -n rook . -f values.yaml --namespace=rook-ceph-system +``` + #### Install Operator package ```bash cd $DA_WORKING_DIR/operator diff --git a/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md b/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md index ca694a19..fcef4fa1 100644 --- a/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md +++ b/vnfs/DAaaS/deploy/training-core/charts/kubernetes-HDFS/README.md @@ -10,3 +10,58 @@ See [charts/README.md](charts/README.md) for how to run the charts. See [tests/README.md](tests/README.md) for how to run integration tests for HDFS on Kubernetes. + + +# Troubleshooting + +In case some pods are in pending state, check by using kubectl describe command. +If describe shows : +``` + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedScheduling 7s (x20 over 66s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 3 times) +``` + +Then make sure you have the storage provisioner up and running. +In our case, its rook that we support. +So, rook should be up and be the default storage proviosner. + +``` +NAME PROVISIONER AGE +rook-ceph-block (default) ceph.rook.io/block 132m +``` + +Delete all the previous unbound PVCs like below : +``` +NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +data-hdfs1-zookeeper-0 Pending 108m +editdir-hdfs1-journalnode-0 Pending 108m +metadatadir-hdfs1-namenode-0 Pending 108m +``` + +``` +kubectl delete pvc/data-hdfs1-zookeeper-0 +kubectl delete pvc/editdir-hdfs1-journalnode-0 +kubectl delete pvc/metadatadir-hdfs1-namenode-0 +``` + +#### If the dataNode restarts with the error: +``` +19/07/19 21:22:55 FATAL datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to hdfs1-namenode-1.hdfs1-namenode.hdfs1.svc.cluster.local/XXX.YY.ZZ.KK:8020. Exiting. +java.io.IOException: All specified directories are failed to load. + at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478) + at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358) + at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323) + at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317) + at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223) + at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802) +``` + +* SOLUTION: Make sure that whatever host path you set for the dataNode is deleted and doesnt exist before you run the hdfs helm chart. +``` + - name: hdfs-data-0 + hostPath: + path: /hdfs-data +``` +In case you are reinstalling the HDFS, delete the host path : /hdfs-data +before you proceed or else the above error shall come. \ No newline at end of file