clean up some sphinx warnings
[dcaegen2.git] / docs / sections / services / ves-hv / healthcheck-and-monitoring.rst
1 .. This work is licensed under a Creative Commons Attribution 4.0 International License.
2 .. http://creativecommons.org/licenses/by/4.0
3
4 .. _healthcheck_and_monitoring:
5
6 Healthcheck and Monitoring
7 ==========================
8
9 Healthcheck
10 -----------
11 Inside HV-VES docker container runs a small HTTP service for healthcheck. Port for healthchecks can be configured
12 at deployment using command line (for details see :ref:`deployment`).
13
14 This service exposes endpoint **GET /health/ready** which returns a **HTTP 200 OK** when HV-VES is healthy
15 and ready for connections. Otherwise it returns a **HTTP 503 Service Unavailable** message with a short reason of unhealthiness.
16
17
18 Monitoring
19 ----------
20 HV-VES collector allows to collect metrics data at runtime. To serve this purpose HV-VES application exposes an endpoint **GET /monitoring/prometheus**
21 which returns a **HTTP 200 OK** message with a specific data in its body. Returned data is in a format readable by Prometheus service.
22 Prometheus endpoint shares a port with healthchecks.
23
24 Metrics provided by HV-VES metrics:
25
26 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
27 |           Name of metric                      |     Unit     |              Description                                                                 |
28 +===============================================+==============+==========================================================================================+
29 | hvves_clients_rejected_cause_total            |  cause/piece | number of rejected clients grouped by cause                                              |
30 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
31 | hvves_clients_rejected_total                  |     piece    | total number of rejected clients                                                         |
32 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
33 | hvves_connections_active                      |     piece    | number of currently active connections                                                   |
34 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
35 | hvves_connections_total                       |     piece    | total number of connections                                                              |
36 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
37 | hvves_data_received_bytes_total               |     bytes    | total number of received bytes                                                           |
38 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
39 | hvves_disconnections_total                    |     piece    | total number of disconnections                                                           |
40 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
41 | hvves_messages_dropped_cause_total            |  cause/piece | number of dropped messages grouped by cause                                              |
42 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
43 | hvves_messages_dropped_total                  |     piece    | total number of dropped messages                                                         |
44 +-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
45 | hvves_messages_latency_seconds_bucket         |     seconds  | latency is a time between         |  cumulative counters for the latency occurance       |
46 +-----------------------------------------------+--------------+ message.header.lastEpochMicrosec  +------------------------------------------------------+
47 | hvves_messages_latency_seconds_count          |     piece    | and time when data has been sent  |  counter for number of latency occurance             |
48 +-----------------------------------------------+--------------+ from HV-VES to Kafka              +------------------------------------------------------+
49 | hvves_messages_latency_seconds_max            |    seconds   |                                   |  maximal observed latency                            |
50 +-----------------------------------------------+--------------+                                   +------------------------------------------------------+
51 | hvves_messages_latency_seconds_sum            |    seconds   |                                   |  sum of latency parameter from each message          |
52 +-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
53 | hvves_messages_processing_time_seconds_bucket |    seconds   | processing time is time meassured |  cumulative counters for processing time occurance   |
54 +-----------------------------------------------+--------------+ between decoding of WTP message   +------------------------------------------------------+
55 | hvves_messages_processing_time_seconds_count  |     piece    | and time when data has been sent  |  counter for number of processing time occurance     |
56 +-----------------------------------------------+--------------+ From HV-VES to Kafka              +------------------------------------------------------+
57 | hvves_messages_processing_time_seconds_max    |    seconds   |                                   |  maximal processing time                             |
58 +-----------------------------------------------+--------------+                                   +------------------------------------------------------+
59 | hvves_messages_processing_time_seconds_sum    |    seconds   |                                   |  sum of processing time from each message            |
60 +-----------------------------------------------+--------------+-----------------------------------+------------------------------------------------------+
61 | hvves_messages_received_payload_bytes_total   |     bytes    | total number of received payload bytes                                                   |
62 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
63 | hvves_messages_received_total                 |     piece    | total number of received messages                                                        |
64 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
65 | hvves_messages_sent_topic_total               |  topic/piece | number of sent messages grouped by topic                                                 |
66 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
67 | hvves_messages_sent_total                     |     piece    | number of sent messages                                                                  |
68 +-----------------------------------------------+--------------+------------------------------------------------------------------------------------------+
69
70 JVM metrics:
71
72 - jvm_buffer_memory_used_bytes
73 - jvm_classes_unloaded_total
74 - jvm_gc_memory_promoted_bytes_total
75 - jvm_buffer_total_capacity_bytes
76 - jvm_threads_live
77 - jvm_classes_loaded
78 - jvm_gc_memory_allocated_bytes_total
79 - jvm_threads_daemon
80 - jvm_buffer_count
81 - jvm_gc_pause_seconds_count
82 - jvm_gc_pause_seconds_sum
83 - jvm_gc_pause_seconds_max
84 - jvm_gc_max_data_size_bytes
85 - jvm_memory_committed_bytes
86 - jvm_gc_live_data_size_bytes
87 - jvm_memory_max_bytes
88 - jvm_memory_used_bytes
89 - jvm_threads_peak
90
91 Sample response for **GET monitoring/prometheus**:
92
93 .. literalinclude:: resources/metrics_sample_response.txt