docs/specs/parallelism_improvement.rst

   1 ..
   2  This work is licensed under a Creative Commons Attribution 4.0
   3  International License.
   4
   5 :orphan:
   6
   7 ===============================================
   8 Parallelism improvement of Multi Cloud Services
   9 ===============================================
  10
  11
  12 Problem Description
  13 ===================
  14
  15 Multi-Cloud runs Django by using Django's built-in webserver currently.
  16 According to Django Document[Django_Document]_, this mode should not be used
  17 in production. This mode has not gone through security audits or performance
  18 tests, and should only be used in development. From test on local computer,
  19 this mode can only handle ONE API request at one time. This can not meet the
  20 performance requirement.
  21
  22 .. [Django_Document] https://docs.djangoproject.com/en/dev/ref/django-admin/#runserver
  23
  24 Although security and scalability might be improved as the side effect of
  25 resolving the performance issue, this spec will only focus on how to improve
  26 the parallelism(performance) of current MultiCloud API framework.
  27
  28 Possible Solutions
  29 ==================
  30
  31 Solution 1
  32 ----------
  33
  34 Django is a mature framework. And it has its own way to improve parallelism.
  35 Instead of running Django's build-in webserver, Django APP can be deployed in
  36 some dedicated web server. Django’s primary deployment platform is
  37 WSGI[django_deploy]_,
  38 the Python standard for web servers and applications.
  39
  40 .. [django_deploy] https://docs.djangoproject.com/en/2.0/howto/deployment/wsgi/
  41
  42
  43 But on the other side, Danjgo is very huge. And Django is a black box if one
  44 doesn't have good knowledge of it. Adding feature based on Django may be
  45 time-consuming. For example, the unit test[unit_test]_ of Multi-Cloud can't use
  46 regular python test library because of Django. The unit test has to base on
  47 Django's test framework. When we want to improve the parallelism of Multi-Cloud
  48 services, we need to find out how Django can implement it, instead of using
  49 some common method.
  50
  51 .. [unit_test] https://gerrit.onap.org/r/#/c/8909/
  52
  53 Besides, Django's code pattern is too much like web code. And, most famous use
  54 cases of Django are web UI. Current code of Multi-Cloud puts many logic in
  55 files named `views.py`, but actually there is no view to expose. It is
  56 confusing.
  57
  58 The benefit of this solution is that most current code needs no change.
  59
  60 Solution 2
  61 ----------
  62
  63 Given the fact that Django has shortcomings to move on, this solution propose
  64 to use a alternative framework. Eventlet[Eventlet]_ with Pecan[Pecan]_ will be
  65 the idea web framework in this case, because it is lightweight, lean and widely
  66 used.
  67
  68 .. [Eventlet] http://eventlet.net/doc/modules/wsgi.html
  69
  70 .. [Pecan] https://pecan.readthedocs.io/en/latest/
  71
  72 For example, most OpenStack projects use such framework. This framework is so
  73 thin that it can provide flexibility for future architecture design.
  74
  75 However, it needs to change existing code of API exposing.
  76
  77
  78 Performance Test Comparison
  79 ===========================
  80
  81 Test Environment
  82 ----------------
  83
  84 Apache Benchmark is used as test tool. It is shipped with Ubuntu, if you
  85 don’t find it, just run “sudo apt install -y apache2-utils”
  86
  87 2 Virtual Machine with Ubuntu1604. Virtual Machines are hosted in a multi-core
  88 hardware server. One VM is for Apache Benchmark. This VM is 1 CPU core, 8G mem.
  89 The other VM is for Multicloud. The VM is 4 CPU core, 6G mem.
  90
  91 Test Command
  92 ~~~~~~~~~~~~
  93
  94 `ab  -n <num of total requests> -c <concurrency level> http://<IP:port>/api/multicloud/v0/vim_types`
  95
  96 Test result
  97 -----------
  98
  99 It should be noted that data may vary in different test run, but overall result
 100 is similar as below.
 101
 102 100 requests, concurrency level 1
 103 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 104
 105 Command:  `ab  -n 100 -c 1 http://<IP:port>/api/multicloud/v0/vim_types`
 106 Result::
 107
 108   Django runserver: total takes 0.512 seconds, all requests success
 109   Django+uwsgi: totally takes 0.671 seconds, all requests success.
 110   Pecan+eventlet:  totally takes 0.149 seconds, all requests success.
 111
 112 10000 requests, concurrency level 100
 113 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 114
 115 Command:  `ab  -n 10000 -c 100 http://<IP:port>/api/multicloud/v0/vim_types`
 116 Result::
 117
 118   Django runserver: total takes 85.326 seconds, all requests success
 119   Django+uwsgi: totally takes 3.808 seconds, all requests success.
 120   Pecan+eventlet:  totally takes 3.181 seconds, all requests success.
 121
 122 100000 requests, concurrency level 1000
 123 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 124
 125 Command:  `ab  -n 100000 -c 1000 http://<IP:port>/api/multicloud/v0/vim_types`
 126 Result::
 127
 128   Django runserver: Apache Benchmark quit because it reports timeout after
 129   running a random portion of all requests.
 130   Django+uwsgi: totally takes 37.316 seconds, about 32% requests fail. I see
 131   some error says that tcp socket open too many.
 132   Pecan+eventlet:  totally takes 35.315 seconds, all requests success.
 133
 134 Proposed Change
 135 ===============
 136
 137 Given the test result above, this spec proposes to use solution 2. Based on
 138 the consideration of Elastic API exposure[jira_workitem]_, Multi-Cloud will
 139 provide a new way to expose its API. That is to say, existing code of API
 140 exposing needs rewrite in [jira_workitem]_. So the disadvantage of solution
 141 2 doesn't exist.
 142
 143 .. [jira_workitem] https://jira.onap.org/browse/MULTICLOUD-152
 144
 145 To define a clear scope of this spec, VoLTE is the use case that will be used
 146 to perform test to this spec. All functionality that VoLTE needed should be
 147 implemented in this spec and [jira_workitem]_.
 148
 149 Backward compatibility
 150 ----------------------
 151
 152 This spec will NOT change current API. This spec will NOT replace the current
 153 API framework in R2, nor will switch to new API framework in R2. Instead,
 154 this spec will provide a configuration option, named `web_framework`,  to make
 155 sure use case and functionalities not be broken. Default value of the
 156 configuration will BE `django`, which will still run current Django API
 157 framework. An alternative value is `pecan`, which will run the API framework
 158 proposed in this spec. So users don't care about the change won't be
 159 affected.
 160
 161 WSGI Server
 162 -----------
 163
 164 No matter what API framework will be used, a WSGI Server needs to be provided.
 165 This spec will use Eventlet WSGI server. API framework will be run as an
 166 application in WSGI server.
 167
 168 Multi processes framework
 169 -------------------------
 170
 171 This spec proposes to run Multi-Cloud API server in multiple processes mode.
 172 Multi-process can provide parallel API handlers. So, when multiple API
 173 requests come to Multi-Cloud, they can be handled simultaneously. On the other
 174 hand, different processes can effectively isolate different API request. So
 175 that, one API request will not affect another.
 176
 177 Managing multiple processes could be overwhelming difficult and sometimes
 178 dangerous. Some mature library could be used to reduce related work here, for
 179 example oslo.service[oslo_service]_. Since oslo is used by all OpenStack
 180 projects for many releases, and oslo project is actively updated, it can be
 181 seen as a stable library.
 182
 183 .. [oslo_service] https://github.com/openstack/oslo.service
 184
 185 Number of processes
 186 ~~~~~~~~~~~~~~~~~~~
 187
 188 To best utilize multi-core CPU, the number of processes will be set to the
 189 number of CPU cores by default.
 190
 191 Shared socket file
 192 ~~~~~~~~~~~~~~~~~~
 193
 194 To make multiple processes work together and provide a unified port number,
 195 multiple processes need to share a socket file. To achieve this, a bootstrap
 196 process will be started and will initialize the socket file. Other processes
 197 can be forked from this bootstrap process.
 198
 199 Work Items
 200 ==========
 201
 202 #. Add WSGI server.
 203 #. Run Pecan application in WSGI server.
 204 #. Add multiple processes support.
 205 #. Update deploy script to support new API framework.
 206