Update the README.md documentation.

author gfraboni <gino.fraboni@amdocs.com>

Thu, 11 May 2017 15:36:07 +0000 (11:36 -0400)

committer gfraboni <gino.fraboni@amdocs.com>

Thu, 11 May 2017 15:49:25 +0000 (11:49 -0400)
author gfraboni <gino.fraboni@amdocs.com>
Thu, 11 May 2017 15:36:07 +0000 (11:36 -0400)
committer gfraboni <gino.fraboni@amdocs.com>
Thu, 11 May 2017 15:49:25 +0000 (11:49 -0400)
diff --git a/BULK.md b/BULK.md

new file mode 100644 (file)

index 0000000..534bbd0
--- /dev/null
+++ b/BULK.md
@@ -0,0 +1,142 @@
+# Bulk Operations\r
+\r
+## Overview\r
+\r
+Bulk operations allow the client to bundle a number of actions into a single REST request.\r
+\r
+It is important to note that the individual operations bundled into a bulk request are considered by the Search Service to be completely independent operations.  This has a few consequences:\r
+\r
+* No guarantees are made with respect to the order in which the individual operations will be processed by the document store.\r
+\r
+* There is no implied transactionality between the operations.  The operations may succeed or fail independently of one another, and it is entirely possible to get back a result set indicating a mix of success and failure results for the individual operations.\r
+\r
+## Syntax\r
+\r
+The request payload of a bulk operation must be structured in the following manner (we will flesh out this pseudo-json with a concrete example further down):\r
+\r
+    [\r
+        { <operation> : { <meta-data>, <document>  } },\r
+                            .\r
+                            .\r
+        { <operation> : { <meta-data>, <document>  } }\r
+    ]\r
+    \r
+**Operation**\r
+The following table describes the operations which are supported as part of a _Bulk_ request:\r
+\r
+| Operation | Behaviour                                      | Expected Meta Data     | Expected Payload  |\r
+| --------- | ---------------------------------------------- | ---------------------- | ----------------- |\r
+| create    | Insert a new document into the document store. | document url           | document contents |   \r
+| update    | Update an existing document.                   | document url, etag     | document contents |\r
+| delete    | Remove a document from the document store.     | document url, etag     | none              |  \r
+           \r
+**Meta-Data**\r
+Depending on the operation being requested, certain additional meta-data is required for the _Search Date Service_ to be able to carry out the operation.  These are described in the _operations_ table above.\r
+\r
+**Document**\r
+For those operations which involve creating or updating a _Document_, the contents of the document must be provided.\r
+\r
+_Example - Simple Bulk Request including all supported operations:_\r
+\r
+Request Payload:\r
+\r
+       [\r
+         {\r
+           "create": {\r
+             "metaData": {\r
+               "url": "/indexes/my-index/documents/"\r
+             },\r
+             "document": {\r
+               "field1": "value1",\r
+               "field2": "value2"\r
+             }\r
+           }\r
+         },\r
+         {\r
+           "update": {\r
+             "metaData": {\r
+               "url": "/indexes/my-other-index/documents/3",\r
+               "etag": "5"\r
+             },\r
+             "document": {\r
+               "field1": "some-value"\r
+             }\r
+           }\r
+         },\r
+         {\r
+           "delete": {\r
+             "metaData": {\r
+               "url": "/indexes/my-index/documents/7"\r
+             }\r
+           }\r
+         }\r
+       ]\r
+\r
+Response Payload:\r
+\r
+       { \r
+           "total_operations": 3, \r
+           "total_success": 2, \r
+           "total_fails": 1, \r
+           "results": [\r
+               {\r
+                   "operation": "create", \r
+                   "url": "/services/search-data-service/v1/indexes/my-index/documents/1", \r
+                   "etag": "1", \r
+                   "status-code": "409", \r
+                   "status-message": "[default][1]: document already exists"\r
+               }, \r
+               {\r
+                   "operation": "update", \r
+                   "url": "/services/search-data-service/v1/indexes/my-other-index/documents/3", \r
+                   "etag": 6, \r
+                   "status-code": "200", "status-message": "OK"\r
+               }, \r
+               {\r
+                   "operation": "delete", \r
+                   "url": "/services/search-data-service/v1/indexes/my-index/documents/7", \r
+                   "status-code": "200", "status-message": "OK"\r
+               }\r
+           ]\r
+       }\r
+       \r
+## API\r
+\r
+**Submit A Bulk Operation**\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/bulk/\r
+\r
+**Method** \r
+\r
+    POST\r
+\r
+**URL Params**\r
+\r
+    None\r
+\r
+**Request Header**\r
+\r
+    Accept          = application/json\r
+    X-TransactionId = Unique id set by client (for logging purposes)\r
+    X-FromAppId     = Application identifier (for logging purposes)\r
+    Content-Type    = application/json\r
+    \r
+**Request Payload**\r
+\r
+    Set of bulk operations to be executed (see Syntax Section) \r
+\r
+**Success Response**\r
+\r
+    Code:      207 (Multi-Staltus)\r
+    Header(s): None\r
+    Body:      JSON format result set which includes individual status codes for each requested operation.  \r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    500 - Internal Error\r
+\r
+---\r
diff --git a/CONCEPTS.md b/CONCEPTS.md

new file mode 100644 (file)

index 0000000..8dedf01
--- /dev/null
+++ b/CONCEPTS.md
@@ -0,0 +1,87 @@
+# High Level Concepts\r
+\r
+The _Search Data Service_ is built around a set of fundamental concepts and building blocks, which we will describe at a high level here.\r
+\r
+We will explore in more detail how to express these concepts when interacting with the _Search Data Service_ in more concrete terms in the section on API syntax.\r
+\r
+## Index\r
+An _Index_ can be thought of as a collection of _Documents_, and represents the largest granularity of data grouping in the store.\r
+\r
+Multiple _Indexes_ may be defined within the document store in order to segregate data that is different enough that we don't typically want to consider them within the same contexts.\r
+\r
+For details regarding how to manipulate document indexes, please refer to the [Index API](./INDEXES.md) page.\r
+\r
+## Document\r
+_Documents_ are the things we are putting into our Document Store (go figure!) and represent the smallest independent unit of data with which the document store is concerned.\r
+\r
+Documents are composed of _Fields_ which contain the documents' data.\r
+\r
+### Document Fields\r
+_Fields_ represent the individual bits of data which make up a document and can be thought of as describing the _structure_ of the document.\r
+_Fields_ have a number of attributes associated with them which instruct the _Search Service_ about how to treat the data which they contain.\r
+\r
+\r
+| Field           | Description                                                              | \r
+| :-------------- | :----------------------------------------------------------------------- |\r
+| name            | Identifier for the field                                                 | \r
+| searchable      | Indicates whether or not this field should be indexed.  Defaults to true |\r
+| search_analyzer | Which analyzer should be used for queries.  Defaults to TBD              |\r
+| index_analyzer  | Which analyzer will be used for indexing                                 |\r
+\r
+For details regarding how to manipulate documents, please refer to the [Document API](./DOCUMENTS.md) page.\r
+\r
+## Analysis\r
+Field analysis is the process of a taking a _Document_'s fields and breaking them down into tokens that can be used for indexing and querying purposes.  How a document is analyzed has a direct impact on the sort of _Queries_ that we can expect to perform against that data.\r
+\r
+### Analyzers\r
+An _Analyzer_ can be envisioned as describing a pipeline of operations that a document is run through in the process of persisting it in the document store.  These can be broken down as follows:\r
+\r
+     +--------------------+\r
+     |    DATA STREAM     |\r
+     | (ie: the document) |\r
+     +--------------------+\r
+               |\r
+               V\r
+       \---------------/       \r
+        \  Character  / -------------- Apply character level transformations on the data stream.\r
+         \ Filter(s) /\r
+          \---------/\r
+               |\r
+               V\r
+       +----------------+              \r
+       | SANITIZED DATA |\r
+       |     STREAM     |\r
+       +----------------+\r
+               |\r
+               V\r
+        \-------------/      \r
+         \ Tokenizer / --------------- Break the data stream into tokens.\r
+          \---------/\r
+               |\r
+               V\r
+        +--------------+\r
+        | TOKEN STREAM |\r
+        +--------------+\r
+               |\r
+               V\r
+     \--------------------/      \r
+      \  Token Filter(s) / ----------- Apply transformations at the token level (add, remove, or transform tokens)\r
+       \----------------/\r
+               |\r
+               V\r
+      +-------------------+   \r
+      | TRANSFORMED TOKEN |\r
+      |      STREAM       |\r
+      +-------------------+\r
+\r
+**Character Filter(s)**\r
+The input stream may first be passed through one or more _Character Filters_ which perform transformations in order to clean up or normalize the stream contents (for example, stripping out or transforming HTML tags, converting all characters to lower case, etc)\r
+\r
+**Tokenizer**\r
+The resulting data stream is then tokenized into individual terms.  The choice of tokenizer at this stage determines the rules under which the input stream is split (for example, break the text into terms whenever a whitespace character is encountered).\r
+\r
+**Token Filter(s)**\r
+Each term resulting from the previous stage in the pipeline is then run through any number of _Token Filters_ which may apply various transformations to the terms, such as filtering out certain terms, generating additional terms (for example, adding synonyms for a given term), etc.\r
+\r
+The set of tokens resulting from the analysis pipeline operations are ultimately used to create indexes in the document store that can later be used to retrieve the data in a variety of ways.\r
+\r
diff --git a/DOCUMENTS.md b/DOCUMENTS.md

new file mode 100644 (file)

index 0000000..ed07543
--- /dev/null
+++ b/DOCUMENTS.md
@@ -0,0 +1,267 @@
+# Documents\r
+\r
+## Overview\r
+_Documents_ represent the _things_ that we want to store in the _Search Data Service_ and are themselves, basically, a set of fields containing the data that we want to persist.\r
+\r
+\r
+## Syntax\r
+_Document_ contents are specified as a simple JSON object.  The structure of the _Document_ JSON should match the schema provided to the _Search Data Service_ when the _Index_ was created.\r
+\r
+For a discussion of how to specify the _Document Structure_, refer to [Index API](./INDEXES.md). \r
+\r
+**Example - Simple Document **\r
+\r
+    {\r
+       "FirstName": "Bob",\r
+       "LastName": "Smith",\r
+       "Age": 43\r
+    }\r
+\r
+**Example - Document With Nested Fields **\r
+\r
+    {\r
+        "FirstName": "Sherlock",\r
+        "LastName": "Holmes",\r
+        "Address": {\r
+               "Street": "222B Baker",\r
+               "City": "London",\r
+               "Country": "England"\r
+        }\r
+    }\r
+    \r
+## API\r
+\r
+### Create Document\r
+Persists a _Document_ in an _Index_ in the _Search Data Service_.\r
+\r
+Note, that there are two variants of document creation: with and without supplying an id to associate with the document.\r
+\r
+**Create Document (No Id Specified)**\r
+\r
+If no _Id_ is provided by the client, then a unique identifier will be generated by the _Search Data Service_.\r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/documents/\r
+\r
+**Method** \r
+\r
+    POST\r
+\r
+**URL Params**\r
+\r
+    index - The name of the _Index_ to persist the _Document_ in.\r
+\r
+**Request Payload**\r
+\r
+    Document contents expressed as a JSON object. (see **Syntax**) \r
+\r
+**Success Response**\r
+\r
+    Code:      201\r
+    Header(s): ETag = ETag for the document instance that was just created.\r
+    Body:      URL identifying the document that was just created.  \r
+               Example:\r
+                     {"url": "indexes/myindex/documents/AVgGq2jz4aZeqcwCmlQY"}\r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    500 - Internal Error\r
+\r
+---\r
+\r
+**Create Document (Client Supplied Id)**\r
+\r
+If the client supplies an identifier for its document then that is what will be used for the document id.\r
+\r
+_NOTE: If a document id is supplied then it is the responsibility of the client to ensure uniqueness._  \r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/documents/{id}\r
+\r
+**Method** \r
+\r
+    PUT\r
+\r
+**URL Params**\r
+\r
+    index - The name of the _Index_ to persist the Document in.\r
+    id    - The identifier to associate with this Document.\r
+\r
+**Request Payload**\r
+\r
+    Document contents expressed as a JSON object. (see **Syntax**) \r
+\r
+**Success Response**\r
+\r
+    Code:      201\r
+    Header(s): ETag = ETag for the document instance that was just created.\r
+    Body:      URL identifying the document that was just created.  \r
+               Example:\r
+                     {"url": "indexes/myindex/documents/AVgGq2jz4aZeqcwCmlQY"}\r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    409 - Conflict -- Will occur if a document with that Id already exists\r
+    500 - Internal Error\r
+\r
+---\r
+\r
+\r
+### Retrieve Document\r
+Perform a straight look up of a particular _Document_ by specifying its unique identifier.\r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/documents/{id}\r
+\r
+**Method** \r
+\r
+    GET\r
+\r
+**URL Params**\r
+\r
+    index - The name of the _Index_ to persist the Document in.\r
+    id    - The identifier to associate with this Document.\r
+\r
+**Request Payload**\r
+\r
+    NONE \r
+\r
+**Success Response**\r
+\r
+    Code:      200\r
+    Header(s): ETag = ETag indicating the current version of the document.\r
+    Body:      Document contents expressed as a JSON object.  \r
+               Example:\r
+                     {\r
+                         "url": "indexes/myindex/documents/AVgGq2jz4aZeqcwCmlQY"\r
+                         "content": {\r
+                             "firstName": "Bob",\r
+                             "lastName": "Smith",\r
+                             "age": 43\r
+                         }    \r
+                     }\r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    404 - Not Found\r
+    500 - Internal Error\r
+\r
+---\r
+\r
+### Update Document\r
+Replace the contents of a document which already exists in the _Search Data Service_.\r
+\r
+**Optimistic Locking On Update**\r
+\r
+The _Search Data Service_ employs an optimistic locking mechanism on _Document_ updates which works as follows:\r
+\r
+The ETag response header field is set in the response for each document create or update to indicate the most recent version of the document in the document store.\r
+\r
+When performing a _Document_ update, this value must be supplied in the _If-Match_ field in the request header.  Failure to supply this value, or failure to provide a value which matches the version in the document store will result in a request failure with a 412 (Precondition Failed) error.\r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/documents/{id}\r
+\r
+**Method** \r
+\r
+    PUT\r
+\r
+**URL Params**\r
+\r
+    index - The name of the _Index_ to persist the Document in.\r
+    id    - The identifier to associate with this Document.\r
+\r
+**Request Header**\r
+\r
+    Accept          = application/json\r
+    X-TransactionId = Unique id set by client (for logging purposes)\r
+    X-FromAppId     = Application identifier (for logging purposes)\r
+    Content-Type    = application/json   \r
+    If-Match        = The ETag value for the document to be updated.\r
+\r
+**Request Payload**\r
+\r
+    Document contents expressed as a JSON object. (see Syntax Section) \r
+\r
+**Success Response**\r
+\r
+    Code:      200\r
+    Header(s): ETag = ETag indicating the current version of the document.\r
+    Body:      URL identifying the document that was just created.  \r
+               Example:\r
+                     {"url": "indexes/myindex/documents/AVgGq2jz4aZeqcwCmlQY"}\r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    404 - Not Found\r
+    412 - Precondition Failed -- Supplied ETag does not match the version in the document store.\r
+    500 - Internal Error\r
+\r
+---\r
+\r
+### Delete Document\r
+Remove an existing _Index_ from the _Search Data Service_.  \r
+Note that this results in the removal of all _Documents_ that are stored in the _Index_ at the time that the DELETE operation occurs.\r
+\r
+**Optimistic Locking On Update**\r
+\r
+As for _Document_ updates, the ETag value must be supplied in the _If-Match_ field in the request header.  \r
+\r
+Failure to supply this value, or failure to provide a value which matches the version in the document store will result in a request failure with a 412 (Precondition Failed) error.\r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/documents/{id}\r
+\r
+**Method** \r
+\r
+    DELETE\r
+\r
+**URL Params**\r
+\r
+    index - The name of the _Index_ to persist the Document in.\r
+    id    - The identifier to associate with this Document.\r
+\r
+**Request Header**\r
+\r
+    Accept          = application/json\r
+    X-TransactionId = Unique id set by client (for logging purposes)\r
+    X-FromAppId     = Application identifier (for logging purposes)\r
+    Content-Type    = application/json\r
+    If-Match        = The ETag value for the document to be deleted.\r
+    \r
+**Request Payload**\r
+\r
+    NONE \r
+\r
+**Success Response**\r
+\r
+    Code:      200\r
+    Header(s): None.\r
+    Body:      None.  \r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    404 - Not Found\r
+    412 - Precondition Failed -- Supplied ETag does not match the version in the document store.\r
+    500 - Internal Error\r
+\r
+---\r
diff --git a/INDEXES.md b/INDEXES.md

new file mode 100644 (file)

index 0000000..e28dd5e
--- /dev/null
+++ b/INDEXES.md
@@ -0,0 +1,166 @@
+# Document Indexes\r
+\r
+## Overview\r
+An index can be thought of as a collection of _Documents_, and represents the largest granularity of data grouping in the store.\r
+\r
+The first step in persisting documents via the _Search Data Service_ is to create the _Index_ into which we will put the documents.\r
+\r
+This is where we define the structure of the _Documents_ that we will be storing in our _Index_, including how we want the data in our documents to be analyzed and indexed so that they can be queried for in interesting and useful ways.\r
+\r
+## Syntax\r
+When we create an _Index_ we need to define the structure of the _Documents_ that we will be storing in it.  Specifically, we must enumerate the _Fields_ that make up the _Document_, the type of data we expect to be stored in each _Field_, and how we want that data to be indexed by the back end document store.\r
+\r
+We express this as a JSON structure, enumerating the _Fields_ in our document, where each _Field_ is expressed as a JSON object which conforms to the following schema:\r
+ \r
+    {\r
+       "name":            {"type": "string" },\r
+       "data-type":       {"type": "string" },\r
+       "format":          {"type": "string" },\r
+       "searchable":      {"type": "boolean"},\r
+       "search-analyzer": {"type": "string" },\r
+       "index-analyzer":  {"type": "string" }\r
+    }\r
+    \r
+Where,\r
+\r
+    name            = An arbitrary label to assign to the _Index_\r
+    data-type       = One of:  string, date, long, double, boolean, ip, or nested*\r
+    format          = For 'date' type fields, the date format string to use when persisting the field.\r
+    searchable      = true  - field will be indexed,\r
+                      false - field will not be indexed\r
+    search-analyzer = Default analyzer to use for queries if one is not specified as part of the query\r
+                      One of:  whitespace or ngram.\r
+    index-analyser  = Analyzer to use for this field when indexing documents being persisted to the Index\r
+                      One of:  whitespace or ngram.\r
+                    \r
+\* **Nested** fields:\r
+If the _data-type_ is specified as _nested_, then this indicates that the contents of the field is itself a set of document fields.  In this case, the _Field_ definition should contain an additional entry named _sub-fields_, which is a JSON array containing the definitions of the sub-fields.  \r
+\r
+**Example - A simple document definition which includes a 'date' type field.**\r
+\r
+_Take note of the following:_\r
+* For our 'BirthDate' field, which is a date, we also specify the format string to use when storing the field's contents.\r
+\r
+    {\r
+        "fields": [\r
+               {"name": "FirstName", "data-type": "string"},\r
+               {"name": "LastName", "data-type": "string"},\r
+               {"name": "BirthDate", "data-type": "date", "format": "MMM d y HH:m:s"}\r
+        ]\r
+    }\r
+\r
+\r
+**Example - An example document definition containing nested sub-fields.**\r
+  \r
+_Take note of the following:_\r
+* It is perfectly valid for a nested field to itself contain nested fields\r
+* For the _Tracks.Title_ field, we are specifying that the _whitespace_ analyzer should be applied for both indexing and queries. \r
+\r
+    {\r
+        "fields": [\r
+               {"name": "Album", "data-type": "string"},\r
+               {"name": "Group", "data-type": "string"},\r
+               {"name": "Tracks", "data-type": "nested", "sub-fields": [\r
+                       {"name": "Title", "data-type": "string", "index-analyzer": "whitespace", "search-analyzer": "whitespace"},\r
+                       {"name": "Length", "data-type": "long"}\r
+               ]},\r
+               {"name": "BandMembers", "data-type": "nested", "sub-fields": [\r
+                       {"name": "FirstName", "data-type": "string"},\r
+                       {"name": "LastName", "data-type": "string"},\r
+                       {"name": "Address", "data-type": "nested", "sub-fields": [\r
+                               {"name": "Street", "data-type": "string"},\r
+                               {"name": "City", "data-type": "string"},\r
+                               {"name": "Country", "data-type": "string"}\r
+                       ]}\r
+               ]}\r
+        ]\r
+    }\r
+## API\r
+\r
+### Create Index\r
+Define a new _Index_ in the _Search Data Service_.\r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/\r
+\r
+**Method** \r
+\r
+    PUT\r
+\r
+**URL Params**\r
+\r
+    index - The name to assign to the document index we are creating.\r
+\r
+**Request Header**\r
+\r
+    Accept          = application/json\r
+    X-TransactionId = Unique id set by client (for logging purposes)\r
+    X-FromAppId     = Application identifier (for logging purposes)\r
+    Content-Type    = application/json\r
+    \r
+**Request Payload**\r
+\r
+    JSON format document structure for this index (see Syntax Section)\r
+\r
+**Success Response**\r
+\r
+    Code:      201\r
+    Header(s): None\r
+    Body:      JSON structure containing the URL for the created Index  \r
+               Example:\r
+                     {"url": "indexes/myindex"}\r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    500 - Internal Error\r
+\r
+---\r
+\r
+\r
+### Delete Index\r
+Remove an existing _Index_ from the _Search Data Service_.  \r
+Note that this results in the removal of all _Documents_ that are stored in the _Index_ at the time that the DELETE operation occurs.\r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/\r
+\r
+**Method** \r
+\r
+    DELETE\r
+\r
+**URL Params**\r
+\r
+    index - The name to assign to the document index we are creating.\r
+\r
+**Request Header**\r
+\r
+    Accept          = application/json\r
+    X-TransactionId = Unique id set by client (for logging purposes)\r
+    X-FromAppId     = Application identifier (for logging purposes)\r
+    Content-Type    = application/json\r
+\r
+**Request Payload**\r
+\r
+    None\r
+\r
+**Success Response**\r
+\r
+    Code:      201\r
+    Header(s): None\r
+    Body:      JSON structure containing the URL for the created Index  \r
+               Example:\r
+                     {"url": "indexes/myindex"}\r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    500 - Internal Error\r
+\r
+---\r
diff --git a/README.md b/README.md

index 97831cc..f3f1c73 100644 (file)
--- a/README.md
+++ b/README.md
@@ -1,273 +1,161 @@
-# Search Engine Micro Service
+# Search Data Service Micro Service
  
-The _Search Engine_ micro service exposes APIs via REST which allow clients to interact with the search database back end without requiring direct knowledge of or interaction with the underlying technology.
- 
-## High Level Concepts
-This section establishes some of the terminology and concepts that relate to interacting with the _Search Engine_ service.
-A much more detailed examination of these concepts can be found on the  [Search Engine Design Share](http://d2athenaconf:8090/confluence/display/AAI/AAI-4633%3A+Search+DB+Abstraction%3A+Expose+REST+Interface) Confluence page.
+## Overview
+The _Search Data Service_ acts as an abstraction layer for clients which have a need to interact with data which is most suitably persisted in a searchable document store.  The purpose of imposing an abstraction service between the client and the document store itself is to decouple clients from any direct knowledge of any specific document store technology, allowing the underlying technology to be swapped out without a direct impact to any clients which interact with search data.
+
+Please refer to the following sub-sections for more detailed information:
+
+[High Level Concepts](./CONCEPTS.md) - Discussion of the high level concepts and building blocks of the _Search Data Service_.
+
+[Index API](./INDEXES.md) - Details regarding manipulating document indexes.
+
+[Document API](./DOCUMENTS.md) - Details regarding manipulating documents.
  
-### Documents
-_Documents_ are the _things_ that we want to put into our document store.  At its most basic, a _document_ is a collection of _fields_ which contain the data that we want to be able to store and query.
+[Search API](./SEARCH.md) - Details regarding querying the data set.
  
-_Fields_ are defined as having a name, a type, and optional parameters indicating whether or not the field is intended to be searchable, and if so, how it should be indexed.
+[Bulk API](./BULK.md) - Details regarding submitted bulk operation requests.
  
-### Indexes
-An _index_ is essentially a collection of _documents_.  It is the top-level container into which we will store our documents.  A single data store may have multiple _indexes_ for the purposes of segregating different types of data (most queries are performed across all documents within  *single* instance).
  
----
  ## Getting Started
  
  ### Building The Micro Service
  
-After checking out the project, execute the following Maven command from the project's top level directory:
+After cloning the project, execute the following Maven command from the project's top level directory to build the project:
  
      > mvn clean install
+
+Now, you can build your Docker image:
+
+    > docker build -t openecomp/search-data-service target 
      
-### Running The Micro Service Locally
-To run the microservice in your local environment, execute the following Maven command from the project's top level directory:
+### Deploying The Micro Service 
  
-    > mvn -P runAjsc
+Push the Docker image that you have built to your Docker repository and pull it down to the location that you will be running the search service from.
  
-### Running The Micro Service Within An Eclipse Environment
-It is often extremely useful to be able to run a micro service from within Eclipse in order to set breakpoints and perform general debugging activities.
+Note that the current version of the _Search Data Service_ uses _ElasticSearch_ as its document store back end.  You must therefore deploy an instance of ElasticSearch and make it accessible to the _Search Data Service_.
  
-For a good reference on how to launch any of the D2 micro services from within an Eclipse environment, refer to the following Confluence page: [Running An AJSC Container Within Eclipse](http://d2athenaconf:8090/confluence/pages/viewpage.action?pageId=1840887#DevelopingMicroserviceswithAT&T-RunninganAJSCContainerwithinEclipse)
+**Create the following directories on the host machine:**
  
----
+    /logs
+    /opt/app/search-data-service/appconfig
+    
+You will be mounting these as data volumes when you start the Docker container.
  
-## Public Interfaces
+**Populate these directories as follows:**
  
-### Echo Service
-The _Search Database Abstraction_ micro service supports the standard echo service to allow it to be 'pinged' to verify that the service is up and responding.
+##### Contents of /opt/app/search-data-service/appconfig
  
-The echo service is reachable via the following REST end point:
+The following files must be present in this directory on the host machine:
  
-    http://{host}:9509/services/search-data-service/v1/jaxrsExample/jaxrs-services/echo/{input}
+_analysis-config.json_
+Create this file with exactly the contents below:
  
-### Indexes
-The _Search Engine_ service supports simple creation and deletion of document indexes via the following REST API calls:
  
-##### Create Index
-    Method         : POST
-    URL            : https://<host>:9509/services/search-data-service/v1/search/indexes/<index>/
-    URL Params     : index - The name of the index to be created.
-    Request Payload:
-        A document structure expressed as json.
-        
-    Response Payload:
-        {"url": "< resource location of the index >"
+    [
+        {
+                "name": "whitespace_analyzer",
+                "description": "A standard whitespace analyzer.",
+                "behaviours": [
+                        "Tokenize the text using white space characters as delimeters.",
+                        "Convert all characters to lower case.",
+                        "Convert all alphanumeric and symbolic Unicode characters above the first 127 ASCII characters into their ASCII equivalents."
+                ],
+                "tokenizer": "whitespace",
+                "filters": [
+                        "lowercase",
+                        "asciifolding"
+                ]
+        },
+        {
+                "name": "ngram_analyzer",
+                "description": "An analyzer which performs ngram filtering on the data stream.",
+                "behaviours": [
+                        "Tokenize the text using white space characters as delimeters.",
+                        "Convert all characters to lower case.",
+                        "Convert all alphanumeric and symbolic Unicode characters above the first 127 ASCII characters into their ASCII equivalents.",
+                        "Apply ngram filtering using the following values for minimum and maximum size in codepoints of a single n-gram: minimum = 1, maximum = 2."
+                ],
+                "tokenizer": "whitespace",
+                "filters": [
+                        "lowercase",
+                        "asciifolding",
+                        "ngram_filter"
+                ]
+        }
+    ]
  
-##### Delete Index
-    Method         : DELETE
-    URL            : http://<host>:9509/services/search-data-service/v1/search/indexes/<index>/
-    URL Params     : index - The name of the index to be deleted.
-    Request Payload:
-        None    
-        
-   
-### Documents
- 
-##### Create Document Without Specifying a Document Identifier
-Documents can be created via a POST request with no document identifier specified.  In this case the document store will generate an identifier to associate with the document.
-
-    Method         : POST
-    URL            : https://<host>:9509/services/search-data-service/v1/search/indexes/<index>/documents/
-    URL Params     : index       - The name of the index to create the document in.
-    Request Payload:
-        Document contents expressed as a JSON object containing key/value pairs.
-        
-    Response Payload:
-        { "etag": "string", "url": "string" }
-        
-##### Create or Update Document With a Specified Document Identifier
-Documents can also be created via a PUT request which includes an identifier to associate with the document.  The put endpoint is actually used for both creates and updates, where this is distinguished as follows:
-* If the request header DOES NOT include a value in the If-Match field, then the request is assumed to be a document create.
-* If the request header DOES contain a value in the If-Match field, then the request is assumed to be a document update.
-
-    Method         : PUT
-    URL            : https://<host>:9509/services/search-data-service/v1/search/indexes/<index>/documents<document id>
-    URL Params     : index       - The name of the index to create or update the document in.
-                     document id - The identifier of the document to be created or updated.
-    Request Payload:
-        Document contents expressed as a JSON object containing key/value pairs.
-        
-    Response Payload:
-        { "etag": "string", "url": "string"}
-        
-##### Delete a Document
-
-    Method:        : DELETE
-    URL            : https://<host>:9509/services/search-data-service/v1/search/indexes/<index>/documents<document id>
-    URL Params     : index       - The name of the index to remove the document from.
-                     document id - the identifier of the document to be deleted.
-    Request Payload:
-        None.
-        
-##### Retrieve a Document
-
-    Method:        : GET
-    URL            : https://<host>:9509/services/search-data-service/v1/search/indexes/<index>/documents<document id>
-    URL Params     : index       - The name of the index to retrieve the document from.
-                     document id - the identifier of the document to be retrieved.
-    Request Payload:
-        None.
-        
-
-### Searching the Document Store
-Search statements are passed to the _Search Data Service_ as a JSON object which is structured as follows:
-
-_Filters_
-* A "filter" stanza defines a set of queries to be run in _non-scoring-mode_ to reduce the document set to a smaller subset to be searched.
-* The filter stanza is optional - omitting it implies that the query is _unfiltered_.
-* This stanza is represented as a JSON object with the following structure:
-
-    "filter": {
-                "all": [ { query }, { query },....{ query }],
-                "any": [ { query }, { query },....{ query }]
-    },
-
-Where: 
-* the _all_ list defines a set of queryies such that ALL queries in the list must be satisfied for the document to pass the filter.
-* the _any_ list defines a set of queryies such that ANY single query in the list must be satisfied for the document to pass the filter. 
-
-_Queries_
-The following types of query statements are supported by the _Search Data Service_:
-
-_Term Query_:
-
-A term query attempts to match the literal value of a field, with no advanced parsing or analysis of the query string.  This type of query is most appropriate for structured data like numbers, dates and enums, rather than full text fields.
-
-     // Find documents where the specified field contains the supplied value
-    "match": {
-        "field": "value"
-    }
-  
-    // Find documents where the specified field DOES NOT contain the supplied value
-    "not-match": {
-        "field": "value"
-    }
-    
-_Parsed Query_:
+_filter-config.json:_
  
-Parsed queries apply a query parser to the supplied query string in order to determine the exact query to apply to the specified field.
-The query string is parsed into a series of terms and operators, as described below:
+Create this file with exactly the contents below:
  
-Terms may be any of the following:
-* single words
-* exact phrases, as denoted by enclosing the phrase in open and close quotations.  Example: "this is my exact phrase"
-* regular expressions, as denoted by wrapping the expressing in forward slash ( / ) character.  Example: /joh?n(ath[oa]n)/
+    [
+        {
+                "name": "ngram_filter",
+                "description": "Custom NGram Filter.",
+                "configuration": " \"type\": \"nGram\", \"min_gram\": 1, \"max_gram\": 50, \"token_chars\": [ \"letter\", \"digit\", \"punctuation\", \"symbol\" ]"
+        }
+    ]
+    
+_elastic-search.properties_
  
-The supported operators are as follows:
-* AND - Both terms to the left or right of the operator MUST be present
-* OR  - Either the term to the left or right of the operator MUST be present
-* NOT - The term to the right of the operator MUST NOT be present.
+This file tells the _Search Data Service_ how to communicate with the ElasticSearch data store which it will use for its back end.
+The contents of this file will be determined by your ElasticSearch deployment:
  
-    "parsed-query": {
-        "field": "fieldname",
-        "query-string": "string"
-    }
+    es-cluster-name=<<name of your ElasticSearch cluster>>
+    es-ip-address=<<ip address of your ElasticSearch instance>>
+    ex.http-port=9200
      
-_Range Query_:
-
- Range queries match fields whose term value falls within the specified numeric or date range.
- Supported bounds operators include:
- * gt  - Greater than
- * gte - Greater than or equal to
- * lt  - Less than
- * lte - Less than or equal to
- 
-     "range": {
-        "field": "fieldname",
-        "operator": "value",
-        "operator": "value"
-     }
-        
-##### Examples
-The following snippet illustrates a search statement describing a filtered query which uses examples of all of the supported query types:
+
+
+##### Contents of the /opt/app/search-data-service/app-config/auth Directory
+
+The following files must be present in this directory on the host machine:
+
+_search\_policy.json_
+
+Create a policy file defining the roles and users that will be allowed to access the _Search Data Service_.  This is a JSON format file which will look something like the following example:
  
      {
-        "filter": {
-            "all": [{"range": {"field": "timestamp", "lte": "2016-12-01T00:00:00.558+03:00"}}],
-            "any": [ ]
-        },
-        
-        "queries": [
-            {"match": {"field": "name", "value": "Bob"}},
-            {"parsed-query": {"field": "street-name", "query-string": "Main OR First"}},
-            {"range": {"field": "street-number", "gt": 10, "lt": 50}}
+        "roles": [
+            {
+                "name": "admin",
+                "functions": [
+                    {
+                        "name": "search", "methods": [ { "name": "GET" },{ "name": "DELETE" }, { "name": "PUT" }, { "name": "POST" } ]
+                    }
+                ],
+                "users": [
+                    {
+                        "username": "CN=searchadmin, OU=My Organization Unit, O=, L=Sometown, ST=SomeProvince, C=CA"
+                    }    
+                ]
+            }
          ]
      }
  
-##### REST Endpoint
+_tomcat\_keystore_
  
-    Method:        : POST
-    URL            : https://<host>:9509/services/search-data-service/v1/search/indexes/<index>/query
-    URL Params     : index       - The name of the index to apply the query to.
+Create a keystore with this name containing whatever CA certificates that you want your instance of the _Search Data Service_ to accept for HTTPS traffic.
  
-    Request Payload:
-        {
-            "filter": {
-                "all": [ { query }, { query },....{ query }],
-                "any": [ { query }, { query },....{ query }]
-            },
-            
-            "queries": [
-                { query },
-                    .
-                    .
-                { query }
-            ]
-        }
+**Start the service:**
  
-### Bulk Operations
-Bulk operations allow the client to bundle a number of actions into a single REST request.
-It is important to note that individual operations bundled into a bulk request are considered by the _Search Service_ to be completely independent operations.  This has a few important consequences:
-* No guarantees are made with respect to the order in which the individual operations will be processed by the document store.
-* There is no implied transactionality between the operations.  Individual operations my succeed or fail independently of one another, and it is entirely possible for the client to receive back a result set indicating a mix of success and failure results for the individual operations.
-
-##### Submit Bulk Request
-    Method        : POST
-    URL           : http://<host>:9509/services/search-data-service/v1/search/bulk/
-    URL Params    : NONE
-    Request Payload:
-        A json structure containing all of the bundled actions to be performed.
-        It must correspond to the following format:
-            [
-                { "operation": {{<metaData>}, {<document>},
-                { "operation": {{<metaData>}, {<document>},
-                            .
-                            .
-                { "operation": {{<metaData>}, {<document>},
-            ]
-            
-        Where,
-            operation - Is one of:  "create", "update", or "delete"
-            
-            metaData  - A structure containing meta-data associated with the individual operation to be performed.  Valid fields include:
-                "url"   - The resource identifier of the document to be operated on.
-                "etag" - Identifies the version of the document to be acted on.  Required for "update" and "delete" operations.
-                
-            document - The document contents for "create" and "update" operations.
-            
-        Example Payload:
-        [
-            {"create": {"metaData": {"url": "/services/search-data-service/v1/indexes/the-index/documents/1"}, "document": {"f1": "v1", "f2": "v2"}}},
-            {"create": {"metaData": {"url": "/services/search-data-service/indexes/the-index/documents/2"}, "document": {"f1": "v1", "f2": "v2"}}},
-            {"update": {"metaData": {"url": "/services/search-data-service/v1/search/indexes/the-index/documents/8", "etag": "1"}, "document": {"f1": "v1a", "f2": "v2a"}}},
-            {"delete": {"metaData": {"url": "/services/search-data-service/v1/search/indexes/the-index/documents/99", "etag": "3"}}}
-        ]
-        
-    Response Payload:
-        The response body will contain an aggregation of the collective results as well as separate status codes for each of the operations in the request.
-        Example:
-        { 
-            "total_operations": 4, 
-            "total_success": 1, 
-            "total_fails": 3, 
-                       "results": [
-                               {"operation": "create", "url": "/services/search-data-service/v1/search/indexes/the-index/documents/1", "etag": "1", "status-code": "201", "status-message": "OK"}, 
-                               {"operation": "create", "url": "/services/search-data-service/v1/search/indexes/the-index/documents/2", "etag": "1", "status-code": "201", "status-message": "OK"}, 
-                               {"operation": "update", "url": "/services/search-data-service/v1/search/indexes/the-index/documents/8", "etag": "2", "status-code": "200", "status-message": "OK"}, 
-                               {"operation": "delete", "url": "/services/search-data-service/v1/search/indexes/the-index/documents/2", "status-code": "200", "status-message": "OK"}
-    ]
-}
-\ No newline at end of file
+You can now start the Docker container for the _Search Data Service_, in the following manner:
+
+       docker run -d \
+           -p 9509:9509 \
+               -e CONFIG_HOME=/opt/app/search-data-service/config/ \
+               -e KEY_STORE_PASSWORD={{obfuscated password}} \
+               -e KEY_MANAGER_PASSWORD=OBF:{{obfuscated password}} \
+           -v /logs:/opt/aai/logroot/AAI-SDB \
+           -v /opt/app/search-data-service/appconfig:/opt/app/search-data-service/config \
+           --name search-data-service \
+           {{your docker repo}}/search-data-service
+    
+Where,
+
+    {{your docker repo}} = The Docker repository you have published your Search Data Service image to.
+    {{obfuscated password}} = The password for your key store/key manager after running it through the Jetty obfuscation tool.
+
+ 
+ 
+ 
+\ No newline at end of file
diff --git a/SEARCH.md b/SEARCH.md

new file mode 100644 (file)

index 0000000..cc6d074
--- /dev/null
+++ b/SEARCH.md
@@ -0,0 +1,600 @@
+# Search Requests\r
+\r
+## Overview\r
+Ultimately, the point of storing documents in a document store is to be able to access the data embodied in those documents in a variety of interesting ways.  To put it another way, we want to be able to ask interesting questions of the data that has been persisted, not just get back a document that we put in.\r
+\r
+We do this by submitting _Search Requests_ to the _Search Service_.\r
+\r
+Conceptually, the structure of a _Search Request_ can be visualized as a pipeline consisting of three phases of operations that are applied to our document set in order to produce a query result set:\r
+\r
+    +---------------------+\r
+    | ENTIRE DOCUMENT SET | ----------------------------- We begin with the entire set of documents in our index.\r
+    +---------------------+\r
+               |\r
+               V\r
+        \-------------/          \r
+         \ Filter(s) / ---------------------------------- We optionally apply a set of filter criteria to the document \r
+          -----------                                     set, with the intent of reducing the overall set of documents \r
+               |                                          that we will be applying our query against (filtering is cheaper \r
+               |                                          than running the full queries)\r
+               V                   \r
+      +-----------------+        \r
+      | DOCUMENT SUBSET | ------------------------------- Our filter stage produces a (hopefully) smaller subset of documents \r
+      +-----------------+                                 than we started with.\r
+               |\r
+               V\r
+         \-----------/      \r
+          \ Queries / ----------------------------------- Now we execute our queries against the filtered document subset.\r
+           --------- \r
+               |\r
+               V\r
+     +------------------+          \r
+     | QUERY RESULT SET | ------------------------------- This produces a scored set of results. \r
+     +------------------+            \r
+               |                     \r
+               V                                \r
+      \----------------/             \r
+       \ Aggregations / --------------------------------- Optionally we may apply a set of aggregations against the query               \r
+        --------------                                    result set.\r
+               |                     \r
+               V                     \r
+    +---------------------+          \r
+    | AGGREGATION BUCKETS | ----------------------------- This produces a set of aggregation buckets based on the query\r
+    +---------------------+                               result set.\r
+\r
+\r
+\r
+### Filters\r
+_Filters_ are intended to answer the following question about the documents that they are applied to:  _Does this document match the specified criteria?_\r
+\r
+In other words, filter queries produce **yes/no** results.  A given document either **IS** or **IS NOT** in the set of documents that we want.  This can also be described as a _non-scored_ search and is typically a cheaper operation than performing a _scored_ search (more on those below).  This is why we often want to pre-filter our document set before applying more expensive query operations.\r
+\r
+### Queries\r
+_Queries_ are intended to answer the following question about the documents that they are applied to: _How well does this document match the specified criteria?_ \r
+\r
+In other words, the criteria which we include in our _Query_ is not intended to produce a set of _yes/no_ results, but a _scored_ result set which includes the set of documents that meet some combination of the criteria which we supply and includes a _score_ indicating _how well_ a particular document meets the overall requirement set.  The more criteria that a particular document meets, the higher its score.\r
+\r
+### Aggregations\r
+_Aggregations_ are intended to answer questions that summarize and analyze an overall data set (typically a _Query_ result set).\r
+\r
+_Aggregations_ produce result sets that group the data set into buckets, based on the _Aggregation_ criteria, and allow questions such as the following to be asked:\r
+\r
+* Of all of the people in my result set, how many are taller than 5' 10"?\r
+* How many different server vendors have hardware installed in my network?\r
+* What proportion of employees serviced by our IT department use Macs?  What proportion use Windows?\r
+\r
+## Syntax\r
+\r
+### The Filter Stanza\r
+\r
+If you intend for your _Search Request_ to include a _Filter_ stage, then you must define the _filter queries_ that you wish to be applied in the _Filter Stanza_ of your _Search Request_.  The following pseudo-json illustrates the structure of the _Filter Stanza_:\r
+\r
+    {\r
+        "filter": {\r
+            "all": [ {filter-query}, {filter-query}, ... {filter-query}],\r
+            "any": [ {filter-query}, {filter-query}, ... {filter-query}]\r
+        }\r
+    }\r
+\r
+As we can see, our _Filter Stanza_ consists of two optional groupings of _query_ statements: _any_ and _all_.  These groupings will produce the following behaviours:\r
+\r
+* _all_ - _Documents_ must match ALL of the _queries_ in this grouping in order to be included in the result set.\r
+* _any_ - _Documents_ must match a minimum of ONE of the _queries_ in this grouping in order to be included in the result set.\r
+\r
+The _filter-queries_ themselves are syntactically identical to the _Queries_ which we will define below, the only difference being that, because they are declared within the _Filter Stanza_ they will be applied as _unscored_ queries.\r
+\r
+### The Query Stanza\r
+\r
+The _Query Stanza_ is where we define _query statements_ which are intended to produce _scored results_, as opposed to _queries_ which are intended for filtering purposes.\r
+\r
+The _Query Stanza_ is expressed as a list of _query statements_, each of which is prefixed with a directive indicating how strongly the _query_ in question is to be applied.\r
+\r
+_Queries_ prefixed with the "must" directive, represent queries which all documents must satisfy to be included in the result set, whereas, _queries_ prefixed with the "may" directive represent queries which are not required for a document to be included in the result set, although they will score higher if they do.\r
+\r
+The following pseudo-json illustrates the structure of the _Query Stanza_:\r
+ \r
+    {\r
+        "queries": [\r
+            { "must": { <query> } },\r
+            { "must": { <query> } },\r
+                 .\r
+                 .\r
+            { "must": { <query> } },\r
+            { "may" : { <query> } },\r
+                 .\r
+                 .\r
+            { "may" : { <query> } }\r
+        ]\r
+    }\r
+\r
+**Nested Fields** - If the document to be queried contains nested fields, then these may be referenced by providing the fully qualified field name using dotted notation.  Example: _myFieldName.mySubFieldName_\r
+\r
+** Result-Set Restrictions** - In some cases, where the number of hits for a given set of queries is expected to be large, the client may want to restrict the size of the result set that is returned, as well as manipulate which subset of results it gets back.  This can be accomplished with the following optional fields in the Search Statement:\r
+* results-start - Index into the result set to the first document to be returned.\r
+* results-size - The maximum number of documents to be returned in the result set.\r
+\r
+Both of these fields are optional - leaving them out implies that the entire result set should be returned to the client.\r
+\r
+**IMPORTANT - Note that although the these two fields may be used by the client to get back query results a chunk at a time by resending the same query repeatedly and specifying a different 'results-start' value each time, be aware that this is NOT a transactional operation.  There is no guarantee that changes to the underlying data may not occur in between query calls.  This is NOT intended to be the equivalent of a mechanism such as the 'Scroll API' provided by ElasticSearch or a cursor in a traditional data base.**\r
+\r
+We will discuss the specific query types supported below, and then provide some concrete examples.\r
+\r
+**Term Query**\r
+\r
+A _Term_ query attempts to match the literal value of a field, with no advanced parsing or analysis of the query string.\r
+\r
+There are two operations supported by the _Term_ query type:\r
+* match - The contents of the specified field must match the supplied value.\r
+* not-match - The contents of the specified field must NOT match the supplied value.\r
+\r
+_Example - Simple Match_\r
+\r
+    {\r
+        "match": { "FirstName": "Bob" }\r
+    }\r
+    \r
+_Example - Simple Not-Match_\r
+\r
+    {\r
+        "not-match": { "LastName": "Smith" }\r
+    }\r
+    \r
+Note that the term match is applied against the tokenized field contents.\r
+\r
+For example, if a field containing the contents: "_the quick brown fox_" was analyzed with a white space tokenizer (this occurs on document creation or update), the following inverted indexes would be created for the field at the time of document insertion:\r
+\r
+    the\r
+    quick\r
+    brown\r
+    fox\r
+\r
+Meaning that the following term queries would all produce a match for our document:\r
+\r
+         {"must": {"match": { "my-field-name": "the"}}}\r
+         {"must": {"match": { "my-field-name": "quick"}}}\r
+         {"must": {"match": { "my-field-name": "brown"}}}\r
+         {"must": {"match": { "my-field-name": "fox"}}}\r
+\r
+         \r
+**Multi Field Term Query**\r
+\r
+A variant of the _Term_ query described above is the  _Multi Field Term Query_, which, as the name suggests, allows for a _Term_ query to be applied across multiple fields.  The syntax is the same as for a single field term query except that the fields are supplied as a space-delimited list, as in the example below:\r
+\r
+    {"must": {"match": {"field": "field1 field2 field3", "value": "blah"}}}\r
+\r
+The above query would produce a hit for any documents containing the value "blah" in any of "field1", "field2", or "field3".\r
+ \r
+Note that it is also valid to supply multiple values in the same manner, by supplying a space-delimited list in the "value" field.\r
+\r
+The default behaviour in this case is to produce a hit only if there is at least one occurrence of EVERY supplied value in any of the specified fields.\r
+\r
+For example, the following query:\r
+\r
+    {"must": {"match": {"field": "first_name last_name", "value": "Smith Will"}}}\r
+  \r
+Produces a match for document {"first_name": "Will", "last_name": "Smith"} but not {"first_name": "Will", "last_name": "Shakespeare"}\r
+ \r
+This default behaviour can be overriden by explicitly specifying a boolean operator for the query.  Valid operators are as follows:\r
+* and -        At least one occurrence of every value must be present in one of the specified fields to produce a hit. (This is the default behaviour if no operation is specified).\r
+* or - An occurrence of any of the specified values in any of the specified fields will produce a hit.\r
+ \r
+Restating the previous example with the operator explicitly specified illustrates the difference between the two operations:\r
+\r
+_Example - Multi field term query with AND operator explicitly set:_\r
+\r
+    {"must": {"match": {"field": "first_name last_name", "value": "Smith Will", "operator": "and"}}}\r
+  \r
+Produces a match for document {"first_name": "Will", "last_name": "Smith"} but not {"first_name": "Will", "last_name": "Shakespeare"} -- Exactly as in our previous example since this is the default behaviour.\r
+  \r
+  \r
+_Example - Multi field term query with OR operator explicitly set:_\r
+\r
+    {"must": {"match": {"field": "first_name last_name", "value": "Smith Will", "operator": "or"}}}\r
+  \r
+Produces a match for both documents {"first_name": "Will", "last_name": "Smith"} and {"first_name": "Will", "last_name": "Shakespeare"}\r
+\r
+**Parsed Query**\r
+\r
+Parsed queries apply a query parser to the supplied query string in order to determine the exact query to apply to the specified field.\r
+The query string is parsed into a series of terms and operators, as described below:\r
+\r
+_Terms_\r
+\r
+Terms may be any of the following:\r
+* single words\r
+* exact phrases, as denoted by enclosing the phrase in open and close quotations.  Example: "this is my exact phrase"\r
+* regular expressions, as denoted by wrapping the expressing in forward slash ( / ) character.  Example: /joh?n(ath[oa]n)/\r
+\r
+Note that a series of terms with no explicit operators will be interpreted as a set of optional values.  For example:\r
+\r
+    quick brown fox  would match fields which contain quick OR brown OR fox.\r
+\r
+_Operators_\r
+\r
+Operators provide for more complexity in terms of the behaviour described by the query string.  The supported operators are as follows:\r
+\r
+| Operator | Description                                                              | Example        |\r
+| -------- | ------------------------------------------------------------------------ | -------------- |\r
+| +           | The term to the right of the operator must be present.                   | +quick         |\r
+| -           | The term to the right of the operator must not be present                | -brown         |\r
+| AND     | Both the terms to the left and right of the operator must be present     | brown AND fox  |\r
+| OR      | Either the term to the left or right of the operator must be present     | quick OR brown |\r
+| NOT     | The term to the right of the operator must not be present (similar to -) | NOT fox        |\r
+ \r
+The following pseudo-json illustrates the structure of a parsed query:\r
+\r
+    "parsed-query": {\r
+        "field": "fieldname",      // If this field is not present, then apply the query to ALL fields\r
+        "query-string": "string"\r
+    }\r
+\r
+\r
+**Range Query**\r
+\r
+_Range_ queries match fields whose term value falls within the specified numeric or date range.\r
+\r
+For fields containing date types, the default format for that field (as provided in the document schema when the index was created) will be used, unless the client overrides that with their own format string.\r
+\r
+A _Range_ query includes one or a combination of operations representing the upper and lower bounds of the range.  The following describes the supported bounds operations:\r
+\r
+| Operation | Description            |\r
+| --------- | ---------------------- |\r
+| gt       | Greater than           |\r
+| gte      | Greater than or equals |\r
+| lt       | Less than              |\r
+| lte      | Less than or equals    |\r
+ \r
+The following pseudo-json describes the structure of a ranged query.\r
+\r
+    "range": {\r
+        "field": "fieldname",\r
+        "<<operation>>": numeric-or-date-value,     // where <<operation>> is one of the operations from the table above.\r
+              .\r
+              .\r
+        "<<operation>>": numeric-or-date-value,     // where <<operation>> is one of the operations from the table above.\r
+        "format": "format-string"                   // For date ranges, this allows the client to override the format defined \r
+                                                    // for the field in the document schema.\r
+        "time-zone": "+1:00"                        // For date ranges, this allows the client to specify a time zone parameter \r
+                                                    // to be applied to the lower and upper bounds of the query.\r
+    }\r
+\r
+### The Aggregations Stanza\r
+\r
+**Group By Aggregation**\r
+\r
+_Group By_ aggregations create buckets based on the values in a specific field.  These are expressed in the following manner:\r
+\r
+_Example - Group by last name, excluding any buckets with less than 2 entries._\r
+\r
+    {\r
+        "name": "GroupByLastName",\r
+        "aggregation" : {\r
+            "group-by": {\r
+                "field": "LastName",\r
+                "min-threshold": 2\r
+            }\r
+        }\r
+    }\r
+\r
+\r
+**Date Range Aggregation**\r
+\r
+_Date Range_ aggregations produce counts where date type fields fall in or out of a particular range.\r
+\r
+_Example - Return counts of the number of people born before Jan 1, 1972 and the number of people born after Jan 1, 1972_\r
+\r
+    {\r
+        "name": "AggregateByBirthdate",\r
+        "aggregation": {\r
+            "date-range": {\r
+                "field": "BirthDate",\r
+                "from": "01-01-1972 00:00:00",\r
+                "to": "01-01-1972 00:00:00"\r
+            }\r
+        }\r
+    }\r
+\r
+### Putting It All Together\r
+\r
+The following examples illustrate how to construct a complete search statement, putting all of the previous building blocks together, starting with the simplest possible search statement, and building toward more complicated request that combine, filters, scored queries, and aggregations.\r
+\r
+_Example - Simple search statement with no filtering or aggregations_\r
+\r
+Search Statement:\r
+\r
+    {\r
+        "queries": [\r
+            {"must": {"match": {"field": "LastName", "value": "Smith"}}}   \r
+        ]\r
+    }\r
+\r
+Response Body:\r
+\r
+    {\r
+        "searchResult": {\r
+            "totalHits": 3,\r
+            "hits": [\r
+                {\r
+                    "score": 0.8,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/8",\r
+                        "etag": "3",\r
+                        "content": {\r
+                            "FirstName": "Will",\r
+                            "LastName": "Smith"\r
+                        }\r
+                    }\r
+                },\r
+                {\r
+                    "score": 0.8,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/2",\r
+                        "etag": "1",\r
+                        "content": {\r
+                            "FirstName": "Bob",\r
+                            "LastName": "Smith"\r
+                        }\r
+                    }\r
+                },\r
+                {\r
+                    "score": 0.8,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/10",\r
+                        "etag": "7",\r
+                        "content": {\r
+                            "FirstName": "Alan",\r
+                            "LastName": "Smith"\r
+                        }\r
+                    }\r
+                }\r
+            ]\r
+        }\r
+    }\r
+\r
+_Example - Simple search statement with multiple term queries, with no filtering or aggregations_\r
+\r
+Search Statement:\r
+\r
+    {\r
+        "queries": [\r
+            {"must": {"match": {"field": "LastName", "value": "Smith"}}},\r
+            {"may": {"match": {"field": "FirstName", "value", "Bob"}}},\r
+            {"must": {"not-match": {"field": "FirstName", "value", "Alan"}}}\r
+        ]\r
+    }\r
+\r
+Response Body:\r
+\r
+    {\r
+        "searchResult": {\r
+            "totalHits": 2,\r
+            "hits": [\r
+                {\r
+                    "score": 0.8,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/8",\r
+                        "etag": "3",\r
+                        "content": {\r
+                            "FirstName": "Bob",\r
+                            "LastName": "Smith"\r
+                        }\r
+                    }\r
+                },\r
+                {\r
+                    "score": 0.5,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/2",\r
+                        "etag": "1",\r
+                        "content": {\r
+                            "FirstName": "Will",\r
+                            "LastName": "Smith"\r
+                        }\r
+                    }\r
+                }\r
+            ]\r
+        }\r
+    }\r
+    \r
+_Example - Simple search statement with a filter stanza_\r
+\r
+Search Statement:\r
+\r
+    {\r
+        "filter": {\r
+            "all": {\r
+                { "must": {"not-match": {"field": "FirstName", "value", "Bob"}}}\r
+            }\r
+        },\r
+        \r
+        "queries": [\r
+            {"must": {"match": {"field": "LastName", "value": "Smith"}}},\r
+        ]\r
+    }\r
+\r
+Response Body:\r
+\r
+    {\r
+        "searchResult": {\r
+            "totalHits": 2,\r
+            "hits": [\r
+                {\r
+                    "score": 0.8,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/8",\r
+                        "etag": "3",\r
+                        "content": {\r
+                            "FirstName": "Will",\r
+                            "LastName": "Smith"\r
+                        }\r
+                    }\r
+                },\r
+                {\r
+                    "score": 0.8,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/10",\r
+                        "etag": "7",\r
+                        "content": {\r
+                            "FirstName": "Alan",\r
+                            "LastName": "Smith"\r
+                        }\r
+                    }\r
+                }\r
+            ]\r
+        }\r
+    }\r
+\r
+_Example - Simple search statement with filter and aggregation stanzas_\r
+\r
+Assuming the following document set:\r
+\r
+    {"FirstName": "Will", "LastName": "Smith", "BirthDate": "1968-09-25T00:00:00", "Profession": "Actor"},\r
+    {"FirstName": "Jaden, "LastName": "Smith", "BirthDate": "1998-06-08T00:00:00", "Profession": "Actor"},\r
+    {"FirstName": "Alan, "LastName": "Smith", "BirthDate": "1956-05-21T00:00:00", "Profession": "Artist"},\r
+    {"FirstName": "Wilson", "LastName": "Fisk", "BirthDate": "1962-02-17T00:00:00", "Profession": "Crime Lord"},\r
+    {"FirstName": "Bob", "LastName": "Smith", "BirthDate": "1972-11-05T00:00:00", "Profession": "Plumber"},\r
+    {"FirstName": "Jane", "LastName": "Smith", "BirthDate": "1992-10-15T00:00:00", "Profession": "Accountant"},\r
+    {"FirstName": "John", "LastName": "Doe", "BirthDate": "1981-10-15T00:00:00", "Profession": "Janitor"}\r
+    \r
+Filter out all people born before Jan 1, 1960, then query the remaining set for people who's last name is 'Smith', with preference to plumbers, and finally count the number of each profession in the resulting set:\r
+\r
+Search Statement:\r
+\r
+    {\r
+        "filter": {\r
+            "all": {\r
+                { "must": {"range": {"field": "BirthDate", "gte", "1960-01-01T00:00:00"}}}\r
+            }\r
+        },\r
+        \r
+        "queries": [\r
+            {"must": {"match": {"field": "LastName", "value": "Smith"}}},\r
+            {"may": {"match": {"field": "Profession", "value": "Plumber"}}}\r
+        ],\r
+        \r
+        "aggregations": [\r
+               {\r
+                       "name": "by_profession",\r
+                       "aggregation": {\r
+                           "group-by": {\r
+                               "field": "Profession"\r
+                           }\r
+                       }\r
+           } \r
+        ] \r
+    }\r
+\r
+Response Body:\r
+\r
+    {\r
+        "searchResult": {\r
+            "totalHits": 4,\r
+            "hits": [\r
+                {\r
+                    "score": 0.8,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/8",\r
+                        "etag": "3",\r
+                        "content": {\r
+                            "FirstName": "Bob",\r
+                            "LastName": "Smith",\r
+                            "Profession": "Plumber"\r
+                            "BirthDate": "1972-11-05T00:00:00"\r
+                        }\r
+                    }\r
+                },\r
+                {\r
+                    "score": 0.5,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/10",\r
+                        "etag": "7",\r
+                        "content": {\r
+                            "FirstName": "Will",\r
+                            "LastName": "Smith",\r
+                            "Profession": "Actor",\r
+                            "BirthDate": "1968-09-25T00:00:00"\r
+                        }\r
+                    }\r
+                },\r
+                {\r
+                    "score": 0.5,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/10",\r
+                        "etag": "7",\r
+                        "content": {\r
+                            "FirstName": "Jaden",\r
+                            "LastName": "Smith",\r
+                            "Profession": "Actor",\r
+                            "BirthDate": "1998-06-08T00:00:00"\r
+                        }\r
+                    }\r
+                },\r
+                {\r
+                    "score": 0.5,\r
+                    "document": {\r
+                        "url": "/indexes/people/documents/10",\r
+                        "etag": "7",\r
+                        "content": {\r
+                            "FirstName": "Jane",\r
+                            "LastName": "Smith",\r
+                            "Profession": "Accountant",\r
+                            "BirthDate": "1992-10-15T00:00:00"\r
+                        }\r
+                    }\r
+                }\r
+            ]\r
+        },\r
+        "aggregationResult": {\r
+            "aggregations": [\r
+                {\r
+                    "name": "by_profession",\r
+                    "buckets": [\r
+                        { "key": "Actor", "count": 2 },\r
+                        { "key": "Plumber", "count": 1 },\r
+                        { "key": "Accountant", "count": 1 }\r
+                    ]\r
+            ]\r
+        }\r
+    }\r
+\r
+\r
+## API\r
+\r
+### Submit a Search Query\r
+\r
+---\r
+**URL**\r
+\r
+    https://{host}:9509/services/search-data-service/v1/search/indexes/{index}/query/\r
+\r
+**Method** \r
+\r
+    POST\r
+\r
+**URL Params**\r
+\r
+    index - The name of the _Index_ to apply the query against.\r
+\r
+**Request Header**\r
+\r
+    Accept          = application/json\r
+    X-TransactionId = Unique id set by client (for logging purposes)\r
+    X-FromAppId     = Application identifier (for logging purposes)\r
+    Content-Type    = application/json\r
+    \r
+**Request Payload**\r
+\r
+    Search statement expressed in JSON format (see **Syntax**) \r
+\r
+**Success Response**\r
+\r
+    Code:      200\r
+    Header(s): None\r
+    Body:      JSON format result set.  \r
+    \r
+**Error Response**\r
+\r
+    400 - Bad Request\r
+    403 - Unauthorized\r
+    500 - Internal Error\r
+\r
+---\r
author	gfraboni <gino.fraboni@amdocs.com>
	Thu, 11 May 2017 15:36:07 +0000 (11:36 -0400)
committer	gfraboni <gino.fraboni@amdocs.com>
	Thu, 11 May 2017 15:49:25 +0000 (11:49 -0400)
BULK.md	[new file with mode: 0644]	patch \| blob
CONCEPTS.md	[new file with mode: 0644]	patch \| blob
DOCUMENTS.md	[new file with mode: 0644]	patch \| blob
INDEXES.md	[new file with mode: 0644]	patch \| blob
README.md		patch \| blob \| history
SEARCH.md	[new file with mode: 0644]	patch \| blob