GetItems
Description
Retrieves (reads) attributes of multiple items in a table or in a data container's root directory, according to the specified criteria.
-
You can't use both optimization methods together in the same
GetItems request. -
If you're looking for a specific item, use
GetItem , which is faster than either of theseGetItems optimized-scan methods because it searches for a specific object file on the relevant data slice. See Working with NoSQL Data.
- Range Scan
GetItems allows you to perform a range scan to retrieve items with a specific sharding-key value by setting theShardingKey request parameter to the requested sharding-key value. You can also optionally restrict the query to a specific range of item sorting-key values by using theSortKeyRangeStart and/orSortKeyRangeEnd parameters. A range scan is more efficient than the defaultGetItems full table scan because of the way that the data is stored and accessed. For more information, see Working with NoSQL Data.- Parallel Scan (Segmented Table Scan)
GetItems scans table items in search for the requested items. By default, the scan is executed sequentially. However, you can optionally scan only a specific portion (segment) of the table: you can set the request'sTotalSegment parameter to the number of segments into which you wish to divide the table, and set the request'sSegment parameter to the ID of the segment that you wish to scan in the current operation. To improve performance, you can implement a parallel table scan by dividing the scan among multiple application instances ("workers"), assigning each worker a different segment to scan. Note that such an implementation requires that the workers all sendGetItems requests with the same scan criteria and total-segments count but with different scan segments.The following table depicts a parallel multi-worker scan of a segmented table with
GetItems :
The "FALSE"
.
To retrieve the remaining requested items, send a new identical
-
The
Limit request parameter defines the maximum number of items to return in the response object for the current API call. When issuing aGetItems request with a new marker, after receiving a partial response, consider recalculating the limit to subtract the items returned in the responses to the previous requests. -
A
GetItems response might contain less items than specified in theLimit request parameter even if there are additional table items that match the request (i.e., the value of theLastItemIncluded response element is"FALSE"
). In such cases, you need to issue a newGetItems request to retrieve the remaining items, as explained above. -
Requests that set the
Marker parameter must perform a similar scan to that performed by the previous partial-response request — be it a parallel scan, a range scan, or a regular scan. For example, you cannot use theNextMarker response element returned for a previous range-scan request as the value of theMarker parameter of a parallel-scan request.
Request
Request Header
POST /<container>/<resource> HTTP/1.1
Host: <web-APIs URL>
Content-Type: application/json
X-v3io-function: GetItems
X-v3io-session-key: <access key>
url = "http://<web-APIs URL>/<container>/<resource>"
headers = {
"Content-Type": "application/json",
"X-v3io-function": "GetItems",
"<Authorization OR X-v3io-session-key>": "<value>"
}
- To retrieve items from a specific table, set the relative table path within the configured container in the request URL or in the
TableName JSON parameter, or split the path between the URL and the JSON parameter. See Data-Service Web-API General Structure. - To retrieve items from the root directory of the configured container, omit the
<resource>
URL element — i.e., end the URL in the request header with<container>/
— and either don't set the request'sTableName JSON parameter or set it to"/"
.
Request Data
{
"TableName": "string",
"Limit": number,
"AttributesToGet": "string",
"FilterExpression": "string",
"ShardingKey": "string",
"SortKeyRangeStart": "string",
"SortKeyRangeEnd": "string",
"Segment": number,
"TotalSegment": number,
"Marker": "string"
}
payload = {
"TableName": "string",
"Limit": number,
"AttributesToGet": "string",
"FilterExpression": "string",
"ShardingKey": "string",
"SortKeyRangeStart": "string",
"SortKeyRangeEnd": "string",
"Segment": number,
"TotalSegment": number,
"Marker": "string"
}
- TableName
To retrieve items from a specific table (collection), set the relative table path within the configured container in this parameter or in the request URL, or split the path between the URL and the JSON parameter. See Data-Service Web-API General Structure.
To retrieve items from the root directory of the configured container, end the URL in the request header with
<container>/
and either don't set theTableName JSON parameter or set it to"/"
.- Type: String
- Requirement: Optional
- Limit
The maximum number of items to return within the response (i.e., the maximum number of elements in the response object's Items array).
- Type: Number
- Requirement: Optional
- AttributesToGet
The attributes to return for each item.
- Type: String
- Requirement: Optional
- Default Value:
"*"
The attributes to return can be depicted in one of the following ways:
-
A comma-separated list of attribute names.
Note: Currently, the delimiter commas cannot be surrounded by spaces.The attributes can be of any attribute type — user, system, or hidden.
-
"*" — retrieve the item's user attributes and__name system attribute, but not other system attributes or hidden attributes. This is the default value. -
"**" — retrieve all item attributes — user, system, and hidden attributes.
For an overview of the different attribute types, see Attribute Types.
- FilterExpression
A filter expression that restricts the items to retrieve. Only items that match the filter criteria are returned. See filter expression.
- Type: String
- Requirement: Optional
- ShardingKey
The sharding-key value of the items to get by using a range scan. The sharding-key value is the part to the left of the leftmost period in a compound primary-key value (item name). You can optionally use the
SortKeyRangeStart and/orSortKeyRangeEnd request parameters to restrict the search to a specific range of sorting keys (SortKeyRangeStart >= <sorting key> < SortKeyRangeEnd
).NoteTo retrieve all items for an original sharding-key value that was recalculated during the ingestion (to achieve a more even workload distribution), you need to repeat the
GetItems request for each of the sharding-key values that were used in the ingestion. If the ingestion was done by using the even-distribution option of the NoSQL Spark DataFrame, you need to repeat the request withShardingKey values that range from<original sharding key>_1
to<original sharding key>_<n>
, where<n>
is the value of thev3io.kv.range-scan.hashing-bucket-num configuration property (default = 64); for example,johnd_1 .. johnd_64
. For more information, see Recalculating Sharding-Key Values for Even Workload Distribution.- Type: String
Requirement: Optional; required when either the
SortKeyRangeStart orSortKeyRangeEnd request parameter is set
- SortKeyRangeStart
The minimal sorting-key value of the items to get by using a range scan. The sorting-key value is the part to the right of the leftmost period in a compound primary-key value (item name). This parameter is applicable only together with the
ShardingKey request parameter. The scan will return all items with the specified sharding-key value whose sorting-key values are greater than or equal to (>=
) the value of theSortKeyRangeStart parameter and less than (<
) the value of theSortKeyRangeEnd parameter (if set).- Type: String
- Requirement: Optional
- SortKeyRangeEnd
The maximal sorting-key value of the items to get by using a range scan. The sorting-key value is the part to the right of the leftmost period in a compound primary-key value (item name). This parameter is applicable only together with the
ShardingKey request parameter. The scan will return all items with the specified sharding-key value whose sorting-key values are greater than or equal to (>=
) than the value of theSortKeyRangeStart parameter (if set) and less than (<
) the value of theSortKeyRangeEnd parameter.- Type: String
- Requirement: Optional
- Segment
The ID of a specific table segment to scan — 0 to one less than
TotalSegment . See Parallel Scan.- Type: Number
Requirement: Required when
TotalSegment is provided
- TotalSegment
The number of segments into which to divide the table scan — 1 to 1024. See Parallel Scan. The segments are assigned sequential IDs starting with 0.
- Type: Number
Requirement: Required when
Segment is provided
- Marker
An opaque identifier that was returned in the
NextMarker element of a response to a previousGetItems request that did not return all the requested items. This marker identifies the location in the table from which to start searching for the remaining requested items. See Partial Response and the description of theNextMarker response element.- Type: String
- Requirement: Optional
Response
Response Data
{
"LastItemIncluded": "string",
"NumItems": number,
"NextMarker": "string",
"Items": [
{
"string": {
"S": "string",
"N": "string",
"BOOL": Boolean,
"B": "blob"
}
}
]
}
- LastItemIncluded
"TRUE"
if the scan completed successfully — the entire table was scanned for the requested items and all relevant items were returned (possibly in a previous response — see Partial Response);"FALSE"
otherwise.- Type: Boolean string —
"TRUE"
or"FALSE"
- Type: Boolean string —
- NumItems
The number of items in the response's
Items array.- Type: Number
- NextMarker
An opaque identifier that marks the location in the table at which to start searching for remaining items in the next call to
GetItems . See Partial Response and the description of theMarker request parameter. When the response contains all the requested items,NextMarker is not returned.- Type: String
- Items
An array of items containing the requested attributes. The array contains information only for items that satisfy the conditions of the
FilterExpression request parameter. Each returned item object includes only the attributes requested in theAttributesToGet parameter, provided the item has these attributes.- Type: An array of item JSON objects that contain
Attribute objects
- Type: An array of item JSON objects that contain
Examples
Example 1 — Basic Filter-Expression Scan
Retrieve from a "MyDirectory/Cars" table in a "mycontainer" container the
POST /mycontainer/MyDirectory/ HTTP/1.1
Host: https://default-tenant.app.mycluster.iguazio.com:8443
Content-Type: application/json
X-v3io-function: GetItems
X-v3io-session-key: e8bd4ca2-537b-4175-bf01-8c74963e90bf
{
"TableName": "Cars",
"Limit": 1000,
"AttributesToGet": "__name,km,state,manufacturer",
"FilterExpression": "(km >= 10000) AND (lastService < 10000)"
}
import requests
url = "https://default-tenant.app.mycluster.iguazio.com:8443/mycontainer/MyDirectory/"
headers = {
"Content-Type": "application/json",
"X-v3io-function": "GetItems",
"X-v3io-session-key": "e8bd4ca2-537b-4175-bf01-8c74963e90bf"
}
payload = {
"TableName": "Cars",
"Limit": 1000,
"AttributesToGet": "__name,km,state,manufacturer",
"FilterExpression": "(km >= 10000) AND (lastService < 10000)"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
HTTP/1.1 200 OK
Content-Type: application/json
...
{
"LastItemIncluded": "TRUE",
"NumItems": 3,
"Items": [
{
"__name": {"S": "7348841"},
"km": {"N": "10000"},
"state": {"S": "OK"}
},
{
"__name": {"S": "6924123"},
"km": {"N": "15037"},
"state": {"N": "OUT_OF_SERVICE"},
"manufacturer": {"S": "Honda"}
},
{
"__name": {"S": "7222751"},
"km": {"N": "12503"}
},
{
"__name": {"S": "5119003"},
"km": {"N": "11200"},
"manufacturer": {"S": "Toyota"}
}
]
}
Example 2 — Range Scan
This examples demonstrates two range-scan queries for a "mytaxis/rides" table in a "mycontainer" container. The table contains the following items:
+---------+--------+---------+--------+----------------+------------------+-------------------+
|driver_id| date|num_rides|total_km|total_passengers| avg_ride_km|avg_ride_passengers|
+---------+--------+---------+--------+----------------+------------------+-------------------+
| 1|20180601| 25| 125.0| 40| 5.0| 1.6|
| 1|20180602| 20| 106.0| 46| 5.3| 2.3|
| 1|20180701| 28| 106.4| 42|3.8000000000000003| 1.5|
| 16|20180601| 1| 224.2| 8| 224.2| 8.0|
| 16|20180602| 10| 244.0| 45| 24.4| 4.5|
| 16|20180701| 6| 193.2| 24|32.199999999999996| 4.0|
| 24|20180601| 8| 332.0| 18| 41.5| 2.25|
| 24|20180602| 5| 260.0| 11| 52.0| 2.2|
| 24|20180701| 7| 352.1| 21|50.300000000000004| 3.0|
+---------+--------+---------+--------+----------------+------------------+-------------------+
The first query scans for all attributes of the items whose sharding-key value is 1:
POST /mycontainer/mytaxis/rides/ HTTP/1.1
Host: https://default-tenant.app.mycluster.iguazio.com:8443
Content-Type: application/json
X-v3io-function: GetItems
X-v3io-session-key: e8bd4ca2-537b-4175-bf01-8c74963e90bf
{
"ShardingKey": "1",
"AttributesToGet": "*"
}
import requests
url = "https://default-tenant.app.mycluster.iguazio.com:8443/mycontainer/mytaxis/rides/"
headers = {
"Content-Type": "application/json",
"X-v3io-function": "GetItems",
"X-v3io-session-key": "e8bd4ca2-537b-4175-bf01-8c74963e90bf"
}
payload = {
"ShardingKey": "1",
"AttributesToGet": "*"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
The second query scans for the
POST /mycontainer/mytaxis/rides/ HTTP/1.1
Host: https://default-tenant.app.mycluster.iguazio.com:8443
Content-Type: application/json
X-v3io-function: GetItems
X-v3io-session-key: e8bd4ca2-537b-4175-bf01-8c74963e90bf
{
"ShardingKey": "24",
"SortKeyRangeStart": "20180101",
"SortKeyRangeEnd": "20180701",
"AttributesToGet": "__name,driver_id,date,avg_ride_km,avg_ride_passengers"
}
import requests
url = "https://default-tenant.app.mycluster.iguazio.com:8443/mycontainer/mytaxis/rides/"
headers = {
"Content-Type": "application/json",
"X-v3io-function": "GetItems",
"X-v3io-session-key": "e8bd4ca2-537b-4175-bf01-8c74963e90bf"
}
payload = {
"ShardingKey": "24",
"SortKeyRangeStart": "20180101",
"SortKeyRangeEnd": "20180701",
"AttributesToGet": "__name,driver_id,date,avg_ride_km,avg_ride_passengers"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
Response to the first query —
HTTP/1.1 200 OK
Content-Type: application/json
...
{
"LastItemIncluded": "TRUE",
"NumItems": 3,
"Items": [
{
"__name": {
"S": "1.20180601"
},
"avg_ride_km": {
"N": "5"
},
"total_passengers": {
"N": "40"
},
"driver_id": {
"N": "1"
},
"avg_ride_passengers": {
"N": "1.6"
},
"total_km": {
"N": "125"
},
"date": {
"S": "20180601"
},
"num_rides": {
"N": "25"
}
},
{
"__name": {
"S": "1.20180602"
},
"avg_ride_km": {
"N": "5.3"
},
"total_passengers": {
"N": "46"
},
"driver_id": {
"N": "1"
},
"avg_ride_passengers": {
"N": "2.3"
},
"total_km": {
"N": "106"
},
"date": {
"S": "20180602"
},
"num_rides": {
"N": "20"
}
},
{
"__name": {
"S": "1.20180701"
},
"avg_ride_km": {
"N": "3.8"
},
"total_passengers": {
"N": "42"
},
"driver_id": {
"N": "1"
},
"avg_ride_passengers": {
"N": "1.5"
},
"total_km": {
"N": "106.4"
},
"date": {
"S": "20180701"
},
"num_rides": {
"N": "28"
}
}
]
}
Response to the second query —
HTTP/1.1 200 OK
Content-Type: application/json
...
{
"LastItemIncluded": "TRUE",
"NumItems": 2,
"Items": [
{
"__name": {
"S": "24.20180601"
},
"driver_id": {
"N": "24"
},
"date": {
"S": "20180601"
},
"avg_ride_km": {
"N": "41.5"
},
"avg_ride_passengers": {
"N": "2.25"
}
},
{
"__name": {
"S": "24.20180602"
},
"driver_id": {
"N": "24"
},
"date": {
"S": "20180602"
},
"avg_ride_km": {
"N": "52"
},
"avg_ride_passengers": {
"N": "2.2"
}
}
]
}