API Data Paths
Overview
The data containers and their contents are referenced differently depending on the programming interface. You need to know how to set the data paths for each interface, as outlined in this guide:
- RESTful Web and Management API Data Paths
- Frames API Data Paths
- Spark API Data Paths
- Trino Data Paths
- File-System Data Paths
Predefined Environment Variables
The platform's command-line services (Jupyter Notebook and the web shell) predefine the following environment variables for simplifying access to the running-user directory of the predefined "users" container:
V3IO_USERNAME — set to the username of the running user of the Jupyter Notebook service.V3IO_HOME — set to the running-user directory in the "users" container — users/<running user>.V3IO_HOME_URL — set to the fully qualifiedv3io
path to the running-user directory —v3io://users/<running user>
.
RESTful Web and Management API Data Paths
To refer to a data container or to a specific directory or file in a container from a RESTful web or cluster-management API request, specify the path as part of the URL in the request header:
<API-endpoint URL>/<container name>[/<path to file or directory>]
For example, the following web-API request URL references the "projects" container:
https://default-tenant.app.mycluster.iguazio.com:8443/projects/
And this is a similar cluster-management API ("management API") request URL:
https://dashboard.default-tenant.app.mycluster.iguazio.com/projects/
The following web-API request URL references a
https://default-tenant.app.mycluster.iguazio.com:8443/users/iguazio/mytable
When using the platform's data-service web APIs, you can optionally set the relative file or directory path within the configured container in the request's JSON body.
For example, for a NoSQL Web API request, you can end the URL path in the previous example with the container name (users
) and set the "mydata/mytable"
.
For full details and examples, see the data-service web-API reference documentation.
Frames API Data Paths
When using the V3IO Frames (Frames) Python API, you create a client object for a specific data container; the container name is specified in the container parameter of the client constructor API. For example:
import v3io_frames as v3f
# Create a client object for the "users" container:
client = v3f.Client("framesd:8081", container="users", token="e8bd4ca2-537b-4175-bf01-8c74963e90bf")
To refer to a specific data collection — such as a NoSQL or TSDB table or a stream — you specify in the relevant
# Read from a "mytable" table in the root directory of the `client` object's
# container:
df = client.read(backend="kv", table="mytable")
# Read from a "mytable" table in the running-user directory (`V3IO_USERNAME`)
# of the `client` object's container (typically for the "users" container):
tsdb_table = os.path.join(os.getenv("V3IO_USERNAME"), "mytable")
df = client.read(backend="tsdb", table=tsdb_table)
# Read from a "drivers" stream in a "my_streams" directory in the `client`
# object's container:
stream = "/my_streams/drivers"
df = client.read(backend="stream", table="/my_streams/drivers", seek="earliest")
For detailed information and examples, see the Frames API reference.
Spark API Data Paths
To refer to data in a data container from Spark API code, such as Spark DataFrames, specify the data path as a fully qualified v3io
path of the following format — where <container name>
is the name of the parent data container and <data path>
is the relative path to the data within the specified container:
v3io://<container name>/<data path>
When using a NoSQL DataFrame, you set the data source to "io.iguaz.v3io.spark.sql.kv"
For example:
val nosql_source = "io.iguaz.v3io.spark.sql.kv"
// Read from a "mytable" NoSQL table in a "mydata" directory in the "projects" container:
var table_path = "v3io://projects/mydata/mytable/"
var readDF = spark.read.format(nosql_source).load(table_path)
// Read from a "mytable" table in the running-user directory of the "users" container.
// The table_path assignments demonstrate alternative methods for setting the same path
// for running-user "iguazio" (specified explicitly only in the first example):
table_path = "v3io://users/iguazio/mytable"
table_path = "v3io://users/" + System.getenv("V3IO_USERNAME") + "/mytable"
table_path = "v3io://" + System.getenv("V3IO_HOME") + "/mytable"
table_path = System.getenv("V3IO_HOME_URL") + "/mytable"
readDF = spark.read.format(nosql_source).load(table_path)
import os
nosql_source = "io.iguaz.v3io.spark.sql.kv"
# Read from a NoSQL table "mytable" in a "mydata" directory in the "projects" container:
table_path = "v3io://projects/mydata/mytable/"
df = spark.read.format(nosql_source).load(table_path)
# Read from a "mytable" table in the running-user directory of the "users" container.
# The table_path assignments demonstrate alternative methods for setting the same path
# for running-user "iguazio" (specified explicitly only in the first example):
table_path = "v3io://users/iguazio/mytable"
table_path = "v3io://users/" + os.getenv("V3IO_USERNAME") + "/mytable"
table_path = "v3io://" + os.getenv("V3IO_HOME") + "/mytable"
table_path = os.getenv("V3IO_HOME_URL") + "/mytable"
readDF = spark.read.format(nosql_source).load(table_path)
For detailed information and examples, see the Spark datasets reference — and especially the Data Paths overview and the Table Paths NoSQL DataFrame sections; and the Spark examples in the platform's tutorial Jupyter notebooks.
Trino Data Paths
To refer to a table in a data container from a Trino query, specify the table path using the following format — where <catalog>
is the name of the Trino connector catalog (v3io
for the Iguazio Trino connector, <container name>
is the name of the table's parent data container (the Trino schema), and <table path>
is the relative path to the table within the specified container:
[<catalog>.][<container name>.]<table path>
To specify a path to a nested table, use the following syntax:
[<catalog>.][<container name>.]"/path/to/table"
The catalog and container (schema) names are marked as optional ([]
) because you can select to configure default values for these parameters when starting the Trino CLI.
For example, the v3io
as the default catalog.
For example, following are Trino CLI queries that reference NoSQL tables in the platform's data containers:
# Query a "mytable" table in the "projects" container:
SELECT * FROM v3io.projects.mytable;
# Query a "mytable" table in the "iguazio" running-user directory of the "users" container:
SELECT * FROM v3io.users."/iguazio/mytable";
-
When using the
trino wrapper instead of the native Trino CLI, you can omit "v3io.
" from the path:SELECT * FROM projects.mytable; SELECT * FROM users."/iguazio/mytable";
-
You can use a bash table-path variable and the Trino CLI's
execute option to replace the hardcoded running-user directory name in the second example ("iguazio") with theV3IO_USERNAME
environment variable:trino_table_path="v3io.users.\"/$V3IO_USERNAME/mytable\"" trino --execute "SELECT * FROM $trino_table_path"
Following is an example of an SQL query in a Python Jupyter Notebook, which uses Trino to query a "mytable" table in the running-user directory of the "users" container:
trino_table_path = os.path.join('v3io.users."/' + os.getenv("V3IO_USERNAME") + '/mytable"')
print("SELECT * FROM " + trino_table_path)
%sql SELECT * FROM $trino_table_path
For detailed information and examples, see Using Trino, and especially the Table Paths overview and the similar Trino CLI guide that it references.
File-System Data Paths
Local File-System Data Paths
To refer to data in the platform from a local file-system command, use the predefined "v3io
" data mount:
/v3io[/<container name>][/<path to file or directory>]
To refer to the running-user directory in the "users" container, you can select to use the predefined "User
" mount to this directory:
/User/[<path to file or directory in the users/<username> directory>]
For example:
# List all data-container directories
ls /v3io
# List the contents of the "projects" container
ls /v3io/projects/
# List the contents of the "mydata" directory in the "projects" container
ls -lF /v3io/projects/mydata/
# Copy a myfile.txt file from a "mydata" directory in the "projects" container
# to the running-user directory of the "users" container for user "iguazio".
# All of the following syntax variations evaluate to the same copy command:
cp /v3io/projects/mydata/myfile.txt /v3io/users/iguazio/
cp /v3io/projects/mydata/myfile.txt /v3io/users/$V3IO_USERNAME
cp /v3io/projects/mydata/myfile.txt /v3io/$V3IO_HOME
cp /v3io/projects/mydata/myfile.txt /User
Hadoop FS File-System Data Paths
To refer to a data container or its contents from an Hadoop FS command, specify the data path as a fully qualified v3io
path of the following format:
v3io://<container name>/[<data path>]
For example:
# List the contents of the "projects" container
hadoop fs -ls v3io://projects/
# List the contents of the "mydata" directory in the "projects" container
hadoop fs -ls -lF v3io://projects/mydata/
# Copy a myfile.txt file from a "mydata" directory in the "projects" container
# to the running-user directory of the "users" container for user "iguazio"
# All of the following syntax variations evaluate to the same copy command:
hadoop fs -cp v3io://projects/mydata/myfile.txt v3io://users/iguazio/
hadoop fs -cp v3io://projects/mydata/myfile.txt v3io://users/$V3IO_USERNAME
hadoop fs -cp v3io://projects/mydata/myfile.txt v3io://$V3IO_HOME
hadoop fs -cp v3io://projects/mydata/myfile.txt $V3IO_HOME_URL
/
).
Therefore, to list the contents of a container's root directory you must end the path with a slash, as demonstrated in the examples.