NoSQL Table Schema Reference
Overview
To support reading and writing NoSQL data using structured-data interfaces — such as Spark DataFrames, Trino, and Frames) — the platform uses a schema file that defines the schema of the data structure.
When writing NoSQL data in the platform using a Spark or Frames DataFrame, the schema of the data table is automatically identified and saved and then retrieved when using a structure-data interface to read data from the same table (unless you explicitly define the schema for the read operation).
However, to use a structure-data interface to read NoSQL data that was not written in this manner, you first need to define the table schema.
The schema is stored as a JSON file (
-
Spark — do one of the following as part of a NoSQL Spark DataFrame read operation. For more information, see Defining the Table Schema in the Spark NoSQL DataFrame reference:
- Use the custom
inferSchema option to infer the schema (recommended). - Define the schema programmatically.
NoteProgrammatically created table schemas don't support range-scan or even-distribution table queries.
- Use the custom
-
Trino — use the custom
v3io.schema.infer Trino CLI command to generate a schema file. For more information, see Defining the NoSQL Table Schema in the Trino reference. -
Frames — use the
infer_schema orinfer command of the NoSQL backend's client method to generate a schema file.
The Item-Attributes Schema Object ('fields')
The NoSQL-table schema JSON file contains a
- name
The name of the attribute (column). For example,
"id"
or"age"
.- Type: String
- type
The attribute's data type (i.e., the type of the data that is stored in the column). The type can be one of the following string values —
"boolean"
,"double"
,"long"
,"null"
,"string"
, or"timestamp"
. The platform implicitly converts integer and short values to long values ("long"
) and floating-point values to double-precision values ("double"
).- Type: String in the schema file; Spark SQL data type when defining the schema programmatically using a Spark DataFrame
Spark DataFrame Programmatic Schema Definition NoteWhen defining the table shcema programmatically as part of a Spark DataFrame read operation, use the Spark SQL data types that match the supported schema-file attribute types (such asStringType for"string"
orLongType for"long"
). When writing the data to the NoSQL table, the platform will translate the Spark data types into the relevant attribute data types and perform any necessary type conversions.- nullable
Indicates whether the value is "nullable". If
true
, the attribute value can be null.- Type: Boolean
The Item-Key Schema Objects ('key' and 'sortingKey')
The NoSQL-table schema JSON file contains a
- key
The name of the table's sharding-key attribute, which together with the sorting-key attribute (
sortingKey ), if defined, determines the primary-key values of the table items. For example,"id"
.- Type: String
- sortingKey
The name of the table's sorting-key attribute, if defined, which together with the sharding-key attribute (
key ) determines the primary-key values of the table items. For example,"date"
.- Type: String
See Also
- Trino reference — Defining the NoSQL Table Schema
- Spark NoSQL DataFrame reference — Defining the Table Schema
- Object Names and Primary Keys
- Spark DataFrame Data Types