Column Types
DJ’s type system is based on Apache Spark SQL types. This ensures compatibility with Spark-based query engines and provides a familiar type system for users working with big data platforms.
Specifying Types
How you specify types depends on the node type:
Source nodes: You can manually specify column types as strings, or DJ can automatically infer types by connecting to the external table via table reflection.
# Example: Specifying column types for a source node
columns:
- name: user_id
type: bigint
- name: username
type: string
- name: metadata
type: map<string, string>
- name: tags
type: array<string>
Transform, dimension, and metric nodes: DJ automatically parses your SQL query and infers column types based on the expressions and upstream node types. You don’t need to specify types manually.
# Example: Transform node - types are inferred from the query
query: |
SELECT
user_id, -- inferred as bigint from source
UPPER(username) AS username, -- inferred as string
CAST(created_at AS date) AS order_date -- inferred as date from CAST
FROM source.events
Using CAST is a common way to explicitly control column types in your transforms.
Primitive Types
Numeric Types
| Type | Aliases | Description |
|---|---|---|
tinyint | 8-bit signed integer | |
smallint | 16-bit signed integer | |
int | integer | 32-bit signed integer |
bigint | long | 64-bit signed integer |
float | 32-bit floating point | |
double | 64-bit floating point | |
decimal(p, s) | Fixed-precision decimal (precision and scale up to 38) |
Example:
SELECT
CAST(quantity AS int),
CAST(price AS decimal(10, 2)),
CAST(large_value AS bigint)
FROM orders
Boolean Type
| Type | Aliases | Description |
|---|---|---|
boolean | bool | True or false values |
String Types
| Type | Aliases | Description |
|---|---|---|
string | str | Arbitrary-length UTF-8 character sequence |
varchar | varchar(n) | Variable-length character string, optionally with max length |
Example:
SELECT
CAST(name AS string),
CAST(code AS varchar(10))
FROM products
Date and Time Types
| Type | Aliases | Description |
|---|---|---|
date | Calendar date (year, month, day) without time or timezone | |
time | Time of day without date or timezone (microsecond precision) | |
timestamp | datetime | Date and time without timezone (microsecond precision) |
timestamptz | Date and time with timezone, stored as UTC |
Example:
SELECT
CAST(order_date AS date),
CAST(created_at AS timestamp),
CAST(event_time AS timestamptz)
FROM events
Binary Types
| Type | Aliases | Description |
|---|---|---|
binary | byte, bytes | Arbitrary-length byte array |
Other Types
| Type | Description |
|---|---|
uuid | Universally unique identifier |
null | Represents missing or unknown values |
Complex Types
DJ supports Spark’s complex types for nested and structured data.
Struct
A struct is a collection of named fields, each with its own type. Use struct<field_name type, ...> syntax.
-- Struct type definition
struct<name string, age int, active boolean>
-- Accessing struct fields
SELECT user_info.name, user_info.age FROM users
Array
An array is an ordered collection of elements of the same type. Use array<element_type> syntax.
-- Array type definition
array<string>
array<int>
array<struct<id int, value double>>
-- Working with arrays
SELECT tags[0], SIZE(tags) FROM items
Map
A map is a collection of key-value pairs. Use map<key_type, value_type> syntax.
-- Map type definition
map<string, int>
map<string, array<double>>
-- Accessing map values
SELECT properties['color'], properties['size'] FROM products
Interval Types
Interval types represent time durations and are useful for date/time arithmetic.
| Type | Description |
|---|---|
INTERVAL DAY TO SECOND | Duration in days, hours, minutes, seconds |
INTERVAL YEAR TO MONTH | Duration in years and months |
Example:
SELECT
order_date + INTERVAL 30 DAY,
subscription_start + INTERVAL 1 YEAR
FROM subscriptions
Type Compatibility
DJ performs type checking and will report errors when incompatible types are used. Types within the same category are generally compatible:
- Numeric types: Can be compared and used in arithmetic operations together. Smaller types are promoted to larger types (e.g.,
inttobigint). - String types:
stringandvarcharare compatible. - Date/time types: Can be compared within the same category.
Type Casting
Use CAST to convert between types:
SELECT
CAST(string_column AS int),
CAST(int_column AS double),
CAST(timestamp_column AS date),
CAST(number AS string)
FROM my_table