Column Types

DJ’s type system is based on Apache Spark SQL types. This ensures compatibility with Spark-based query engines and provides a familiar type system for users working with big data platforms.

Specifying Types

How you specify types depends on the node type:

Source nodes: You can manually specify column types as strings, or DJ can automatically infer types by connecting to the external table via table reflection.

# Example: Specifying column types for a source node
columns:
  - name: user_id
    type: bigint
  - name: username
    type: string
  - name: metadata
    type: map<string, string>
  - name: tags
    type: array<string>

Transform, dimension, and metric nodes: DJ automatically parses your SQL query and infers column types based on the expressions and upstream node types. You don’t need to specify types manually.

# Example: Transform node - types are inferred from the query
query: |
  SELECT
    user_id,                                -- inferred as bigint from source
    UPPER(username) AS username,            -- inferred as string
    CAST(created_at AS date) AS order_date  -- inferred as date from CAST
  FROM source.events  

Using CAST is a common way to explicitly control column types in your transforms.

Primitive Types

Numeric Types

TypeAliasesDescription
tinyint8-bit signed integer
smallint16-bit signed integer
intinteger32-bit signed integer
bigintlong64-bit signed integer
float32-bit floating point
double64-bit floating point
decimal(p, s)Fixed-precision decimal (precision and scale up to 38)

Example:

SELECT
    CAST(quantity AS int),
    CAST(price AS decimal(10, 2)),
    CAST(large_value AS bigint)
FROM orders

Boolean Type

TypeAliasesDescription
booleanboolTrue or false values

String Types

TypeAliasesDescription
stringstrArbitrary-length UTF-8 character sequence
varcharvarchar(n)Variable-length character string, optionally with max length

Example:

SELECT
    CAST(name AS string),
    CAST(code AS varchar(10))
FROM products

Date and Time Types

TypeAliasesDescription
dateCalendar date (year, month, day) without time or timezone
timeTime of day without date or timezone (microsecond precision)
timestampdatetimeDate and time without timezone (microsecond precision)
timestamptzDate and time with timezone, stored as UTC

Example:

SELECT
    CAST(order_date AS date),
    CAST(created_at AS timestamp),
    CAST(event_time AS timestamptz)
FROM events

Binary Types

TypeAliasesDescription
binarybyte, bytesArbitrary-length byte array

Other Types

TypeDescription
uuidUniversally unique identifier
nullRepresents missing or unknown values

Complex Types

DJ supports Spark’s complex types for nested and structured data.

Struct

A struct is a collection of named fields, each with its own type. Use struct<field_name type, ...> syntax.

-- Struct type definition
struct<name string, age int, active boolean>

-- Accessing struct fields
SELECT user_info.name, user_info.age FROM users

Array

An array is an ordered collection of elements of the same type. Use array<element_type> syntax.

-- Array type definition
array<string>
array<int>
array<struct<id int, value double>>

-- Working with arrays
SELECT tags[0], SIZE(tags) FROM items

Map

A map is a collection of key-value pairs. Use map<key_type, value_type> syntax.

-- Map type definition
map<string, int>
map<string, array<double>>

-- Accessing map values
SELECT properties['color'], properties['size'] FROM products

Interval Types

Interval types represent time durations and are useful for date/time arithmetic.

TypeDescription
INTERVAL DAY TO SECONDDuration in days, hours, minutes, seconds
INTERVAL YEAR TO MONTHDuration in years and months

Example:

SELECT
    order_date + INTERVAL 30 DAY,
    subscription_start + INTERVAL 1 YEAR
FROM subscriptions

Type Compatibility

DJ performs type checking and will report errors when incompatible types are used. Types within the same category are generally compatible:

  • Numeric types: Can be compared and used in arithmetic operations together. Smaller types are promoted to larger types (e.g., int to bigint).
  • String types: string and varchar are compatible.
  • Date/time types: Can be compared within the same category.

Type Casting

Use CAST to convert between types:

SELECT
    CAST(string_column AS int),
    CAST(int_column AS double),
    CAST(timestamp_column AS date),
    CAST(number AS string)
FROM my_table