Using the DJ CLI

The DataJunction CLI (dj) provides a command-line interface for managing your DataJunction deployment, nodes, and namespaces.

Installation

The CLI is installed automatically with the DataJunction Python client:

pip install datajunction

Configuration

Authentication

Set your DJ credentials using environment variables:

export DJ_URL="http://localhost:8000"
export DJ_USER="your-username"
export DJ_PWD="your-password"

Available Commands

Node Inspection

`dj describe <node-name>`

Display detailed information about a node including its type, description, query, columns, and metadata.

Usage:

# Describe a node in text format
dj describe default.num_repair_orders

# Output as JSON
dj describe default.num_repair_orders --format json

Options:

--format <text|json>: Output format (default: text)

Example output:

============================================================
Node: default.num_repair_orders
============================================================
Type:         metric
Description:  Number of repair orders
Status:       valid
Mode:         published

Query:
------------------------------------------------------------
SELECT COUNT(repair_order_id) FROM default.repair_orders
============================================================

`dj lineage <node-name>`

Show lineage (upstream and downstream dependencies) for a node.

Usage:

# Show both upstream and downstream dependencies
dj lineage default.num_repair_orders

# Show only upstream dependencies
dj lineage default.num_repair_orders --direction upstream

# Show only downstream dependencies
dj lineage default.num_repair_orders --direction downstream

# Output as JSON
dj lineage default.num_repair_orders --format json

Options:

--direction <upstream|downstream|both>: Direction of lineage to show (default: both)
--format <text|json>: Output format (default: text)

`dj dimensions <node-name>`

List all available dimensions for a node (particularly useful for metrics).

Usage:

# Show available dimensions for a metric
dj dimensions default.num_repair_orders

# Output as JSON
dj dimensions default.num_repair_orders --format json

Options:

--format <text|json>: Output format (default: text)

Example output:

============================================================
Available dimensions for: default.num_repair_orders
============================================================

  • default.hard_hat.city
    Type: string
    Node: default.hard_hat
    Path: default.repair_orders → default.hard_hat

  • default.hard_hat.state
    Type: string
    Node: default.hard_hat
    Path: default.repair_orders → default.hard_hat

  • default.dispatcher.company_name
    Type: string
    Node: default.dispatcher
    Path: default.repair_orders → default.dispatcher

Total: 15 dimensions

============================================================

Use case: When you want to query a metric with specific dimensions, use this command to see all available dimension attributes you can group by or filter on.

Listing Objects

`dj list <type>`

List various types of objects in DJ (namespaces, metrics, dimensions, cubes, sources, transforms, nodes).

Usage:

# List all metrics
dj list metrics

# List metrics in a specific namespace
dj list metrics --namespace default

# List all namespaces
dj list namespaces

# Output as JSON
dj list metrics --format json

Supported types:

namespaces: List all namespaces
metrics: List metric nodes
dimensions: List dimension nodes
cubes: List cube nodes
sources: List source nodes
transforms: List transform nodes
nodes: List all nodes

Options:

--namespace <name>: Filter by namespace (or prefix for namespaces)
--format <text|json>: Output format (default: text)

SQL Generation

`dj sql <node-name>`

Generate SQL for a node with optional dimensions and filters.

Usage:

# Generate SQL for a metric
dj sql default.num_repair_orders

# Generate SQL with dimensions
dj sql default.num_repair_orders --dimensions default.hard_hat.city,default.hard_hat.state

# Generate SQL with filters
dj sql default.num_repair_orders \
  --dimensions default.hard_hat.city \
  --filters "default.hard_hat.state = 'CA'"

Options:

--dimensions <dim1,dim2,...>: Comma-separated list of dimensions
--filters <filter1,filter2,...>: Comma-separated list of filters
--engine <name>: Engine name
--engine-version <version>: Engine version

Data Retrieval

`dj data`

Fetch actual query results for a node or metrics with optional dimensions and filters.

Usage:

# Fetch data for a single metric node
dj data default.num_repair_orders

# Fetch data for multiple metrics
dj data --metrics default.num_repair_orders default.avg_repair_price

# Fetch data with dimensions
dj data --metrics default.avg_repair_price --dimensions default.hard_hat.city

# Fetch data with filters
dj data --metrics default.avg_repair_price \
  --dimensions default.hard_hat.city \
  --filters "default.hard_hat.state = 'CA'"

# Limit the number of rows returned
dj data --metrics default.avg_repair_price --limit 100

# Output in different formats
dj data --metrics default.avg_repair_price --format json
dj data --metrics default.avg_repair_price --format csv

Options:

--metrics <metric1> <metric2> ...: One or more metrics to query
--dimensions <dim1> <dim2> ...: Dimensions to group by
--filters <filter1> <filter2> ...: Filter expressions
--limit <number>: Maximum rows to return (default: 1000)
--format <table|json|csv>: Output format (default: table)
--engine <name>: Engine name
--engine-version <version>: Engine version

Example output (table format):

╭──────────────────────────┬───────────────────╮
│ default.hard_hat.city    │ avg_repair_price  │
├──────────────────────────┼───────────────────┤
│ Jersey City              │ 54751.10          │
│ Billerica                │ 38277.67          │
│ Southgate                │ 33625.85          │
╰──────────────────────────┴───────────────────╯

3 row(s)

💡

Queries default to a limit of 1000 rows to prevent accidentally running unbounded queries on large tables. Use --limit to adjust this.

Query Planning

`dj plan`

Show the query execution plan for a set of metrics, including how metrics will be computed and which tables/cubes will be used.

Usage:

# Show execution plan for metrics
dj plan --metrics default.num_repair_orders default.avg_repair_price

# Show plan with dimensions
dj plan --metrics default.avg_repair_price --dimensions default.hard_hat.city

# Show plan with filters
dj plan --metrics default.avg_repair_price \
  --filters "default.hard_hat.state = 'CA'"

# Specify target dialect
dj plan --metrics default.avg_repair_price --dialect spark

# Output as JSON
dj plan --metrics default.avg_repair_price --format json

Options:

--metrics <metric1> <metric2> ...: Metrics to include in the plan
--dimensions <dim1> <dim2> ...: Dimensions to group by
--filters <filter1> <filter2> ...: Filter expressions
--dialect <spark|trino|druid>: Target SQL dialect
--format <text|json>: Output format (default: text)

Example output:

============================================================
Query Execution Plan
============================================================
Dialect: spark
Requested Dimensions: default.hard_hat.city

Grain Groups (1)
------------------------------------------------------------
Group 1: default.repair_order_details
  Grain: [default.hard_hat.city]
  Aggregability: full
  Metrics: default.avg_repair_price

  Components:
    • price_count: COUNT(price)
    • price_sum: SUM(price)

Metric Formulas (1)
------------------------------------------------------------
  • default.avg_repair_price = SUM(price_sum) / SUM(price_count)

============================================================

Use case: Use this command to understand how DJ will execute a query before running it. This is helpful for debugging performance issues or understanding which materialized cubes will be used.

Deployments

`dj push <directory>`

Push node YAML definitions from a local directory to the DJ server. Supports dry-run mode for impact analysis before deploying.

Usage:

# Push all YAML files in a directory
dj push ./my-nodes

# Override namespace in YAML files
dj push ./my-nodes --namespace my.custom.namespace

# Dry run to preview changes without applying them
dj push ./my-nodes --dryrun

# Dry run with JSON output for programmatic use
dj push ./my-nodes --dryrun --format json

Options:

--namespace <name>: Override the namespace specified in YAML files
--dryrun: Preview changes without applying them (shows impact analysis)
--format <text|json>: Output format for dry run (default: text)
--repo <url>: Git repository URL for deployment tracking
--branch <name>: Git branch name for deployment tracking
--commit <sha>: Git commit SHA for deployment tracking
--ci-system <name>: CI system name (e.g., ‘github_actions’, ‘jenkins’)
--ci-run-url <url>: URL to the CI run/build

Example:

dj push ./metrics --namespace production.analytics

Example dry run output:

Analyzing deployment from: ./my-nodes

Impact Analysis
============================================================

Changes to Apply:
------------------------------------------------------------
  CREATE  production.metrics.new_metric (metric)
  UPDATE  production.metrics.existing_metric (metric)

Downstream Impact:
------------------------------------------------------------
  production.metrics.existing_metric
    ↳ Downstream nodes affected: 3
      • production.dashboards.sales_cube
      • production.reports.daily_summary
      • production.alerts.threshold_metric

============================================================

💡

Always run dj push --dryrun first to preview changes before deploying to production namespaces.

`dj pull <namespace> <directory>`

Pull (export) nodes from a namespace to YAML files in a local directory.

Usage:

# Export all nodes from default namespace
dj pull default ./exported-nodes

# Export from a specific namespace
dj pull production.metrics ./metrics-backup

Example:

# Backup all nodes in production namespace
dj pull production ./backups/production-$(date +%Y%m%d)

Node Management

`dj delete-node <node-name>`

Delete (deactivate) or permanently remove a node.

Usage:

# Soft delete (deactivate) a node
dj delete-node default.old_metric

# Hard delete (permanently remove) a node
dj delete-node default.old_metric --hard

Options:

--hard: Permanently delete the node (use with extreme caution)

Example:

# Deactivate a deprecated metric
dj delete-node default.deprecated_metric

# Permanently remove a test node
dj delete-node staging.test_metric --hard

⚠️

Soft delete (deactivate) is reversible via the Python API using djbuilder.restore_node(). Hard delete is permanent and cannot be undone.

Namespace Management

`dj delete-namespace <namespace>`

Delete (deactivate) or permanently remove a namespace.

Usage:

# Soft delete (deactivate) a namespace
dj delete-namespace old_project

# Delete namespace and all contained nodes
dj delete-namespace old_project --cascade

# Hard delete (permanently remove)
dj delete-namespace old_project --hard --cascade

Options:

--cascade: Delete all nodes within the namespace
--hard: Permanently delete the namespace (use with extreme caution)

Example:

# Remove a test environment namespace
dj delete-namespace test.environment --cascade

# Permanently clean up an old project
dj delete-namespace archived.old_project --hard --cascade

⚠️

Using --cascade with --hard will permanently delete the namespace and all its nodes. This cannot be undone.

Troubleshooting

Authentication Issues

If you see authentication errors:

# Verify environment variables are set
echo $DJ_USER
echo $DJ_PWD

# Re-export with correct values
export DJ_USER="your-username"
export DJ_PWD="your-password"

Connection Issues

If CLI cannot connect to DJ server:

# Check server URL
echo $DJ_URL

# Verify server is running
curl $DJ_URL/health

# Update URL if needed
export DJ_URL="http://your-dj-server:8000"

Getting Help

# Show all available commands
dj --help

# Show help for specific command
dj push --help
dj delete-node --help

Using the DJ CLI

Installation #

Configuration #

Authentication #

Available Commands #

Node Inspection #

dj describe <node-name> #

dj lineage <node-name> #

dj dimensions <node-name> #

Listing Objects #

dj list <type> #

SQL Generation #

dj sql <node-name> #

Data Retrieval #

dj data #

Query Planning #

dj plan #

Deployments #

dj push <directory> #

dj pull <namespace> <directory> #

Node Management #

dj delete-node <node-name> #

Namespace Management #

dj delete-namespace <namespace> #

Troubleshooting #

Authentication Issues #

Connection Issues #

Getting Help #

Installation

Configuration

Authentication

Available Commands

Node Inspection

`dj describe <node-name>`

`dj lineage <node-name>`

`dj dimensions <node-name>`

Listing Objects

`dj list <type>`

SQL Generation

`dj sql <node-name>`

Data Retrieval

`dj data`

Query Planning

`dj plan`

Deployments

`dj push <directory>`

`dj pull <namespace> <directory>`

Node Management

`dj delete-node <node-name>`

Namespace Management

`dj delete-namespace <namespace>`

Troubleshooting

Authentication Issues

Connection Issues

Getting Help