Using the DJ CLI
The DataJunction CLI (dj) provides a command-line interface for managing your DataJunction deployment, nodes, and namespaces.
Installation
The CLI is installed automatically with the DataJunction Python client:
pip install datajunction
Configuration
Authentication
Set your DJ credentials using environment variables:
export DJ_URL="http://localhost:8000"
export DJ_USER="your-username"
export DJ_PWD="your-password"
Available Commands
Node Inspection
dj describe <node-name>
Display detailed information about a node including its type, description, query, columns, and metadata.
Usage:
# Describe a node in text format
dj describe default.num_repair_orders
# Output as JSON
dj describe default.num_repair_orders --format json
Options:
--format <text|json>: Output format (default: text)
Example output:
============================================================
Node: default.num_repair_orders
============================================================
Type: metric
Description: Number of repair orders
Status: valid
Mode: published
Query:
------------------------------------------------------------
SELECT COUNT(repair_order_id) FROM default.repair_orders
============================================================
dj lineage <node-name>
Show lineage (upstream and downstream dependencies) for a node.
Usage:
# Show both upstream and downstream dependencies
dj lineage default.num_repair_orders
# Show only upstream dependencies
dj lineage default.num_repair_orders --direction upstream
# Show only downstream dependencies
dj lineage default.num_repair_orders --direction downstream
# Output as JSON
dj lineage default.num_repair_orders --format json
Options:
--direction <upstream|downstream|both>: Direction of lineage to show (default: both)--format <text|json>: Output format (default: text)
dj dimensions <node-name>
List all available dimensions for a node (particularly useful for metrics).
Usage:
# Show available dimensions for a metric
dj dimensions default.num_repair_orders
# Output as JSON
dj dimensions default.num_repair_orders --format json
Options:
--format <text|json>: Output format (default: text)
Example output:
============================================================
Available dimensions for: default.num_repair_orders
============================================================
• default.hard_hat.city
Type: string
Node: default.hard_hat
Path: default.repair_orders → default.hard_hat
• default.hard_hat.state
Type: string
Node: default.hard_hat
Path: default.repair_orders → default.hard_hat
• default.dispatcher.company_name
Type: string
Node: default.dispatcher
Path: default.repair_orders → default.dispatcher
Total: 15 dimensions
============================================================
Use case: When you want to query a metric with specific dimensions, use this command to see all available dimension attributes you can group by or filter on.
Listing Objects
dj list <type>
List various types of objects in DJ (namespaces, metrics, dimensions, cubes, sources, transforms, nodes).
Usage:
# List all metrics
dj list metrics
# List metrics in a specific namespace
dj list metrics --namespace default
# List all namespaces
dj list namespaces
# Output as JSON
dj list metrics --format json
Supported types:
namespaces: List all namespacesmetrics: List metric nodesdimensions: List dimension nodescubes: List cube nodessources: List source nodestransforms: List transform nodesnodes: List all nodes
Options:
--namespace <name>: Filter by namespace (or prefix for namespaces)--format <text|json>: Output format (default: text)
SQL Generation
dj sql <node-name>
Generate SQL for a node with optional dimensions and filters.
Usage:
# Generate SQL for a metric
dj sql default.num_repair_orders
# Generate SQL with dimensions
dj sql default.num_repair_orders --dimensions default.hard_hat.city,default.hard_hat.state
# Generate SQL with filters
dj sql default.num_repair_orders \
--dimensions default.hard_hat.city \
--filters "default.hard_hat.state = 'CA'"
Options:
--dimensions <dim1,dim2,...>: Comma-separated list of dimensions--filters <filter1,filter2,...>: Comma-separated list of filters--engine <name>: Engine name--engine-version <version>: Engine version
Data Retrieval
dj data
Fetch actual query results for a node or metrics with optional dimensions and filters.
Usage:
# Fetch data for a single metric node
dj data default.num_repair_orders
# Fetch data for multiple metrics
dj data --metrics default.num_repair_orders default.avg_repair_price
# Fetch data with dimensions
dj data --metrics default.avg_repair_price --dimensions default.hard_hat.city
# Fetch data with filters
dj data --metrics default.avg_repair_price \
--dimensions default.hard_hat.city \
--filters "default.hard_hat.state = 'CA'"
# Limit the number of rows returned
dj data --metrics default.avg_repair_price --limit 100
# Output in different formats
dj data --metrics default.avg_repair_price --format json
dj data --metrics default.avg_repair_price --format csv
Options:
--metrics <metric1> <metric2> ...: One or more metrics to query--dimensions <dim1> <dim2> ...: Dimensions to group by--filters <filter1> <filter2> ...: Filter expressions--limit <number>: Maximum rows to return (default: 1000)--format <table|json|csv>: Output format (default: table)--engine <name>: Engine name--engine-version <version>: Engine version
Example output (table format):
╭──────────────────────────┬───────────────────╮
│ default.hard_hat.city │ avg_repair_price │
├──────────────────────────┼───────────────────┤
│ Jersey City │ 54751.10 │
│ Billerica │ 38277.67 │
│ Southgate │ 33625.85 │
╰──────────────────────────┴───────────────────╯
3 row(s)
--limit to adjust this.Query Planning
dj plan
Show the query execution plan for a set of metrics, including how metrics will be computed and which tables/cubes will be used.
Usage:
# Show execution plan for metrics
dj plan --metrics default.num_repair_orders default.avg_repair_price
# Show plan with dimensions
dj plan --metrics default.avg_repair_price --dimensions default.hard_hat.city
# Show plan with filters
dj plan --metrics default.avg_repair_price \
--filters "default.hard_hat.state = 'CA'"
# Specify target dialect
dj plan --metrics default.avg_repair_price --dialect spark
# Output as JSON
dj plan --metrics default.avg_repair_price --format json
Options:
--metrics <metric1> <metric2> ...: Metrics to include in the plan--dimensions <dim1> <dim2> ...: Dimensions to group by--filters <filter1> <filter2> ...: Filter expressions--dialect <spark|trino|druid>: Target SQL dialect--format <text|json>: Output format (default: text)
Example output:
============================================================
Query Execution Plan
============================================================
Dialect: spark
Requested Dimensions: default.hard_hat.city
Grain Groups (1)
------------------------------------------------------------
Group 1: default.repair_order_details
Grain: [default.hard_hat.city]
Aggregability: full
Metrics: default.avg_repair_price
Components:
• price_count: COUNT(price)
• price_sum: SUM(price)
Metric Formulas (1)
------------------------------------------------------------
• default.avg_repair_price = SUM(price_sum) / SUM(price_count)
============================================================
Use case: Use this command to understand how DJ will execute a query before running it. This is helpful for debugging performance issues or understanding which materialized cubes will be used.
Deployments
dj push <directory>
Push node YAML definitions from a local directory to the DJ server. Supports dry-run mode for impact analysis before deploying.
Usage:
# Push all YAML files in a directory
dj push ./my-nodes
# Override namespace in YAML files
dj push ./my-nodes --namespace my.custom.namespace
# Dry run to preview changes without applying them
dj push ./my-nodes --dryrun
# Dry run with JSON output for programmatic use
dj push ./my-nodes --dryrun --format json
Options:
--namespace <name>: Override the namespace specified in YAML files--dryrun: Preview changes without applying them (shows impact analysis)--format <text|json>: Output format for dry run (default: text)--repo <url>: Git repository URL for deployment tracking--branch <name>: Git branch name for deployment tracking--commit <sha>: Git commit SHA for deployment tracking--ci-system <name>: CI system name (e.g., ‘github_actions’, ‘jenkins’)--ci-run-url <url>: URL to the CI run/build
Example:
dj push ./metrics --namespace production.analytics
Example dry run output:
Analyzing deployment from: ./my-nodes
Impact Analysis
============================================================
Changes to Apply:
------------------------------------------------------------
CREATE production.metrics.new_metric (metric)
UPDATE production.metrics.existing_metric (metric)
Downstream Impact:
------------------------------------------------------------
production.metrics.existing_metric
↳ Downstream nodes affected: 3
• production.dashboards.sales_cube
• production.reports.daily_summary
• production.alerts.threshold_metric
============================================================
dj push --dryrun first to preview changes before deploying to production namespaces.dj pull <namespace> <directory>
Pull (export) nodes from a namespace to YAML files in a local directory.
Usage:
# Export all nodes from default namespace
dj pull default ./exported-nodes
# Export from a specific namespace
dj pull production.metrics ./metrics-backup
Example:
# Backup all nodes in production namespace
dj pull production ./backups/production-$(date +%Y%m%d)
Node Management
dj delete-node <node-name>
Delete (deactivate) or permanently remove a node.
Usage:
# Soft delete (deactivate) a node
dj delete-node default.old_metric
# Hard delete (permanently remove) a node
dj delete-node default.old_metric --hard
Options:
--hard: Permanently delete the node (use with extreme caution)
Example:
# Deactivate a deprecated metric
dj delete-node default.deprecated_metric
# Permanently remove a test node
dj delete-node staging.test_metric --hard
djbuilder.restore_node(). Hard delete is permanent and cannot be undone.Namespace Management
dj delete-namespace <namespace>
Delete (deactivate) or permanently remove a namespace.
Usage:
# Soft delete (deactivate) a namespace
dj delete-namespace old_project
# Delete namespace and all contained nodes
dj delete-namespace old_project --cascade
# Hard delete (permanently remove)
dj delete-namespace old_project --hard --cascade
Options:
--cascade: Delete all nodes within the namespace--hard: Permanently delete the namespace (use with extreme caution)
Example:
# Remove a test environment namespace
dj delete-namespace test.environment --cascade
# Permanently clean up an old project
dj delete-namespace archived.old_project --hard --cascade
--cascade with --hard will permanently delete the namespace and all its nodes. This cannot be undone.Troubleshooting
Authentication Issues
If you see authentication errors:
# Verify environment variables are set
echo $DJ_USER
echo $DJ_PWD
# Re-export with correct values
export DJ_USER="your-username"
export DJ_PWD="your-password"
Connection Issues
If CLI cannot connect to DJ server:
# Check server URL
echo $DJ_URL
# Verify server is running
curl $DJ_URL/health
# Update URL if needed
export DJ_URL="http://your-dj-server:8000"
Getting Help
# Show all available commands
dj --help
# Show help for specific command
dj push --help
dj delete-node --help