Using DJ with AI Assistants

DataJunction integrates with AI assistants like Claude, letting you explore your semantic layer, generate SQL, and query metrics through natural conversation — without writing code or crafting API requests directly.

The integration has three components that work together:

MCP tools — give Claude the ability to query your live DJ instance: search nodes, generate SQL, inspect lineage, run queries, and more. Built on MCP (Model Context Protocol), an open standard for connecting AI assistants to external tools.
DJ skill — teaches Claude about DJ concepts: node types, YAML syntax, dimension links, cube partitions, and semantic modeling best practices. Shapes how Claude reasons about DJ, not just what it can call.
DJ subagent — a Claude Code agent with the DJ skill pre-loaded, so DJ expertise is automatically available in any Claude Code session without needing to invoke it manually.

Installation

Prerequisites

Python 3.10 or higher
Access to a running DataJunction server instance
Claude Code (CLI) or Claude Desktop

Install and set up

Install the DataJunction Python client with the MCP extra:

pip install datajunction[mcp]

Then run the setup command to configure Claude Code:

dj setup-claude

This installs all three components described above:

DJ skill — adds DataJunction knowledge to Claude Code under ~/.claude/skills/datajunction/
DJ subagent — creates ~/.claude/agents/dj.md so DJ expertise is always available
MCP server config — adds dj-mcp to ~/.claude.json pointing at your DJ instance

Restart Claude Code after running to pick up the changes.

Custom DJ server URL:

DJ_URL=https://dj.yourcompany.com dj setup-claude

Selective installation (if you only want some components):

dj setup-claude --no-mcp         # Skill + subagent only
dj setup-claude --no-skills      # MCP + subagent only
dj setup-claude --no-agents      # Skill + MCP only

Claude Desktop

dj setup-claude only configures Claude Code. For Claude Desktop, add the DJ MCP server manually to your config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "datajunction": {
      "command": "dj-mcp",
      "env": {
        "DJ_API_URL": "http://localhost:8000",
        "DJ_USERNAME": "admin",
        "DJ_PASSWORD": "admin"
      }
    }
  }
}

To authenticate with a JWT token instead of username/password, use DJ_API_TOKEN:

{
  "mcpServers": {
    "datajunction": {
      "command": "dj-mcp",
      "env": {
        "DJ_API_URL": "https://dj.yourcompany.com",
        "DJ_API_TOKEN": "your-jwt-token-here"
      }
    }
  }
}

Restart Claude Desktop after saving.

Available Tools

Once configured, the following tools are available to Claude:

list_namespaces List all available namespaces with node counts. Namespaces are the primary organizational structure in DataJunction (e.g., finance.metrics, growth.dimensions).

search_nodes Search for nodes (metrics, dimensions, cubes, sources, transforms). All filters are optional and combinable. When searching git-backed namespaces, automatically resolves to main branches (e.g., namespace="finance" → "finance.main").

Parameters:

query (optional): Fragment of node name to search for (e.g., revenue)
node_type (optional): Filter by type — metric, dimension, cube, source, transform
namespace (optional): Filter by namespace (highly recommended to narrow results)
tags (optional): Filter to nodes tagged with ALL of these tag names (e.g., ["revenue", "core"])
statuses (optional): Filter by validity — ["valid"] for healthy nodes, ["invalid"] to find broken ones
mode (optional): Filter by published (production) or draft (in-progress work on a branch)
owned_by (optional): Filter to nodes owned by this username or email
has_materialization (optional): If true, return only nodes with materializations configured (default: false)
limit (optional): Maximum results (default: 100, max: 1000)
prefer_main_branch (optional): Auto-resolve to .main branches (default: true)

get_node_details Get detailed information about a specific node including its SQL definition, metadata, tags, owners, and dependencies.

Parameters:

name (required): Full node name (e.g., finance.daily_revenue)

Lineage & Dependencies

get_node_lineage Explore upstream dependencies (what this node depends on) and downstream dependencies (what depends on this node). Useful for impact analysis and understanding data flow.

Parameters:

node_name (required): Full node name
direction (optional): upstream, downstream, or both (default: both)
max_depth (optional): Maximum traversal depth

get_node_dimensions List all dimensions available for a specific node, showing which dimensions can be used for grouping/filtering.

Parameters:

node_name (required): Full node name

Analysis & Querying

get_common Bidirectional semantic compatibility lookup. Provide exactly one of metrics or dimensions:

Pass metrics → returns the dimensions shared across all of those metrics (i.e., what can I slice these metrics by?)
Pass dimensions → returns the metrics that can be queried using all of those dimensions (i.e., what can I analyze by this dimension?)

Parameters:

metrics (optional): List of metric node names
dimensions (optional): List of dimension attribute names

get_query_plan Get the query execution plan for a set of metrics, showing how DataJunction decomposes them internally. The plan includes:

Grain groups — sets of metrics that share a common dimensional grain and can be computed in a single SQL query
Components — the atomic aggregations (e.g., SUM(amount), COUNT(*)) that feed into each metric
Metric formulas — the combiner expressions that reassemble components into final metric values

Use this to understand multi-metric query structure, debug unexpected results, or validate your semantic model design.

Parameters:

metrics (required): List of metric names to analyze
dimensions (optional): Dimensions to group by — affects grain group assignment
filters (optional): SQL filter conditions
dialect (optional): Target SQL dialect (e.g., spark, trino, postgres)
use_materialized (optional): Use materialized tables when available (default: true)
include_temporal_filters (optional): Include partition filters if metrics resolve to a cube with partitions (default: false)
lookback_window (optional): Lookback window for temporal filters when include_temporal_filters is true (e.g., 7 DAY, 1 WEEK)

build_metric_sql Generate executable SQL for querying metrics with specified dimensions and filters. Returns the SQL query, output columns, and dialect.

Parameters:

metrics (required): List of metric names
dimensions (optional): List of dimensions to group by
filters (optional): SQL filter conditions
orderby (optional): Columns to order by (use full node names, e.g., finance.revenue DESC)
limit (optional): Row limit
dialect (optional): Target SQL dialect

get_metric_data Execute a query and return actual data results. Only works with materialized cubes — refuses to run expensive ad-hoc queries.

Parameters:

metrics (required): List of metric names
dimensions (optional): List of dimensions to group by
filters (optional): SQL filter conditions
orderby (optional): Columns to order by
limit (optional): Row limit (recommended)

Usage Examples

Once configured, you can ask Claude questions like:

“What namespaces are available in DataJunction?”
“Show me all published revenue metrics in the finance namespace”
“Which metrics have a materialization configured?”
“Find all invalid nodes in the growth namespace”
“What dimensions do revenue and cost metrics have in common?”
“Which metrics can I slice by common.dimensions.date.dateint?”
“Show me the query plan for finance.revenue and finance.orders together”
“Generate SQL to query daily revenue grouped by region”
“What nodes depend on the users dimension?”
“Show me actual revenue data for the last 7 days by region”

Claude will automatically use the appropriate tools to answer your questions.

Git-Backed Namespaces

Many DataJunction deployments use git branches to separate development and production nodes. Namespaces follow a pattern like:

finance.main - Production metrics
finance.feature1 - Development/experimental metrics

When you search with namespace="finance", the MCP server automatically resolves to finance.main (if it exists) to ensure you get production-ready nodes. Set prefer_main_branch=False to search all branches.

Search results show git branch information: [git: company/finance-metrics @ main]

Testing the Installation

Test your setup in Claude:

Open Claude Desktop or start Claude Code
Start a new conversation
Ask: “What namespaces are available in DataJunction?”
Claude should use the list_namespaces tool to query your DJ server

If successful, you’ll see a list of namespaces with node counts.

Troubleshooting

MCP Server Not Found

If you get a “command not found” error:

Check installation: Run which dj-mcp to verify it’s in your PATH
Use full path: Specify the absolute path to dj-mcp in the Claude config
Virtual environment: If using a venv, use the full path to the venv’s bin directory

Authentication Errors

If you get authentication errors:

Verify credentials: Test them with curl:

curl -X POST http://localhost:8000/basic/login/ \
  -d "username=admin&password=admin" \
  -H "Content-Type: application/x-www-form-urlencoded"

Check API URL: Ensure DJ_API_URL points to your running DataJunction server
Check logs: Claude Code logs are in ~/.claude/debug/latest

Connection Refused

If the MCP server can’t connect to DataJunction:

Verify DJ is running: Check that your DataJunction server is accessible
Check URL: Ensure DJ_API_URL is correct (including http:// or https://)
Network access: Verify there are no firewall rules blocking the connection

GraphQL Errors

If you see GraphQL errors in the response:

Check DJ version: Ensure your DJ server is up to date
Verify schema: The MCP server expects the latest GraphQL schema
Check server logs: Look at DJ server logs for more details

Debugging

Enable debug logging by checking Claude Code’s debug logs:

tail -f ~/.claude/debug/latest

The MCP server also writes its own debug log to ~/.dj_mcp_debug.log.

Architecture

The MCP server is built on:

Server Core (datajunction/mcp/server.py): MCP protocol implementation using the official Python SDK
Tools (datajunction/mcp/tools.py): Business logic for each tool, communicating with DJ’s GraphQL API
Formatters (datajunction/mcp/formatters.py): Converts GraphQL responses to AI-friendly text
CLI (datajunction/mcp/cli.py): Command-line interface for starting the server

The server runs as a separate process from the DJ API server, communicating via stdin/stdout with Claude and via GraphQL with DataJunction.

Uninstalling

To remove the DataJunction MCP server:

pip uninstall datajunction

Then remove the datajunction entry from your Claude configuration file.

Support

Documentation: DataJunction Docs
GitHub Issues: Report issues
Source Code: GitHub Repository

Using DJ with AI Assistants

Installation #

Prerequisites #

Install and set up #

Claude Desktop #

Available Tools #

Discovery & Navigation #

Lineage & Dependencies #

Analysis & Querying #

Usage Examples #

Git-Backed Namespaces #

Testing the Installation #

Troubleshooting #

MCP Server Not Found #

Authentication Errors #

Connection Refused #

GraphQL Errors #

Debugging #

Architecture #

Uninstalling #

Support #

Installation

Prerequisites

Install and set up

Claude Desktop

Available Tools

Discovery & Navigation

Lineage & Dependencies

Analysis & Querying

Usage Examples

Git-Backed Namespaces

Testing the Installation

Troubleshooting

MCP Server Not Found

Authentication Errors

Connection Refused

GraphQL Errors

Debugging

Architecture

Uninstalling

Support