Using DJ with AI Assistants
DataJunction integrates with AI assistants like Claude, letting you explore your semantic layer, generate SQL, and query metrics through natural conversation — without writing code or crafting API requests directly.
The integration has three components that work together:
- MCP tools — give Claude the ability to query your live DJ instance: search nodes, generate SQL, inspect lineage, run queries, and more. Built on MCP (Model Context Protocol), an open standard for connecting AI assistants to external tools.
- DJ skill — teaches Claude about DJ concepts: node types, YAML syntax, dimension links, cube partitions, and semantic modeling best practices. Shapes how Claude reasons about DJ, not just what it can call.
- DJ subagent — a Claude Code agent with the DJ skill pre-loaded, so DJ expertise is automatically available in any Claude Code session without needing to invoke it manually.
Installation
Prerequisites
- Python 3.10 or higher
- Access to a running DataJunction server instance
- Claude Code (CLI) or Claude Desktop
Install and set up
Install the DataJunction Python client with the MCP extra:
pip install datajunction[mcp]
Then run the setup command to configure Claude Code:
dj setup-claude
This installs all three components described above:
- DJ skill — adds DataJunction knowledge to Claude Code under
~/.claude/skills/datajunction/ - DJ subagent — creates
~/.claude/agents/dj.mdso DJ expertise is always available - MCP server config — adds
dj-mcpto~/.claude.jsonpointing at your DJ instance
Restart Claude Code after running to pick up the changes.
Custom DJ server URL:
DJ_URL=https://dj.yourcompany.com dj setup-claude
Selective installation (if you only want some components):
dj setup-claude --no-mcp # Skill + subagent only
dj setup-claude --no-skills # MCP + subagent only
dj setup-claude --no-agents # Skill + MCP only
Claude Desktop
dj setup-claude only configures Claude Code. For Claude Desktop, add the DJ MCP server manually to your config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"datajunction": {
"command": "dj-mcp",
"env": {
"DJ_API_URL": "http://localhost:8000",
"DJ_USERNAME": "admin",
"DJ_PASSWORD": "admin"
}
}
}
}
To authenticate with a JWT token instead of username/password, use DJ_API_TOKEN:
{
"mcpServers": {
"datajunction": {
"command": "dj-mcp",
"env": {
"DJ_API_URL": "https://dj.yourcompany.com",
"DJ_API_TOKEN": "your-jwt-token-here"
}
}
}
}
Restart Claude Desktop after saving.
Available Tools
Once configured, the following tools are available to Claude:
Discovery & Navigation
list_namespaces
List all available namespaces with node counts. Namespaces are the primary organizational structure in DataJunction (e.g., finance.metrics, growth.dimensions).
search_nodes
Search for nodes (metrics, dimensions, cubes, sources, transforms). All filters are optional and combinable. When searching git-backed namespaces, automatically resolves to main branches (e.g., namespace="finance" → "finance.main").
Parameters:
query(optional): Fragment of node name to search for (e.g.,revenue)node_type(optional): Filter by type —metric,dimension,cube,source,transformnamespace(optional): Filter by namespace (highly recommended to narrow results)tags(optional): Filter to nodes tagged with ALL of these tag names (e.g.,["revenue", "core"])statuses(optional): Filter by validity —["valid"]for healthy nodes,["invalid"]to find broken onesmode(optional): Filter bypublished(production) ordraft(in-progress work on a branch)owned_by(optional): Filter to nodes owned by this username or emailhas_materialization(optional): Iftrue, return only nodes with materializations configured (default:false)limit(optional): Maximum results (default: 100, max: 1000)prefer_main_branch(optional): Auto-resolve to.mainbranches (default:true)
get_node_details
Get detailed information about a specific node including its SQL definition, metadata, tags, owners, and dependencies.
Parameters:
name(required): Full node name (e.g.,finance.daily_revenue)
Lineage & Dependencies
get_node_lineage
Explore upstream dependencies (what this node depends on) and downstream dependencies (what depends on this node). Useful for impact analysis and understanding data flow.
Parameters:
node_name(required): Full node namedirection(optional):upstream,downstream, orboth(default:both)max_depth(optional): Maximum traversal depth
get_node_dimensions
List all dimensions available for a specific node, showing which dimensions can be used for grouping/filtering.
Parameters:
node_name(required): Full node name
Analysis & Querying
get_common
Bidirectional semantic compatibility lookup. Provide exactly one of metrics or dimensions:
- Pass
metrics→ returns the dimensions shared across all of those metrics (i.e., what can I slice these metrics by?) - Pass
dimensions→ returns the metrics that can be queried using all of those dimensions (i.e., what can I analyze by this dimension?)
Parameters:
metrics(optional): List of metric node namesdimensions(optional): List of dimension attribute names
get_query_plan
Get the query execution plan for a set of metrics, showing how DataJunction decomposes them internally. The plan includes:
- Grain groups — sets of metrics that share a common dimensional grain and can be computed in a single SQL query
- Components — the atomic aggregations (e.g.,
SUM(amount),COUNT(*)) that feed into each metric - Metric formulas — the combiner expressions that reassemble components into final metric values
Use this to understand multi-metric query structure, debug unexpected results, or validate your semantic model design.
Parameters:
metrics(required): List of metric names to analyzedimensions(optional): Dimensions to group by — affects grain group assignmentfilters(optional): SQL filter conditionsdialect(optional): Target SQL dialect (e.g.,spark,trino,postgres)use_materialized(optional): Use materialized tables when available (default:true)include_temporal_filters(optional): Include partition filters if metrics resolve to a cube with partitions (default:false)lookback_window(optional): Lookback window for temporal filters wheninclude_temporal_filtersistrue(e.g.,7 DAY,1 WEEK)
build_metric_sql
Generate executable SQL for querying metrics with specified dimensions and filters. Returns the SQL query, output columns, and dialect.
Parameters:
metrics(required): List of metric namesdimensions(optional): List of dimensions to group byfilters(optional): SQL filter conditionsorderby(optional): Columns to order by (use full node names, e.g.,finance.revenue DESC)limit(optional): Row limitdialect(optional): Target SQL dialect
get_metric_data
Execute a query and return actual data results. Only works with materialized cubes — refuses to run expensive ad-hoc queries.
Parameters:
metrics(required): List of metric namesdimensions(optional): List of dimensions to group byfilters(optional): SQL filter conditionsorderby(optional): Columns to order bylimit(optional): Row limit (recommended)
Usage Examples
Once configured, you can ask Claude questions like:
- “What namespaces are available in DataJunction?”
- “Show me all published revenue metrics in the finance namespace”
- “Which metrics have a materialization configured?”
- “Find all invalid nodes in the growth namespace”
- “What dimensions do revenue and cost metrics have in common?”
- “Which metrics can I slice by
common.dimensions.date.dateint?” - “Show me the query plan for
finance.revenueandfinance.orderstogether” - “Generate SQL to query daily revenue grouped by region”
- “What nodes depend on the users dimension?”
- “Show me actual revenue data for the last 7 days by region”
Claude will automatically use the appropriate tools to answer your questions.
Git-Backed Namespaces
Many DataJunction deployments use git branches to separate development and production nodes. Namespaces follow a pattern like:
finance.main- Production metricsfinance.feature1- Development/experimental metrics
When you search with namespace="finance", the MCP server automatically resolves to finance.main (if it exists) to ensure you get production-ready nodes. Set prefer_main_branch=False to search all branches.
Search results show git branch information: [git: company/finance-metrics @ main]
Testing the Installation
Test your setup in Claude:
- Open Claude Desktop or start Claude Code
- Start a new conversation
- Ask: “What namespaces are available in DataJunction?”
- Claude should use the
list_namespacestool to query your DJ server
If successful, you’ll see a list of namespaces with node counts.
Troubleshooting
MCP Server Not Found
If you get a “command not found” error:
- Check installation: Run
which dj-mcpto verify it’s in your PATH - Use full path: Specify the absolute path to
dj-mcpin the Claude config - Virtual environment: If using a venv, use the full path to the venv’s bin directory
Authentication Errors
If you get authentication errors:
Verify credentials: Test them with curl:
curl -X POST http://localhost:8000/basic/login/ \ -d "username=admin&password=admin" \ -H "Content-Type: application/x-www-form-urlencoded"Check API URL: Ensure
DJ_API_URLpoints to your running DataJunction serverCheck logs: Claude Code logs are in
~/.claude/debug/latest
Connection Refused
If the MCP server can’t connect to DataJunction:
- Verify DJ is running: Check that your DataJunction server is accessible
- Check URL: Ensure
DJ_API_URLis correct (including http:// or https://) - Network access: Verify there are no firewall rules blocking the connection
GraphQL Errors
If you see GraphQL errors in the response:
- Check DJ version: Ensure your DJ server is up to date
- Verify schema: The MCP server expects the latest GraphQL schema
- Check server logs: Look at DJ server logs for more details
Debugging
Enable debug logging by checking Claude Code’s debug logs:
tail -f ~/.claude/debug/latest
The MCP server also writes its own debug log to ~/.dj_mcp_debug.log.
Architecture
The MCP server is built on:
- Server Core (
datajunction/mcp/server.py): MCP protocol implementation using the official Python SDK - Tools (
datajunction/mcp/tools.py): Business logic for each tool, communicating with DJ’s GraphQL API - Formatters (
datajunction/mcp/formatters.py): Converts GraphQL responses to AI-friendly text - CLI (
datajunction/mcp/cli.py): Command-line interface for starting the server
The server runs as a separate process from the DJ API server, communicating via stdin/stdout with Claude and via GraphQL with DataJunction.
Uninstalling
To remove the DataJunction MCP server:
pip uninstall datajunction
Then remove the datajunction entry from your Claude configuration file.
Support
- Documentation: DataJunction Docs
- GitHub Issues: Report issues
- Source Code: GitHub Repository