Sankey Diagram Creator

by dkyazzentwatwa

data

Create interactive Sankey diagrams for flow visualization from CSV, DataFrame, or dict data. Supports node/link styling and HTML/PNG/SVG export.

Skill Details

Repository Files

3 files in this skill directory


name: sankey-diagram-creator description: Create interactive Sankey diagrams for flow visualization from CSV, DataFrame, or dict data. Supports node/link styling and HTML/PNG/SVG export.

Sankey Diagram Creator

Create interactive Sankey diagrams to visualize flows, transfers, and relationships between categories. Perfect for budget flows, energy transfers, user journeys, and data pipelines.

Quick Start

from scripts.sankey_creator import SankeyCreator

# From dictionary
sankey = SankeyCreator()
sankey.from_dict({
    'source': ['A', 'A', 'B', 'B'],
    'target': ['C', 'D', 'C', 'D'],
    'value': [10, 20, 15, 25]
})
sankey.generate().save_html("flow.html")

# From CSV
sankey = SankeyCreator()
sankey.from_csv("flows.csv", source="from", target="to", value="amount")
sankey.title("Budget Flow").generate().save_html("budget.html")

Features

  • Multiple Input Sources: Dict, DataFrame, or CSV files
  • Interactive Output: Hover tooltips with flow details
  • Node Customization: Colors, labels, positions
  • Link Styling: Colors, opacity, labels
  • Export Formats: HTML (interactive), PNG, SVG
  • Auto-Coloring: Automatic color assignment or custom schemes

API Reference

Initialization

sankey = SankeyCreator()

Data Input Methods

# From dictionary
sankey.from_dict({
    'source': ['A', 'A', 'B'],
    'target': ['X', 'Y', 'X'],
    'value': [10, 20, 15]
})

# From CSV file
sankey.from_csv(
    filepath="data.csv",
    source="source_col",
    target="target_col",
    value="value_col",
    label=None  # Optional: column for link labels
)

# From pandas DataFrame
import pandas as pd
df = pd.DataFrame({
    'from': ['A', 'B'],
    'to': ['C', 'C'],
    'amount': [100, 200]
})
sankey.from_dataframe(df, source="from", target="to", value="amount")

# Add individual flows
sankey.add_flow("Source", "Target", 50)
sankey.add_flow("Source", "Other", 30, label="30 units")

Styling Methods

All methods return self for chaining.

# Node colors
sankey.node_colors({
    'Income': '#2ecc71',
    'Expenses': '#e74c3c',
    'Savings': '#3498db'
})

# Use colormap for automatic colors
sankey.node_colors(colormap='Pastel1')

# Link colors (by source, target, or custom)
sankey.link_colors(by='source')  # Color by source node
sankey.link_colors(by='target')  # Color by target node
sankey.link_colors(opacity=0.6)  # Set link opacity

# Custom link colors
sankey.link_colors(colors={
    ('Income', 'Expenses'): '#ff6b6b',
    ('Income', 'Savings'): '#4ecdc4'
})

# Title
sankey.title("My Flow Diagram")
sankey.title("Budget Overview", font_size=20)

# Layout
sankey.layout(orientation='h')  # Horizontal (default)
sankey.layout(orientation='v')  # Vertical
sankey.layout(pad=20)  # Node padding

Generation and Export

# Generate the diagram
sankey.generate()

# Save as interactive HTML
sankey.save_html("diagram.html")
sankey.save_html("diagram.html", auto_open=True)  # Open in browser

# Save as static image
sankey.save_image("diagram.png")
sankey.save_image("diagram.svg", format='svg')
sankey.save_image("diagram.png", width=1200, height=800)

# Get Plotly figure object for customization
fig = sankey.get_figure()
fig.update_layout(...)

# Show in notebook/browser
sankey.show()

Data Format

CSV Format

source,target,value,label
Income,Housing,1500,Rent
Income,Food,600,Groceries
Income,Transport,400,Car
Income,Savings,500,Emergency Fund
Savings,Investments,300,Stocks

Dictionary Format

data = {
    'source': ['Income', 'Income', 'Income', 'Expenses'],
    'target': ['Rent', 'Food', 'Savings', 'Tax'],
    'value': [1500, 600, 500, 400],
    'label': ['Housing', 'Groceries', 'Emergency', 'Federal']  # Optional
}

DataFrame Format

import pandas as pd
df = pd.DataFrame({
    'from_node': ['A', 'A', 'B', 'C'],
    'to_node': ['B', 'C', 'D', 'D'],
    'flow_value': [100, 200, 150, 180]
})

Color Schemes

Available Colormaps

Name Description
Pastel1 Soft pastel colors
Pastel2 Muted pastels
Set1 Bold distinct colors
Set2 Muted distinct colors
Set3 Light distinct colors
Paired Paired color scheme
viridis Blue-green-yellow
plasma Purple-orange-yellow

Custom Colors

# Hex colors
sankey.node_colors({
    'Category A': '#1abc9c',
    'Category B': '#3498db',
    'Category C': '#9b59b6'
})

# Named colors
sankey.node_colors({
    'Revenue': 'green',
    'Costs': 'red',
    'Profit': 'blue'
})

CLI Usage

# Basic usage
python sankey_creator.py --input flows.csv \
    --source from --target to --value amount \
    --output flow.html

# With title and colors
python sankey_creator.py --input budget.csv \
    --source category --target subcategory --value amount \
    --title "Budget Breakdown" \
    --colormap Pastel1 \
    --output budget.html

# PNG output
python sankey_creator.py --input data.csv \
    --source src --target dst --value val \
    --output diagram.png \
    --width 1200 --height 800

CLI Arguments

Argument Description Default
--input Input CSV file Required
--source Source column name Required
--target Target column name Required
--value Value column name Required
--output Output file path sankey.html
--title Diagram title None
--colormap Color scheme Pastel1
--link-opacity Link opacity (0-1) 0.5
--orientation 'h' or 'v' h
--width Image width (PNG/SVG) 1000
--height Image height (PNG/SVG) 600

Examples

Budget Flow

sankey = SankeyCreator()
sankey.from_dict({
    'source': ['Salary', 'Salary', 'Salary', 'Salary', 'Savings', 'Investments'],
    'target': ['Housing', 'Food', 'Transport', 'Savings', 'Investments', 'Stocks'],
    'value': [2000, 800, 500, 1000, 700, 700]
})
sankey.title("Monthly Budget Flow")
sankey.node_colors({
    'Salary': '#27ae60',
    'Housing': '#e74c3c',
    'Food': '#f39c12',
    'Transport': '#3498db',
    'Savings': '#9b59b6',
    'Investments': '#1abc9c',
    'Stocks': '#34495e'
})
sankey.generate().save_html("budget_flow.html")

Energy Flow

sankey = SankeyCreator()
sankey.add_flow("Coal", "Electricity", 100)
sankey.add_flow("Gas", "Electricity", 80)
sankey.add_flow("Solar", "Electricity", 30)
sankey.add_flow("Electricity", "Industry", 120)
sankey.add_flow("Electricity", "Residential", 60)
sankey.add_flow("Electricity", "Commercial", 30)

sankey.title("Energy Flow")
sankey.node_colors(colormap='Set2')
sankey.link_colors(by='source', opacity=0.4)
sankey.generate().save_html("energy.html")

User Journey

sankey = SankeyCreator()
sankey.from_dict({
    'source': ['Landing', 'Landing', 'Product', 'Product', 'Cart', 'Cart'],
    'target': ['Product', 'Exit', 'Cart', 'Exit', 'Checkout', 'Exit'],
    'value': [1000, 200, 600, 200, 400, 100]
})
sankey.title("User Journey")
sankey.node_colors({
    'Landing': '#3498db',
    'Product': '#2ecc71',
    'Cart': '#f1c40f',
    'Checkout': '#27ae60',
    'Exit': '#e74c3c'
})
sankey.generate().save_html("journey.html")

Multi-Level Flow

# Three-level hierarchy
data = {
    'source': [
        # Level 1 -> Level 2
        'Revenue', 'Revenue', 'Revenue',
        # Level 2 -> Level 3
        'Products', 'Products', 'Services', 'Services'
    ],
    'target': [
        'Products', 'Services', 'Other',
        'Electronics', 'Software', 'Consulting', 'Support'
    ],
    'value': [500, 300, 50, 300, 200, 200, 100]
}

sankey = SankeyCreator()
sankey.from_dict(data)
sankey.title("Revenue Breakdown")
sankey.generate().save_html("revenue.html")

Dependencies

plotly>=5.15.0
pandas>=2.0.0
kaleido>=0.2.0

Limitations

  • Static image export (PNG/SVG) requires kaleido package
  • Very complex diagrams with many nodes may be hard to read
  • Node positions are auto-calculated (limited manual control)
  • Link colors can only be uniform or by source/target

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:12/16/2025