Data Aggregation

by dglowacki

data

Aggregate and merge data from multiple sources including App Store sales, GitHub commits, Skillz events, and more. Use when combining data for reports, dashboards, or analysis.

Skill Details

Repository Files

5 files in this skill directory


name: data-aggregation description: Aggregate and merge data from multiple sources including App Store sales, GitHub commits, Skillz events, and more. Use when combining data for reports, dashboards, or analysis.

Data Aggregation

Tools for aggregating, transforming, and merging data from multiple sources.

Quick Start

Aggregate App Store sales:

python scripts/aggregate_sales.py --input sales_reports/ --output aggregated.json

Aggregate GitHub commits:

python scripts/aggregate_commits.py --input commits.json --period week --output summary.json

Merge multiple sources:

python scripts/merge_sources.py --sources app_store.json github.json skillz.json --output combined.json

Aggregation Types

1. Time-Based Aggregation

Group data by time periods (day, week, month).

Example: Daily sales totals

from aggregate_sales import aggregate_by_time

# Input: List of sales records
sales = [
    {"date": "2026-01-14", "revenue": 123.45, "units": 5},
    {"date": "2026-01-14", "revenue": 67.89, "units": 3},
    {"date": "2026-01-15", "revenue": 234.56, "units": 8}
]

# Output: Aggregated by day
result = aggregate_by_time(sales, period='day')
# {
#     "2026-01-14": {"revenue": 191.34, "units": 8},
#     "2026-01-15": {"revenue": 234.56, "units": 8}
# }

2. Entity-Based Aggregation

Group data by entities (apps, users, repos, etc.).

Example: Per-app metrics

from aggregate_sales import aggregate_by_entity

sales = [
    {"app": "App A", "revenue": 100, "units": 5},
    {"app": "App A", "revenue": 50, "units": 2},
    {"app": "App B", "revenue": 200, "units": 10}
]

result = aggregate_by_entity(sales, entity_field='app')
# {
#     "App A": {"revenue": 150, "units": 7},
#     "App B": {"revenue": 200, "units": 10}
# }

3. Statistical Aggregation

Calculate statistics (sum, avg, min, max, percentiles).

Example: Commit statistics

from aggregate_commits import calculate_stats

commits = [
    {"author": "John", "lines": 125},
    {"author": "Jane", "lines": 87},
    {"author": "John", "lines": 43}
]

result = calculate_stats(commits, group_by='author', metric='lines')
# {
#     "John": {"sum": 168, "avg": 84, "min": 43, "max": 125, "count": 2},
#     "Jane": {"sum": 87, "avg": 87, "min": 87, "max": 87, "count": 1}
# }

Data Sources

App Store Sales

Input format (TSV from App Store Connect):

Provider	Provider Country	SKU	Developer	Title	Version	Product Type Identifier	Units	Developer Proceeds	Begin Date	End Date	Customer Currency	Country Code	Currency of Proceeds	Apple Identifier	Customer Price	Promo Code	Parent Identifier	Subscription	Period	Category	CMB	Device	Supported Platforms	Proceeds Reason	Preserved Pricing	Client

Aggregated output:

{
  "period": "2026-01-14",
  "apps": {
    "com.example.app": {
      "name": "My App",
      "downloads": 1234,
      "revenue": 567.89,
      "updates": 45,
      "countries": ["US", "CA", "UK"]
    }
  },
  "totals": {
    "total_downloads": 5678,
    "total_revenue": 2345.67,
    "total_apps": 5
  }
}

GitHub Commits

Input format (from GitHub API):

[
  {
    "sha": "abc123",
    "author": {"name": "John Doe", "email": "john@example.com"},
    "commit": {
      "message": "Add feature X",
      "author": {"date": "2026-01-14T10:30:00Z"}
    },
    "stats": {"additions": 125, "deletions": 45}
  }
]

Aggregated output:

{
  "period": "week",
  "date_range": "2026-01-07 to 2026-01-14",
  "summary": {
    "total_commits": 45,
    "total_contributors": 5,
    "total_lines": 2345,
    "total_files": 123
  },
  "by_author": {
    "John Doe": {
      "commits": 15,
      "lines_added": 1234,
      "lines_deleted": 456,
      "files_changed": 45
    }
  },
  "by_day": {
    "2026-01-14": {"commits": 8, "lines": 567}
  }
}

Skillz Events

Input format (from Skillz Developer Portal):

{
  "event_id": "888831",
  "name": "Winter Tournament",
  "status": "active",
  "start_date": "2026-01-10",
  "end_date": "2026-01-20",
  "prize_pool": 1000,
  "entries": 234
}

Aggregated output:

{
  "period": "active",
  "summary": {
    "total_events": 8,
    "total_prize_pool": 8000,
    "total_entries": 1234
  },
  "by_status": {
    "active": {"count": 5, "prize_pool": 5000},
    "completed": {"count": 3, "prize_pool": 3000}
  }
}

Aggregation Scripts

aggregate_sales.py

Aggregate App Store sales data.

Usage:

python scripts/aggregate_sales.py \
    --input sales_reports/ \
    --period week \
    --group-by app \
    --output aggregated.json

Arguments:

  • --input: Input directory or file (TSV/JSON)
  • --period: Time period (day, week, month)
  • --group-by: Grouping field (app, country, category)
  • --output: Output JSON file

aggregate_commits.py

Aggregate GitHub commit data.

Usage:

python scripts/aggregate_commits.py \
    --input commits.json \
    --period week \
    --metrics lines,files,commits \
    --output summary.json

Arguments:

  • --input: Input JSON file (commits array)
  • --period: Time period (day, week, month)
  • --metrics: Metrics to calculate (comma-separated)
  • --output: Output JSON file

aggregate_events.py

Aggregate Skillz event data.

Usage:

python scripts/aggregate_events.py \
    --input events/ \
    --status active,completed \
    --output summary.json

Arguments:

  • --input: Input directory with event JSON files
  • --status: Filter by status (comma-separated)
  • --output: Output JSON file

merge_sources.py

Merge data from multiple sources.

Usage:

python scripts/merge_sources.py \
    --sources app_store.json github.json skillz.json \
    --strategy combine \
    --output combined.json

Arguments:

  • --sources: Space-separated list of JSON files
  • --strategy: Merge strategy (combine, average, latest)
  • --output: Output JSON file

Merge strategies:

  • combine: Combine all data (keep all fields)
  • average: Average numeric fields
  • latest: Keep latest values (by timestamp)

Data Transformations

Filtering

from aggregate_sales import filter_data

sales = [...]

# Filter by country
us_sales = filter_data(sales, country='US')

# Filter by date range
recent_sales = filter_data(sales, start_date='2026-01-01', end_date='2026-01-14')

# Filter by value
high_revenue = filter_data(sales, min_revenue=100)

Grouping

from aggregate_commits import group_data

commits = [...]

# Group by author
by_author = group_data(commits, group_by='author')

# Group by repository
by_repo = group_data(commits, group_by='repository')

# Group by date
by_date = group_data(commits, group_by='date', period='day')

Sorting

from merge_sources import sort_data

data = [...]

# Sort by revenue (descending)
sorted_data = sort_data(data, field='revenue', reverse=True)

# Sort by date (ascending)
sorted_data = sort_data(data, field='date')

Integration with Agents

Reporting Agent

# Aggregate App Store sales
from aggregate_sales import aggregate_sales

sales_data = appstore_client.get_sales_report(days=7)
aggregated = aggregate_sales(sales_data, period='day', group_by='app')

# Use for report
html = render_template('appstore-metrics', aggregated)

Automation Agent

# Aggregate GitHub commits
from aggregate_commits import aggregate_commits

commits = github_client.get_commits(repo='owner/repo', days=7)
summary = aggregate_commits(commits, period='week')

# Create ClickUp task if high activity
if summary['total_commits'] > 50:
    clickup_client.create_task(
        title='High GitHub Activity',
        description=f"Total commits: {summary['total_commits']}"
    )

Examples

See examples/ directory for:

  • sample_sales_aggregation.json - App Store sales example
  • sample_commit_aggregation.json - GitHub commits example
  • sample_multi_source_merge.json - Multi-source merge example

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/14/2026