Xlsx

by benchflow-ai

data

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and pivot tables. When Claude needs to work with spreadsheets (.xlsx files) for: (1) Creating new spreadsheets with data and formatting, (2) Reading or analyzing Excel data with pandas, (3) Creating pivot tables programmatically with openpyxl, (4) Building multi-sheet workbooks with source data and pivot table sheets, or (5) Any Excel file operations

Skill Details

Repository Files

1 file in this skill directory


name: xlsx description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and pivot tables. When Claude needs to work with spreadsheets (.xlsx files) for: (1) Creating new spreadsheets with data and formatting, (2) Reading or analyzing Excel data with pandas, (3) Creating pivot tables programmatically with openpyxl, (4) Building multi-sheet workbooks with source data and pivot table sheets, or (5) Any Excel file operations"

XLSX Creation, Editing, and Analysis

Overview

This skill covers working with Excel files using Python libraries: openpyxl for Excel-specific features (formatting, formulas, pivot tables) and pandas for data analysis.

Reading Data

With pandas (recommended for analysis)

import pandas as pd

df = pd.read_excel('file.xlsx')  # First sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # All sheets as dict
df.head()      # Preview
df.describe()  # Statistics

With openpyxl (for cell-level access)

from openpyxl import load_workbook

wb = load_workbook('file.xlsx')
ws = wb.active
value = ws['A1'].value

# Read calculated values (not formulas)
wb = load_workbook('file.xlsx', data_only=True)

Creating Excel Files

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
ws = wb.active
ws.title = "Data"

# Add data
ws['A1'] = 'Header'
ws.append(['Row', 'of', 'data'])

# Formatting
ws['A1'].font = Font(bold=True)
ws['A1'].fill = PatternFill('solid', start_color='2c3e50')

wb.save('output.xlsx')

Creating Pivot Tables

Pivot tables summarize data by grouping and aggregating. Use openpyxl's pivot table API.

CRITICAL: All pivot tables MUST use cacheId=0. Using any other cacheId (1, 2, etc.) will cause openpyxl to fail when reading the file back with KeyError. This is an openpyxl limitation.

Basic Pivot Table Structure

from openpyxl import Workbook
from openpyxl.pivot.table import TableDefinition, Location, PivotField, DataField, RowColField
from openpyxl.pivot.cache import CacheDefinition, CacheField, CacheSource, WorksheetSource, SharedItems

# 1. Create workbook with source data
wb = Workbook()
data_ws = wb.active
data_ws.title = "SourceData"

# Write your data (with headers in row 1)
data = [
    ["CategoryName", "ProductName", "Quantity", "Revenue"],
    ["Beverages", "Chai", 25, 450.00],
    ["Seafood", "Ikura", 12, 372.00],
    # ... more rows
]
for row in data:
    data_ws.append(row)

num_rows = len(data)

# 2. Create pivot table sheet
pivot_ws = wb.create_sheet("PivotAnalysis")

# 3. Define the cache (source data reference)
cache = CacheDefinition(
    cacheSource=CacheSource(
        type="worksheet",
        worksheetSource=WorksheetSource(
            ref=f"A1:D{num_rows}",  # Adjust to your data range
            sheet="SourceData"
        )
    ),
    cacheFields=[
        CacheField(name="CategoryName", sharedItems=SharedItems(count=8)),
        CacheField(name="ProductName", sharedItems=SharedItems(count=40)),
        CacheField(name="Quantity", sharedItems=SharedItems()),
        CacheField(name="Revenue", sharedItems=SharedItems()),
    ]
)

# 4. Create pivot table definition
pivot = TableDefinition(
    name="RevenueByCategory",
    cacheId=0,  # MUST be 0 for ALL pivot tables - any other value breaks openpyxl
    dataCaption="Values",
    location=Location(
        ref="A3:B10",  # Where pivot table will appear
        firstHeaderRow=1,
        firstDataRow=1,
        firstDataCol=1
    ),
)

# 5. Configure pivot fields (one for each source column)
# Fields are indexed 0, 1, 2, 3 matching cache field order

# Field 0: CategoryName - use as ROW
pivot.pivotFields.append(PivotField(axis="axisRow", showAll=False))
# Field 1: ProductName - not used (just include it)
pivot.pivotFields.append(PivotField(showAll=False))
# Field 2: Quantity - not used
pivot.pivotFields.append(PivotField(showAll=False))
# Field 3: Revenue - use as DATA (for aggregation)
pivot.pivotFields.append(PivotField(dataField=True, showAll=False))

# 6. Add row field reference (index of the field to use as rows)
pivot.rowFields.append(RowColField(x=0))  # CategoryName is field 0

# 7. Add data field with aggregation
pivot.dataFields.append(DataField(
    name="Total Revenue",
    fld=3,  # Revenue is field index 3
    subtotal="sum"  # Options: sum, count, average, max, min, product, stdDev, var
))

# 8. Attach cache and add to worksheet
pivot.cache = cache
pivot_ws._pivots.append(pivot)

wb.save('output_with_pivot.xlsx')

Common Pivot Table Configurations

Count by Category (Order Count)

# Row field: CategoryName (index 0)
# Data field: Count any column

pivot.rowFields.append(RowColField(x=0))  # CategoryName as rows
pivot.dataFields.append(DataField(
    name="Order Count",
    fld=1,  # Any field works for count
    subtotal="count"
))

Sum by Category (Total Revenue)

pivot.rowFields.append(RowColField(x=0))  # CategoryName as rows
pivot.dataFields.append(DataField(
    name="Total Revenue",
    fld=3,  # Revenue field
    subtotal="sum"
))

Two-Dimensional Pivot (Rows and Columns)

# CategoryName as rows, Quarter as columns, revenue as values
pivot.pivotFields[0] = PivotField(axis="axisRow", showAll=False)  # Category = row
pivot.pivotFields[1] = PivotField(axis="axisCol", showAll=False)  # Quarter = col
pivot.pivotFields[3] = PivotField(dataField=True, showAll=False)  # Revenue = data

pivot.rowFields.append(RowColField(x=0))  # Category
pivot.colFields.append(RowColField(x=1))  # Quarter
pivot.dataFields.append(DataField(name="Revenue", fld=3, subtotal="sum"))

Multiple Data Fields

# Show both count and sum
pivot.dataFields.append(DataField(name="Order Count", fld=2, subtotal="count"))
pivot.dataFields.append(DataField(name="Total Revenue", fld=3, subtotal="sum"))

Pivot Table Field Configuration Reference

axis value Meaning
"axisRow" Field appears as row labels
"axisCol" Field appears as column labels
"axisPage" Field is a filter/slicer
(none) Field not used for grouping
subtotal value Aggregation
"sum" Sum of values
"count" Count of items
"average" Average/mean
"max" Maximum value
"min" Minimum value
"product" Product of values
"stdDev" Standard deviation
"var" Variance

Important Notes

  1. CRITICAL - cacheId must be 0: ALL pivot tables must use cacheId=0. Using 1, 2, etc. will cause KeyError when reading the file
  2. Field indices must match: The order of cacheFields must match your source data columns
  3. Location ref: The ref in Location is approximate - Excel will adjust when opened
  4. Cache field count: For categorical fields, set count parameter to approximate number of unique values
  5. Multiple pivots: Each pivot table needs its own sheet for clarity
  6. Values populate on open: Pivot table values are calculated when the file is opened in Excel/LibreOffice, not when created

Working with Existing Excel Files

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')
ws = wb['SheetName']

# Modify
ws['A1'] = 'New Value'
ws.insert_rows(2)

# Add new sheet
new_ws = wb.create_sheet('Analysis')

wb.save('modified.xlsx')

Best Practices

  1. Use pandas for data manipulation, openpyxl for Excel features
  2. Match field indices carefully when creating pivot tables
  3. Test with small data first before scaling up
  4. Name your sheets clearly (e.g., "SourceData", "RevenueByCategory")
  5. Document your pivot table structure with comments in code

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/27/2026