name: xlsx description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and pivot tables. When Claude needs to work with spreadsheets (.xlsx files) for: (1) Creating new spreadsheets with data and formatting, (2) Reading or analyzing Excel data with pandas, (3) Creating pivot tables programmatically with openpyxl, (4) Building multi-sheet workbooks with source data and pivot table sheets, or (5) Any Excel file operations"

XLSX Creation, Editing, and Analysis

Overview

This skill covers working with Excel files using Python libraries: openpyxl for Excel-specific features (formatting, formulas, pivot tables) and pandas for data analysis.

Reading Data

With pandas (recommended for analysis)

import pandas as pd

df = pd.read_excel('file.xlsx')  # First sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # All sheets as dict
df.head()      # Preview
df.describe()  # Statistics

With openpyxl (for cell-level access)

from openpyxl import load_workbook

wb = load_workbook('file.xlsx')
ws = wb.active
value = ws['A1'].value

# Read calculated values (not formulas)
wb = load_workbook('file.xlsx', data_only=True)

Creating Excel Files

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
ws = wb.active
ws.title = "Data"

# Add data
ws['A1'] = 'Header'
ws.append(['Row', 'of', 'data'])

# Formatting
ws['A1'].font = Font(bold=True)
ws['A1'].fill = PatternFill('solid', start_color='2c3e50')

wb.save('output.xlsx')

Creating Pivot Tables

Pivot tables summarize data by grouping and aggregating. Use openpyxl's pivot table API.

CRITICAL: All pivot tables MUST use cacheId=0. Using any other cacheId (1, 2, etc.) will cause openpyxl to fail when reading the file back with KeyError. This is an openpyxl limitation.

Basic Pivot Table Structure

from openpyxl import Workbook
from openpyxl.pivot.table import TableDefinition, Location, PivotField, DataField, RowColField
from openpyxl.pivot.cache import CacheDefinition, CacheField, CacheSource, WorksheetSource, SharedItems

# 1. Create workbook with source data
wb = Workbook()
data_ws = wb.active
data_ws.title = "SourceData"

# Write your data (with headers in row 1)
data = [
    ["CategoryName", "ProductName", "Quantity", "Revenue"],
    ["Beverages", "Chai", 25, 450.00],
    ["Seafood", "Ikura", 12, 372.00],
    # ... more rows
]
for row in data:
    data_ws.append(row)

num_rows = len(data)

# 2. Create pivot table sheet
pivot_ws = wb.create_sheet("PivotAnalysis")

# 3. Define the cache (source data reference)
cache = CacheDefinition(
    cacheSource=CacheSource(
        type="worksheet",
        worksheetSource=WorksheetSource(
            ref=f"A1:D{num_rows}",  # Adjust to your data range
            sheet="SourceData"
        )
    ),
    cacheFields=[
        CacheField(name="CategoryName", sharedItems=SharedItems(count=8)),
        CacheField(name="ProductName", sharedItems=SharedItems(count=40)),
        CacheField(name="Quantity", sharedItems=SharedItems()),
        CacheField(name="Revenue", sharedItems=SharedItems()),
    ]
)

# 4. Create pivot table definition
pivot = TableDefinition(
    name="RevenueByCategory",
    cacheId=0,  # MUST be 0 for ALL pivot tables - any other value breaks openpyxl
    dataCaption="Values",
    location=Location(
        ref="A3:B10",  # Where pivot table will appear
        firstHeaderRow=1,
        firstDataRow=1,
        firstDataCol=1
    ),
)

# 5. Configure pivot fields (one for each source column)
# Fields are indexed 0, 1, 2, 3 matching cache field order

# Field 0: CategoryName - use as ROW
pivot.pivotFields.append(PivotField(axis="axisRow", showAll=False))
# Field 1: ProductName - not used (just include it)
pivot.pivotFields.append(PivotField(showAll=False))
# Field 2: Quantity - not used
pivot.pivotFields.append(PivotField(showAll=False))
# Field 3: Revenue - use as DATA (for aggregation)
pivot.pivotFields.append(PivotField(dataField=True, showAll=False))

# 6. Add row field reference (index of the field to use as rows)
pivot.rowFields.append(RowColField(x=0))  # CategoryName is field 0

# 7. Add data field with aggregation
pivot.dataFields.append(DataField(
    name="Total Revenue",
    fld=3,  # Revenue is field index 3
    subtotal="sum"  # Options: sum, count, average, max, min, product, stdDev, var
))

# 8. Attach cache and add to worksheet
pivot.cache = cache
pivot_ws._pivots.append(pivot)

wb.save('output_with_pivot.xlsx')

Common Pivot Table Configurations

Count by Category (Order Count)

# Row field: CategoryName (index 0)
# Data field: Count any column

pivot.rowFields.append(RowColField(x=0))  # CategoryName as rows
pivot.dataFields.append(DataField(
    name="Order Count",
    fld=1,  # Any field works for count
    subtotal="count"
))

Sum by Category (Total Revenue)

pivot.rowFields.append(RowColField(x=0))  # CategoryName as rows
pivot.dataFields.append(DataField(
    name="Total Revenue",
    fld=3,  # Revenue field
    subtotal="sum"
))

Two-Dimensional Pivot (Rows and Columns)

# CategoryName as rows, Quarter as columns, revenue as values
pivot.pivotFields[0] = PivotField(axis="axisRow", showAll=False)  # Category = row
pivot.pivotFields[1] = PivotField(axis="axisCol", showAll=False)  # Quarter = col
pivot.pivotFields[3] = PivotField(dataField=True, showAll=False)  # Revenue = data

pivot.rowFields.append(RowColField(x=0))  # Category
pivot.colFields.append(RowColField(x=1))  # Quarter
pivot.dataFields.append(DataField(name="Revenue", fld=3, subtotal="sum"))

Multiple Data Fields

# Show both count and sum
pivot.dataFields.append(DataField(name="Order Count", fld=2, subtotal="count"))
pivot.dataFields.append(DataField(name="Total Revenue", fld=3, subtotal="sum"))

Pivot Table Field Configuration Reference

axis value	Meaning
`"axisRow"`	Field appears as row labels
`"axisCol"`	Field appears as column labels
`"axisPage"`	Field is a filter/slicer
(none)	Field not used for grouping

subtotal value	Aggregation
`"sum"`	Sum of values
`"count"`	Count of items
`"average"`	Average/mean
`"max"`	Maximum value
`"min"`	Minimum value
`"product"`	Product of values
`"stdDev"`	Standard deviation
`"var"`	Variance

Important Notes

CRITICAL - cacheId must be 0: ALL pivot tables must use cacheId=0. Using 1, 2, etc. will cause KeyError when reading the file
Field indices must match: The order of cacheFields must match your source data columns
Location ref: The ref in Location is approximate - Excel will adjust when opened
Cache field count: For categorical fields, set count parameter to approximate number of unique values
Multiple pivots: Each pivot table needs its own sheet for clarity
Values populate on open: Pivot table values are calculated when the file is opened in Excel/LibreOffice, not when created

Working with Existing Excel Files

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')
ws = wb['SheetName']

# Modify
ws['A1'] = 'New Value'
ws.insert_rows(2)

# Add new sheet
new_ws = wb.create_sheet('Analysis')

wb.save('modified.xlsx')

Best Practices

Use pandas for data manipulation, openpyxl for Excel features
Match field indices carefully when creating pivot tables
Test with small data first before scaling up
Name your sheets clearly (e.g., "SourceData", "RevenueByCategory")
Document your pivot table structure with comments in code

Xlsx

Skill Details

Repository Files

XLSX Creation, Editing, and Analysis

Overview

Reading Data

With pandas (recommended for analysis)

With openpyxl (for cell-level access)

Creating Excel Files

Creating Pivot Tables

Basic Pivot Table Structure

Common Pivot Table Configurations

Count by Category (Order Count)

Sum by Category (Total Revenue)

Two-Dimensional Pivot (Rows and Columns)

Multiple Data Fields

Pivot Table Field Configuration Reference

Important Notes

Working with Existing Excel Files

Best Practices

Related Skills

Xlsx

Clickhouse Io

Clickhouse Io

Analyzing Financial Statements

Data Storytelling

Kpi Dashboard Design

Dbt Transformation Patterns

Sql Optimization Patterns

Anndata

Xlsx

Skill Information