Geopandas
by eyadsibai
Use when "GeoPandas", "geospatial", "GIS", "shapefile", "GeoJSON", or asking about "spatial analysis", "coordinate transformation", "spatial join", "choropleth map", "buffer analysis", "geographic data", "map visualization
Skill Details
Repository Files
1 file in this skill directory
name: geopandas description: Use when "GeoPandas", "geospatial", "GIS", "shapefile", "GeoJSON", or asking about "spatial analysis", "coordinate transformation", "spatial join", "choropleth map", "buffer analysis", "geographic data", "map visualization" version: 1.0.0
GeoPandas Geospatial Data Analysis
Python library for geospatial vector data - extends pandas with spatial operations.
When to Use
- Working with geographic/spatial data (shapefiles, GeoJSON, GeoPackage)
- Spatial analysis (buffer, intersection, spatial joins)
- Coordinate transformations and projections
- Creating choropleth maps
- Processing geographic boundaries, points, lines, polygons
Quick Start
import geopandas as gpd
# Read spatial data
gdf = gpd.read_file("data.geojson")
# Basic exploration
print(gdf.head())
print(gdf.crs) # Coordinate Reference System
print(gdf.geometry.geom_type)
# Simple plot
gdf.plot()
# Reproject to different CRS
gdf_projected = gdf.to_crs("EPSG:3857")
# Calculate area (use projected CRS)
gdf_projected['area'] = gdf_projected.geometry.area
# Save to file
gdf.to_file("output.gpkg")
Reading/Writing Data
# Read various formats
gdf = gpd.read_file("data.shp") # Shapefile
gdf = gpd.read_file("data.geojson") # GeoJSON
gdf = gpd.read_file("data.gpkg") # GeoPackage
# Read with spatial filter (faster for large files)
gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))
# Write to file
gdf.to_file("output.gpkg")
gdf.to_file("output.geojson", driver="GeoJSON")
# PostGIS database
from sqlalchemy import create_engine
engine = create_engine("postgresql://user:pass@localhost/db")
gdf = gpd.read_postgis("SELECT * FROM table", con=engine, geom_col='geom')
Coordinate Reference Systems
# Check CRS
print(gdf.crs)
# Set CRS (when metadata missing)
gdf = gdf.set_crs("EPSG:4326")
# Reproject (transforms coordinates)
gdf_projected = gdf.to_crs("EPSG:3857") # Web Mercator
gdf_projected = gdf.to_crs("EPSG:32633") # UTM zone 33N
# Common CRS codes:
# EPSG:4326 - WGS84 (lat/lon)
# EPSG:3857 - Web Mercator
# EPSG:326XX - UTM zones
Geometric Operations
# Buffer (expand/shrink geometries)
buffered = gdf.geometry.buffer(100) # 100 units buffer
# Centroid
centroids = gdf.geometry.centroid
# Simplify (reduce vertices)
simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)
# Convex hull
hull = gdf.geometry.convex_hull
# Boundary
boundary = gdf.geometry.boundary
# Area and length (use projected CRS!)
gdf['area'] = gdf.geometry.area
gdf['length'] = gdf.geometry.length
Spatial Analysis
Spatial Joins
# Join based on spatial relationship
joined = gpd.sjoin(gdf1, gdf2, predicate='intersects')
joined = gpd.sjoin(gdf1, gdf2, predicate='within')
joined = gpd.sjoin(gdf1, gdf2, predicate='contains')
# Nearest neighbor join
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
Overlay Operations
# Intersection
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
# Union
union = gpd.overlay(gdf1, gdf2, how='union')
# Difference
difference = gpd.overlay(gdf1, gdf2, how='difference')
Dissolve (Aggregate by Attribute)
# Merge geometries by attribute
dissolved = gdf.dissolve(by='region', aggfunc='sum')
Clip
# Clip data to boundary
clipped = gpd.clip(gdf, boundary_gdf)
Visualization
import matplotlib.pyplot as plt
# Basic plot
gdf.plot()
# Choropleth map
gdf.plot(column='population', cmap='YlOrRd', legend=True)
# Multi-layer map
fig, ax = plt.subplots(figsize=(10, 10))
gdf1.plot(ax=ax, color='blue', alpha=0.5)
gdf2.plot(ax=ax, color='red', alpha=0.5)
plt.savefig('map.png', dpi=300, bbox_inches='tight')
# Interactive map (requires folium)
gdf.explore(column='population', legend=True)
Common Workflows
Spatial Join and Aggregate
# Join points to polygons
points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')
# Aggregate by polygon
aggregated = points_in_polygons.groupby('index_right').agg({
'value': 'sum',
'count': 'size'
})
# Merge back to polygons
result = polygons_gdf.merge(aggregated, left_index=True, right_index=True)
Buffer Analysis
# Create buffers around points
gdf_projected = points_gdf.to_crs("EPSG:3857") # Project first!
gdf_projected['buffer'] = gdf_projected.geometry.buffer(1000) # 1km buffer
gdf_projected = gdf_projected.set_geometry('buffer')
# Find features within buffer
within_buffer = gpd.sjoin(other_gdf, gdf_projected, predicate='within')
Best Practices
- Always check CRS before spatial operations
- Use projected CRS for area/distance calculations
- Match CRS before spatial joins or overlays
- Validate geometries with
.is_validbefore operations - Use GeoPackage format over Shapefile (modern, better)
- Use
.copy()when modifying geometry to avoid side effects - Filter during read with
bboxfor large files
vs Alternatives
| Tool | Best For |
|---|---|
| GeoPandas | Vector data analysis, spatial operations |
| Rasterio | Raster data (satellite imagery, DEMs) |
| Shapely | Low-level geometry operations |
| Folium | Interactive web maps |
Resources
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
