Everyrow Sdk
by futuresearch
Helps write Python code using the everyrow SDK for AI-powered data processing - transforming, deduping, merging, ranking, and screening dataframes with natural language instructions
Skill Details
Repository Files
1 file in this skill directory
name: everyrow-sdk description: Helps write Python code using the everyrow SDK for AI-powered data processing - transforming, deduping, merging, ranking, and screening dataframes with natural language instructions
everyrow SDK
The everyrow SDK provides intelligent data processing utilities powered by AI agents. Use this skill when writing Python code that needs to:
- Rank/score rows based on qualitative criteria
- Deduplicate data using semantic understanding
- Merge tables using AI-powered matching
- Screen/filter rows based on research-intensive criteria
- Run AI agents over dataframe rows
Installation
pip install everyrow
Configuration
Before writing any everyrow code, check if EVERYROW_API_KEY is set. If not, prompt the user:
everyrow requires an API key. Do you have one?
- If yes, paste it here
- If no, get one at https://everyrow.io/api-key and paste it back
Once the user provides the key, set it:
export EVERYROW_API_KEY=<their_key>
Results
All operations return a result object. The data is available as a pandas DataFrame in result.data:
result = await rank(...)
print(result.data.head()) # pandas DataFrame
Operations
For quick one-off operations, sessions are created automatically.
rank - Score and rank rows
Score rows based on criteria you can't put in a database field:
from everyrow.ops import rank
result = await rank(
task="Score by likelihood to need data integration solutions",
input=leads_dataframe,
field_name="integration_need_score",
)
print(result.data.head())
dedupe - Deduplicate data
Remove duplicates using AI-powered semantic matching. The AI understands that "AbbVie Inc", "Abbvie", and "AbbVie Pharmaceutical" are the same company:
from everyrow.ops import dedupe
result = await dedupe(
input=crm_data,
equivalence_relation="Two entries are duplicates if they represent the same legal entity",
)
print(result.data.head())
Results include equivalence_class_id (groups duplicates), equivalence_class_name (human-readable cluster name), and selected (the canonical record in each cluster).
merge - Merge tables with AI matching
Join two tables when the keys don't match exactly. The AI knows "Photoshop" belongs to "Adobe" and "Genentech" is a Roche subsidiary:
from everyrow.ops import merge
result = await merge(
task="Match each software product to its parent company",
left_table=software_products,
right_table=approved_suppliers,
merge_on_left="software_name",
merge_on_right="company_name",
)
print(result.data.head())
screen - Evaluate and filter rows
Filter rows based on criteria that require research:
from everyrow.ops import screen
from pydantic import BaseModel, Field
class ScreenResult(BaseModel):
passes: bool = Field(description="True if company meets the criteria")
result = await screen(
task="""
Find companies with >75% recurring revenue that would benefit from
Taiwan tensions - CHIPS Act beneficiaries, defense contractors,
cybersecurity firms. Exclude companies dependent on Taiwan manufacturing.
""",
input=sp500_companies,
response_model=ScreenResult,
)
print(result.data.head())
single_agent - Single input task
Run an AI agent on a single input:
from everyrow.ops import single_agent
result = await single_agent(
task="What is the capital of the given country?",
input={"country": "India"},
)
print(result.data.head())
agent_map - Batch processing
Run an AI agent across multiple rows:
from everyrow.ops import agent_map
from pandas import DataFrame
result = await agent_map(
task="What is the capital of the given country?",
input=DataFrame([{"country": "India"}, {"country": "USA"}]),
)
print(result.data.head())
Explicit Sessions
For multiple operations or when you need visibility into progress, use an explicit session:
from everyrow import create_session
async with create_session(name="My Session") as session:
print(f"View session at: {session.get_url()}")
# All operations here share the same session
Sessions are visible on the everyrow.io dashboard.
Async Operations
All operations have _async variants for background processing. These need an explicit session since the task persists beyond the function call:
from everyrow import create_session
from everyrow.ops import rank_async
async with create_session(name="Async Ranking") as session:
task = await rank_async(
session=session,
task="Score this organization",
input=dataframe,
field_name="score",
)
print(f"Task ID: {task.task_id}") # Print this! Useful if your script crashes.
# Continue with other work...
result = await task.await_result()
Tip: Print the task ID after submitting. If your script crashes, you can fetch the result later using fetch_task_data:
from everyrow import fetch_task_data
# Recover results from a crashed script
df = await fetch_task_data("12345678-1234-1234-1234-123456789abc")
Best Practices
Everyrow operations have associated costs. To avoid re-running them unnecessarily:
- Separate data processing from analysis: Save everyrow results to a file (CSV, Parquet, etc.), then do analysis in a separate script. This way, if analysis code has bugs, you don't re-trigger the everyrow step.
- Use intermediate checkpoints: For multi-step pipelines, consider saving results after each everyrow operation.
- You are able to chain multiple operations together without needing to download and re-upload intermediate results via the SDK. However for most control, implement each step as a dedicated job, possibly orchestrated by tools such as Apache Airflow or Prefect.
- Test with
preview=True: Operations likerank,screen, andmergesupportpreview=Trueto process only a few rows first.
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
