Bio Tcr Bcr Analysis Repertoire Visualization
by GPTomics
Create publication-quality visualizations of immune repertoire data including circos plots, clone tracking, diversity plots, and network graphs. Use when generating figures for repertoire comparisons, clonal dynamics, or V(D)J gene usage.
Skill Details
Repository Files
3 files in this skill directory
name: bio-tcr-bcr-analysis-repertoire-visualization description: Create publication-quality visualizations of immune repertoire data including circos plots, clone tracking, diversity plots, and network graphs. Use when generating figures for repertoire comparisons, clonal dynamics, or V(D)J gene usage. tool_type: mixed primary_tool: VDJtools
Repertoire Visualization
Circos Plots (V-J Gene Usage)
VDJtools
# Generate V-J usage circos plot
vdjtools PlotFancyVJUsage \
-m metadata.txt \
output_dir/
# Generates PDF circos plots showing V-J pairing frequencies
Python with pyCircos
import pandas as pd
import matplotlib.pyplot as plt
from pycircos import Gcircle
def plot_vj_circos(clone_df):
'''Create circos plot of V-J usage'''
# Count V-J pairs
vj_counts = clone_df.groupby(['v_gene', 'j_gene']).size().reset_index(name='count')
# Create circos
circle = Gcircle()
# Add arcs for each V and J gene
v_genes = vj_counts['v_gene'].unique()
j_genes = vj_counts['j_gene'].unique()
# Add sectors and links
# ... (complex setup)
circle.save('vj_circos.pdf')
R with circlize
library(circlize)
plot_vj_circos <- function(clone_df) {
# Prepare adjacency matrix
vj_matrix <- table(clone_df$v_gene, clone_df$j_gene)
# Create circos plot
chordDiagram(
vj_matrix,
transparency = 0.5,
annotationTrack = c("grid", "name")
)
}
Clone Tracking Over Time
import pandas as pd
import matplotlib.pyplot as plt
def plot_clone_tracking(clones_by_time, top_n=10):
'''Track top clones across timepoints'''
# Get top clones by total frequency
total_freq = clones_by_time.groupby('cdr3_aa')['frequency'].sum()
top_clones = total_freq.nlargest(top_n).index
fig, ax = plt.subplots(figsize=(10, 6))
for clone in top_clones:
clone_data = clones_by_time[clones_by_time['cdr3_aa'] == clone]
ax.plot(clone_data['timepoint'], clone_data['frequency'],
marker='o', label=clone[:20])
ax.set_xlabel('Timepoint')
ax.set_ylabel('Clone Frequency')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.savefig('clone_tracking.pdf')
Diversity Plots
import matplotlib.pyplot as plt
import seaborn as sns
def plot_diversity_comparison(diversity_df, metric='shannon'):
'''Compare diversity between groups'''
fig, ax = plt.subplots(figsize=(8, 6))
sns.boxplot(
data=diversity_df,
x='condition',
y=metric,
ax=ax
)
sns.stripplot(
data=diversity_df,
x='condition',
y=metric,
color='black',
alpha=0.5,
ax=ax
)
ax.set_ylabel(f'{metric.capitalize()} Diversity')
plt.savefig('diversity_comparison.pdf')
Overlap Heatmap
def plot_overlap_heatmap(overlap_matrix):
'''Plot pairwise repertoire overlap'''
import seaborn as sns
fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(
overlap_matrix,
annot=True,
fmt='.2f',
cmap='YlOrRd',
ax=ax
)
ax.set_title('Repertoire Overlap (Jaccard Index)')
plt.tight_layout()
plt.savefig('overlap_heatmap.pdf')
Spectratype Plot
def plot_spectratype(clone_df, group_col=None):
'''Plot CDR3 length distribution'''
fig, ax = plt.subplots(figsize=(10, 6))
clone_df['cdr3_length'] = clone_df['cdr3_nt'].str.len()
if group_col:
for group, data in clone_df.groupby(group_col):
ax.hist(data['cdr3_length'], bins=range(20, 80, 3),
alpha=0.5, label=group, density=True)
ax.legend()
else:
ax.hist(clone_df['cdr3_length'], bins=range(20, 80, 3))
ax.set_xlabel('CDR3 Length (nt)')
ax.set_ylabel('Density')
ax.set_title('CDR3 Length Distribution (Spectratype)')
plt.savefig('spectratype.pdf')
Clonotype Network
import networkx as nx
def plot_clone_network(clone_df, similarity_threshold=0.8):
'''Create network of similar clonotypes'''
from Levenshtein import ratio
G = nx.Graph()
clones = clone_df['cdr3_aa'].unique()
# Add nodes
for clone in clones:
freq = clone_df[clone_df['cdr3_aa'] == clone]['frequency'].sum()
G.add_node(clone, size=freq)
# Add edges for similar clones
for i, c1 in enumerate(clones):
for c2 in clones[i+1:]:
sim = ratio(c1, c2)
if sim >= similarity_threshold:
G.add_edge(c1, c2, weight=sim)
# Draw network
fig, ax = plt.subplots(figsize=(12, 12))
pos = nx.spring_layout(G)
sizes = [G.nodes[n]['size'] * 1000 for n in G.nodes()]
nx.draw(G, pos, node_size=sizes, with_labels=False, ax=ax)
plt.savefig('clone_network.pdf')
Related Skills
- vdjtools-analysis - Generate input data
- mixcr-analysis - Generate clonotype tables
- data-visualization - General plotting concepts
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
