Alignment Level Qc
by BIsnake2001
Calculates technical mapping statistics for any aligned BAM file (ChIP or ATAC). It assesses the performance of the aligner itself by generating metrics on read depth, mapping quality, error rates, and read group data using samtools and Picard.Use this skill to check "how well the reads mapped" or to validate BAM formatting/sorting before further processing. Do NOT use this skill for biological signal validation (like checking for peaks or open chromatin) or for filtering/removing reads.
Skill Details
Repository Files
3 files in this skill directory
name: alignment-level-QC description: Calculates technical mapping statistics for any aligned BAM file (ChIP or ATAC). It assesses the performance of the aligner itself by generating metrics on read depth, mapping quality, error rates, and read group data using samtools and Picard.Use this skill to check "how well the reads mapped" or to validate BAM formatting/sorting before further processing. Do NOT use this skill for biological signal validation (like checking for peaks or open chromatin) or for filtering/removing reads.
Alignment Quality Control for ChIP-seq/ATAC-seq
Overview
Perform comprehensive preliminary alignment-level quality control for ChIP-seq and ATAC-seq BAM files using samtools, Picard, and MultiQC.
Main steps include:
- Initialize the project directory.
- Refer to the Inputs & Outputs section to check inputs and build the output architecture. All the output file should located in
${proj_dir}in Step 0. - Sort and add read groups if missing in the BAM file.
- Run preliminary QC metrics
- Generate MultiQC report
When to use this skill
- Use skill when you want to perform alignment-level quality control for ChIP-seq or ATAC-seq BAM files.
Inputs & Outputs
Inputs
.bam
Outputs
alignment_qc/
${sample}.bam # Original input
${sample}.sorted.bam # (Optional) Created if sorting was needed
${sample}.RG.bam # (Optional) Created if RG was needed
${sample}.RG.bam.bai # Index file
qc_results/
${sample}.flagstat.txt
${sample}.stats.txt
${sample}.insertsize_metrics.txt
${sample}.dup_metrics.txt
alignment_qc_report.html # Visual MultiQC report
qc_summary.txt # Pass/Warn/Fail table
temp/
${sample}.markdup.bam # Intermediate file (safe to delete later)
...
Decision Tree
Step 0: Initialize Project
Call:
mcp__project-init-tools__project_init
with:
sample: alltask: alignment_qc
The tool will:
- Create
${sample}_alignment_qcdirectory. - Return the full path of the
${sample}_alignment_qcdirectory, which will be used as${proj_dir}.
Step 1: Check and Fix BAMs
- Ensure all BAM files are coordinate-sorted, have Read Groups, and are indexed. This tool will skip files that are already correct and only create temporary files when fixes are needed.
Call:
- mcp__qc-tools__check_and_fix_bams
with:
bam_files: List of BAM files to process.temp_dir: ${proj_dir}/temp
Step 2: Run Alignment QC Metrics
Call:
- mcp__qc-tools__run_bam_qc
with:
bam_files: List of BAM files to process.qc_dir: ${proj_dir}/qc_resultstemp_dir: ${proj_dir}/temp
Step 3: Generate Summary Report
Call:
- mcp__qc-tools__generate_qc_report
with:
qc_dir: ${proj_dir}/qc_results
Quality Assessment
Key QC Metrics
- Total reads – overall sequencing depth
- Mapped reads (%) – alignment efficiency
- Properly paired (%) – valid pair fraction (paired-end)
- Duplicate rate (%) – PCR duplication estimate
- Mitochondrial reads (%) – mitochondrial contamination
- Insert size distribution – fragment length profile
All metrics are derived from samtools/Picard and summarized by MultiQC.
Quality Thresholds
| Category | Criteria | Interpretation |
|---|---|---|
| Pass | All metrics within recommended thresholds | Suitable for downstream analysis |
| Warn | One or more borderline metrics | Likely acceptable; review recommended |
| Fail | Critical metrics outside acceptable ranges | Re-sequencing or reprocessing suggested |
Report Generation
After MultiQC completes, generate a sample-wise summary (PASS/WARN/FAIL) per thresholds in references/qc_metrics.md and save it as:
qc_results/qc_summary.txt
Resources
Use references/qc_metrics.md for:
- Metric definitions and recommended thresholds
- Troubleshooting guidance
- Readiness criteria for peak calling
- Pointers to ENCODE/nf-core QC standards
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
