Grafana Dashboards

by fgarofalo56

data

Build monitoring dashboards with Grafana. Covers panel types, queries, variables, alerting, provisioning, and data sources like Prometheus and InfluxDB. Use for infrastructure monitoring, observability, and metrics visualization.

Skill Details

Repository Files

1 file in this skill directory


name: grafana-dashboards description: Build monitoring dashboards with Grafana. Covers panel types, queries, variables, alerting, provisioning, and data sources like Prometheus and InfluxDB. Use for infrastructure monitoring, observability, and metrics visualization.

Grafana Dashboards

Build powerful monitoring and observability dashboards.

Instructions

  1. Start with key metrics - CPU, memory, latency, error rates
  2. Use consistent time ranges - All panels should sync
  3. Add context with variables - Filter by environment, service, host
  4. Set up alerts - Proactive monitoring, not reactive
  5. Use templates - Consistent dashboard styling

Dashboard Structure

Dashboard JSON

{
  "dashboard": {
    "id": null,
    "uid": "my-dashboard",
    "title": "Service Overview",
    "tags": ["production", "service-name"],
    "timezone": "browser",
    "schemaVersion": 39,
    "version": 1,
    "refresh": "30s",
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "templating": {
      "list": []
    },
    "panels": [],
    "annotations": {
      "list": []
    }
  }
}

Panel Types

// Time series (line chart)
{
  "type": "timeseries",
  "title": "Request Rate",
  "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 },
  "fieldConfig": {
    "defaults": {
      "unit": "reqps",
      "custom": {
        "lineWidth": 2,
        "fillOpacity": 10,
        "gradientMode": "opacity"
      }
    }
  },
  "targets": [
    {
      "expr": "rate(http_requests_total{job=\"$job\"}[5m])",
      "legendFormat": "{{method}} {{status}}"
    }
  ]
}

// Stat panel (single value)
{
  "type": "stat",
  "title": "Total Requests",
  "gridPos": { "x": 0, "y": 0, "w": 4, "h": 4 },
  "options": {
    "colorMode": "value",
    "graphMode": "area",
    "justifyMode": "auto",
    "orientation": "auto",
    "reduceOptions": {
      "calcs": ["lastNotNull"],
      "fields": "",
      "values": false
    }
  },
  "targets": [
    {
      "expr": "sum(http_requests_total{job=\"$job\"})",
      "legendFormat": ""
    }
  ]
}

// Gauge
{
  "type": "gauge",
  "title": "CPU Usage",
  "fieldConfig": {
    "defaults": {
      "unit": "percent",
      "min": 0,
      "max": 100,
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "color": "green", "value": null },
          { "color": "yellow", "value": 70 },
          { "color": "red", "value": 90 }
        ]
      }
    }
  }
}

// Table
{
  "type": "table",
  "title": "Top Endpoints",
  "transformations": [
    {
      "id": "sortBy",
      "options": {
        "fields": {},
        "sort": [{ "field": "Value", "desc": true }]
      }
    }
  ]
}

Prometheus Queries (PromQL)

Basic Queries

# Instant rate (requests per second)
rate(http_requests_total[5m])

# Sum by label
sum by (status_code) (rate(http_requests_total[5m]))

# Average latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m])) * 100

# CPU usage percentage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) /
node_memory_MemTotal_bytes * 100

# Disk usage
(node_filesystem_size_bytes - node_filesystem_avail_bytes) /
node_filesystem_size_bytes * 100

Aggregation & Filtering

# Filter by label
http_requests_total{job="api", environment="production"}

# Regex match
http_requests_total{path=~"/api/v[0-9]+/.*"}

# Not equal
http_requests_total{status!="200"}

# Rate over time window
rate(metric[5m])   # 5 minute rate
irate(metric[5m])  # Instant rate (last 2 points)

# Aggregations
sum(metric)              # Total
avg(metric)              # Average
max(metric)              # Maximum
min(metric)              # Minimum
count(metric)            # Count of series
topk(5, metric)          # Top 5 series
bottomk(5, metric)       # Bottom 5 series

# Group by label
sum by (instance) (metric)
avg without (instance) (metric)

Variables (Templating)

{
  "templating": {
    "list": [
      {
        "name": "datasource",
        "type": "datasource",
        "query": "prometheus",
        "current": {},
        "hide": 0
      },
      {
        "name": "environment",
        "type": "query",
        "datasource": "${datasource}",
        "query": "label_values(up, environment)",
        "refresh": 1,
        "multi": false,
        "includeAll": true,
        "allValue": ".*"
      },
      {
        "name": "instance",
        "type": "query",
        "datasource": "${datasource}",
        "query": "label_values(up{environment=\"$environment\"}, instance)",
        "refresh": 2,
        "multi": true,
        "includeAll": true
      },
      {
        "name": "interval",
        "type": "interval",
        "options": [
          { "selected": false, "text": "1m", "value": "1m" },
          { "selected": true, "text": "5m", "value": "5m" },
          { "selected": false, "text": "15m", "value": "15m" }
        ]
      }
    ]
  }
}

Usage in queries:

rate(http_requests_total{environment=~"$environment", instance=~"$instance"}[$interval])

Alerting

Alert Rule

{
  "alert": "HighErrorRate",
  "expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) > 0.05",
  "for": "5m",
  "labels": {
    "severity": "critical"
  },
  "annotations": {
    "summary": "High error rate detected",
    "description": "Error rate is {{ $value | humanizePercentage }} for the last 5 minutes"
  }
}

Grafana Alerting (v8+)

# provisioning/alerting/alerts.yaml
apigroups:
  - name: service-alerts
    folder: Alerts
    interval: 1m
    rules:
      - uid: high-error-rate
        title: High Error Rate
        condition: C
        data:
          - refId: A
            datasourceUid: prometheus
            model:
              expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
          - refId: C
            datasourceUid: __expr__
            model:
              type: threshold
              conditions:
                - evaluator:
                    type: gt
                    params: [5]
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: Error rate above 5%

Dashboard Provisioning

File Structure

grafana/
├── provisioning/
│   ├── dashboards/
│   │   └── dashboards.yaml
│   ├── datasources/
│   │   └── datasources.yaml
│   └── alerting/
│       └── alerts.yaml
└── dashboards/
    ├── overview.json
    └── service-details.json

Datasources Config

# provisioning/datasources/datasources.yaml
apidatasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

  - name: InfluxDB
    type: influxdb
    access: proxy
    url: http://influxdb:8086
    database: metrics
    user: admin
    secureJsonData:
      password: ${INFLUXDB_PASSWORD}

Dashboard Provider

# provisioning/dashboards/dashboards.yaml
apiproviders:
  - name: default
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /var/lib/grafana/dashboards

Common Dashboard Patterns

RED Method (Request, Error, Duration)

# Request Rate
sum(rate(http_requests_total[5m]))

# Error Rate
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m]))

# Duration (95th percentile)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

USE Method (Utilization, Saturation, Errors)

# CPU Utilization
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory Saturation
node_memory_SwapCached_bytes / node_memory_SwapTotal_bytes

# Network Errors
rate(node_network_receive_errs_total[5m])

Best Practices

  1. Use consistent colors - Red for errors, green for success
  2. Add descriptions - Panel descriptions explain what's shown
  3. Set meaningful thresholds - Color changes at important values
  4. Link related dashboards - Drill-down from overview to details
  5. Version control dashboards - Store JSON in git
  6. Use dashboard folders - Organize by team or service

When to Use

  • Infrastructure monitoring
  • Application performance monitoring
  • Business metrics dashboards
  • Real-time operational dashboards
  • SLA/SLO tracking

Notes

  • Grafana Cloud offers managed hosting
  • Use Terraform provider for IaC
  • Consider Grafana Loki for logs
  • Grafana Tempo for distributed tracing

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/23/2026