jae jae fetcher mcp

jae jae fetcher mcp avatar

by jae-jae

MCP server for fetch web page content using Playwright headless browser.

What is jae jae fetcher mcp

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

Advantages

  • JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.

  • Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.

  • Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.

  • Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.

  • Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.

  • Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.

  • Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser by running the following command in your terminal:

npx playwright install chromium

HTTP and SSE Transport

Use the --transport=http parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:

npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=3000

After startup, the server provides the following endpoints:

  • /mcp - Streamable HTTP endpoint (modern MCP protocol)
  • /sse - SSE endpoint (legacy MCP protocol)

Clients can choose which method to connect based on their needs.

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Docker Deployment

Running with Docker

docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest

Deploying with Docker Compose

Create a docker-compose.yml file:

version: "3.8"

services:
  fetcher-mcp:
    image: ghcr.io/jae-jae/fetcher-mcp:latest
    container_name: fetcher-mcp
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
    # Using host network mode on Linux hosts can improve browser access efficiency
    # network_mode: "host"
    volumes:
      # For Playwright, may need to share certain system paths
      - /tmp:/tmp
    # Health check
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"]
      interval: 30s
      timeout: 10s
      retries: 3

Then run:

docker-compose up -d

Features

  • fetch_url - Retrieve web page content from a specified URL

    • Uses Playwright headless browser to parse JavaScript
    • Supports intelligent extraction of main content and conversion to Markdown
    • Supports the following parameters:
      • url: The URL of the web page to fetch (required parameter)
      • timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)
      • waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'
      • extractContent: Whether to intelligently extract the main content, default is true
      • maxLength: Maximum length of returned content (in characters), default is no limit
      • returnHtml: Whether to return HTML content instead of Markdown, default is false
      • waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
      • navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
      • disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true
      • debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
  • fetch_urls - Batch retrieve web page content from multiple URLs in parallel

    • Uses multi-tab parallel fetching for improved performance
    • Returns combined results with clear separation between webpages
    • Supports the following parameters:
      • urls: Array of URLs to fetch (required parameter)
      • Other parameters are the same as fetch_url

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

  • Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:

    Please wait for the page to fully load
    

    This will use the waitForNavigation: true parameter.

  • Increase Timeout Duration: For websites that load slowly:

    Please set the page loading timeout to 60 seconds
    

    This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

  • Preserve Original HTML Structure: When content extraction might fail:

    Please preserve the original HTML content
    

    Sets extractContent: false and returnHtml: true.

  • Fetch Complete Page Content: When extracted content is too limited:

    Please fetch the complete webpage content instead of just the main content
    

    Sets extractContent: false.

  • Return Content as HTML: When HTML format is needed instead of default Markdown:

    Please return the content in HTML format
    

    Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: To display the browser window during a specific fetch operation:
    Please enable debug mode for this fetch operation
    
    This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

  • Manual Login: To login using your own credentials:

    Please run in debug mode so I can manually log in to the website
    

    Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.

  • Interacting with Debug Browser: When debug mode is enabled:

    1. The browser window remains open
    2. You can manually log into the website using your credentials
    3. After login is complete, content will be fetched with your authenticated session
  • Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:

    Please enable debug mode for this authentication step
    

    Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

npm install

Install Playwright Browser

Install the browsers needed for Playwright:

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug

Related Projects

  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License

Leave a Comment

Frequently Asked Questions

What is MCP?

MCP (Model Context Protocol) is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications, providing a standardized way to connect AI models to different data sources and tools.

What are MCP Servers?

MCP Servers are lightweight programs that expose specific capabilities through the standardized Model Context Protocol. They act as bridges between LLMs like Claude and various data sources or services, allowing secure access to files, databases, APIs, and other resources.

How do MCP Servers work?

MCP Servers follow a client-server architecture where a host application (like Claude Desktop) connects to multiple servers. Each server provides specific functionality through standardized endpoints and protocols, enabling Claude to access data and perform actions through the standardized protocol.

Are MCP Servers secure?

Yes, MCP Servers are designed with security in mind. They run locally with explicit configuration and permissions, require user approval for actions, and include built-in security features to prevent unauthorized access and ensure data privacy.

Related MCP Servers

chrisdoc hevy mcp avatar

chrisdoc hevy mcp

mcp
sylphlab pdf reader mcp avatar

sylphlab pdf reader mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

pdf-parsetypescriptnodejs
aashari mcp server atlassian bitbucket avatar

aashari mcp server atlassian bitbucket

Node.js/TypeScript MCP server for Atlassian Bitbucket. Enables AI systems (LLMs) to interact with workspaces, repositories, and pull requests via tools (list, get, comment, search). Connects AI directly to version control workflows through the standard MCP interface.

atlassianrepositorymcp
aashari mcp server atlassian confluence avatar

aashari mcp server atlassian confluence

Node.js/TypeScript MCP server for Atlassian Confluence. Provides tools enabling AI systems (LLMs) to list/get spaces & pages (content formatted as Markdown) and search via CQL. Connects AI seamlessly to Confluence knowledge bases using the standard MCP interface.

atlassianmcpconfluence
prisma prisma avatar

prisma prisma

Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB

cockroachdbgomcp
Zzzccs123 mcp sentry avatar

Zzzccs123 mcp sentry

mcp sentry for typescript sdk

mcptypescript
zhuzhoulin dify mcp server avatar

zhuzhoulin dify mcp server

mcp
zhongmingyuan mcp my mac avatar

zhongmingyuan mcp my mac

mcp
zhixiaoqiang desktop image manager mcp avatar

zhixiaoqiang desktop image manager mcp

MCP 服务器,用于管理桌面图片、查看详情、压缩、移动等(完全让Trae实现)

mcp
zhixiaoqiang antd components mcp avatar

zhixiaoqiang antd components mcp

An MCP service for Ant Design components query | 一个减少 Ant Design 组件代码生成幻觉的 MCP 服务,包含系统提示词、组件文档、API 文档、代码示例和更新日志查询

designantdapi

Submit Your MCP Server

Share your MCP server with the community

Submit Now