Helps AI assistants access text content from bot-protected websites. MCP server that fetches HTML/markdown from sites with anti-automation measures using Scrapling.
What is scrapling-fetch-mcp
Scrapling Fetch MCP
An MCP server that helps AI assistants access text content from websites that implement bot detection, bridging the gap between what you can see in your browser and what the AI can access.
Intended Use
This tool is optimized for low-volume retrieval of documentation and reference materials (text/HTML only) from websites that implement bot detection. It has not been designed or tested for general-purpose site scraping or data harvesting.
Note: This project was developed in collaboration with Claude Sonnet 3.7, using LLM Context.
Installation
-
Requirements:
- Python 3.10+
- uv package manager
-
Install dependencies and the tool:
uv tool install scrapling
scrapling install
uv tool install scrapling-fetch-mcp
Setup with Claude
Add this configuration to your Claude client's MCP server configuration:
{
"mcpServers": {
"Cyber-Chitta": {
"command": "uvx",
"args": ["scrapling-fetch-mcp"]
}
}
}
Available Tools
This package provides two distinct tools:
- s-fetch-page: Retrieves complete web pages with pagination support
- s-fetch-pattern: Extracts content matching regex patterns with surrounding context
Example Usage
Fetching a Complete Page
Human: Please fetch and summarize the documentation at https://example.com/docs
Claude: I'll help you with that. Let me fetch the documentation.
<mcp:function_calls>
<mcp:invoke name="s-fetch-page">
<mcp:parameter name="url">https://example.com/docs</mcp:parameter>
<mcp:parameter name="mode">basic</mcp:parameter>
</mcp:invoke>
</mcp:function_calls>
Based on the documentation I retrieved, here's a summary...
Extracting Specific Content with Pattern Matching
Human: Please find all mentions of "API keys" on the documentation page.
Claude: I'll search for that specific information.
<mcp:function_calls>
<mcp:invoke name="s-fetch-pattern">
<mcp:parameter name="url">https://example.com/docs</mcp:parameter>
<mcp:parameter name="mode">basic</mcp:parameter>
<mcp:parameter name="search_pattern">API\s+keys?</mcp:parameter>
<mcp:parameter name="context_chars">150</mcp:parameter>
</mcp:invoke>
</mcp:function_calls>
I found several mentions of API keys in the documentation:
...
Functionality Options
-
Protection Levels:
basic
: Fast retrieval (1-2 seconds) but lower success with heavily protected sitesstealth
: Balanced protection (3-8 seconds) that works with most sitesmax-stealth
: Maximum protection (10+ seconds) for heavily protected sites
-
Content Targeting Options:
- s-fetch-page: Retrieve entire pages with pagination support (using
start_index
andmax_length
) - s-fetch-pattern: Extract specific content using regular expressions (with
search_pattern
andcontext_chars
)- Results include position information for follow-up queries with
s-fetch-page
- Results include position information for follow-up queries with
- s-fetch-page: Retrieve entire pages with pagination support (using
Tips for Best Results
- Start with
basic
mode and only escalate to higher protection levels if needed - For large documents, use the pagination parameters with
s-fetch-page
- Use
s-fetch-pattern
when looking for specific information on large pages - The AI will automatically adjust its approach based on the site's protection level
Limitations
- Designed only for text content: Specifically for documentation, articles, and reference materials
- Not designed for high-volume scraping or data harvesting
- May not work with sites requiring authentication
- Performance varies by site complexity
License
Apache 2
Leave a Comment
Frequently Asked Questions
What is MCP?
MCP (Model Context Protocol) is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications, providing a standardized way to connect AI models to different data sources and tools.
What are MCP Servers?
MCP Servers are lightweight programs that expose specific capabilities through the standardized Model Context Protocol. They act as bridges between LLMs like Claude and various data sources or services, allowing secure access to files, databases, APIs, and other resources.
How do MCP Servers work?
MCP Servers follow a client-server architecture where a host application (like Claude Desktop) connects to multiple servers. Each server provides specific functionality through standardized endpoints and protocols, enabling Claude to access data and perform actions through the standardized protocol.
Are MCP Servers secure?
Yes, MCP Servers are designed with security in mind. They run locally with explicit configuration and permissions, require user approval for actions, and include built-in security features to prevent unauthorized access and ensure data privacy.
Related MCP Servers
sylphlab pdf reader mcp
An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.
aashari mcp server atlassian bitbucket
Node.js/TypeScript MCP server for Atlassian Bitbucket. Enables AI systems (LLMs) to interact with workspaces, repositories, and pull requests via tools (list, get, comment, search). Connects AI directly to version control workflows through the standard MCP interface.
aashari mcp server atlassian confluence
Node.js/TypeScript MCP server for Atlassian Confluence. Provides tools enabling AI systems (LLMs) to list/get spaces & pages (content formatted as Markdown) and search via CQL. Connects AI seamlessly to Confluence knowledge bases using the standard MCP interface.
weibaohui k8m
一款轻量级、跨平台的 Mini Kubernetes AI Dashboard,支持大模型+智能体+MCP(支持设置操作权限),集成多集群管理、智能分析、实时异常检测等功能,支持多架构并可单文件部署,助力高效集群管理与运维优化。
watchdealer pavel watchbase mcp server
MCP Server for structured and standardized querying of watch-related metadata such as brands, families, and reference details from WatchBase.com.
vgnshiyer apple books mcp
Apple Books MCP Server
ttiimmaacc cinema4d mcp
Cinema 4D plugin integrating Claude AI for prompt-driven 3D modeling, scene creation, and manipulation.
sv mcp paradex py
Connect AI agents to the Paradex trading platform. Retrieve market data, manage accounts, and execute trades seamlessly. Enhance your trading experience with automated tools and real-time insights.
SamllPigYanDong revit mcp
Revit MCP. A Model Context Protocol server for Revit integration, enabling seamless communication between Claude AI and Autodesk Revit.
Rayyan9477 linkedin mcp
A powerful Model Context Protocol server for LinkedIn interactions that enables AI assistants to search for jobs, generate resumes and cover letters, and manage job applications programmatically.
Submit Your MCP Server
Share your MCP server with the community
Submit Now