MCP server Browser Use
by JovaniPink
FastAPI server implementing MCP protocol Browser automation via browser-use library.
What is MCP server Browser Use
MCP server w/ Browser Use
MCP server for browser-use.
Overview
This repository contains the server for the browser-use library, which provides a powerful browser automation system that enables AI agents to interact with web browsers through natural language. The server is built on Anthropic's Model Context Protocol (MCP) and provides a seamless integration with the browser-use library.
Features
- Browser Control
- Automated browser interactions via natural language
- Navigation, form filling, clicking, and scrolling capabilities
- Tab management and screenshot functionality
- Cookie and state management
- Agent System
- Custom agent implementation in custom_agent.py
- Vision-based element detection
- Structured JSON responses for actions
- Message history management and summarization
- Configuration
- Environment-based configuration for API keys and settings
- Chrome browser settings (debugging port, persistence)
- Model provider selection and parameters
Dependencies
This project relies on the following Python packages:
Package | Version | Description |
---|---|---|
Pillow | >=10.1.0 | Python Imaging Library (PIL) fork that adds image processing capabilities to your Python interpreter. |
browser-use | ==0.1.19 | A powerful browser automation system that enables AI agents to interact with web browsers through natural language. The core library that powers this project's browser automation capabilities. |
fastapi | >=0.115.6 | Modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints. Used to create the server that exposes the agent's functionality. |
fastmcp | >=0.4.1 | A framework that wraps FastAPI for building MCP (Model Context Protocol) servers. |
instructor | >=1.7.2 | Library for structured output prompting and validation with OpenAI models. Enables extracting structured data from model responses. |
langchain | >=0.3.14 | Framework for developing applications with large language models (LLMs). Provides tools for chaining together different language model components and interacting with various APIs and data sources. |
langchain-google-genai | >=2.1.1 | LangChain integration for Google GenAI models, enabling the use of Google's generative AI capabilities within the LangChain framework. |
langchain-openai | >=0.2.14 | LangChain integrations with OpenAI's models. Enables using OpenAI models (like GPT-4) within the LangChain framework. Used in this project for interacting with OpenAI's language and vision models. |
langchain-ollama | >=0.2.2 | Langchain integration for Ollama, enabling local execution of LLMs. |
openai | >=1.59.5 | Official Python client library for the OpenAI API. Used to interact directly with OpenAI's models (if needed, in addition to LangChain). |
python-dotenv | >=1.0.1 | Reads key-value pairs from a .env file and sets them as environment variables. Simplifies local development and configuration management. |
pydantic | >=2.10.5 | Data validation and settings management using Python type annotations. Provides runtime enforcement of types and automatic model creation. Essential for defining structured data models in the agent. |
pyperclip | >=1.9.0 | Cross-platform Python module for copy and paste clipboard functions. |
uvicorn | >=0.22.0 | ASGI web server implementation for Python. Used to serve the FastAPI application. |
Components
Resources
The server implements a browser automation system with:
- Integration with browser-use library for advanced browser control
- Custom browser automation capabilities
- Agent-based interaction system with vision capabilities
- Persistent state management
- Customizable model settings
Requirements
- Operating Systems (Linux, macOS, Windows; we haven't tested for Docker or Microsoft WSL)
- Python 3.11 or higher
- uv (fast Python package installer)
- Chrome/Chromium browser
- Claude Desktop
Quick Start
Claude Desktop
On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
Installing via Smithery
To install Browser Use for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @JovaniPink/mcp-browser-use --client claude
"mcpServers": {
"mcp_server_browser_use": {
"command": "uvx",
"args": [
"mcp-server-browser-use",
],
"env": {
"OPENAI_ENDPOINT": "https://api.openai.com/v1",
"OPENAI_API_KEY": "",
"ANTHROPIC_API_KEY": "",
"GOOGLE_API_KEY": "",
"AZURE_OPENAI_ENDPOINT": "",
"AZURE_OPENAI_API_KEY": "",
// "DEEPSEEK_ENDPOINT": "https://api.deepseek.com",
// "DEEPSEEK_API_KEY": "",
// Set to false to disable anonymized telemetry
"ANONYMIZED_TELEMETRY": "false",
// Chrome settings
"CHROME_PATH": "",
"CHROME_USER_DATA": "",
"CHROME_DEBUGGING_PORT": "9222",
"CHROME_DEBUGGING_HOST": "localhost",
// Set to true to keep browser open between AI tasks
"CHROME_PERSISTENT_SESSION": "false",
// Model settings
"MCP_MODEL_PROVIDER": "anthropic",
"MCP_MODEL_NAME": "claude-3-5-sonnet-20241022",
"MCP_TEMPERATURE": "0.3",
"MCP_MAX_STEPS": "30",
"MCP_USE_VISION": "true",
"MCP_MAX_ACTIONS_PER_STEP": "5",
"MCP_TOOL_CALL_IN_CONTENT": "true"
}
}
}
Environment Variables
Key environment variables:
# API Keys
ANTHROPIC_API_KEY=anthropic_key
# Chrome Configuration
# Optional: Path to Chrome executable
CHROME_PATH=/path/to/chrome
# Optional: Chrome user data directory
CHROME_USER_DATA=/path/to/user/data
# Default: 9222
CHROME_DEBUGGING_PORT=9222
# Default: localhost
CHROME_DEBUGGING_HOST=localhost
# Keep browser open between tasks
CHROME_PERSISTENT_SESSION=false
# Model Settings
# Options: anthropic, openai, azure, deepseek
MCP_MODEL_PROVIDER=anthropic
# Model name
MCP_MODEL_NAME=claude-3-5-sonnet-20241022
MCP_TEMPERATURE=0.3
MCP_MAX_STEPS=30
MCP_USE_VISION=true
MCP_MAX_ACTIONS_PER_STEP=5
Development
Setup
- Clone the repository:
git clone https://github.com/JovaniPink/mcp-browser-use.git
cd mcp-browser-use
- Create and activate virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
uv sync
- Start the server
uv run mcp-browser-use
Debugging
For debugging, use the MCP Inspector:
npx @modelcontextprotocol/inspector uv --directory /path/to/project run mcp-server-browser-use
The Inspector will display a URL for the debugging interface.
Browser Actions
The server supports various browser actions through natural language:
- Navigation: Go to URLs, back/forward, refresh
- Interaction: Click, type, scroll, hover
- Forms: Fill forms, submit, select options
- State: Get page content, take screenshots
- Tabs: Create, close, switch between tabs
- Vision: Find elements by visual appearance
- Cookies & Storage: Manage browser state
Security
I want to note that their are some Chrome settings that are set to allow for the browser to be controlled by the server. This is a security risk and should be used with caution. The server is not intended to be used in a production environment.
Security Details: SECURITY.MD
Contributing
We welcome contributions to this project. Please follow these steps:
- Fork this repository.
- Create your feature branch:
git checkout -b my-new-feature
. - Commit your changes:
git commit -m 'Add some feature'
. - Push to the branch:
git push origin my-new-feature
. - Submit a pull request.
For major changes, open an issue first to discuss what you would like to change. Please update tests as appropriate to reflect any changes made.
Leave a Comment
Frequently Asked Questions
What is MCP?
MCP (Model Context Protocol) is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications, providing a standardized way to connect AI models to different data sources and tools.
What are MCP Servers?
MCP Servers are lightweight programs that expose specific capabilities through the standardized Model Context Protocol. They act as bridges between LLMs like Claude and various data sources or services, allowing secure access to files, databases, APIs, and other resources.
How do MCP Servers work?
MCP Servers follow a client-server architecture where a host application (like Claude Desktop) connects to multiple servers. Each server provides specific functionality through standardized endpoints and protocols, enabling Claude to access data and perform actions through the standardized protocol.
Are MCP Servers secure?
Yes, MCP Servers are designed with security in mind. They run locally with explicit configuration and permissions, require user approval for actions, and include built-in security features to prevent unauthorized access and ensure data privacy.
Related MCP Servers
Brave Search MCP
Integrate Brave Search capabilities into Claude through MCP. Enables real-time web searches with privacy-focused results and comprehensive web coverage.
chrisdoc hevy mcp
sylphlab pdf reader mcp
An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.
aashari mcp server atlassian bitbucket
Node.js/TypeScript MCP server for Atlassian Bitbucket. Enables AI systems (LLMs) to interact with workspaces, repositories, and pull requests via tools (list, get, comment, search). Connects AI directly to version control workflows through the standard MCP interface.
aashari mcp server atlassian confluence
Node.js/TypeScript MCP server for Atlassian Confluence. Provides tools enabling AI systems (LLMs) to list/get spaces & pages (content formatted as Markdown) and search via CQL. Connects AI seamlessly to Confluence knowledge bases using the standard MCP interface.
prisma prisma
Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
Zzzccs123 mcp sentry
mcp sentry for typescript sdk
zhuzhoulin dify mcp server
zhongmingyuan mcp my mac
zhixiaoqiang desktop image manager mcp
MCP 服务器,用于管理桌面图片、查看详情、压缩、移动等(完全让Trae实现)
Submit Your MCP Server
Share your MCP server with the community
Submit Now