MCP (Model Context Protocol) server that utilizes the Google Gemini Vision API to interact with YouTube videos.
What is minbang930 Youtube Vision MCP
YouTube Vision MCP Server (youtube-vision
)
MCP (Model Context Protocol) server that utilizes the Google Gemini Vision API to interact with YouTube videos. It allows users to get descriptions, summaries, answers to questions, and extract key moments from YouTube videos.
Features
- Analyzes YouTube videos using the Gemini Vision API.
- Provides multiple tools for different interactions:
- General description or Q&A (
ask_about_youtube_video
) - Summarization (
summarize_youtube_video
) - Key moment extraction (
extract_key_moments
)
- General description or Q&A (
- Lists available Gemini models supporting
generateContent
. - Configurable Gemini model via environment variable.
- Communicates via stdio (standard input/output).
Prerequisites
Before using this server, ensure you have the following:
- Node.js: Version 18 or higher recommended. You can download it from nodejs.org.
- Google Gemini API Key: Obtain your API key from Google AI Studio or Google Cloud Console.
Installation & Usage
There are two main ways to use this server:
Installing via Smithery
To install youtube-vision-mcp for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @minbang930/youtube-vision-mcp --client claude
Option 1: Using npx (Recommended for quick use)
The easiest way to run this server is using npx
, which downloads and runs the package without needing a permanent installation.
You can configure it within your MCP client's settings file (Claude, VSCode .. ):
{
"mcpServers": {
"youtube-vision": {
"command": "npx",
"args": [
"-y",
"youtube-vision"
],
"env": {
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY",
"GEMINI_MODEL_NAME": "gemini-2.0-flash"
}
}
}
}
Replace "YOUR_GEMINI_API_KEY"
with your actual Google Gemini API key.
Option 2: Manual Installation (from Source)
If you want to modify the code or run it directly from the source:
-
Clone the repository:
git clone https://github.com/minbang930/Youtube-Vision-MCP.git cd youtube-vision
-
Install dependencies:
npm install
-
Build the project:
npm run build
-
Configure and run: You can then run the compiled code using
node dist/index.js
directly (ensureGEMINI_API_KEY
is set as an environment variable) or configure your MCP client to run it using thenode
command and the absolute path todist/index.js
, passing the API key via theenv
setting as shown in the npx example.
Configuration
The server uses the following environment variables:
GEMINI_API_KEY
(Required): Your Google Gemini API key.GEMINI_MODEL_NAME
(Optional): The specific Gemini model to use (e.g.,gemini-1.5-flash
). Defaults togemini-2.0-flash
. Important: For production or commercial use, ensure you select a model version that is not marked as "Experimental" or "Preview".
Environment variables should be set in the env
section of your MCP client's settings file (e.g., mcp_settings.json
).
Available Tools
1. ask_about_youtube_video
Answers a question about the video or provides a general description if no question is asked.
- Input:
youtube_url
(string, required): The URL of the YouTube video.question
(string, optional): The specific question to ask about the video. If omitted, a general description is generated.
- Output: Text containing the answer or description.
2. summarize_youtube_video
Generates a summary of a given YouTube video.
- Input:
youtube_url
(string, required): The URL of the YouTube video.summary_length
(string, optional): Desired summary length ('short', 'medium', 'long'). Defaults to 'medium'.
- Output: Text containing the video summary.
3. extract_key_moments
Extracts key moments (timestamps and descriptions) from a given YouTube video.
- Input:
youtube_url
(string, required): The URL of the YouTube video.number_of_moments
(integer, optional): Number of key moments to extract. Defaults to 3.
- Output: Text describing the key moments with timestamps.
4. list_supported_models
Lists available Gemini models that support the generateContent
method (fetched via REST API).
- Input: None
- Output: Text listing the supported model names.
Important Notes
- Model Selection for Production: When using this server for production or commercial purposes, please ensure the selected
GEMINI_MODEL_NAME
is a stable version suitable for production use. According to the Gemini API Terms of Service, models marked as "Experimental" or "Preview" are not permitted for production deployment. - API Terms of Service: Usage of this server relies on the Google Gemini API. Users are responsible for reviewing and complying with the Google APIs Terms of Service and the Gemini API Additional Terms of Service. Note that data usage policies may differ between free and paid tiers of the Gemini API. Do not submit sensitive or confidential information when using free tiers.
- Content Responsibility: The accuracy and appropriateness of content generated via the Gemini API are not guaranteed. Use discretion before relying on or publishing generated content.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Leave a Comment
Frequently Asked Questions
What is MCP?
MCP (Model Context Protocol) is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications, providing a standardized way to connect AI models to different data sources and tools.
What are MCP Servers?
MCP Servers are lightweight programs that expose specific capabilities through the standardized Model Context Protocol. They act as bridges between LLMs like Claude and various data sources or services, allowing secure access to files, databases, APIs, and other resources.
How do MCP Servers work?
MCP Servers follow a client-server architecture where a host application (like Claude Desktop) connects to multiple servers. Each server provides specific functionality through standardized endpoints and protocols, enabling Claude to access data and perform actions through the standardized protocol.
Are MCP Servers secure?
Yes, MCP Servers are designed with security in mind. They run locally with explicit configuration and permissions, require user approval for actions, and include built-in security features to prevent unauthorized access and ensure data privacy.
Related MCP Servers
Brave Search MCP
Integrate Brave Search capabilities into Claude through MCP. Enables real-time web searches with privacy-focused results and comprehensive web coverage.
chrisdoc hevy mcp
sylphlab pdf reader mcp
An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.
aashari mcp server atlassian bitbucket
Node.js/TypeScript MCP server for Atlassian Bitbucket. Enables AI systems (LLMs) to interact with workspaces, repositories, and pull requests via tools (list, get, comment, search). Connects AI directly to version control workflows through the standard MCP interface.
aashari mcp server atlassian confluence
Node.js/TypeScript MCP server for Atlassian Confluence. Provides tools enabling AI systems (LLMs) to list/get spaces & pages (content formatted as Markdown) and search via CQL. Connects AI seamlessly to Confluence knowledge bases using the standard MCP interface.
prisma prisma
Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
Zzzccs123 mcp sentry
mcp sentry for typescript sdk
zhuzhoulin dify mcp server
zhongmingyuan mcp my mac
zhixiaoqiang desktop image manager mcp
MCP 服务器,用于管理桌面图片、查看详情、压缩、移动等(完全让Trae实现)
Submit Your MCP Server
Share your MCP server with the community
Submit Now