Crawl-MCP: Unofficial MCP Server for crawl4ai

⚠️ Important: This is an unofficial MCP server implementation for the excellent crawl4ai library.
Not affiliated with the original crawl4ai project.

A comprehensive Model Context Protocol (MCP) server that wraps the powerful crawl4ai library with advanced AI capabilities. Extract and analyze content from any source: web pages, PDFs, Office documents, YouTube videos, and more. Features intelligent summarization to dramatically reduce token usage while preserving key information.

🌟 Key Features

🔍 Google Search Integration - 7 optimized search genres with Google official operators
🔍 Advanced Web Crawling: JavaScript support, deep site mapping, entity extraction
🌐 Universal Content Extraction: Web pages, PDFs, Word docs, Excel, PowerPoint, ZIP archives
🤖 AI-Powered Summarization: Smart token reduction (up to 88.5%) while preserving essential information
🎬 YouTube Integration: Extract video transcripts and summaries without API keys
⚡ Production Ready: 21 specialized tools with comprehensive error handling

🚀 Quick Start

Prerequisites (Required First)

Install system dependencies for Playwright:

Linux/macOS:

sudo bash scripts/prepare_for_uvx_playwright.sh

Windows (as Administrator):

scripts/prepare_for_uvx_playwright.ps1

Installation

UVX (Recommended - Easiest):

# After system preparation above - that's it!
uvx --from git+https://github.com/walksoda/crawl-mcp crawl-mcp

Claude Desktop Setup

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "crawl-mcp": {
      "transport": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/walksoda/crawl-mcp",
        "crawl-mcp"
      ],
      "env": {
        "CRAWL4AI_LANG": "en"
      }
    }
  }
}

For Japanese interface:

"env": {
  "CRAWL4AI_LANG": "ja"
}

📖 Documentation

Topic	Description
Installation Guide	Complete installation instructions for all platforms
API Reference	Full tool documentation and usage examples
Configuration Examples	Platform-specific setup configurations
HTTP Integration	HTTP API access and integration methods
Advanced Usage	Power user techniques and workflows
Development Guide	Contributing and development setup

Language-Specific Documentation

English: docs/ directory
日本語: docs/ja/ directory

🛠️ Tool Overview

Web Crawling

crawl_url - Single page crawling with JavaScript support
deep_crawl_site - Multi-page site mapping and exploration
crawl_url_with_fallback - Robust crawling with retry strategies
batch_crawl - Process multiple URLs simultaneously

AI-Powered Analysis

intelligent_extract - Semantic content extraction with custom instructions
auto_summarize - LLM-based summarization for large content
extract_entities - Pattern-based entity extraction (emails, phones, URLs, etc.)

Media Processing

process_file - Convert PDFs, Office docs, ZIP archives to markdown
extract_youtube_transcript - Multi-language transcript extraction
batch_extract_youtube_transcripts - Process multiple videos

Search Integration

search_google - Genre-filtered Google search with metadata
search_and_crawl - Combined search and content extraction
batch_search_google - Multiple search queries with analysis

🎯 Common Use Cases

Content Research:

search_and_crawl → intelligent_extract → structured analysis

Documentation Mining:

deep_crawl_site → batch processing → comprehensive extraction

Media Analysis:

extract_youtube_transcript → auto_summarize → insight generation

Competitive Intelligence:

batch_crawl → extract_entities → comparative analysis

🚨 Quick Troubleshooting

Installation Issues:

Run system diagnostics: Use get_system_diagnostics tool
Re-run setup scripts with proper privileges
Try development installation method

Performance Issues:

Use wait_for_js: true for JavaScript-heavy sites
Increase timeout for slow-loading pages
Enable auto_summarize for large content

Configuration Issues:

Check JSON syntax in claude_desktop_config.json
Verify file paths are absolute
Restart Claude Desktop after configuration changes

🏗️ Project Structure

Original Library: crawl4ai by unclecode
MCP Wrapper: This repository (walksoda)
Implementation: Unofficial third-party integration

📄 License

This project is an unofficial wrapper around the crawl4ai library. Please refer to the original crawl4ai license for the underlying functionality.

🤝 Contributing

See our Development Guide for contribution guidelines and development setup instructions.

🔗 Related Projects

crawl4ai - The underlying web crawling library
Model Context Protocol - The standard this server implements
Claude Desktop - Primary client for MCP servers

Server	Summary	Actions
Markdown Downloader		View
Skyvern	Skyvern's MCP server implementation helps connect your AI Applications to the browser. This allows y...	View
MCP Web Research Server	A Model Context Protocol (MCP) server for web research.	View
YouTube Translate MCP		View
Kakuyomu MCP Server	小説投稿サイト「カクヨム」のコンテンツを読み込むためのMCP（Model Context Protocol）サーバーです。LLM...	View
Fetch	A Model Context Protocol server that provides web content fetching capabilities. This server enables...	View

Server

Summary

Actions

Markdown Downloader

View

Skyvern

Skyvern's MCP server implementation helps connect your AI Applications to the browser. This allows y...

View

MCP Web Research Server

A Model Context Protocol (MCP) server for web research.

View

YouTube Translate MCP

View

Kakuyomu MCP Server

小説投稿サイト「カクヨム」のコンテンツを読み込むためのMCP（Model Context Protocol）サーバーです。LLM...

View

Fetch

A Model Context Protocol server that provides web content fetching capabilities. This server enables...

View

Crawl4AI MCP Server

Crawl4AI MCP Server

Crawl-MCP: Unofficial MCP Server for crawl4ai

🌟 Key Features

🚀 Quick Start

Prerequisites (Required First)

Installation

Claude Desktop Setup

📖 Documentation

Language-Specific Documentation

🛠️ Tool Overview

Web Crawling

AI-Powered Analysis

Media Processing

Search Integration

🎯 Common Use Cases

🚨 Quick Troubleshooting

🏗️ Project Structure

📄 License

🤝 Contributing

🔗 Related Projects

Related in Web Scraping - Secure MCP Servers

Remove malware from your site