Deepcrawl

screenshot of Deepcrawl
hono
nextjs
shadcn-ui

100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy to cloudflare or vercel by yourself.

Overview

Deepcrawl is an innovative, open-source alternative to traditional web crawling platforms, offering a high-performance solution for extracting website data. Designed particularly for those who need to scrape public web pages efficiently, Deepcrawl provides the capability to retrieve cleaned Markdown content, facilitating easier processing and analysis. However, it's important to note that this tool is still in the early stages of development, and users are advised to proceed with caution in production environments.

Deepcrawl focuses on enhancing flexibility and performance, making it an appealing choice for developers and data scientists looking for a cutting-edge solution for web scraping undertaken at high frequency. The platform aims to minimize context switching and reduce the incidence of hallucinations in content by providing well-structured data in a convenient format.

Features

  • Open Source: Completely free to use and the code is accessible for contributions, fostering community engagement and continuous improvement.
  • High Performance: Optimized for high-frequency agent workloads, ensuring efficient extraction of large volumes of data from public web pages.
  • Cleaned Markdown Output: Converts extracted content into a clean Markdown format, which is easier to process for various applications.
  • Hierarchical Links Tree: Generates a structured links tree that helps users navigate and analyze the relationships between pages effectively.
  • Minimal Token Cost: Reduces the computational expense associated with processing data, making it suitable for LLMs that require efficient context management.
  • Comprehensive Dashboard: Features a full platform including Nextjs Dashboard, API Workers, Auth Workers, and a Database, providing users with a complete toolkit for their web scraping needs.
  • Active Development: As a project under rapid development, users can expect ongoing updates and enhancements based on community feedback.
hono
Hono

Hono is an ultrafast web framework designed for edge computing environments. It's lightweight, supports multiple runtimes including Cloudflare Workers, Deno, and Bun, and provides a familiar Express-like API with excellent TypeScript support.

nextjs
Next.js

Next.js is a React-based web framework that enables server-side rendering, static site generation, and other powerful features for building modern web applications.

shadcn-ui
Shadcn UI

Beautifully designed components that you can copy and paste into your apps. Accessible. Customizable. Open Source.

typescript
Typescript

TypeScript is a superset of JavaScript, providing optional static typing, classes, interfaces, and other features that help developers write more maintainable and scalable code. TypeScript's static typing system can catch errors at compile-time, making it easier to build and maintain large applications.

zod
Zod

Zod is a TypeScript-first schema declaration and validation library. It allows you to define schemas that can validate data at runtime while providing excellent TypeScript inference, making it perfect for API validation, form validation, and type-safe data handling.