Nextcrawler

screenshot of Nextcrawler
nextjs
react
tailwind
shadcn-ui
prisma

Next Crawler 是使用Playwright + Next.js + Prisma等主流技术搭建的网页数据采集器,通过可视化的UI进行配置,即可周期性的通过Playwright驱动浏览器爬取网页数据。

Overview

Next Crawler is an innovative web data collector built using prominent technologies such as Playwright, Next.js, and Prisma. It allows users to seamlessly collect web data through a visual UI, making the entire process user-friendly and efficient. Whether you need to scrape articles, images, or even comments, Next Crawler provides a robust interface to help you automate regular data collection from various websites with ease.

With its powerful capabilities and customizable features, Next Crawler stands out as a versatile tool for anyone looking to gather data for research, analysis, or personal projects. The ability to configure repeated tasks and download different file formats adds to its appeal, catering to a wide range of data scraping needs.

Features

  • User-Friendly Interface: A visual UI simplifies the configuration process, making it accessible for users of all skill levels.
  • Smart Content Recognition: Built-in support for intelligent content detection using the mozilla/readability library ensures that you scrape only the relevant text.
  • Flexible File Downloads: You can download files in multiple formats like PDF, MP3, and MP4, enhancing the tool’s versatility for various content types.
  • Custom Multi-Field Parsing: The option to configure parsing settings for multiple fields allows for tailored data extraction to meet specific requirements.
  • Scheduled Tasks: Support for cron jobs lets users automate the scraping process at regular intervals, ensuring data is always up-to-date.
  • Import/Export Functionality: Easily manage scraping configurations with templates that support import and export, streamlining your setup.
  • Proxy Support & Error Logging: Features such as proxy support and detailed error logs enhance security and troubleshooting capabilities.
  • Persistent Browser Context: The capability to maintain a persistent browser session mimics real user behavior, providing more accurate data collection.
nextjs
Next.js

Next.js is a React-based web framework that enables server-side rendering, static site generation, and other powerful features for building modern web applications.

react
React

React is a widely used JavaScript library for building user interfaces and single-page applications. It follows a component-based architecture and uses a virtual DOM to efficiently update and render UI components

tailwind
Tailwind

Tailwind CSS is a utility-first CSS framework that provides pre-defined classes for building responsive and customizable user interfaces.

shadcn-ui
Shadcn UI

Beautifully designed components that you can copy and paste into your apps. Accessible. Customizable. Open Source.

prisma
Prisma

Prisma is a server-side library that helps developers read and write data to the database in an intuitive, efficient and safe way.

fullstack
Fullstack

A fullstack boilerplate provides a starter application that includes both frontend and backend. It should include database, auth, payments, user roles and other backend services to build a fully featured saas or webapps.

eslint
Eslint

ESLint is a linter for JavaScript that analyzes code to detect and report on potential problems and errors, as well as enforce consistent code style and best practices, helping developers to write cleaner, more maintainable code.