Recruit

screenshot of Recruit

recruit 招聘爬虫+数据分析 1.爬虫: 采用Scrapy 分布式爬虫技术,使用mongodb作为数据存储,爬取的网站Demo为51job,数据我目前爬了有几千条 2.数据处理: 采用pandas对爬取的数据进行清洗和处理 2.数据分析: 采用flask后端获取mongodb数据,前端使用bootstrap3.echarts以及D3的词云图,如果喜欢请star or Fork,预览详见

Overview

The Recruit Recruitment Scraper and Data Analysis project showcases a robust system built to efficiently extract and analyze job listing data from web sources. Utilizing technologies like Scrapy, MongoDB, and data visualization tools, it captures the essence of modern data collection and processing. With thousands of job postings already gathered from platforms like 51job, this project is a comprehensive solution for those looking to leverage web data for insights in recruitment trends.

Features

  • Scrapy Distributed Crawl: Employs Scrapy to navigate through websites, ensuring effective data gathering at scale, with a focus on job postings.
  • MongoDB for Storage: Utilizes MongoDB as a reliable database solution for storing massive amounts of structured data collected from various sites.
  • Data Cleaning with Pandas: Implements Pandas for thorough data cleaning and processing, ensuring that the data is accurate and meaningful.
  • Flask Backend: Integrates a Flask backend to facilitate seamless data retrieval from MongoDB, providing a robust framework for managing data interactions.
  • Front-End Visualization: Leverages Bootstrap, Echarts, and D3.js to create interactive visual representations like word clouds, making data analysis intuitive and visually appealing.
  • Dependency Management: Offers clear instructions on setting up necessary software versions, including specific versions of Pymongo to avoid compatibility issues.
  • User-Friendly Setup: Provides straightforward steps to initiate the scraper and visualize the data, making it accessible even for those new to data scraping and analysis.