Guanchazhe_spider

screenshot of Guanchazhe_spider

观察者新闻网爬虫(新闻爬虫),基于python+Flask+Echarts,实现首页与更多新闻页面爬取(Requests+etree+Xpath)+新闻存储(MySQL)+文本分析(Jieba)+可视化(新闻词云,词频统计)。

Overview

The Observer News Crawler project is an impressive undertaking that demonstrates the ability to scrape, store, analyze, and visualize news content from the Observer News website. Completed in just four days as part of a school training assignment, this project showcases a mix of technologies that are increasingly essential in the field of web development and data analysis. It's particularly interesting to know that it was built without prior experience in web crawling, highlighting a commitment to learning and adapting quickly.

The project integrates various components, including web scraping, data storage, and text analysis, to present news content effectively. While the author encourages feedback to improve the project, it's a valuable resource for those looking to explore web scraping and visualization techniques.

Features

  • Web Scraping with Requests and Xpath: Utilizes Requests for HTTP requests and Xpath for parsing HTML, facilitating efficient data extraction from the target website.

  • Data Storage using MySQL: Employs MySQL to securely store the gathered news content, ensuring easy retrieval and management of data.

  • Text Analysis with Jieba: Implements Jieba for Chinese text segmentation, enabling effective analysis of news articles at the word level.

  • Visualization with Flask and Echarts: Leverages Flask for the web framework and Echarts for interactive data visualization, providing users with clear graphical representations of word frequency.

  • Word Cloud Generation: Creates impactful word clouds based on the frequency of terms found in the scraped content, effectively highlighting trending topics in visual form.

  • Rapid Development Timeline: Completed in just four days, showcasing the author's ability to quickly learn and implement new skills within a short timeframe.

  • Open to Feedback: The author invites constructive criticism and suggestions for improvement, emphasizing a collaborative approach to learning and development in the project.