Real Time Twitter Sentiment Analysis

screenshot of Real Time Twitter Sentiment Analysis
django

This repo contains Big Data Project, its about "Real Time Twitter Sentiment Analysis via Kafka, Spark Streaming, MongoDB and Django Dashboard".

Overview

The Big Data Project focusing on real-time Twitter sentiment analysis is an impressive initiative that combines several cutting-edge technologies to provide valuable insights into public sentiment. This project utilizes Apache Kafka for real-time data ingestion, Apache Spark for processing, and MongoDB for storage, all while utilizing Django to create an interactive dashboard. The ability to classify tweets into various sentiment categories—positive, negative, and neutral—via natural language processing adds significant depth to the analysis.

This repository not only highlights the architecture of the system but also serves as a detailed guide for anyone interested in setting up a similar project. The inclusion of both training and validation datasets allows for comprehensive testing and model building, making it an excellent resource for data enthusiasts and developers alike.

Features

  • Real-time Data Ingestion: Collects live tweets from Twitter using Kafka, ensuring that sentiment analysis reflects current trends.
  • Stream Processing: Utilizes Spark Streaming to process and analyze incoming data in real-time, allowing for immediate insights.
  • Sentiment Analysis: Employs natural language processing techniques to classify tweets into sentiment categories—positive, negative, or neutral.
  • Data Storage: Uses MongoDB for efficient storage and persistence of processed sentiment analysis results.
  • Visualization: Features a real-time dashboard built with Django, showcasing sentiment trends and insights in a user-friendly manner.
  • Comprehensive Dataset: Incorporates extensive datasets (over 74,682 tweets in training and 998 in validation) to build robust models.
  • Repository Structure: Organized into distinct folders for the Django dashboard, Kafka provider, and PySpark model, facilitating easy navigation and understanding.
  • Getting Started Guide: Provides clear installation instructions for necessary tools like Docker, Python, Kafka, Spark, and MongoDB, making it accessible for developers.
django
Django

Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design. It follows the model-view-controller (MVC) architectural pattern, providing an extensive set of built-in tools and conventions to streamline the creation of robust and scalable web applications.

docker
Docker

A website that uses Docker for containerization to streamline development, testing, and deployment workflows. This includes features such as containerization of dependencies, automated builds and deployments, and container orchestration to ensure scalability and availability.