Word2vec Spam Filter

screenshot of Word2vec Spam Filter
flask
react

Using word vectors to classify spam messages

Overview

The word2vec spam filter project is an innovative solution developed during the Kik hackathon in 2017, designed to classify spam messages while prioritizing user privacy. This system operates on the client side, generating a "hash" from incoming messages and sending it to a server for classification. By comparing these hashes against a bank of previously reported messages, the system can efficiently identify spam and incrementally build its accuracy based on user feedback.

The approach combines machine learning with practical functionality, utilizing word vectors and various configurable parameters to enhance performance. It also offers a user-friendly web client that allows individuals to test the spam classification in real-time, giving users control over message reporting and classification status.

Features

  • Privacy Protection: The system generates a hash of the message, ensuring that user privacy is maintained while determining spam status.
  • Dynamic Spam Bank: New spam messages are added to a central bank after being reported multiple times, facilitating continuous learning and enhancement of spam detection.
  • Customizable Hyper-Parameters: Users can adjust parameters such as confidence thresholds and vector sizes to optimize the spam filtering process according to their needs.
  • User-Friendly Web Client: The project includes a web interface with multiple view modes, making it easy for users to interact with the spam filter.
  • Interactive Testing: Users can input their messages to check if they are classified as spam or report them accordingly, aiding in the training of the spam detection system.
  • Extensive Configurations: Supports configurations for handling different message types, including non-English words and various punctuation marks, increasing accuracy.
  • Real-Time Feedback Loop: Users can instantly report messages as spam, which contributes to the ongoing refinement of the spam bank and overall system performance.
  • Quick Setup: A single makefile allows for easy initialization and installation of dependencies, making it accessible for developers to run and contribute to the project efficiently.
flask
Flask

Flask is a lightweight and popular web framework for Python, known for its simplicity and flexibility. It is widely used to build web applications, providing a minimalistic approach to web development with features like routing, templates, and support for extensions.

react
React

React is a widely used JavaScript library for building user interfaces and single-page applications. It follows a component-based architecture and uses a virtual DOM to efficiently update and render UI components

typescript
Typescript

TypeScript is a superset of JavaScript, providing optional static typing, classes, interfaces, and other features that help developers write more maintainable and scalable code. TypeScript's static typing system can catch errors at compile-time, making it easier to build and maintain large applications.

webpack
Webpack

Webpack is a popular open-source module bundler for JavaScript applications that bundles and optimizes the code and its dependencies for production-ready deployment. It can also be used to transform other types of assets such as CSS, images, and fonts.