Fake Job Posting Prediction

screenshot of Fake Job Posting Prediction

Fake Job Predictions using Topic Modeling and Classification

Overview:

This project aims to predict fake job postings from a given list of jobs. The dataset used for this project consists of 17,880 rows of job postings and includes various features such as title, company profile, description, requirements, and benefits. The dataset also includes a column indicating whether a job posting is fraudulent or real.

Features:

  • Topic Modelling: Utilizes topic Latent Dirichlet Allocation (LDA) to find the number of topics and generate probability for each row.
  • Classification Models: Utilizes classification models to predict fake jobs from real jobs with high accuracy.
  • Data Cleaning: Performs steps such as replacing null values with the string "missing", separating country, state, and city from the location column, dropping non-English text entries, and cleaning text columns by removing URLs, non-ASCII characters, punctuation, extra spaces, and white space.
  • Data Engineering: Redefines education bins, drops the salary column, and handles missing values as valid observations.
  • Exploratory Data Analysis (EDA): Conducts exploratory data analysis on the dataset to gain insights and analyze the data.

Summary:

Overall, this project utilizes topic modelling and classification models to predict fake job postings from real ones. The dataset used contains various features, including text and numeric fields, and is cleaned and preprocessed before analysis. The project also includes exploratory data analysis to gain insights into the data. The installation guide provides step-by-step instructions for setting up and running the project.