Overview:
The project is a GUI application designed to simplify data tasks such as data cleaning, data engineering, and machine learning without the need for coding. It is primarily aimed at students and researchers who need to perform preliminary analysis on their datasets.
Features:
- Data Imports in CSV, JSON, XLSX, and HTML format: Users can easily import datasets in different file formats without any hassle.
- Null Values imputation: The application provides multiple options for handling null values, including fill forward, fill backward, fill median, fill mean, and drop.
- Preprocessors: Users can apply label encoding, normalization, and standardization to preprocess their datasets.
- Horizontal and Vertical Joins of Dataset: The application allows users to perform joins on datasets based on various criteria.
- Visualizations: Users can visualize their data using line charts, bar charts, pie charts, and histograms.
- View dataset as a table: Users can view their datasets in a tabular format for easy analysis.
- Regression Algorithms supported: Linear Regression, Support Vector Machine (SVM), Decision Trees, Random Forest, and K-Nearest Neighbors (KNN) algorithms are available for regression analysis.
- Classification Algorithms supported: Logistic Regression, Support Vector Machine (SVM), Decision Trees, Random Forest, and K-Nearest Neighbors (KNN) algorithms are available for classification analysis.
- Data Exports in CSV, JSON, XLSX, and HTML format: Users can export their processed datasets in different file formats.
Summary:
The project is a user-friendly GUI application that enables users to perform data-related tasks without needing to write any code. It offers features such as data imports in various formats, null value imputation, preprocessors, dataset joins, visualizations, regression and classification algorithms support, and data exports. The installation process involves setting up Docker and executing a script. This tool is particularly useful for students and researchers who want to quickly analyze and prototype their datasets.