
Structurizer is a web application that helps you extract structured data from PDF files with Large Language Models!
L'application Structurizer représente une avancée intéressante dans le domaine de la structuration des données issues du langage naturel. Développée dans le cadre d'un travail de Bachelor par Lazar Pavicevic à l'HEIG-VD, elle exploite des modèles de langage de grande taille (LLM) pour offrir des solutions efficaces dans le traitement et l'analyse de documents, notamment des factures et des reçus. En tant que proof of concept, elle est conçue pour faciliter l'extraction et la structuration des données à partir de documents PDF, proposant une alternative automatique et guidée pour les utilisateurs.
Cette application utilise une API robuste qui garantit une interaction fluide tout en intégrant des outils modernes comme Typescript, React, et PostgreSQL. Elle cherche à transformer l'approche traditionnelle de gestion des données en offrant des fonctionnalités variées et utiles, parfaites pour quiconque veut mieux organiser et analyser des données sensibles.
Document Upload: Permet aux utilisateurs de télécharger des documents afin d'extraire et de structurer des données de manière efficace.
Guided and Automatic Structuring: Offre la possibilité d'un processus de structuration manuelle par pipeline ou d'une automatisation complète, facilitant la gestion des données.
Human Data Verification: Intègre une vérification humaine des données structurées avec l'assistance d'un LLM pour s'assurer de la précision des informations.
Data Consultation and Statistics: Permet de consulter les données structurées et d'afficher des statistiques sous forme de graphiques pour une meilleure visualisation des résultats.
Natural Language Question Answering: Facilite la recherche d'informations spécifiques à travers un système de questions-réponses en langage naturel sur les données extraites.
Support for Multiple Object Storage: Stocke les documents PDF en utilisant des services de stockage compatibles S3, tels qu'Amazon S3 et Cloudflare R2, offrant ainsi une flexibilité dans le choix de l’hébergement.
Development and Production Environment: Inclut des instructions claires pour l'installation et la mise en place, utilisant Docker pour assurer un environnement de production local efficace.
Continuous Integration: Gère un processus CI avec des pull requests protégées qui garantissent que seul un code vérifié par des outils appropriés est intégré, assurant ainsi la qualité du projet.

Next.js is a React-based web framework that enables server-side rendering, static site generation, and other powerful features for building modern web applications.
React is a widely used JavaScript library for building user interfaces and single-page applications. It follows a component-based architecture and uses a virtual DOM to efficiently update and render UI components
TanStack is a collection of high-quality, framework-agnostic libraries including TanStack Query for data fetching, TanStack Router for routing, TanStack Table for tables, and more. These tools provide powerful, type-safe solutions for common web development challenges.
Tailwind CSS is a utility-first CSS framework that provides pre-defined classes for building responsive and customizable user interfaces.
cmdk is a fast, composable command menu component for React. It provides the foundation for building command palettes, search interfaces, and keyboard-navigable menus similar to those found in applications like VS Code, Linear, and Raycast.
Beautifully designed components that you can copy and paste into your apps. Accessible. Customizable. Open Source.
Prisma is a server-side library that helps developers read and write data to the database in an intuitive, efficient and safe way.
A fullstack boilerplate provides a starter application that includes both frontend and backend. It should include database, auth, payments, user roles and other backend services to build a fully featured saas or webapps.
ESLint is a linter for JavaScript that analyzes code to detect and report on potential problems and errors, as well as enforce consistent code style and best practices, helping developers to write cleaner, more maintainable code.
Alpine.js is a lightweight JavaScript framework that simplifies the process of creating dynamic, reactive user interfaces on the web. It uses a declarative syntax that offers a higher level of abstraction compared to vanilla JavaScript, while being more performant and easier to use than jQuery.
PostCSS is a popular open-source tool that enables web developers to transform CSS styles with JavaScript plugins. It allows for efficient processing of CSS styles, from applying vendor prefixes to improving browser compatibility, ultimately resulting in cleaner, faster, and more maintainable code.
React Hook Form is a performant, flexible, and extensible form library for React with easy validation. It reduces re-renders and improves performance by using uncontrolled components and native HTML validation, making form handling simple and efficient.
Recharts is a powerful and easy-to-use React library for building customizable and interactive charts. Built on D3.js, it offers a wide range of pre-built chart types, such as line, bar, pie, and scatter charts, all of which can be composed with a declarative syntax.
TypeScript is a superset of JavaScript, providing optional static typing, classes, interfaces, and other features that help developers write more maintainable and scalable code. TypeScript's static typing system can catch errors at compile-time, making it easier to build and maintain large applications.
Zod is a TypeScript-first schema declaration and validation library. It allows you to define schemas that can validate data at runtime while providing excellent TypeScript inference, making it perfect for API validation, form validation, and type-safe data handling.
Zustand is a lightweight state management library for React that provides a simple and intuitive API for managing state in your application. It allows developers to easily create and manage global state, and provides a powerful set of tools for optimizing performance and improving developer productivity. Zustand is designed to be easy to use and easy to learn, making it a popular choice for developers of all skill levels.