
A python module and REST API for automatic extraction of metadata from PDF files
METEOR is a powerful web service designed to facilitate the extraction of metadata from public reports, allowing users to efficiently gather essential information from both PDFs and ALTO XML files. This tool is particularly valuable for researchers, librarians, and anyone dealing with vast amounts of publication data, enabling them to streamline their workflow and improve data management.
With its ease of use and compatibility with Python 3.11, METEOR stands out as a reliable solution for metadata extraction. By simply setting up the environment and starting the Flask app, users can access the service locally and enjoy its robust features aimed at simplifying metadata handling from various sources.
Versatile Input Options: Supports metadata extraction from both PDF files with a text layer and directories containing ALTO XML files, making it flexible for different user needs.
Key Metadata Fields: Attempts to identify crucial fields like ISBN, ISSN, title, publisher, publication year, language, authors, and publication type, thus providing comprehensive data extraction.
Integration with Norwegian Authority File: Publisher names can be cross-referenced with the Norwegian Authority File, enhancing the accuracy and relevance of extracted data.
Database Building Capability: Users can create a detailed database from the registry’s API by configuring environment variables and executing a simple script, saving time on manual data entry.
PEP8 Compliance Check: Includes a pre-commit PEP8 compliance check which helps in maintaining code quality and adherence to Python coding standards.
Open Source Licenses: The project offers transparency through its licensing under the Apache License 2.0, allowing users to leverage and modify the code while being mindful of the dependencies licensed under GPLv3.
Local Development Support: Designed for easy local setup, allowing developers to run the Flask application in debug mode, enhancing the experience of customizing the service based on specific requirements.
Active Community: Being an open-source project, METEOR benefits from a collaborative community that contributes to its ongoing development and refinement, ensuring it remains relevant and powerful.
