
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.
The HTML to Markdown library is a modern solution designed for developers looking to efficiently convert HTML content into Markdown format. Built with Python 3.9+ compatibility in mind, this library is a complete rewrite of the original markdownify, presenting a fresh codebase with stringent type safety and a host of advanced features. Whether you’re dealing with simple text or complex HTML documents, this library provides powerful tools to make the conversion process seamless and intuitive.
This library stands out for its extensive support for HTML5 structures and metadata extraction, making it suitable for both simple web pages and intricate documentation projects. Performance enhancements via the optional lxml parser add to its appeal, ensuring that users can handle larger documents without compromising on speed or accuracy.
Full HTML5 Support: Comprehensive processing of all modern HTML5 elements, including tables, forms, and SVG, ensuring accurate representation in Markdown.
Enhanced Table Support: Advanced handling of merged cells with rowspan and colspan, providing better structure in table representation.
Type Safety: The library adheres to strict MyPy guidelines, offering comprehensive type hints for safer and clearer code.
Metadata Extraction: Automatically extracts document metadata such as title and meta tags, offering valuable context right from the headers.
Streaming Support: Efficiently processes large documents with progress callbacks, making it suitable for extensive content without strain on memory.
Highlight Support: Allows multiple styles for highlighted text, including easy conversion of <mark> elements.
Task List Support: Converts HTML checkboxes into GitHub-compatible task list syntax, handy for managing tasks directly within Markdown.
Flexible Configuration: Offers over 20 configuration options to customize the conversion behavior according to specific needs.
This library serves as a robust tool for developers wanting to bridge the gap between HTML content and Markdown, enhancing productivity with its rich features and customizability.
