Htmldate

screenshot of Htmldate

Htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

Overview

Htmldate is a Python package that allows users to find the original and updated publication dates of any web page. It offers a range of features including flexible input options, customizable output formats, multilingual support, and compatibility with recent Python versions. The package uses heuristics to sift through HTML markup and text elements to identify dates accurately.

Features

  • Flexible Input: Supports URLs, HTML files, or HTML trees for input, including batch processing.
  • Customizable Output: Allows customization of the date format, defaults to ISO 8601YMD.
  • Detection of Dates: Identifies both original and updated dates on web pages.
  • Multilingual: Supports multiple languages for date identification.
  • Compatibility: Works with all recent versions of Python.