Nokogiri

screenshot of Nokogiri

HTML parser for PHP - Парсер HTML

Overview

The Nokogiri library is a powerful and efficient tool designed for parsing HTML documents, including those with invalid code. Built on the robust LibXML engine, it aims to facilitate the extraction and manipulation of data from web pages, making it an essential asset for developers who need to work with HTML. With its recent updates, it has improved its handling of selectors and added new functionalities while maintaining compatibility with previous PHP versions.

As the landscape of web development continues to evolve, Nokogiri ensures that users can still work efficiently with various PHP environments, particularly those using PHP 7.3 and above. Whether you're a seasoned developer or just starting, this library offers an intuitive interface that makes HTML parsing straightforward.

Features

  • Fast HTML Parsing: Built on LibXML, Nokogiri quickly parses HTML, even if it's malformed or contains errors.
  • Flexible Input: Accepts both raw HTML strings in UTF-8 encoding and DOMDocument objects to suit various workflows.
  • CSS Selector Support: Allows querying elements using CSS selectors, which are internally converted to XPath expressions for flexibility.
  • Array Representation: The toArray() method provides an easily accessible array structure of the underlying DOM, including attributes and child elements.
  • DOM Conversion: Easily convert back to an HTML string using toXml() or obtain a DOMDocument with getDom().
  • Rich Selector Capabilities: Supports a wide array of selectors, including advanced options like :nth-child(), ensuring powerful querying options.
  • Compatibility Maintenance: Designed to work with PHP versions from 5.4 to 7.3+, making it versatile for different projects.
  • Updated Error Handling: The latest version introduces better error management, throwing exceptions for incorrect selectors, which enhances debugging.