Pup

screenshot of Pup

Parsing HTML at the command line

Overview

Pup is a powerful command line tool designed for those who frequently work with HTML. Inspired by the popular JSON processor jq, it offers an efficient and flexible way to parse and manipulate HTML content directly from the terminal. Whether you're looking to filter specific elements, extract attributes, or convert HTML to JSON, pup provides a robust set of features to streamline the process. Its ability to execute complex CSS selectors makes it a valuable asset for developers and data analysts alike.

What sets pup apart is its simplicity and speed. With straightforward installation instructions and an intuitive command structure, you can quickly get started exploring and transforming HTML. Whether you're cleaning up web data or simply want to retrieve specific content, pup offers a command-line experience that enhances productivity and efficiency.

Features

  • CSS Selector Filtering: Easily filter HTML elements using CSS selectors, allowing you to pinpoint exactly what you need from the document.
  • Text Extraction: Use the text{} function to extract all text content from selected nodes and their children in a depth-first order.
  • Attribute Retrieval: The attr{attrkey} function enables you to print the values of specific attributes from selected nodes, which is great for gathering data efficiently.
  • JSON Conversion: Convert HTML output into a more consumable JSON format with the json{} function, improving interoperability with various applications.
  • Indentation Control: Customize the output format with the -i or --indent flag to control the indentation level of the formatted HTML or JSON results.
  • Pseudo Classes: Pup implements a variety of CSS pseudo classes, allowing for intricate selections and interactions within the HTML structure.
  • Chained Selectors: Combine multiple selectors to refine your selection further, making it easier to drill down to the elements of interest.
  • Comprehensive Help Option: Access detailed usage instructions and options by running pup --help, ensuring that users can fully leverage its capabilities.