Htmlunit

screenshot of Htmlunit

HtmlUnit is a "GUI-Less browser for Java programs".

Overview

HtmlUnit is a GUI-less browser for Java programs that allows users to model HTML documents and interact with web pages programmatically. It provides support for HTTP and HTTPS protocols, cookies, form submission, JavaScript, and more. HtmlUnit is commonly used for testing purposes and web scraping.

Features

  • Support for HTTP and HTTPS protocols
  • Ability to handle cookies
  • Support for various submit methods (POST, GET, HEAD, DELETE)
  • Customizable request headers
  • Wrapper for HTML pages for easy access to information
  • Form submission and link clicking support
  • Proxy server support
  • Support for basic and NTLM authentication

Maven:

Add the following to your pom.xml:

<dependency>
    <groupId>net.sourceforge.htmlunit</groupId>
    <artifactId>htmlunit</artifactId>
    <version>4.4.0</version>
</dependency>

Gradle:

Add the following to your build.gradle:

implementation 'net.sourceforge.htmlunit:htmlunit:4.4.0'

Summary

HtmlUnit is a powerful tool for Java developers to interact with web pages programmatically. With features like support for various protocols, cookies, form submissions, and excellent JavaScript support, HtmlUnit is a valuable resource for testing and web scraping applications. The ability to customize request headers, handle authentication, and work with proxy servers makes it versatile for a variety of projects. Consider HtmlUnit for your next Java web automation project.