Astro Provider Databricks

screenshot of Astro Provider Databricks

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows

Overview

The Astronomer Databricks Provider has made significant contributions to the Apache Airflow community, providing developers with powerful tools to streamline their workflow management. As a solution designed for those utilizing Databricks, this provider integrates seamlessly with Airflow, enabling users to create and manage Databricks workflows efficiently. Although Astronomer has transitioned these features to the Apache Airflow repository, the provider will continue to play a crucial role in data orchestration.

With the potential for a notable cost reduction in running Databricks notebooks—up to 75%—this integration makes it an appealing option for anyone looking to optimize their data processing tasks. Even though Astronomer has shifted its focus, the legacy of the Astronomer Databricks Provider is still relevant for users exploring efficient workflows with Airflow.

Features

  • Cost Efficiency: Leverage Databricks Workflows through Airflow to experience significant savings on compute costs.

  • Easy Migration: Transitioning from the deprecated features to the official Apache Airflow Databricks Provider is straightforward, requiring only a change in the import path in your code.

  • Community Support: As part of the open-source community, users can continue to contribute and enhance the functionality, ensuring its growth and longevity.

  • Rich Authoring Interface: Define complex workflows with code in Airflow DAGs, providing a robust alternative to the web-based UI of Databricks.

  • Open Accessibility: The provider is available to all Airflow users, removing barriers for those who do not hold an Astronomer customer status.

  • Historical Significance: This repository serves as a historical record, allowing users to understand the evolution of Databricks integration within Airflow, even as it is no longer actively maintained.

  • Automated Workflows: Create and manage automated workflows that utilize the full power of Airflow and Databricks in concert.

This combination of features not only enhances productivity but also ensures that users have the tools they need to manage their data workflows effectively.