Usaddress

screenshot of Usaddress

:us: a python library for parsing unstructured United States address strings into address components

Overview

The usaddress is a Python library that uses advanced NLP methods to parse unstructured United States address strings into address components. It is capable of making educated guesses to identify address components, even in tricky cases where rule-based parsers typically fail. However, it cannot guarantee perfect accuracy in identifying address components or verify the correctness/validity of a given address. Additionally, it does not normalize the address, but there is a library built on top of usaddress that provides this functionality.

Features

  • Probabilistic Model: The library uses a probabilistic model to make educated guesses in identifying address components.
  • Parserator API: A RESTful API built on top of usaddress, which can be used by programmers who do not use Python. It requires an API key, and the first 1,000 parses are free.
  • Parserator Google Sheets App: A Google Sheets app called "Parse and Split Addresses" that allows users to easily split addresses into separate columns by street, city, state, zipcode, and more.
  • Training and Testing: The library provides tools for training and improving the usaddress parser's model on labeled training data. It also includes a testing suite to ensure the proper functioning of the code.