Probablepeople

screenshot of Probablepeople

:family: a python library for parsing unstructured western names into name components.

Overview:

probablepeople is a python library that uses advanced NLP methods to parse unstructured romanized name or company strings into components. It is built on the usaddress library for parsing addresses and offers a web interface and API for non-python developers. While the library makes educated guesses in identifying name or corporation components, it cannot guarantee perfect accuracy or validate the correctness/validity of a given name or company. probablepeople learns from a body of training data, and users can contribute by providing examples of names/companies that the parser struggles with.

Features:

  • Parsing Unstructured Names and Companies: probablepeople can parse unstructured romanized name or company strings and identify components.
  • Probabilistic Model: The library uses a probabilistic model to make educated guesses in identifying name or corporation components, even in tricky cases where rule-based parsers typically break down.
  • Training Data: probablepeople learns how to parse names/companies from a body of training data. Users can contribute by providing examples of names/companies that the parser struggles with to improve its accuracy and performance.