Pytriton

screenshot of Pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

Overview

PyTriton is a framework that simplifies the use of NVIDIA's Triton Inference Server within Python environments. It allows for easy deployment of Machine Learning models with support for various frameworks like PyTorch, TensorFlow, and JAX. With features like model serving, performance optimization, and streaming functionalities, PyTriton provides a versatile solution for deploying and managing models.

Features

  • Native Python support: Create Python functions and expose them as HTTP/gRPC APIs.
  • Framework-agnostic: Run Python code with any framework like PyTorch, TensorFlow, or JAX.
  • Performance optimization: Utilize dynamic batching, response cache, model pipelining, clusters, performance tracing, and GPU/CPU inference.
  • Decorators: Use batching decorators for handling batching and pre-processing tasks.
  • Easy installation and setup: Simple and familiar interface based on Flask/FastAPI for quick installation.
  • Model clients: Access high-level model clients for HTTP/gRPC requests with configurable options.
  • Streaming (alpha): Stream partial responses from models by serving in a decoupled mode.

Summary