Hey Jetson

screenshot of Hey Jetson
flask

Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.

Overview:

The Jetson-based Automatic Speech Recognition (ASR) platform is an innovative approach to speech recognition that leverages deep learning technology for real-time applications. Developed by Brice Walker, this system is particularly geared towards providing therapists with immediate feedback during sessions, thus enhancing therapeutic interventions. While the project originates from a desire to assist in mental health settings, its versatility allows it to be applied in various domains such as mobile applications and robotics, where cloud-based solutions aren't feasible.

Utilizing cutting-edge tools like Keras and TensorFlow, the platform demonstrates a commitment to applied data science by offering a robust model consisting of a deep neural network configuration that incorporates advanced mechanisms such as dilated convolutions and attention layers. The detailed architecture ensures not only effectiveness but also scalability in real-world settings.

Features:

  • Advanced Neural Architecture: The model comprises 3 layers of dilated convolutional neurons and 7 layers of bidirectional GRU cells, optimizing performance for speech tasks.
  • Attention Mechanism: Incorporates a single attention layer to enhance the model's ability to focus on relevant parts of the audio input, leading to improved recognition accuracy.
  • Robust Loss Function: Utilizes a CTC loss function and the Adam optimizer to effectively minimize prediction errors during training.
  • High Training Performance: The model was trained on an Nvidia GTX1070 GPU over approximately 6.5 days, achieving a notable 78% cosine similarity with ground truth transcriptions.
  • Real-Time Feedback Capability: Designed specifically for applications that require instant feedback, making it an invaluable tool for mental health professionals.
  • REST API Integration: Comes with a Flask web server enabling easy deployment of the speech inference engine via RESTful services.
  • Comprehensive Documentation: Step-by-step instructions are provided for downloading the project, preparing datasets, and deploying the web app, facilitating ease of use.
  • Cross-Platform Potential: Although tested on Ubuntu 18.04 LTS, the design opens up possibilities for future deployments on additional platforms.
flask
Flask

Flask is a lightweight and popular web framework for Python, known for its simplicity and flexibility. It is widely used to build web applications, providing a minimalistic approach to web development with features like routing, templates, and support for extensions.