Whisper UI

screenshot of Whisper UI
svelte
vite

A GUI interface for Open AI Whisper based on Tauri and Sveltekit

Overview:

Whiskey is a graphical user interface (GUI) for Open AI's Whisper speech recognition system. It is built using Tauri and Sveltekit technologies and utilizes C++ binaries for Whisper. Whiskey provides users with the ability to transcribe audio or video files into written text, with real-time text highlighting during playback. It also offers features such as exporting transcriptions as .txt or .vtt files. This article will provide an analysis of Whiskey's key features, installation guide, and a summary of its capabilities.

Features:

  • Transcribe audio or video files into written text: Whiskey allows users to convert audio or video files into written text using the Whisper speech recognition system.
  • Real-time text highlighting during playback: The GUI provides a feature that highlights the transcribed text in real-time while playing back the audio or video file.
  • Export transcriptions as .txt or .vtt files: Whiskey allows users to export the transcriptions as .txt or .vtt files, providing flexibility for further use or sharing.

Planned features:

  • [x] Export files
  • [x] Rename files
  • [x] Save already opened files
  • [x] Upload more than wav files
  • [x] Upload video
  • [x] Drag and drop
  • [x] Start audio playback from line
  • [ ] Record mic audio directly
  • [ ] Apple Silicon, Linux, and Windows binaries
  • [ ] Editable text
  • [ ] Event and errors show in UI
  • [ ] Prediction accuracy

Summary:

Whiskey is a user-friendly GUI for Open AI's Whisper speech recognition system. It is built using Tauri and Sveltekit technologies and provides users with the ability to transcribe audio or video files into written text. The GUI offers real-time text highlighting during playback and allows for the export of transcriptions as .txt or .vtt files. With planned features like file renaming, drag and drop functionality, and support for different platforms, Whiskey aims to enhance the user experience and expand its usability. Overall, Whiskey is a powerful tool for transcription tasks that is easy to install and use.

svelte
Svelte

Svelte is a modern front-end framework that compiles your code at build time, resulting in smaller and faster applications. It uses a reactive approach to update the DOM, allowing for high performance and a smoother user experience.

vite
Vite

Vite is a build tool that aims to provide a faster and leaner development experience for modern web projects

typescript
Typescript

TypeScript is a superset of JavaScript, providing optional static typing, classes, interfaces, and other features that help developers write more maintainable and scalable code. TypeScript's static typing system can catch errors at compile-time, making it easier to build and maintain large applications.