-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add Audio Input for Generating Q&A #82
base: main
Are you sure you want to change the base?
Conversation
@IronJam11 could you provide some more details about the implementation? , also please improve the PR message |
Resolves Issue #34: Audio Transcription FunctionalityWhat does this PR do?This PR addresses and resolves issues with the audio transcription functionality by implementing a robust Flask-based transcription service. It supports multiple audio/video formats, enhances security, and optimizes processing. Key Changes Made
Technical Details
How to Test
Attached VideoDemonstration of functionality: [Link to Video](https://drive.google.com/file/d/1K0r7-J2cgbfU4AhBG87kxVZl3BzUGdUl/view?usp=sharing) Let me know if you would like me to explain more on this @Roaster05 |
@IronJam11 i have 2 concerns try to find a work around for them
|
i will make the changes and add the necessary details until midnight |
I have increased the capacity to 50 Mb, and in order to install FFmpeg you will need to add to your path variables. 1. WindowsManual Installation
Using Package Managers
2. LinuxDebian/Ubuntu
Fedora
Arch Linux
Snap (Universal for Linux):
3. macOSUsing Homebrew (Recommended)
Using MacPorts
|
@Roaster05 ??? |
|
Resolves Issue #34
What does this PR do?
Resolves audio transcription functionality issues by implementing a robust Flask-based transcription service that handles multiple audio/video formats.
Changes Made:
Implemented unified audio conversion pipeline using FFmpeg
Added support for multiple formats including MP3, WAV, OGG, M4A, MP4, AVI, MOV, MKV, WEBM, AAC
Enhanced error handling and logging throughout the application
Added file validation and security measures (secure filenames, size limits)
Implemented automatic cleanup of temporary files
Integrated Google Speech Recognition with optimized settings
Added proper CORS support for cross-origin requests
Technical Details:
Uses FFmpeg for audio/video processing
Leverages Google Speech Recognition API for transcription
Standardizes audio conversion to 16kHz mono WAV format
Implements file size limit of 16MB
Uses Werkzeug security features for filename handling
Includes comprehensive logging system
How to Test:
Install required dependencies:
bashCopypip install flask flask-cors SpeechRecognition ffmpeg-python werkzeug
Install FFmpeg on your system
Run the Flask application
Send a POST request to /upload with any supported audio/video file
Verify that you receive a JSON response with the transcription
Attached Video:
https://drive.google.com/file/d/1K0r7-J2cgbfU4AhBG87kxVZl3BzUGdUl/view?usp=sharing
Related Isuue:
#34