WhisperX Server is a powerful backend application that provides advanced audio and video processing capabilities, including transcription, text-to-speech conversion, and voice conversion. It's designed to work in conjunction with the Banana Client project.
- Audio transcription using WhisperX
- Text-to-speech synthesis with multiple voices and backends
- Voice conversion using RVC (Retrieval-based Voice Conversion)
- YouTube video downloading and processing
- Subtitle generation
- Storyboard creation from videos
- API endpoints for integration with frontend applications
-
Clone the repository:
git clone https://github.com/your-username/whisperx-server.git cd whisperx-server
-
Create and activate a virtual environment:
python -m venv my-venv source my-venv/bin/activate # On Windows, use: my-venv\Scripts\activate
-
Install PyTorch following the instructions at https://pytorch.org/
-
Install the required dependencies:
pip install -r requirements.txt
Note: Some dependencies may need to be installed manually or may require additional setup.
-
Set up environment variables: Create a
.env
file in the project root and add the following:HF_TOKEN=your_huggingface_token MODEL_DIRECTORY=path/to/model/directory OUTPUT_DIRECTORY=path/to/output/directory VOICES_DIRECTORY=path/to/voices/directory API_TOKEN=your_api_token
-
Start the server:
python main.py
-
The server will be available at
http://localhost:8127
(or the port specified in your config.ini file). -
Use the provided API endpoints to interact with the server. For example:
- Transcribe a YouTube video:
POST /api/transcribe/url
- Generate text-to-speech:
POST /api/text2speech
- Process audio with voice conversion:
POST /api/rvc
- Transcribe a YouTube video:
- Network settings, model parameters, and other options can be configured in the
config.ini
file. - Additional settings are available in the
settings.py
file.
- Implement TTS Backends
- Tortoise
- Azure
- StyleTTS
- xTTS2
- Implement chunk determination for improved transcription accuracy
- Add support for outputting video in different languages
- Implement unit tests and integration tests
- Refactor project structure and codebase:
- Organize modules and files into logical directories
- Standardize naming conventions across the project
- Improve code documentation and comments
- Reduce code duplication and increase reusability
- Optimize import statements and remove unused imports
- Implement proper error handling and logging
- Create a consistent API structure across endpoints
- Update and maintain requirements.txt for easier dependency management
- Improve configuration management (consider using a proper config management library)
- Enhance security measures, especially for API endpoints
- Optimize performance for large-scale audio and video processing
- Improve integration and documentation for use with Banana Client project
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
I'm still learning and improving my skills with this project. If you have any questions, suggestions, or if you'd like to contribute, please don't hesitate to reach out:
- GitHub Issues: For bug reports, feature requests, or general questions, please open an issue on this repository.
- Discussions: For broader conversations about the project, use the GitHub Discussions feature in this repository.
This project is licensed under the MIT License - see the LICENSE.md file for details.
- WhisperX for improved transcription capabilities
- RVC (Retrieval-based Voice Conversion) for voice conversion
- YT-DLP for YouTube video downloading
- FastAPI for the API framework
- PyTorch for deep learning capabilities
This project is designed to work in conjunction with the Banana Client project. Make sure to set up and configure both projects for full functionality.