Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add article real-time-transcription-deepgram-api #101

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Handling Real-Time Transcription with Deepgram's API


Real-time transcription allows developers to display audio transcriptions instantaneously as speech is processed, enhancing the immediacy of user interactions. However, managing the sequence of transcriptions efficiently is vital for applications required to send complete transcripts post-speech events. In particular, the `is_final` flag provided by Deepgram's API indicates when a segment of speech is considered final, and developers can leverage this to optimize transcript handling.

### Managing Transcription Segments

When utilizing the Deepgram API for real-time transcription via a WebSocket connection, an understanding of how to handle interim and final results is crucial. Each data event can include:
- **Interim results:** Transcriptions that may change as more audio is processed.
- **Final results:** Transcriptions that Deepgram considers complete for a segment.

#### Buffer Strategy

Implementing a "buffer" variable can help manage displayed text and determine when to send complete transcripts to a backend service. This approach includes storing interim results and updating them upon receiving final results.

- **Initialize Buffer:** Initialize a buffer to store text between events.
- **Update Buffer:** Concatenate new interim results to the buffer.
- **Check Finalization:** When `is_final` is `true`, store the buffer content, reset the buffer, and prepare for the next segment out of silence.
- **Send Complete Transcripts:** After a set duration of silence (i.e., when no new `is_final` events occur within a specified time window), compile and send the complete transcription.

```mermaid
graph TD;
A[Start Transcription] --> B[Receive Interim Results];
B -->|Update Buffer| C[Check is_final];
C -->|No| B;
C -->|Yes| D[Store Final Results In Buffer];
D --> E[Check Silence Duration];
E -->|Timeout| F[Send Complete Transcription];
E -->|Timeout Not Met| A;
```

### Implementing Message Handling
To implement this logic, a WebSocket listener can handle message parsing and buffer management:
```javascript
// Example pseudocode for managing buffer
let buffer = ''; // Buffer to store transcriptions
let silenceTimeout;

websocket.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.is_final) {
buffer += data.transcript;
clearTimeout(silenceTimeout);
silenceTimeout = setTimeout(() => {
sendToBackend(buffer); // Function to handle sending the transcription
buffer = ''; // Reset buffer
}, SILENCE_PERIOD);
} else {
buffer += data.transcript; // Append interim results
}
};
```

### Further Resources
- [Deepgram Developer Documentation: Endpointing and Interim Results](https://developers.deepgram.com/docs/understand-endpointing-interim-results)
- [Deepgram JavaScript SDK](https://github.com/deepgram/deepgram-js-sdk)

### Conclusion
Implementing a buffer strategy enables efficient real-time transcription management with Deepgram's API, allowing applications to provide immediate transcription feedback while sending complete transcripts post-silence. If issues persist or the system behavior seems inconsistent, reach out to your Deepgram support representative (if you have one) or visit our community for assistance: [Deepgram Discord Community](https://discord.gg/deepgram).