Skip to main content

Pricing

This page introduces the billing policy for the Real-Time STT add-on provided by Agora.

Your billing details may differ if you have signed a contract with Agora.

Overview

Agora calculates the billing of all projects under your Agora account on a monthly basis. Billing begins once you enable Real-Time STT.

Transcription fee

When Real-Time STT is enabled for a channel, it transcribes the audio of its active hosts. When Real-Time STT is enabled for specific hosts, it only transcribes the audio of the specified hosts and ignores the others. The Real-Time STT service employs algorithms that remove the periods of silence and improve WER (Word Error Rate) of transcription. The processed audio is transcribed by the Real-Time STT engine and referred to as transcription duration. Agora charges for the transcription duration of all or specified hosts in the channel.

The unit price is as follows:

Billing itemUsage, minutes per monthPricing, US$/1,000 minutes
Transcription durationAbove 016.99

Example

After you enable Real-Time STT:

  • Host A speaks for 2 minutes and remains silent for 8 minutes.
  • Host B speaks for 3 minutes and remains silent for 7 minutes.
  • Host C speaks for 3 minutes and remains silent for 7 minutes.
  • All hosts are silent for the first 2 minutes of the call.

In this case, the total transcription minutes are calculated as 2 (Host A) + 3 (Host B) + 3 (Host C) = 8 minutes. The silent periods of each host, including the time spent listening to others, are not counted towards the transcription duration.

Note
  • WER is a measure of the accuracy of an STT engine - the lower, the better.
  • Real-Time STT does not incur additional RTC audio fee.
  • Enabling Real-Time STT for channels or hosts that are silent for long periods is not recommended. In the example, during the first 2 minutes, the Real-Time STT worker processes all hosts' audio to remove silent portions. In this case, Agora charges for the first 2 minutes, and the STT engine standby time is billed at $0.99/1,000 minutes with the same discount applied as for RTC audio.

Language identification fee

Real-Time STT supports dynamic language detection when two or more languages are enabled for a channel or specific hosts. The LID (language identification) duration is the same as the transcription duration.

Billing itemUsage, minutes per monthPricing, US$/1,000 minutes
Language identification durationAbove 05.00

Examples:

  • Let's say there is a channel existing for 10 minutes. There are 3 active hosts - A, B, and C - all in the unmuted state.
  • #3: If Spanish and Chinese LID is enabled for this channel at the start, the algorithm will remove 8 minutes of silent audio for host A, 7 minutes for host B and 7 minutes for host C. Therefore, the transcription duration is 2 + 3 + 3 = 8 minutes. the LID duration is 8 minutes, too, being the sum of 2 minutes for host A, 3 minutes for host B, and 3 minutes for host C.
  • If Spanish and Chinese LID is enabled for host A, then the transcription duration and LID duration are both 2 minutes.

Notes:

  • The Real-Time STT transcription duration does not change if you enable more than 1 language.
  • If only 1 language is set for a channel or a specified host, the language detection will not start.

Free-of-charge duration

Real-Time STT provides 300 minutes of free-of-charge duration for integration and testing purposes.

Contact sales@agora.io or your AE to get a discount.

vundefined