ChatGPT voice to text is here: 4 cents a minute, Chinese is not very good


ChatGPT, one of the hottest AI apps in the world, has started its commercial trial after amassing 100 million users. Yesterday, ChatGPT launched its API, which allows businesses to pay $2 for 750,000 words, a 90 percent reduction in cost.

In fact, one of the other businesses that opened yesterday was a voice-to-text API based on the company’s Whisper Large model, which was first released last September as the Whisper Large-V1 model, and an updated version of the Whisper Large-V2 model opened source in December.

Once commercialized, the Whisper API also charges a low price of $0.006 per minute, or about 4 cents, which is expected to put a lot of pressure on voice-related companies.

The Whisper API supports transcription and translation of voice files and supports dozens of languages including English, Chinese, Arabic, Japanese, German, Spanish and more.

The Whisper large-v2 model has a 5% error rate in Spanish, English, Italian, German and other languages, which requires simple modifications after translation.

As for Chinese, the error rate of v1 model is 19.6%, and v2 is slightly increased to 14.7%, which is not much improvement. The error rate is much higher than that of English and Spanish, etc., so it is somewhat troublesome for users to use, and there are many corrections to be made.

As for why there is such a gap, in addition to the characteristics of Chinese itself, it is likely to be related to the use of less Chinese corpus training, after all, the content of the Internet is mainly in foreign languages.


