beta
/Text-to-speech System With Variable Frame Rate
Abstract

A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.

Full Text

What is claimed is:

A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.
Timeline
Filed
02/19/2026
Published
06/25/2026
Granted
Not Available
IPC Codes(2)
G10L 13/047:Architecture of speech synthesisers
G10L 13/06:Elementary speech units used in speech synthesisers; Concatenation rules