Abstract
A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.
Full Text
What is claimed is:
A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.
Timeline
Filed
02/19/2026Published
06/25/2026Granted
Not AvailableIPC Codes(2)
G10L 13/047:Architecture of speech synthesisers
G10L 13/06:Elementary speech units used in speech synthesisers; Concatenation rules