Google researchers are working on a way to translate the language directly into another language without first translating it into text. Google's translatotron can also sustain the speaker's voice.
The technique works with a neural network that analyzes and converts spectrograms into a spectrogram corresponding to the language to be translated. According to the researchers, Translatotron is the first end-to-end model that can translate speech directly into another language.
It is already possible to translate spoken texts and have them re-spoken in another language, but the language is first converted into text, which is then translated and then converted back to speech. This is how Google Translate works.
By directly translating speech without first writing the text, according to Google, the speaker's voice can also be preserved. An optional speaker encoder is used for this purpose to ensure that the properties of the translated language are preserved.
If and when Translatotron is used in practice it is not yet known. For examples of the new translation method, visit GitHub. The full search can be found in ArXiv.