Speech Generation for Indigenous Language Education
Jan 1, 2024·,,,,,,,,,,,,,,,·
0 min read
Aidan Pine
Erica Cooper
David Guzmán
Eric Joanis
Anna Kazantseva
Ross Krekoski
Roland Kuhn
Samuel Larkin
Patrick Littell
Delaney Lothian
Akwiratékha’ Martin
Korin Richmond
Marc Tessier
Cassia Valentini-Botinhao
Dan Wells
Junichi Yamagishi
Abstract
As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.
Type
Publication
Computer Speech & Language