Blind World Magazine

A Festival of speech synthesis for Linux.

June 21, 2005.

As information technology becomes more pervasive, the issues of communication between information-processing machines and people becomes increasingly important. Up to now such communication has been almost entirely by means of video screens. Speech, which is by far the most widely used and natural means of communication between people, is an obvious possible substitute. However, this deceptively simple means of exchanging information is, in fact, extremely complicated. Festival Speech Synthesis System aims to make things a little easier on interface developers.

Speech synthesis -- automatic generation of human speech waveforms without directly using a human voice -- has been under development for decades. Speech synthesizers, often called text-to-speech (TTS) synthesizer systems, can be implemented in either software or hardware. The first commercial speech synthesis systems were mostly hardware-based, and their development process was time-consuming and expensive. Since computers have become more powerful, most synthesizers today are software-based. Software-based systems are easy to configure and update, and much less expensive than their hardware counterparts.

You can find a wide array of software tools for speech synthesis, ranging from commercial products to software for download over the Internet, with varying kinds of licensing. Some commercially available TTS systems include:

Apple PlainTalk

Acapela Speech Technologies

Rhetorical rVoice

Loquendo TTS

ScanSoft RealSpeak

Sakrament Text-to-Speech Engine

Nuance Vocalizer

AT&T Natural Voices

Recently, the speech research community has been turning toward open source software, as exemplified by toolkits such as CSLU toolkit, the ISIP Automatic Speech Recognition toolkit, and the Edinburgh speech tools, all of which can help your computer find its voice.

There are many advantages to using open source software for research work. Frequently a researcher is faced with a tool that almost does the task at hand, but needs some tweaking. Having access to the source code allows the researcher, at least in theory, to make the needed modifications. But mere openness is not a guarantee of flexibility. In order for a tool to be flexible, it must have well-defined programming interfaces -- otherwise, extensions and modifications will be hard to develop and maintain -- and it must be interoperable with other tools.

Festival Speech Synthesis System is one such tool. Festival grew out of the need for a unifying, flexible, and extensible tool for research and educational purposes at The Centre for Speech Technology Research (CSTR) at University of Edinburgh.

Festival is a free, portable, extensible, language-independent, run-time speech synthesis engine for various platforms that has been under development since 1999. Primary authors of the C++ system include Alan W Black, Paul Taylor, and Richard Caley. Festival is a part of the Festvox project that aims to make the building of new synthetic voices more systematic and better documented, making it possible for anyone to build a new voice.

Festival offers developers a basic framework for building speech synthesis systems, and includes various demo modules. It offers text-to-speech through a number of APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and even via an Emacs interface. Though Festival is multi-lingual (currently English, Welsh, and Spanish), support for English is the most advanced. The system uses Edinburgh Speech Tools for its underlying architecture and has a Scheme-based (SIOD) command interpreter for control.

The Festival Speech Synthesis System was designed to target three classes of speech synthesis users:

Speech synthesis researchers, who may use Festival for developing and testing new speech synthesis methods;

Speech application developers, who are developing language systems and wish to include synthesis output, such as different voices, specific phrasing, and dialog types; and

End-users, with systems that take text and generates speech, requiring little configuration from users.

Source URL:

End of article.

Any further reproduction or distribution of this article in a format other than a specialized format, may be an infringement of copyright.

Go to ...

Top of Page.

Previous Page.

List of Categories.

Home Page.

Blind World Website
Designed and Maintained by:
George Cassell
All Rights Reserved.

Copyright Notice
and Disclaimer.