more about the Talking Machine including a sound recording


Recording of the Talking Machine counting from 1 to 20.


picture of the vowel pipes
The vowel pipes of the Talking Machine
An acoustic speech synthesizer. The machine is arranged like an organ, with a pipe for each speech sound. Each pipe consists of a noise-maker - a reed or whistle - and a resonator formed like the inside of a human mouth. This resonator filters the noise into a speech sound.

The valves which supply the air to the pipes are driven by a computer. The machine has a vocabulary of a few hundred words in English. The visitor can enter sentences using these words and the machine will speak them: eg. "How I want a cigarette" (delivered in a monotone; all the pipes are tuned to the same pitch). It can also recite the alphabet and count to 100 in English, German and Japanese.

a photo of a guest operating the machine
The Talking Machine at the Centre for Art and Media Technologies (ZKM).

The Talking Machine is usually presented in art exhibitions, as here at the ZKM in Karlsruhe, Germany. We are looking over the shoulder of a visitor who is operating the machine (at the moment he is listening to the sounds of the individual pipes). The program also includes counting in English, Japanese and German, a futurist poem Canzone di Maggio by Giacomo Balla (1917) and a screen where visitors can type in their own short sentences using words from the pronunciation dictionary.
How the project started
In the 1980s I built a computer driven organ. It was driven by a Sinclair ZX81 and the interface was designed by the composer Roland Pfrengle. While I was voicing the pipes - adjusting them so that they all spoke at the same speed and with the same timbre - I noticed that some pipes made hissing and whistling noises which were reminiscent of human speech sounds. I wondered if it might be possible to replace all my organ pipes with specialised voice pipes and whether this new machine could be taught to speak.

I had no idea of how we produce speech sounds - except for some rather vague notions about vocal "cords". I first read The Mechanism of Human Speech together with a Description of his Speaking Machine by Wolfgang von Kempelen (1791). I should mention that my Talking Machine works on a principle - a separate pipe for each sound - which Kempelen rejected, but his classic work with its lively text and splendid engravings was a great inspiration for me. Among a number of other books, I also read Gunnar Fant's Acoustic Theory of Speech Production (1960). Although it makes dryer reading than Kempelen, it provided valuable X-ray information about the shapes we make inside our mouths when we speak. Another important source was the pioneering The Vowel: its Nature and Structure (1942) by Chiba and Kajiyama that provided the shapes of five primary vowel sounds. All that remained to do was to make models of these shapes, add artificial vocal cords and rewrite my music-playing software to deal with voice sounds. Or so I thought ... actually, it was not quite as easy as that and the machine took a couple of years to complete.


a full length photo of the Talking Machine


The Talking Machine (1989-1991)
32 pipes and air valves, wind chests, magazine bellows, blower, computer. 230 cm high.


The white box at the bottom is a sound-proof casing enclosing the blower. The flat white box above that is a magazine bellows which evens out the air pressure no matter how many pipes are being played. The air goes up the red hoses to the four wind chests which carry the pipes. The wind chests are transparent so the movement of the electromagnetic valves can be seen and this is further reinforced by LEDs attached to each valve. The black cables carry the signals from the computer with the user interface to the valves.


a line drawing of all 32 pipes the pipes

The pipes on the top row are the fricatives - the hissing sounds - basically, specialised whistles. The two rows below that are mainly vowel sounds but also include vowel-like sounds like W and Y. The lowest row are the remaining consonants. The six pipes in the middle of the row have additional valves to change their sound while it is being spoken and the three pipes on the left have "noses" added to their mouth resonators that enables them to speak the nasal sounds M, N and NG.


a photo of a voice pipe sawn in half a pipe sawn in half

While I was making the machine I accumulated several reject pipes. This is one of them - now used for demonstration purposes. At the bottom is the computer-controlled electromagnetic valve which opens to let the air into the pipe. Above it, in the box, is a metal reed, the equivalent of the vocal cords. It vibrates making a bassoon-like sound and this sound is filtered by the resonator above it, the equivalent of the human mouth, transforming it into a speech sound.


a line drawing of an ee pipe and a human mouth a comparison of the "ee" pipe and a human saying "ee"

To pronounce "ee" the tongue goes to the front of the mouth and forms a narrow channel just behind the teeth. The diagram shows the equivalent voice pipe.

The reed is made of brass and rests just clear of a leather rim on the spoon-like form of the "kelch". The airflow causes it to vibrate against the kelch. The reed is roughly tuned by a lead weight fixed to its tip. Fine tuning is achieved by means of a wire that can be adjusted to fix the length of reed that is free to vibrate. (These are adaptations of traditional organ-building techniques.)


a screen shot of the numbers data from the pronunciation dictionary: English numbers

A screenshot of the computer program that governs the order in which the pipes are played and the timing. This page is concerned with English numbers. Each letter represents a pipe; the numbers represent time units.

1,   O3 U3 2 n4

To speak the word "one", the pipe named O (which makes an oo sound as in blue) plays for 3 time units. Next, the pipe named U (which makes an uh sound - as in up) plays for 3 time units. Then there is a pause of 2 time units. Finally, the n pipe plays for 4 time units. Result: oo uh - n ... or "one".

Sometimes two, or more, pipes play at the same time as in the fv1 at the end of five.

The same recording, as above, of the Talking Machine counting from 1 to 20.

a photo of the Talking Machine in a concert with the flautist Lesley Olsen

The machine is usually shown in exhibitions but sometimes takes part in concerts. Here it is in a concert, with Lesley Olson playing SprachMusik by Roland Pfrengle (2008).

Here are the programme notes for a performance How to learn to talk 2U by Tomomi Adachi (2014).


What they said:

Here is a review of my contributions to Sound Inventions by Bart Hopkin and Sudhu Tewari on the subject of Mechanical Speech Synthesis'

Abstract
This chapter provides a basic grounding in the mechanics of speech production, and then describes four important historical synthetic speech devices. It is devoted to the mechanical speech synthesizer that he himself created, called The Talking Machine. The chapter gives some brief workman-like notes on acoustic theory and on the human production of English speech sounds. Everyone speaks differently. There is such a wide variety of speech and we are highly skilled at recognizing the underlying patterns – but we also rely very heavily on context. Provided that the listener has some idea of what to expect, it is possible that even a comparatively simple device which can reproduce human speech sounds at the correct tempo, will be understood. The Talking Machine is sometimes exhibited in interactive mode. The public can use the keyboard to make it count and recite the alphabet and various poems and can also input sentences for it speak.

Taylor-Francis Group: Sound Inventions (2021) Focal Press ISBN9781003003526

to speaking machines