T2V: A Text-to-Voice System
Richard M. Irons
May 12, 2003
91.548 Robotics I
Project II
The T2V (Text-to-Voice) system project is an attempt to supplement what is currently a limited number of mechanisms available for a software application to get a user's attention. Currently, most applications utilize beeps and dialog boxes working together to get the attention of users. These mechanisms are very often intrusive and annoying to the user's computing experience. Additionally, software very often has helpful messages to convey to the user, but these messages are sometimes displayed where they can be overlooked. They are also frequently displayed with some type of beep that can become annoying to a user.
Many applications must therefore strike a balance between conveying useful information to a user while at the same time not annoying the user when trying to get their attention. The goal of the T2V system is to provide a new mechanism for conveying information to a user while at the same time minimizing the negative impact of the intrusion into the users computing experience.
The process of a software application capturing a user's attention and conveying some type of information is not a trivial process. The mechanisms utilized to accomplish this can be classified into four categories. These categories are the "active icon", "bell and dialog", "bell and message", and "message only". The "active icon" is an icon that appears when the software is invoked and is sometimes animated. It is a entity that the user can click on when needing help, but it also can independently offer suggestions. MS Word is an application that uses this approach with an animated paper clip.
The "beep and dialog" approach is another mechanism used by applications to get the user's attention to convey some information. This approach involves the application invoking a dialog and causing a beep to be emitted from the computer. This approach is commonly used to convey warnings or errors. This is an intrusive approach, but is also effective in getting the attention of the user. Most software applications employ this mechanism to varying degrees.
The third and fourth mechanisms are similar to one another since they involve the display of a message. An application could have a message window where all messages are sent. The application could chose to emit a beep when displaying a message or just display a message without notifying the user that a message has been displayed. There are problems with both of these approaches.
First, if a message is displayed in the message window with a beep, the user must switch their attention to the message window and read the message. Additionally, if many messages are being displayed, this constant switching of attention to read a message and the constant beeps could become very annoying to the user.
Secondly, if the message is displayed without a beep, the user must constantly be checking the message window to see if there is a message applicable to their current activity. Otherwise, the user can not be certain if the most recent message in the message window is currently applicable. One software application that uses this approach is the CAD software Pro/Engineer.
These four categories all suffer from certain disadvantages. They are either intrusive and annoying ("active icon", "bell and dialog", and "bell and message" approaches) or they fail to adequately convey their message to the user ("message only" approach). The T2V project is an attempt at adding a new approach for software applications to convey messages to a user. This approach is not intended to completely replace the existing mechanism. For example, there will still be situations where the "beep and dialog" approach would still be necessary.
The goal of the T2V project is to reduce the level of intrusion that many software applications have into users computing experience. The T2V project has accomplished an initial prototype of a simple software program that conveys a message by a spoken human voice. This project demonstrates the technical feasibility of a software application using a human voice for conveying a message to a user.
This paper discusses the technical aspects in creating the T2V prototype and continues with a discussion of how the research could be applied to more complex software applications. The main premise of this research being that a spoken message would be less intrusive than any of the previously described approaches remains to be seen. This probably could only be determined when the T2V system was actually implemented in a software application and tested with users.
The T2V system consists of a LogoChip running LogoChip Logo, a UML 305 development board, and a Winbond WTS701 Text-to-Voice chip. The LogoChip is the micro controller for the circuit while the WTS701 is a slave processing commands sent to it from the LogoChip. Communication between the LogoChip and the WTS701 is accomplished by a bi-directional SPI interface.
The development of the T2V system has been broken down into five stages. These stages are first, wire the entire T2V prototype circuit. Second, verify that the LogoChip and the WTS701 chip can successfully communicate with one another. Third, create a LogoChip Logo program to send a message to the WTS701 chip to speak. Fourth, develop a C API that allows an application written in C to send text messages to the LogoChip controller. This API would provide controls for the speed at which the text was spoken, the pitch of the spoken text, and the ability to stop a sentence or word before it had been completed. A simple application could be designed that allowed sentences to be entered and transmitted to the LogoChip to be spoken. The application could be designed to have UI components for all of the features that the API supported to fully test the API. Fifth, the API could be used within an existing software application where users responses to the voice messages could be evaluated. Currently only stages I, II, and III of the project have been completed.
Stage I
The first stage of this project was to wire the entire LogoChip/WTS701/UML
development board circuit. The circuit possesses a LogoChip micro controller running a Logo program. The Winbond chip is a slave of the LogoChip controller. The WTS701 chip is a 56 pin TSOP chip. Since this circuit was created using bread boards, a TSOP to DIP adapter was necessary to mount the WTS701 on a bread board. The WTS701 is a powerful chip that makes this entire project possible because of it's extensive text-to-voice capabilities.
The LogoChip and the WTS701 required different power supplies. The LogoChip was able to use the 5 volt power supply from the UML development board. The WTS701 requires a power supply between 2.7 and 3.3 volts so the UML development board power supply could not be used. Instead, the WTS701 was powered by an adjustable power supply unit.
The WTS701 is not only a powerful chip, but also relatively straight forward to use. There is a clear and concise data sheet for the chip at Winbond's web site. The data sheet (70 pages) contained all of the information I needed to work with the WTS701. The WTS701 receives text via a SPI interface and converts the text to either analog or digital speech. The chip currently has English and Chinese versions. The English version is available in either a male or a female voice.
"Text-to-speech conversion is achieved by processing the incoming text into a phonetic representation that is then mapped to a corpus of naturally spoken word parts." The steps taken by the WTS701 to process text are:
Additionally, the WTS701 is programmable to recognize abbreviations and to speak different languages.
Although the WTS701 has 56 pins, only 25 of these pins have connections. These pins control the clock, SPI interfacing, auxiliary in/out, CODEC in/out, digital power/ground, and analog power/ground. Wiring this chip was tedious, but not too difficult. The most significant problem I had wiring the WTS701 was that I initially only wired the analog power/ground. I incorrectly assumed that since I was not using the digital portion of the chip, I did not have to power it. This costly mistake wasted much time and caused significant frustration. This mistake did have the benefit of forcing a reevaluation of the design of both the circuit and the LogoChip code. A number of minor errors were corrected during this process.
The WTS701 has a small set of commands that can be transmitted over the SPI interface. These commands fall into five categories. The first category is the status command. These commands are used to access three read-only registers of the WTS701. These commands are read status register, read interrupt register, and read device version. The second category is the system command. These commands are the power up, power down, reset, and idle commands. These commands change the state of the system. The third category is the synthesis command. These commands effect the text-to-speech synthesis. These commands allow control for starting, pausing, resuming, and stopping the conversion process. There are also finish word and finish buffer commands that finish conversion at the end of the next word and buffer respectively. The volume of the speech and the speed of the conversion can also be controlled with synthesis commands.
The fourth category is the configuration command. These commands allow the reading and setting of configuration settings such as speech volume, audio options, CODEC options, speech pitch, and the speed of the clock. The fifth category is the customization command. These commands allow the user to customize the way in which the WTS701 responds to certain strings. This is the only category of commands that was not used for the T2V project.
A LogoChip running a Logo program was used as the circuit's micro controller. The LogoChip was wired so that the transmit and receive pins went to the UML305 development board. The LogoChip used the B and C pins to communicate with the WTS701. B pins were used to transmit signals to the WTS701 while C pins were used to receive signals from the WTS701.
Pin B0 went to the chip select pin of the WTS701. The chip select pin is set low unless more than one device is to share the same slave select signal. In the case of the T2V project, this pin was always low. The B1 pin is a global reset pin that resets the WTS701 to the initial power down state. The B2, B3, and B4 pins correspond to the serial clock input, slave select input, and the master out slave in (MOSI) input for the WTS701 respectively. These pins are all part of the SPI interface.
The LogoChip also utilized pins C0, C1, and C2 as input pins from the WTS701. C0 was connected to the ready/busy pin. This pin identified if the WTS701 was ready for more input via the SPI interface. If this pin is low, the WTS701 cannot accept more data and the LogoChip must pause SPI data transmission. The current T2V project micro controller code does not check this pin. Instead, it relies on delays after each command. It also sends a small amount of text to guarantee that the WTS701 will not be in the busy state when the micro controller performs SPI transmissions. The C1 and C2 pins connect to the interrupt and master in slave out (MISO) WTS701 SPI pins.
The interrupt pin notifies the micro controller that an interrupt condition has occurred. Possible interrupts are a conversion finish interrupt, input text buffer filled interrupt, and a word count interrupt. All of these interrupts can be enabled and disabled by the micro controller. These interrupts are not utilized by the current design of the T2V project. The need for this pin was avoided using the same techniques that were used to avoid using the ready/busy pin. Any further development of the T2V project would require that the ready/busy pin and the interrupt pins be utilized. These pins would work together to ensure that the LogoChip micro controller did not overflow the WTS701 input text buffer with data.
The C2 pin on the LogoChip received signals from the WTS701 MISO pin. This pin received all serial data input from the LogoChip. This pin was used in the T2V project primarily during the initial development of the circuit to verify that the WTS701 was actually running and successfully receiving and interpreting SPI transmissions from the LogoChip.
All of the LogoChip B pins, LogoChip C pins, and all of the WTS701 commands were defined as constants (refer to the attached source listing). Using constants is simple, but they produced benefits. The use of these constants significantly increases the readability of the code while at the same time reduces the chance of introducing coding errors into the micro controller code. Had these conventions not been used, writing and debugging the micro controller code would have been more difficult. Continuing this convention would be essential to any future development of the T2V project.
Stage II
Stage II involved writing LogoChip Logo code for the micro controller to determine if the LogoChip and WTS701 were active. This process involved turning on the power supplies to the entire circuit, sending some simple commands from the LogoChip to the WTS701, and then observing if the WTS701 chip received the commands. Four commands were the absolute minimal number of commands necessary to complete this stage. These commands were the set clock command, power up command, read version command, and the power down command. The Stage II and Stage III LogoChip source code is attached.
All WTS701 commands begin with the same format. They consist of an 8 bit command code and then an 8 bit command data portion. Type one commands possess only the 8 bit command code and the 8 bit data. Type two commands are type one commands that also receive two bytes from the WTS701.
The init function initializes all of the B ports for writing. The clear_output_regs function sets all of the B pins to the appropriate set or unset state. The critical pins are the chip select, set clock, and slave select pins. The chip select pin must be cleared before issuing commands to the WTS701. The chip select pin is the only input pin that is not 5-volt tolerant, so it was essential that this pin be set properly, else the WTS701 could be damaged. All the other input pins were 5-volt tolerant.
The clock select and slave select pins also required initialization to a set state. Both of these pins were used as part of the SPI interface and required proper initialization for LogoChip/WTS701 communication to work. The slave select pin indicates that a SPI transmission is in progress when lowered. The slave select pin must remain low throughout the processing of the SPI transmission. The clock pin is used when both writing to the MOSI pin and when reading to the MISO pin. If writing to the MOSI pin, the clock pin indicates that the LogoChip has set the MOSI pin and that the WTS701 can now examine the MOSI pin. If reading from the MISO pin, the clock pin indicates that the LogoChip is now ready to examine the MISO pin and that the WTS701 should now set the MISO pin.
Type one commands are processed by two functions. The function command_type_one is simply a wrapper function that sets the slave select pin and calls the low level command_type_one_low function with the command and command data. Breaking the commands into wrappers and low-level routines is useful because complex commands can be constructed by using low-level functions as building blocks. This is what is done in the case of the type two and three commands. The function command_type_one_low takes the command operation code and the command data as arguments. This function simply processes both bytes and transmits them using the SPI protocol to the WTS701.
The command_type_two wrapper function is similar to the command_type_one wrapper except that it calls command_type_two_low after calling command_type_one_low. The function command_type_two_low simply receives two bytes of data from the WTS701. In the case of Stage II, this data was simply the version number of the WTS701 chip. Both of these bytes were stored in two global variables.
The main function was very straightforward (refer to the function stage_two_main in the attached source listing). This design was intentional so that the absolute minimal number of commands could be used to determine if the T2V circuit was functioning properly. First, the set clock command was issued. This command must be issued prior to powering up the WTS701. The command takes as an argument the clock type that the WTS701 will use. The only clock that is currently supported is a 24.576Mhz clock that is set by transmitting 0x0 for the data portion of the command. Next, the power up command changes the state of the WTS701 from powered down to idle. The WTS701 is initially in the powered down state when it is first powered on.
The read version command gets the current version of the WTS701 chip. The first byte is the hardware version and the second byte is the software version. The Stage II test program is completed by powering down the chip and then calling the function word_to_bytes for both version bytes read from the WTS701.
Stage II was ultimately successful but there was one significant problem to overcome. When I first wired the T2V circuit, I assumed that I did not need to power on the digital portion of the chip. I made this assumption because I was only using the analog portion of the chip and also because the WTS701 made a clear distinction between the analog and digital functionality of the chip. After struggling with the problem of not getting a version number back from the WTS701, I tried powering up the entire chip. This solved the problem of getting the chip version number and the get version command retrieved the correct version of the chip.
The calls to word_to_bytes were debugging commands that displayed the binary representation of the two version bytes. A red LED corresponded to a zero and a green LED corresponded to a one.
The success of Stage II verified that both the T2V circuit and the software SPI interface functioned properly. This was a significant accomplishment since there were numerous things with both the hardware and software that could have gone wrong.
Stage III
Stage III involved programming the micro controller to make the WTS701 speak some text. The main function for the Stage III micro controller was the main function for Stage II with a couple of additional function calls (refer to the function stage_three_main in the attached source listing). After reading the version, a call to command_type_three is made to process a text string. This call was followed by calls to command_type_one for both the finish command and idle command. These two type one commands are preceded by three second waits to guarantee that all the text is processed before finishing the conversion process and the WTS701 is put into the idle stage.
The type three command introduced in Stage III sends a variable number of bytes to the WTS701. The function command_type_three is hard coded to send the text "Hi Rick Irons". Each character of the string is transmitted as an 8 bit word. All text conversion transmissions are ended with an end of text character. Although the current version of command_type_three is hard coded for a specific string, it would not be difficult to alter the function to take any string. The function command_type_three takes the same form as the previous two command wrapper functions. The function sets the slave select input pin and then breaks the entire command into parts that are handled by different low level functions. The function command_type_three_low is identical to command_type_one_low except that command_type_one_low transmits two bytes while command_type_three_low transmits one byte.
Stage III was the last stage of the T2V project that was successfully completed. One of the problems that was encountered during Stage III was that wait statements needed to be inserted during the text conversion process. These pauses in program execution were needed to guarantee that the micro controller did not end the text conversion before the text conversion had actually been completed by the WTS701. The wait calls could be removed by changing the conversion process to wait until the WTS701 had finished converting text before entering the idle state. This could be accomplished by waiting for a conversion finished interrupt before switching to the idle state. The current Stage III implementation also does not guard against overflowing the text input buffer of the WTS701. This could be corrected by checking for an inbuffer overwrite interrupt before transmitting a character byte to the WTS701.
Another problem that was encountered in Stage III was that the sound of the spoken text was not very loud. The speaker must be placed close to a person's ear in order to hear the voice. This problem should be simple to correct by amplifying the analog output.
Stages IV & V
The last two stages of the T2V project were not completed. Stage IV involves creating a C API that would allow C applications to utilize the T2V circuit. The API would have functions to set the speed at which text was spoken, functions to set the volume of the spoken text, and a function to control the pitch of the spoken text. The API would also have a function that would pass the string to be spoken to the T2V circuit. All communications between the C functions and the LogoChip micro controller would occur over the UML development board serial port.
Using these new APIs, a simple test program could be written to demonstrate the functionality of the API. The program could have a text input panel where text to be spoken could be entered. The program could also have UI controls for the volume, pitch, and the speed of the text conversion. This program could be used to test all of the features of the API and to experiment with the capabilities of the WTS701 chip.
Stage V of the project would involve modifying an existing application to speak any message window or help text. This should be a trivial task once the APIs from Stage IV are available. A location within an application's message display functions would need to be found to make a call to speak the current help or message string. Once this stage is completed, users should be allowed to use the application to determine what they thought of the spoken messages and spoken help. Only at this point in the project can any adequate determination of the intrusiveness of this new approach for enhancing a user's computing experience be made.
Conclusion
Although not all of the goals of this project were met, significant progress was made in determining the technical feasibility of this approach. It has been shown that the WTS701 chip can work well when controlled by a LogoChip micro controller. The main question that this project did not address is would users find a spoken message less intrusive than some of the current techniques used by software applications to convey information to a user. This question can only be answered by actually integrating the T2V circuit into a software application and then getting feedback from users.
Source Code for Stages II and III
; Richard Irons May 12, 2003
; Project II
;
global [num_bits cur num divisor cur_bit pass_num dataone datatwo]
constants [
[portb 6][portb-ddr $86][portc 7][portc-ddr $87]
[CS 0][RESET 1][SCLK 2][SSB 3][MOSI 4]
[RB 0][INT 1][MISO 2]
[SCLC $14][PWUP $2][RDST $4][CONV $81][RVER $12][FIN $4C]
[IDLE $57][PWDN $40]
[NULL_DATA $0][DEFAULT_CLOCK $0]
[MSG1 $68][MSG2 $59][MSG3 $20][MSG4 $72][MSG5 $49][MSG6 $6b]
[MSG7 $20][MSG8 $59][MSG9 $52][MSG10 $6E][MSG11 $5A][EOT $1A]
]
; =============================================================
to init
write portb-ddr 0
end
to click-on
setbit 5 portb
end
to click-off
clearbit 5 portb
end
to beep
repeat 100 [click-on delay 50 click-off delay 50]
end
to delay :n
repeat :n [no-op]
end
to red
setbit 6 portb
clearbit 7 portb
end
to redoff
clearbit 6 portb
end
to green
clearbit 6 portb
setbit 7 portb
end
to greenoff
clearbit 7 portb
end
; ---------------------------------------------------------------------------------------------------------
to clear_output_regs
clearbit CS portb ; chip select
clearbit RESET portb
setbit SCLK portb
setbit SSB portb ; SPI slave select
clearbit MOSI portb
clearbit 5 portb
end
;
; ---------------------------------------------------------------------------------------------------------
; SPI debugging function
to word_to_bits :word_val
setnum_bits 8
setnum :word_val
setdivisor 128
repeat num_bits
[
setcur num / divisor
setnum num % divisor
setdivisor divisor / 2
ifelse cur = 1
[
green
delay 10000
greenoff
delay 10000
]
[
red
delay 10000
redoff
delay 10000
]
]
end
;
; ---------------------------------------------------------------------------------------------------------
;
to command_type_one_low :cmd :cmd_data
setnum :cmd
repeat 2
[
setnum_bits 8
setdivisor 128
repeat num_bits
[
setcur num / divisor
setnum num % divisor
setdivisor divisor / 2
ifelse cur = 1
[
setbit MOSI portb
]
[
clearbit MOSI portb
]
clearbit SCLK portb
setbit SCLK portb
]
setnum :cmd_data
]
end
;
; ---------------------------------------------------------------------------------------------------------
;
to command_type_one :cmd :cmd_data
clearbit SSB portb ; SPI slave select
command_type_one_low :cmd :cmd_data
setbit SSB portb ; SPI slave select
end
;
; ---------------------------------------------------------------------------------------------------------
;
to command_type_two_low
setpass_num 0
repeat 2
[
setpass_num (pass_num + 1)
setnum_bits 8
setdivisor 128
setnum 0
repeat num_bits
[
clearbit SCLK portb
setcur_bit (testbit MISO portc)
setbit SCLK portb
if cur_bit = 1
[
setnum (divisor + num)
]
setdivisor (divisor / 2)
]
ifelse pass_num = 1
[ setdataone num ]
[ setdatatwo num ]
]
end
;
; ---------------------------------------------------------------------------------------------------------
;
to command_type_two :cmd :cmd_data
clearbit SSB portb ; SPI slave select
command_type_one_low :cmd :cmd_data
command_type_two_low
setbit SSB portb ; SPI slave select
end
;
; ---------------------------------------------------------------------------------------------------------
;
to command_type_three_low :cmd_data
setnum :cmd_data
setnum_bits 8
setdivisor 128
repeat num_bits
[
setcur num / divisor
setnum num % divisor
setdivisor divisor / 2
ifelse cur = 1
[
setbit MOSI portb
]
[
clearbit MOSI portb
]
clearbit SCLK portb
setbit SCLK portb
]
end
;
; ---------------------------------------------------------------------------------------------------------
;
to command_type_three :cmd :cmd_data
clearbit SSB portb ; SPI slave select
command_type_one_low :cmd :cmd_data
command_type_three_low MSG1
command_type_three_low MSG2
command_type_three_low MSG3
command_type_three_low MSG4
command_type_three_low MSG5
command_type_three_low MSG6
command_type_three_low MSG7
command_type_three_low MSG8
command_type_three_low MSG9
command_type_three_low MSG10
command_type_three_low MSG11
command_type_three_low EOT
setbit SSB portb ; SPI slave select
end
; ---------------------------------------------------------------------------------------------------------
; Main function called from command center for stage two
to stage_two_main
init
clear_output_regs
command_type_one SCLC DEFAULT_CLOCK
command_type_one PWUP NULL_DATA
command_type_two RVER NULL_DATA
command_type_one PWDN NULL_DATA
; Debugging information
word_to_bits dataone
word_to_bits datatwo
clearbit SCLK portb
end
; ---------------------------------------------------------------------------------------------------------
; Main function called from command center for stage three
to stage_three_main
init
clear_output_regs
command_type_one SCLC DEFAULT_CLOCK
command_type_one PWUP NULL_DATA
command_type_two RVER NULL_DATA
command_type_three CONV NULL_DATA
wait 30
command_type_one FIN NULL_DATA
wait 30
command_type_one IDLE NULL_DATA
command_type_one PWDN NULL_DATA
; Debugging information
word_to_bits dataone
word_to_bits datatwo
clearbit SCLK portb
end