|
Speech Technology
Man/machine interface design and technologies
have always interested me. The most intuitive, yet still very complex to
achieve with quality, method for user input/output is to have the device speak and listen.
Especially for handsfree use and for disabled this comes into very good use. What is very obvious to a human being
means though use of quite complex mathematics and powerful CPUs to realize (even a shadow
of) in a computer.
Nowadays a normal desktop computer or smartphone are powerful enough to handle
continuous speech recognition in software without any special hardware (except sound
capabilities of course).
Speech synthesis is much simpler to
process for a CPU than speech recognition. The most typical application for
speech synthesis nowadays is text-to-speech, meaning you just feed the
application with text in a given language and it speaks the text with
possible added inflection etc, based on marks used etc.
On mobile phones
speech recognition is the buzz at the moment, but having messages, news and
directions be spoken autonomously while e.g. driving would no doubt be very
nice. If you could then also call by speaking names or numbers and the same
way perform actions would make for a complete handsfree solution, and less
risky than trying to read or key in messages.
Primarily check the comp.speech FAQ
for up-to-date information about algorithms, products, etc. The sites listed here are the
ones I've made use of myself.
General
News Groups
Research and Toolkits
Product Information
Microsoft
Microsoft now includes speech recognition in Office and you can very
easily write your own speech-enabled applications in Visual Basic or
similar, both in terms of input and output. My own
applications Notify and Agent are very simple examples of text-to-speech
via MS Agent, which is extremely easy to use (see the code for Agent).
|