|
Speech Technology
In my full time job I'm involved in specifying user interfaces for mobile
phones (among many other things), and man/machine interface technologies
have always interested me. The most intuitive, yet still very complex to
achieve with quality, method for user input/output is to have the device speak and listen.
Especially for handsfree use and for disabled this comes into very good use. What is very obvious to a human being
means though use of quite complex mathematics and powerful CPUs to realize (even a shadow
of) in a computer. Nowadays a normal desktop computer is powerful enough to handle
continuous speech recognition in software without any special hardware (except sound
capabilities of course). Mobile phones are just starting to get there
(Samsung has introduced such a phone). Speech synthesis is much simpler to
process for a CPU than speech recognition. The most typical application for
speech synthesis nowadays is text-to-speech, meaning you just feed the
application with text in a given language and it speaks the text with
possible added inflection etc, based on marks used etc. On mobile phones
speech recognition is the buzz at the moment, but having messages, news and
directions be spoken autonomously while e.g. driving would no doubt be very
nice. If you could then also call by speaking names or numbers and the same
way perform actions would make for a complete handsfree solution.
Primarily check the comp.speech FAQ
for up-to-date information about algorithms, products, etc. The sites listed here are the
ones I've made use of myself.
General
News Groups
Research and Toolkits
Product Information
Microsoft
Microsoft now includes speech recognition in Office and you can very
easily write your own speech-enabled applications in Visual Basic or
similar, both in terms of input and output. My own
applications Notify and Agent are very simple examples of text-to-speech
via MS Agent, which is extremely easy to use (see the code for Agent).
|