phonetic VR

Hi. Does anyone know of a voice recognition program which produces its output in the form of phonetic symbols(plus spaces)? The set of symbols is not important, English has about 44 distinct sounds so there needs to be about 44 distinct symbols whether IPA or ITA or Unifon etc. I run Windows Vista or Xp on my machine.
Cheers
Rob Selby

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Chuck Runquist's picture

rselby wrote: Hi. Does

rselby wrote:

Hi. Does anyone know of a voice recognition program which produces its output in the form of phonetic symbols(plus spaces)? The set of symbols is not important, English has about 44 distinct sounds so there needs to be about 44 distinct symbols whether IPA or ITA or Unifon etc. I run Windows Vista or Xp on my machine.
Cheers
Rob Selby

Rob,

Unless there is information available from the speech recognition centers at MIT or Carnegie Mellon University, I seriously doubt that you will find any form of speech recognition that will produce output as phonemes or phonetic symbols. The current commercial/retail speech recognition products are designed to convert speech to text and do not have this capability at all.

Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS

If computers get too powerful, we can organize them into a committee - that will do them in. - Bradley's Bromide

Thank You, Chuck and

Thank You, Chuck and Lunis.
I use an old version of ViaVoice which I consider to be very good. I am not a speech analysis expert but it seems to me that the program first captures the sounds and converts them to phonemes, which it clearly does very well, then tries to spell the words in conventional spelling and to do this it needs to guess what I mean. This is where it comes unstuck, and always will in my opinion, though it will improve. Why not bypass the meaning bit and just say "here is a string of sounds from a set of 44, record what you hear on disc and leave me to figure out what I meant when I come to read it." I don't see why the providers find it so difficult. I think at least I should have the choice of phonetic or conventional text. Of course some languages have an alphabet which is phonetic in the first place, for example Punjabi. The area I live in has about 40% punjabi speakers and I have attempted to learn Punjabi at one time. The alphabet has 49 characters, including some which represent sounds which only occur in foreign words. Just for background I am a 55 yr old physiology student in the UK.
Cheers
Rob Selby

Rob, Chuck and Lunis have

Rob,

Chuck and Lunis have given you the chapter and verse on progress (or lack thereof) of the development of the production of phonemes or phonetic symbols in response to your query.

However I note you're using an old version of ViaVoice which you consider to be very good. Why not upgrade and get the latest version - 10.5. I think you would find that it it will go from very good to excellent. You should have no difficulty in importing your existing vocabulary (which would include your training sessions) which would enable you to get going without the usual pain.

As far as I know, PC World has VV 10.5. The box will still show VV 10, and therefore check the box to make sure that it refers to support for Word 2003. If it does not, then it is old stock, and if it does, it is VV 10.5.

If you want to get it on the net, you can get it from Nuance UK.
Quentin

Rob Chambers's picture

Hmmm... Then I guess you

Hmmm...

Then I guess you guys would be surprised by the fact that you can do this in Vista, eh? Smiling

At least you can progamatically.

Using SAPI 5.3 (in Vista) with the Microsoft 8.0 engine (in Vista), you can call LoadDictation with the topic specified as "Pronunciation", and the recognition results you'll get back will be phoneme based with no language model restrictions applied.

Neat. Huh.

Now, Rob, I bet you were hoping for an answer that didn't require programming, right? Or can you write some SAPI code to do this yourself?

--
Rob Chambers [MSFT]
http://blogs.msdn.com/robch/default.aspx
Windows Speech Recognition - We're Listening...

This posting is provided "AS IS" with no warranties, and confers no
rights.

Thank you Rob. I used to be

Thank you Rob. I used to be a programmer back in the mainframe days (mainly PL/1) but I have been retraining in physiology since 2003. I never got into the object-oriented stuff apart from a bit of Visual Basic. I am also in the middle of a couple of modules with exams coming up, so I don't have a lot of spare time. Could you expand on what is required or point me to a website(s)? I could also try to get someone on campus who is in the IT faculty to enlighten me.
Cheers
Rob Selby

Rob Chambers's picture

If you wanted to do this in

If you wanted to do this in C++, you'd probably want to start here: http://msdn2.microsoft.com/en-us/library/ms718914....

In C# or VB.NET, you'd want to start here: http://msdn2.microsoft.com/en-us/library/ms576565....

That'll get you the a bottoms up view of what you'd need to do. From a top down perspective, you might want to look at the talkback sample in the SAPI 5.1 SDK. I just found this online sample this is very similar to that sample: http://www.914pcbots.com/community/components/com_...

I couldn't find a similar example for C# or VB.NET easily, but that would probably be the easier way to go to write the code (the entire thing would probably be 20-30 lines of C#).

The other thing you'd need to decide is, are you trying to build something that would work with all applications, or just something that's an experiment. Experiments are easy. Fully blown applications that work with all other apps are quite another story. So, for example, are you trying to get this to be the way you output text into Microsoft Word and into IE or Firefox? If so, you probably have a lot of work ahead of you (or whoever does this work).

But ... It could be done with the the Vista recognizer...

--
Rob Chambers [MSFT]
http://blogs.msdn.com/robch/default.aspx
Windows Speech Recognition - We're Listening...

This posting is provided "AS IS" with no warranties, and confers no rights.

Thanks again for your

Thanks again for your comment, Rob.
Basically, if I could speak into the microphone and get phonetic output into Wordpad I would be delighted. I could do the rest myself.

With a phonetic script, a keyboard in phonetic symbols and phonetic voice to text, I would be independent of traditional English spelling, which is what I am interested in.

Would a training session be required and would the program ask for it in the usual way? After all, even when using phonetics, each person is slightly different.
Cheers
Rob Selby

Rob Chambers's picture

Well ... It wouldn't be

Well ... It wouldn't be Wordpad. At least not easily. But you could do this by building a stand-alone application that has it's own edit control.

You'd have a button for turning the microphone on and off, build the appropriate grammar, hook up an event handler to handle the recognition event, and when you get recognition events, you could convert the recognition result's phoneme's into a printable form and insert them into the text box.

That whole operation would probably only be 20-100 lines of code (depending on how fancy you wanted to be with the phonetic output).

And, yeah, training would improve the accuracy of the output, just like it does with normal dictation. And, just like with normal dictation, the phonetic output would not be a 100% transcription of what you say. If we could do that 100%, well ... dictation accuracy would be 100% too... Smiling

Do you have anybody on your side that can look at this?

--
Rob Chambers [MSFT]
http://blogs.msdn.com/robch/default.aspx
Windows Speech Recognition - We're Listening...

This posting is provided "AS IS" with no warranties, and confers no rights.

KnowBrainer's picture

Chuck is correct but

Chuck is correct but Carnegie Mellon and possibly MIT have been working on what could be the next generation of speech recognition software based on phonetics (interesting that NaturallySpeaking typed fanatics rather than phonetics on the first try) for the past 2 years.  Unfortunately we cannot supply you with any additional information but the way the current crop of speech recognition software works; NaturallySpeaking, ViaVoice and even Vista, is somewhat dated.  Phonetics is an entirely different approach which has the advantage of being significantly faster because of lower overhead but it is not yet a reality and won't necessarily ever be.

 

When a better widget (speech engine) comes along, you won't have to ask because all the speech recognition forums will be buzzing about it Eye-wink

 

Lunis Orcutt - Developer of KnowBrainer  (DNS Command Software)

    Now Providing FREE (1st 5 min.) Tech Support 615-884-4558

               A Nuance Gold Certified Endorsed Vendor

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.




view recent posts