Developing a system that recognizes context

I recently sent a message on the old list in regards to using multiple grammars on one utterance. I want to explain what I am doing so that it may be clearer as to what my question is and the goals I am trying to accomplish.

I am currently working to improve a number of robotic systems voice recognition. One idea to improve the recognition is to swap grammars based on context of the topic. The idea being that recognition will increase in noisy environments if the system understands the topic being discussed. For this to work I need to be able to determine when a converstaion topic has changed. As humans it is usually easy for us to determine when someone has changed the topic in a conversation. I need to program this same feature into a robot. The robot(s) have a set of grammars loaded. Each grammar represents a different topic. At any given time there are only 2 grammars active one to activate and deactivate grammars and the main grammar that recognize's commands. I need to know if I can take an utterance try to do a recognition using grammar A. If not recognized than use a "contextual algorithm" to determine if the topic has changed. If the topic has changed activate the appropriate grammar and perform the recognition again on the same utterance.

I appologize in advance if this is not the correct forum for this question.

Thanks in advance,
Grad_Student

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I was sent a message in

I was sent a message in regards to saving the utterance to a .wav file activating the grammar and than taking the saved .wav file and sending it through the recognition engine. However I did not see in the ViaVoice SDK documentation that it is possible to take .wav files or any stored files and feed them through the speech engine again. Does anyone know if this is possible?

In ViaVoice SDK, there is a

In ViaVoice SDK, there is a SMAPI sample, called Custom Audio Source. This is what you want - take a wave file and do the recognition. Also, another idea is to launch 2 engines and recognize different grammars, but this architecture is more complex.

I well remember your

I well remember your question back there, and I'm afraid the answer is still the same: Your stated aim can't be achieved by current SR development kits. Current SR products simply can't infer the application context from a set of utterances unless they are explicitly told what that context is. But there may be a work around that ignores your assumed methodology.

The possible way around this contretemps is to make all of your grammars global, i.e., to have them all contained in the current grammar. That way, the SR product doesn't have to do what it can't do, i.e., infer the context from the content.

I suppose there is an inherent reason why this solution can't apply, but if you told us why it can't apply we may be able to suggest a different work around. In other words, why is it that you can't put all the potentially applicable grammars into a single global grammar? I know, for example, that DNS Pro can contain thousands of scripts in a single grammar without significant loss of speed or accuracy. I would assume that VV has similar capabilities

Bruce

If you think through this

If you think through this idea, I think you will see it can't work.

Exactly what would be put into the .wav file? A command to switch to a different grammar? But where would the information about making switch come from? The SR product can't listen to a sample of utterances and infer "Ah by Jesus! That's the golf swing routine, for sure! Or is it the 'get a fixed rate home loan program'. No! I'm sure its the golf swing business, and I'm gonna put that idea into a WAV file and pass it on!"

If the SR product could do that, it could simply issue a command to switch to the new grammar -- at least DNS scripting could do that IF it were told what grammar to switch to -- and I assume VV could do the same thing.

If this is a doctoral project, I wish you a great deal of good luck! Also, I wonder if this is the best forum for such questions. Its an interesting question, for sure, but I'm not sure we have the best pool of people to answer it. Seems to me there might be SR newsgroups that might give you more professional feedback.

Bruce

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.




view recent posts