Failure to recognize custom dictionary entries

I'm starting a new topic to extend my comments in "Does training actually work?" and "Training away from a misrecognized word?" because I have done some experiments which considerably changed my perspective on the problem. Again, I am using Dragon NaturallySpeaking Preferred version 9.0 under Windows XP SP1.

I realized that I have quite a few custom entries in my user profile, and Dragon is doing a poor job of recognizing many of them. As with the proper name I mentioned before, it frequently does not even include the correct word in its list of guesses when I ask it to correct.

I just went through my custom dictionary, collected over a dozen examples of potential problems, and tried to dictate them. Dragon did a good job with almost all of them, but it undercut itself by doing a good job with several that have consistently given it problems. Either something makes it perform much better in a test with isolated words and phrases than in real dictation, or it just decided to behave itself when I put it on the spot! In any case, I can't consider the results of this test to prove anything, one way or the other.

I will cite a few specific examples of misrecognitions of custom entries.

"Robert Half" is almost invariably transcribed as "Robert half," as if the custom dictionary entry did not exist (but it came out right at my test).

"TrippLite" is invariably transcribed as "Tripp light" or something similar. It does not appear in the list of guesses when I try to correct. Again, it is as if the custom dictionary entry did not exist.

"Microsoft Vista" is usually transcribed wrong, although I don't remember what Dragon thinks I said.

Dragon usually interprets "SP1" as "S&P one" or "SP one," although the custom entry is defined with an appropriate spoken version, "s. p. one".

Dragon often transcribes "Xena" or "Xena's" as "Sina['s]" or "seen as," despite repeated training. After it makes this error it often fails to include the correct word in a list of guesses.

In summary: in my experience Dragon has a consistent problem recognizing custom words, both when transcribing dictation and when building lists of guesses. The problem appears to be highly sensitive to context, at least to the extent that several words with a history of poor recognition in dictation were recognized consistently in one test when spoken in isolation. Training and defining spoken forms is of limited help, if any. Because the problem affects a substantial proportion of all custom words and phrases, not just a few, adding the misrecognitions as spoken forms in each case is not a practical solution.

I don't think we're likely to find a solution to this problem, but I'm interested in any insight that further discussion can provide.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I found DNS9 to be finicky

I found DNS9 to be finicky about learning custom words. More so than previous versions. This may have to do with the more advanced language modeling.

I now use the following routine, which works almost 100% of the time.

1) I by opening a fresh instance of DNS
2) If the phrase in questions is already in the vocab and isn't being recognized reliably, I delete the phrase.
There is no point in repeatedly trying to train a phrase that is not recognized.

3) I save speech files.
4) I close and reopen DNS (This probably isn't necessary but I do it anyway).
5) I look to see if part of the phrase is already in the vocab, for example if I want to add "Richard McAbee"- I look to see if McAbee is already present.
If so, I add Richard to the McAbee entry and click add.
I do not train at this point.

I am not sure of the logic of not training at this point, but I think that part of it is that you cannot train single words that are built into dictionary.
Training a phrase that has a built in word can cause misrecognition.

6) I save speech files.
7) I close and reopen DNS (This probably isn't necessary but I do it anyway).
Cool I try dictating the phrase. If it is recognized- done.
If it is not recognized, I try 1 or 2 mores times and correct.
9) If the phrase is still not recognized, I now then train it by typing the correct phrase, selecting and using "train that".

I am not sure of the logic of training at this step. Part of it may be that you only train if necessary (i.e. trying the phrase 1st and then training only if it is mis-recognized).

10) If there is a similar word that I am unlikely to use, I delete it from the Vocab.
For example, if I were to add "Dr. Chuang" to the my vocab, I would delete the word Chun (unless I knew a Dr. Chun).
This is particuallry helpful for proper names.

Note, it is not useful to randomly delete words from the vocab- the deleted word will just be replaced by another low frequency word from the backup dictionary.

I just tried the above routine for "TrippLite" and achieved 100% recognition after step 9.

"I ordered a TrippLite power conditioner".

The other option for a phrase such as Tripplite would be to use a written and spoken form e.g. Tripplite / trip light.
This also achieved 100% recognition.

jsachs177, I would be interested if you could try the above suugestions and report back.
If you do not have better success, I would suspect that you might have a hardware problem such as noisy microphone or soundcard.

Likewise, I would be interested in any comments from Chuck R.

Thanks!

To continue the experiment,

To continue the experiment, I opened DNS and added "Robert Half" to my vocab.

I then tried dictating "Robert Half" => 100% accuracy with no training.

1) I think it is best not to train unless necessary.
2) It does help to train phrases when necessary.
3) If you are having repeated difficulty it is better to delete the phrase and start again, rather than repeated re-training.

To further the experiment, I added “Robert Half” to my vocabulary and checked the pronunciation using GetWords:

No training ==> Robert Half \\robPthaf

Sloppy training ==> Robert Half \\rI \\ru

Careful training ==> Robert Half \\robPthaf

In this case, the native DNS pronunciation seems to often work best for me.
This suggests that it is best not to train unless needed.

Interesting to note that after I added and trained; then deleted "Robert Half"- DNS was unable to add it again, with the error message "unable to create pronunciation". After closing and restarting DNS, I was able to add it again.

It is possible to have multiple pronunciations for a given phrase e.g.

No training ==> Address Service Requested \\@dressVv6sr/kwest6d \\@dressVv6sr6kwest6d \\adressVv6sr/kwest6d \\adressVv6sr6kwest6d

After training ==> Address Service Requested \\@s@kwest6d \\es@kwest6d

This may be why it is better to sometimes delete the phrase and start again.

I suspect that one of the problems with repeated training is that your voice gets strained when frustrated affection the pronunciation.

Quote:

Very interesting results. I never dreamed of training "Robert Half" because logically, it should be futile. The fact that Dragon correctly recognized each part of the phrase proves (logically) that the problem is not a recognition problem and cannot be solved by training.

No logically this makes sense. By adding the phrase "Robert Half" you are increasing the statistical probability of these two words being recognized as a phrase.

If you dictate "Robert Half" ==> Robert Half
but if you dictate "Robert" pause "half" ==> Robert half

When I see a new patient, I always add their name to the vocabulary as two entries:
First Name Last Name e.g. John Brown
Title Last Name e.g. Mr. Brown

This way I almost always get 100% accuracy when I dictate the patient name as either John Brown or Mr. Brown. If part of the phrase is already in the dictionary, it is important to add the words as a phrase.

The biggest problem that I have is when I mispronounce the name or use different pronunciations as I dictate.

Your comment looks very

Your comment looks very interesting but a lot of it doesn't make sense to me. I wonder if you can add some clarification.

It's not clear to me why you said that "logically this makes sense... you are increasing the statistical probability of these two words being recognized as a phrase." From my perspective, probability has nothing to do with it; DNS demonstrably recognized the words correctly, eliminating the mysterious probabilistic recognition engine as a possible culprit. It appears to me that Dragon simply failed to find the phrase in the dictionary, for reasons that I'm trying to understand. If it is doing something statistical when it looks up correctly recognized words and phrases, it is malfunctioning!

Your examples appear to include phonetic pronunciations, implying that there is a way to get these into the dictionary, or out of it, or both. I thought there was no way to do that; actually, I was told so in a recent thread on a similar topic. You mentioned "getwords" -- I downloaded a utility named getwords a few months ago because I hoped it would solve another problem, but all it let me do was export and import lists of words (with their "Spoken as" definitions, I think). Either it has been greatly enhanced since then, or you are not referring to the getwords utility at all, but to another utility with the same name! Can you tell me more?

Chuck is going to have to

Chuck is going to have to bail me out here but basically it has to do with how DNS and all SR engines use a statistical model to determine what said.

There are two components to the model, The Acoustic Model and the Language model.

The language model contains statistical information that predicts which words are most likely to occur in the context of the user’s speech. So even if DNS recognizes a phrase, it may not return what was said. You can often see this in real time as the text in the result box changes as you speak. DNS is using the Language Model to statistically pick the most likely phrase based on what you said (Acoustic data).

When I was playing around before, I tried dictating "Robert Half" as a phrase and DNS returned Robert Half. But if I dictated "Robert" pause "Half" DNS had difficulty recognizing half, because it is a single word (and I enunciated poorly using a crappy mike).
OTOH, hand DNS had no problem with "I had half of a tuna fish sandwich".

There is a lot more information here: http://www.speechcomputing.com/node/923#comment-33...

I wish I understood it all better.

Quote:

You mentioned "getwords" -- I downloaded a utility named getwords a few months ago because I hoped it would solve another problem, but all it let me do was export and import lists of words (with their "Spoken as" definitions, I think). Either it has been greatly enhanced since then, or you are not referring to the getwords utility at all, but to another utility with the same name! Can you tell me more?

I think that you are using the same GetWords utility.
To include pronunciations, just check the include pronunciations with word list.

This is a very useful utility. You can save your word list with pronunciations and import it into a new User using PutWords. (Note, you can only use GetWords => PutWords within the same version of DNS i.e. GetWords DNS9 => PutWords DNS9).

My understanding of the way

My understanding of the way Dragon operates is as follows (with thanks to Chuck Runquist – any misinterpretation is all my own):

As you dictate the Acoustic Model converts your speech into phonetic equivalents using the Vocabulary as its initial Language Model. Adding a custom phrase provides a "known" combination of phonemes. The Vocabulary has a fixed probability for single words (or rather their phonetic equivalents) based on the occurrence of the single words in the appropriate language (e.g. English). In the vocabulary this probability for the supplied vocabulary cannot be changed. If you add an unknown word that follows standard phonetics Dragon will supply the phonetics or you can provide your own "pseudo" phonetics as the spoken form. So far there is nothing mysterious (or heavy math) about the recognition process.
The question I would ask is what "weight" in the Vocabulary is given to a phrase containing two "known" words (or rather their phonetic equivalents) as opposed to a phrase containing the same two words that are given pseudo phonetics as the spoken form. If, that is, there is any difference in treatment between the two cases.
I had always assumed that a Vocabulary phrase was treated as a string of phonemes distinct from the individual word equivalents and that this increased the probability of the unique string being recognised (in the same way that complex medical words are recognised more accurately than shorter words). However in one of Chuck’s responses he states that adding a phrase to the Vocabulary adds the number of words contained in the phrase. This might suggest a different interpretation.

The (mysterious?) bigram model should kick in after the Vocabulary to resolve any homophones and near homophones. This certainly happens in the case of many "proper names" without the need to add them to the Vocabulary. "Tony Wright" is a good example where it is usually transcribed first as "Tony right" but after only one or two corrections it is accurately interpreted. Clearly Mr Wright has a presence in the Language Model whereas Mr Maple does not. The second question here is does the bi-gram Model probability override a short phrase in the Vocabulary?

Incidentally GetWords only displays the binary form of the phonetics provided by Dragon.

Graham

That makes pretty good sense

That makes pretty good sense (as does DSteiNeuro's response to my question).

It frustrates me that Dragon's vocabulary control is so crude. It uses its weighting algorithms and its digrams and trigrams to decide whether to obey or ignore my custom words in any particular case, but denies me control over them, and does not even explain them except through the good offices of Chuck.

Your observations on Dragon's treatment of proper names sound right (or Wright) to me, but they don't tell nearly the whole story. When Dragon gets in a certain mood it interprets words as proper names whenever it can, giving me constant errors like "One box was red and one was Black." And I have never figured out what logic it uses to identify organization names; errors like "Computer law Society" and "Chicago bar Association" are quite consistent until each proper name is added to the dictionary.

IT Speaking wrote: The

IT Speaking wrote:

The question I would ask is what "weight" in the Vocabulary is given to a phrase containing two "known" words (or rather their phonetic equivalents) as opposed to a phrase containing the same two words that are given pseudo phonetics as the spoken form.

I think you're coming quite close now, but I would like to state it a little different.

The question is what statistics has the phrase "Robert Half" compared with the bigram probability of "Robert" and "half". Apparently this latter one is quite high.

My experience is, not that I have an explanation for this, that training the phrase "Robert Half" will improve its statistics. With many phrases this will be not necessary, but sometimes you hit upon a phrase that needs this treatment.

Before training sometimes a phrase even doesn't show up in the spell window, but after training the recognition is immediate or the position in the spell window is two or three and one to corrections are enough.

So my interpretation would be that training also has influence on the statistics of a word or phrase.

(Still no knowledge of single word probabilities as opposed to bigram etc. probabilities. And note we are speaking here of a user added word, for which the single word probability is not fixed, as it should be for original vocab words.)

My five cents, Quintijn

PS I would leave GetWords and building interpretations on that utility. It is not sure if phonetics are still the same in version 9 as compared to what they were in version 3 or 5 when Joel Gould wrote these.

Here's another one that

Here's another one that happened this morning. I dictated "windows XP" and got just that -- "windows XP" instead of "Windows XP."

I looked at the dictation history and confirmed that Dragon had interpreted "Windows XP" as a single phrase, not two separate ones. I looked at the active dictionary and confirmed that "Windows XP" is a built-in entry, and "windows XP" is not an active entry at all. "windows" (without caps) and "XP" are both built-in entries, and, of course, there's no telling what the dictionary says about the probability of that combination.

I corrected the error. The next time I spoke the phrase, Dragon made the same error again. I corrected it again. The third time I used the phrase, Dragon got it right.

I wonder what this implies about the digram/trigram logic. Is there any plausible (legitimate) reason why Dragon should be expected to make this error?

I now can formulate the concern that led me to start this thread. It seems to me that if the dictionary contains a phrase, Dragon should give that phrase some substantial weight over a digram consisting of two almost-identical words, and it does not appear to be doing that.

Quote: Here's another one

Quote:

Here's another one that happened this morning. I dictated "windows XP" and got just that -- "windows XP" instead of "Windows XP."

I added Windows XP to my vocab (it was in the backup dictionary) and had 100% accuracy on the 1st try.

Your problem may be from several factors:

1) Pausing as you dictate. If you pause between windows and XP, you will get windows XP.
2) Dictating in context. This will allow the more advanced recognition models determine the correct response.

3) Your user file. Carefully creating your user file from typical documents goes a long way. If you have several documents with Windows XP in context, accuracy for Windows XP will be higher. Since you have been having so many problems, you might want to consider trying a new user file.
4) Using the Acoustic and Language Model Optimizer.
5) Hardware, microphone and soundcard do make a significant difference.

This may sound like a point

This may sound like a point by point refutation of your comments, but it's not meant to be -- I'm trying to understand something that absolutely does not make sense to me.

Quote:

I added Windows XP to my vocab (it was in the backup dictionary) and had 100% accuracy on the 1st try.

In my dictionary it was active -- not added from the backup vocabulary. I'm puzzled that it would be in the backup dictionary for you, and not for me. Perhaps you are using a version of Dragon earlier than version 9? In any case that does not appear to have been the problem.

Quote:

1) Pausing as you dictate. If you pause between windows and XP, you will get windows XP.

But I did not pause. As I said, I checked the recognition history to confirm that Dragon heard a single phrase. I can't play back the actual words I said, but I was speaking normally, with no more than a normal conversational pause (if that should be called a pause) between the words.

I should note here that my Dragon seems to suffer from a disorder that I call time distortion: it behaves as if there were pauses in places where there absolutely are none. A typical example is "models," transcribed as "model is." This happens persistently, in many cases like "models" where it cannot be explained by slight irregularities in pronunciation or pacing, as it could with a word like "doses." This is part of the problem, but perhaps it is also a clue to the root cause of the problem.

Quote:

2) Dictating in context. This will allow the more advanced recognition models determine the correct response.

I was dictating in context.

Quote:

3) Your user file. Carefully creating your user file from typical documents goes a long way. If you have several documents with Windows XP in context, accuracy for Windows XP will be higher.

I understand the concept of creating a user file from documents, but I have always had trouble grasping the concept of a typical document. I use Dragon to write about an open-ended set of areas: law, information technology auditing, gardening, cats, politics, several aspects of my personal life, photography, speech recognition... And the topics I discuss within those areas are so various that they really should be considered distinct areas. When I am writing about the significance of the Zenger trial in colonial America my vocabulary and word usage are quite different than when I am writing about how to deal with copyright infringement issues on the World Wide Web.

There are two practical difficulties (at least) with the idea of creating a user file from documents. One: that if I did assemble a set of typical documents, it would quickly cease to be typical. The other: that the speech patterns deduced from typical documents in one area would actually tend to work against correct recognition in every other area. I encounter this constantly with contractions, which are a natural part of my informal writing, and are largely inadmissible in formal writing.

I once observed that Dragon needs to recognize a hierarchy of user profiles tied to particular directories, so that when I work on a document in my intellectual property law subdirectory within my law directory, for example, Dragon uses a user profile defined for intellectual property law as an extension of another user directory defined for law. The idea did not go anywhere (even within this group), for reasons that were never clear to me.

Quote:

Since you have been having so many problems, you might want to consider trying a new user file.

That's problematic because it's unclear to me what "a new user file" would be. I long ago discovered that saving the user file at shutdown makes speech recognition worse and worse until the file becomes unusable. Consequently I now save the user file only after adding or training words immediately on startup. If I must replace the user file to solve my problems in spite of that... okay, but then how can I maintain a user file at all?

Quote:

4) Using the Acoustic and Language Model Optimizer.

Opinions go both ways on whether that is helpful or harmful. I don't do it because I never found it particularly effective either way. I can try it again, but I think it will go better if someone who has found it useful could give me advice on how to use it effectively.

Quote:

5) Hardware, microphone and soundcard do make a significant difference.

My own experience has been different: that as long as I'm not using complete trash, everything works about the same. An external converter intended for the low end audiophile market performed just about the same as my motherboard's built-in soundcard. I tried a variety of inexpensive and midrange headsets sold for speech recognition applications, and they all performed about the same. Perhaps I don't know what to look for. Are there particular, nonobvious features that I should select for?

A couple of years ago I bought a Sennheiser MD431 microphone, which was supposed to give me the ultimate in transcription accuracy. I never got to try it because I could not find a usable adapter to connect it to the computer. I'm not sure where it is stored, but if I could find an adapter I could dig it out and give it a try.

Try the MD431 -- I got an

Try the MD431 -- I got an adapter at my local everything electrical shop, any shop of that type should be able to sell you one. You may have to create a new user.

Judy

I'm surprised that you found

I'm surprised that you found it that easy to buy. I inquired at my local everything-electrical shop (RadioShack, in the US) but only got shrugs. They sell lots of adapters, but they never heard of that one. I tried a couple of electronic appliance stores with the same results.

Eventually I bought something by mail that was recommended to me for this specific purpose, but it was a cable about 15 feet long, and it turned out to be unshielded, and the noise induced in it by the computer drowned out everything else. I searched the Web (desperately, in the end), but found nothing else.

Can you tell me what you searched for? Perhaps I just didn't know the magic words.

jsachs177 wrote: Eventually

jsachs177 wrote:

Eventually I bought something by mail that was recommended to me for this specific purpose, but it was a cable about 15 feet long, and it turned out to be unshielded, and the noise induced in it by the computer drowned out everything else. I searched the Web (desperately, in the end), but found nothing else.

Cables do not induce noise. They can act like an antenna and pickup noise if there is a strong nearby radio source or the cable is in touching proximity of a nearby offending electronic device. However, of the thousands of cables supplied this problem is negligible, perhaps one or two instances over the last 10 years. As I recall the problems were rectified by rerouting the cables around offending devices.

Typically, electronic noise from within the computer, is rectified by using an Andrea USB sound pod. Although you have said you have one, it is not clear whether you have ever tried using it to bypass electronic noise within the computer.

Another issue is the microphone. If you bought it used are you sure it works properly? You called it an MD431. These have not been made in 12 years and can be 40 years old. They have been replaced by the MD431II.

Martin

I'm afraid I can't remember

I'm afraid I can't remember now what I asked the shop for -- if, that is, I didn't simply browse their shelves till I saw one that looked right, then asked them about it; that was ages ago. It was, though, a mike cable. (I had to add an adapter to the end, as their plug was too large for a standard sound card.) The shop sold Shure mikes, they may also have sold some Sennheisers; so they'd have known what I meant. It's the kind Martin sells, except that his don't (I think) require the additional adapter (which he also sells).

Incidentally, the mike is an MD 431 II

Judy

This is a follow-up to the

This is a follow-up to the preceding message, in the nature of a retraction.

A Retraction and an Apology

I said that I bought a 15 foot cable which was sold for this particular purpose, but it was unusable, and the vendor did not respond when I reported the problem. From his later post, it was pretty obvious that Martin (eMicrophones) was the vendor.

I have corresponded with Martin privately about this, and things now look very different to me.

First, I understand that Martin's lack of response was an oversight, not an effort to avoid dealing with the problem. Martin has confirmed what I believed before the incident occurred: he stands behind what he sells and makes every reasonable effort to ensure that his customers are satisfied.

Martin made some suggestions that led me to experiment with the cable again. I found that it works very well as long as I do not let it lie near the ballast at the base of my flourescent desk lamp! At this remove I can't be sure, but I believe that's what happened when I tried it two years ago.

I did not intend to identify Martin as problem vendor, but due to his own conscientiousness, I did so. I want to undo any damage I have done to his reputation.

What about This Hiss?

I'm using the Sennheiser mic with the 15 foot cable that I bought from Martin and an Andrea USB pod. The pod is producing a continuous hiss (even when the mic and cable are not attached).

Despite the hiss, the Audio Settings utility gives me quality numbers of 18 or 19. I think that recognition is somewhat better than it is with my Andrea NC-61 headset plugged in to the motherboard's sound card (with quality numbers typically 21-24).

Nevertheless, I wonder whether eliminating the hiss would improve things. If so, I wonder how I might do that.

jsachs177 wrote: A couple

jsachs177 wrote:

A couple of years ago I bought a Sennheiser MD431 microphone, which was supposed to give me the ultimate in transcription accuracy. I never got to try it because I could not find a usable adapter to connect it to the computer. I'm not sure where it is stored, but if I could find an adapter I could dig it out and give it a try.

Do you mean you could not get a suitable cable or USB adapter?

Available on our website are the 5 foot and 15 foot cable for the Sennheiser MD431II. These are designed for us specifically to be impedance matched and to have the correct sound card connector without an adapter which adds another layer of complexity.

Unless you have a known good sound card like the Sound Blaster Live or SoundBlaster Audigy, I suggest you use an Andrea USB Sound Pod. There is no point in using the best darn hand-held/desktop mounted microphone and not assure yourself of the best audio input.

I just checked our records and see you purchased a 15' cable over 2 years ago. Is that what you could not find?

--
Martin Markoe, eMicrophones, Inc.
The best microphones for Speech Recognition
Read, "Key Steps to High Speech Recognition Accuracy"

jsachs177 wrote: This may

jsachs177 wrote:

This may sound like a point by point refutation of your comments, but it's not meant to be -- I'm trying to understand something that absolutely does not make sense to me.

Yes, I think that this is a good exercise

DSteiNeuro wrote:

I added Windows XP to my vocab (it was in the backup dictionary) and had 100% accuracy on the 1st try.

jsachs177 wrote:

In my dictionary it was active -- not added from the backup vocabulary. I'm puzzled that it would be in the backup dictionary for you, and not for me. Perhaps you are using a version of Dragon earlier than version 9? In any case that does not appear to have been the problem.

Probably since I am using DNS9 Medical- less room in the Active vocabulary because of all of the medical terms

DSteiNeuro wrote:

1) Pausing as you dictate. If you pause between windows and XP, you will get windows XP.

jsachs177 wrote:

But I did not pause. As I said, I checked the recognition history to confirm that Dragon heard a single phrase. I can't play back the actual words I said, but I was speaking normally, with no more than a normal conversational pause (if that should be called a pause) between the words.

See the post from Chuck R. Even a very slight pause can make a difference. Dictating in smooth phrases seems to be very important.

DSteiNeuro wrote:

2) Dictating in context. This will allow the more advanced recognition models determine the correct response.

jsachs177 wrote:

I was dictating in context.

Good, again a smooth phrase seems to be important.

DSteiNeuro wrote:

3) Your user file. Carefully creating your user file from typical documents goes a long way. If you have several documents with Windows XP in context, accuracy for Windows XP will be higher.

jsachs177 wrote:

I understand the concept of creating a user file from documents, but I have always had trouble grasping the concept of a typical document. I use Dragon to write about an open-ended set of areas: law, information technology auditing, gardening, cats, politics, several aspects of my personal life, photography, speech recognition... And the topics I discuss within those areas are so various that they really should be considered distinct areas. When I am writing about the significance of the Zenger trial in colonial America my vocabulary and word usage are quite different than when I am writing about how to deal with copyright infringement issues on the World Wide Web.

There are two practical difficulties (at least) with the idea of creating a user file from documents. One: that if I did assemble a set of typical documents, it would quickly cease to be typical. The other: that the speech patterns deduced from typical documents in one area would actually tend to work against correct recognition in every other area. I encounter this constantly with contractions, which are a natural part of my informal writing, and are largely inadmissible in formal writing.

It may be helpful to have several user files. For example, my medical user file works extremely well for my medical dictations. This is because I use similar phrases in all of my medical reports.

I use a separate user file for day to day dictation like e-mail.

jsachs177 wrote:

I once observed that Dragon needs to recognize a hierarchy of user profiles tied to particular directories, so that when I work on a document in my intellectual property law subdirectory within my law directory, for example, Dragon uses a user profile defined for intellectual property law as an extension of another user directory defined for law. The idea did not go anywhere (even within this group), for reasons that were never clear to me.

This is a clever idea, but I suspect difficult to implement. It might be possible with SDK code but probably easier to just dictate "open user" and switch user files as you go.

DSteiNeuro wrote:

Since you have been having so many problems, you might want to consider trying a new user file.

jsachs177 wrote:

That's problematic because it's unclear to me what "a new user file" would be. I long ago discovered that saving the user file at shutdown makes speech recognition worse and worse until the file becomes unusable. Consequently I now save the user file only after adding or training words immediately on startup. If I must replace the user file to solve my problems in spite of that... okay, but then how can I maintain a user file at all?

This is a common misconception. Chuck will help me on this one.
It is extremely important to make corrections and save your user file. This may be even more important with DNS9 than prior versions.

Once a user file has matured and is achieving high recognition accuracy I make a backup copy and tend to save the user file less frequently.

DSteiNeuro wrote:

4) Using the Acoustic and Language Model Optimizer.

jsachs177 wrote:

Opinions go both ways on whether that is helpful or harmful. I don't do it because I never found it particularly effective either way. I can try it again, but I think it will go better if someone who has found it useful could give me advice on how to use it effectively.

Yes, but I think that the results with DNS9 have been very good. It is certainly worth backing up your user file and running the AO. If results are not satisfactory, you can simply restore the user file.

In another recent post, Chuck R. pointed out the importance of using the AO to maintain a user file. I have to trust his expertise on this, but again if you back up 1st- you have nothing to lose.

My suggestion to create a new user file is based on the above suggestions.
BTW, I tried your "Control Panel" example. I added it to my vocab with no training and had consistent 100% recognition 1st try.

DSteiNeuro wrote:

5) Hardware, microphone and soundcard do make a significant difference.

jsachs177 wrote:

My own experience has been different: that as long as I'm not using complete trash, everything works about the same.

This simply does not hold up. Work with Marty. His experience will go a long way!

KnowBrainer's picture

For many end users

For many end users “Windows XP” will come out correctly every time we dictated it for a few months but for no reason whatsoever, NaturallySpeaking started typing out “windows XP”.  We believe this is simply a bug in the software and purchasing a better microphone or dictating differently will have no effect as long as you say Windows XP in a single phrase.  It's amazing how many phrases will work perfectly for a few weeks to a few months and then stop working completely.  Example: We've never been able to get NaturallySpeaking to type “Control Panel”.  We always get the lowercase “control panel” no matter what we do.

 

As far as your “windows XP” problem is concerned.  The simplest solution might be to locate “windows” in your Vocabulary Editor, click the Properties button, put a checkmark in Use alternate written form and type “Windows” in the box.  When you are done, you will see a blue* to the left side of “windows”.  On the downside, you will have to dictate “no-caps windows” when you want to say “you need storm windows for your house” but if you're like us, you only use this version of windows 1% of the time and use the other version of Windows (as in Windows 2000, Windows XP etc.) 99% of the time. 

 

There were other recommendations about creating a new user but we think that would be overkill when dealing with a problem this small.

 

 

KnowBrainer Support Staff - Lunis Orcutt

Dictated with DNS 9, KnowBrainer and UniVoice

Well, your example of

Well, your example of "control panel" illustrates exactly my remarks earlier. I repeat:

First I added Control Panel in the vocabulary editor. Trained once. It wasn't recognized, only in lower case. Three corrections didn't give results, like in your case. So problem reproduced.

Then I went back to the vocabulary editor, trained the word again (once). After that no problems anymore. Just Control Panel.

So apparently training a word also influences its ranking. And when the initial ranking is too low, or the bigram statistics of the two words (in this case control and panel) is too high, simple correction even doesn't start to work. Or, more probably, the phrase "Control Panel" is not even recognised as the phrase you are correcting to (only the single words "Control" and "Panel").

And the extra training does its work, to get the ranking of "Control Panel" sufficiently high.

Greetings, Quintijn

Chuck Runquist's picture

Quintijn wrote: Well, your

Quintijn wrote:

Well, your example of "control panel" illustrates exactly my remarks earlier. I repeat:

First I added Control Panel in the vocabulary editor. Trained once. It wasn't recognized, only in lower case. Three corrections didn't give results, like in your case. So problem reproduced.

Then I went back to the vocabulary editor, trained the word again (once). After that no problems anymore. Just Control Panel.

So apparently training a word also influences its ranking. And when the initial ranking is too low, or the bigram statistics of the two words (in this case control and panel) is too high, simple correction even doesn't start to work. Or, more probably, the phrase "Control Panel" is not even recognised as the phrase you are correcting to (only the single words "Control" and "Panel").

And the extra training does its work, to get the ranking of "Control Panel" sufficiently high.

Greetings, Quintijn

I need to correct you on a couple of points.

First, training standard words that are provided either in the active vocabulary or from the background vocabulary on the hard drive (i.e., no asterisks) do not increase the probability for those words. Those probabilities are fixed. The only thing that occurs when you train fixed words in the vocabulary (i.e., again those provided by DNS and which are not custom words) is that you modify your Acoustic Model, which may result in better recognition of fixed words.

Second, only custom words, in terms of their recognition and probability, can be trained to increase the probability that they will occur correctly when dictated by themselves. Otherwise, the correct representation of words is context based (Language Models). Also understand that while the vocabulary (i.e., active vocabulary) is a Language Model, it is not a Markoff models. Only Markoff models (bigram, trigram, quadgram) can be modified by training and only by training phrases and words in context. The vocabulary is a monogram model and not a Markoff models. Except for custom words, the vocabulary is fixed and never changes regardless of how much training you apply to an individual word. This is not new. This has always been the case with every version of DNS all the way back to version 3.52. Additionally, it will never change. If it did, your accuracy would go from whatever it currently is down to about 35% at best, and in most cases dictation would result in pure gibberish. This is just the way it works, as well is the way it has to work, so complaining about it is like trying to put out a house fire by blowing on it. It won't change and it can't be changed.

Third, the best place to train words is in the vocabulary editor. This has a greater impact on the Acoustic Model adaptations for fixed words, as well is a dramatic effect on custom words.

Lastly, when 2 words exist in the vocabulary independently of one another, such as "control" and "panel" and you speak them with even the slightest pause, they will be treated as individual words rather than words in context. This is what creates the capitalization problem when attempting to get them displayed (transcribes) as Control Panel. The best manner of handling this is to add Control Panel as a phrase to the vocabulary editor capitalized as shown. Then, if you want them to display as lowercase, say each word with a pause in between. If you wanted to be displayed in uppercase (title case) then be sure that you say the words together without pause (i.e., or as you trained them).

I've watched this thread with some interest and for the time being because I don't have the time to write a detailed explanation, most of the assumptions are incorrect with regard to how DNS works under the hood (algorithms) and how to make it work properly. I will at some point post a complete explanation of how words are recognized, best practices for training, etc., etc.

The general rule of thumb is this.

1. Unless you are training custom words when performing corrections, don't train individual words. This doesn't work.

2. You cannot add words as custom words that exist in the DNS provided background dictionary. DNS will only replace them as fixed words (i.e., as explained above). The only way that you can do this is to add these words with spoken forms, and the spoken forms cannot be the same as the word itself. In other words, you cannot add "and" with a spoken form "and" because DNS will reject it and you will be notified as to why. In addition, if you're going to add standard words from the background vocabulary as custom words, be sure that you make the spoken form something that is unique enough that it will not be confused or construed as another word that exists in the active vocabulary, or a command. If you do, you will get intermittent, inconsistent, or inaccurate results.

3. If you want words to appear in contexts, do corrections and training of at least 3 word phrases, and preferably 9 word phrases. DNS will not allow you to correct phrases larger than 9 words (quadgram limitation), but the more words that you include in the correction, the more likely that the corrections and training will produce the desired results.

Chuck Runquist
Former DNS SDK & Senior Technical Solutions PM for DNS

"We are all victims of mythology in one way or another. We are the inheritors, and many times the propagators, of a desire to believe what we want to believe, regardless of whether or not it is true." -- J.V. Stewart

Hello Chuck, There we are

Hello Chuck,

There we are again.

I believe you did not correct me on any point and repeated much of your sayings that passed here several times.

We were talking about fixed words and custom words ("Windows XP" and "Control Panel") and my experience (and that of others in the mean time) is that training, yes, in the vocabulary editor, improves its ranking.

The only thing you could possibly tell us in this discussion is how these single word (phrase) statistics compare to bigram etc. statistics, in other words how "Windows XP" behaves compared with "windows" and "XP" and how "Control Panel" behaves compared with "control" and "panel". And "Robert Half" compared with "Robert" and "half".

Quintijn

PS I do not remember if I tested the "Windows XP" example (mentioned above) myself. I tested the other two however, which were custom words.

Hello, That's a nice

Hello,

That's a nice exercise for a misty Sunday morning.

I tried to repeat your examples, and experienced only problems with Robert Half. But as you see I succeeded in getting this word recognized correct.

I entered "Robert Half" as a phrase in a vocabulary editor, but it didn't want to adapt (after 3, 4, 5 times correction). So I confirm your problem on this. My solution: go back to the vocabulary editor, select the word, and train (one-time). After that no problems. So sometimes you have to train explicitly a word that you added in the vocabulary editor (and in fact I train once each word that I add in the vocabulary editor).

No problems at all with the other words. I already got "SP1\SP one" in my vocabulary (in "written form\spoken form" notation), but also added (for test purposes) "SP3\s. p. three". As long as I train words that I add in the vocabulary editor, and in some rare cases as with Robert Half repeat that step, I nearly always get correct results.

No need to restart NatSpeak or anything.

The only thing I can think of when you continue to have problems with some words is that the general sound quality of your sound system or the quality of your speech profile is insufficient. My experience is that these things can make a recognition of some words more difficult, and/or adaptation of some specific word (or phrase) nearly impossible.

If you want to try that: 1. Check your sound system (check audio) and 2. Do some general training (1 minute is sufficient).

Cheers, Quintijn (on behalf of Robert Half Smiling)

Very interesting results. I

Very interesting results. I never dreamed of training "Robert Half" because logically, it should be futile. The fact that Dragon correctly recognized each part of the phrase proves (logically) that the problem is not a recognition problem and cannot be solved by training.

I trained it. Then I dictated it, and got the same wrong result as before. Then I dictated it again and got the right result -- a half dozen times in a row.

It doesn't make sense, but it is a positive result. I will try training when I have problems in the future, even in situations where it shouldn't help.

Regarding the microphone and soundcard, I don't think that's it. When I run "Check your audio settings" I consistently get quality numbers above 20.

For a while I tried using a low-and audiophile's USB converter instead of my computer's built-in sound board, but it made no detectable difference in accuracy.

I have been using the same microphone for the last couple of years, but when I was experimenting with different ones, I found that different microphones delivered no detectable difference in accuracy except for the ones that were complete garbage. Yes, I have read about the importance of using an excellent microphone, but my experience has not borne that out.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.




view recent posts