Microsoft Vista OS Speech Recognition

Is anyone working with the latest release of Vista ?
Speech recognition appears to be well integrated in the OS and, by Microsoft's description, the speech engine has undergone significant improvements. There are also a number of interesting features that should greatly facilitate developing speech recognition applications.
Vista (build 5231) is available for downloading from MSDN. We have the 32 and 64 bit versions of this build, but will probably delay trying it pending the delivery of a Dell 1800.
Robbie

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Correct build number is

Correct build number is 5308.
Robbie

I have been playing with the

I have been playing with the 32 bit version for a couple of days. It is, I think, still a little rough. As you might think, the accuracy is pretty good, but it still leaves much to be desired in overall utility. They had made some positive changes in the program, but there are a number features that need to be improved. For example, The correction dialog is better than the previous version, but it is a little clunky if you want to do your editing quickly. You correct the work by selecting it, th eprogram provides you a numbe ro fhoices, you can sleect one of them, or say the word/phrase over again to get another selection list. If you find the right answer, you can selectit, or you cell the program you want to spell it out. If it is a long phrase, you have a lot of spelling to do becuase you cannot select individual parts of the phrase to edit. Regular users of speech recognition will not feel that it adds much of anything to what they can already do. Itamar has probably done more work here and has a better handle on the weakneses. On the other hand, it also seems that this version will be good enough to bring outsiders into using speech recognition because it will be easy enough to use. It should provide a usable alternative to Dragon that may spark compeition.

I just got the 64 bit version installed tonight--so far the most difficult thing about these beta builds has been getting the program to install--and have had little time to work with it. Obviously, it works like the 32 bit version, but my first impression is that the 32 bit version was smoother out of the box and more accurate than the 64 bit version. But that may not be fair. It has been a long day and my voice may be tired. I will report more when I have had more time to develop the voice model.

Frank Abbott

Hi Frank, Thank you for the

Hi Frank,

Thank you for the information. We have repeatedly tried Microsoft Speech in the past and have always been disappointed with its recognition accuracy. Hopefully, there has been some improvement in the latest version.
Do you know anything about sound board compatibility with 64 bit Vista ? I understand that there are problems with many drivers not working properly in the 64 bit environment. We are especially interested in the M-Audio Delta and the Lynx Two.

Robbie

I think the accuracy is

I think the accuracy is pretty good right out of the box. I agree with Frank regarding the clunkiness of the correction dialog. That is a significant problem.

I have not really checked

I have not really checked out sound boards in either my 32 bit or the 64 bit versions. I am a little gun shy on that one since in the previouus Beta (5270) having more than one sound card installed gummedup the whole process and likely disabled everything to such an extent that you couldn't get anything back until you reinstalled the whole system. I pulled my sound cards--SB Live24bit, SB Live 5.1, and all the USB cards--Andrea, Telex-- leaving only the on board realTek chip for this installation. It works fine, and I even get 31-35 ratings in the XP Pro DNS partition. I did try to hook up the USB card for a brief moment in Vista, but if caused the speech recognition to default to it, and I could not change that decision. When I unplugged the USB card, the system went back to the RealTek, and everything is OK. I am probably not going to try either of the older cards for a while. Cnequently, my guess is that if you have one card in your system, it will be OK, if you have two or more, only install one of them, and you should be OK; if you must have several cards and devices, my guess is that "you pay your money and you take your chances."

On a more positive note, I tried the 64 bit version this morning using MS Word 2003, and it worked much better than last night's brief trial. I suspect my problems were of my own making last evening.

By the way, for those trying MS in Vista, Itamar has posted some additional information on the Windows beta newsgroup on additional commands that is helpful. He may also have posted it to his Yahoo group, I have not looked there this morning. But it is worth reading.

Frank Abbott

I've been using the Andrea

I've been using the Andrea USB pod with Vista 5308. Like Frank found, I can't use more than one sound input. I can't switch to my SB Live! card.

I loaded build 5308 and it

I loaded build 5308 and it is a hopeless mess. It does install, but there are innumerable problems. I have worked with many Microsoft betas and usually by Beta 2 the software is fairly useable. This isn't true with Vista. There are countless serious glitches and the OS is extremely unstable. The new user interface is poorly designed and, when it does work, is very awkward to use.
A very major problem is that the Windows XP/Windows 2003 device drivers are incompatible with Vista and MS currently only provides Vista support for a limited number of devices. There are instability problems with the drivers that they do support. Regarding sound converters, Andrea USB works, but Telex USB doesn't. No drivers are available for M-Audio Audiophile or for ASUS 800 series motherboard audio chipsets. There is no support for our network card. The video adapter works as a "generic", but is not properly recognized and setup as is true with XP/Windows 2003.
The Microsoft Speech interface must have been designed in kindergarten. The feature and training displays and the user interface are terrible to put it mildly.
The only interesting thing is that when one is finally able to test some dictation the recognition accuracy is remarkably accurate. There is thus hope for the future if Microsoft can ever strengthen out the current operating system and program deficiencies.
Robbie

I don't disagree with

I don't disagree with anything you say (although this is my first beta testing of an OS, so can't compare). The recognition accuracy is great. If the macro facility is any good, and if the correction facility is improved, it will be very useful.

This is some additional

This is some additional information about Vista/Microsoft Speech that may be helpful.
As noted, Vista does require new device drivers. Few are currently available because many vendors do not want to distribute the drivers until there is a production release of a new operating system.
Creative has a list of the beta drivers that are currently available. They also list the sound cards for which Vista support may be available in the future. See: http://tinyurl.com/k2dkp

(edited by admin for length - note that neither original or replacement link works!)

http://dmzweb4.europe.creative.com/SRVS/CGI-BIN/WE...
/,/?St=96,E=0000000000107381018,K=8700,Sxi=0,VARSET=ws:
http://asia.creative.com,case=14186

Beta video card drivers are available from nVidia and ATI. Note that these may currently only work with a specific Vista beta build.
The question was raised on another forum about the availability of a MS SAPI 5.3 SDK that would include the latest improvements to the MS Speech Engine. The reply from one MS employee is that there will not be a new SDK that contains application tools similar to those of an early version of SAPI 4 nor will SAPI 5.3 be released except as part of Vista. I think that there is some doubt about the accuracy of the latter comment since MS will undoubtedly make the latest speech engine improvements available for the MS Speech Server. In any case, there is currently no readily available source of SAPI 5.3 outside of Vista.
Microsoft's Virtual Server software, which runs on Microsoft Server 2003, would potentially be helpful in testing the beta releases of Vista. Unfortunately some of the recent beta releases of Vista run on Virtual Server and others are incompatible.
Finally, this is not an advertisement, but I discovered that Dell is currently offering their $3024 PowerEdge 1800 Server for $1599. This looks like a real bargain for those who would like to have a high performance Dual Xeon 64 bit system. See:
http://configure.us.dell.com/dellstore/config.aspx...

Robbie

Rob Chambers has a good site

Rob Chambers has a good site for information about Vista Speech including a listing of the current commands:
See: http://blogs.msdn.com/robch/default.aspx

Robbie

We have had some real fun

We have had some real fun trying to install Vista in a multi-boot setting and obtaining the bare minimum of required device drivers. At least in our hardware/software configuration, Vista has a bad install bug that results in storage of some of Vista's system variables in the partition of another operating system that was installed before Vista.
Creative's beta sound board driver which works with Vista build 5231 (CTP) fails to install with build 5308. There is a consistent error message that "The parameter is incorrect." "BrancadeNeve" writing in Creative's user forum has a nice fix for this problem.
We use SATA drives and another problem is lack of any Vista drivers that support SATA PCI controllers. We did discover that Siig's 4 Channel SATA PCI driver will work with Vista if it is installed at the first step of a clean Vista installation. It cannot be installed once Vista is operational.
Vista itself remains very slow and buggy. Vista Speech has innumerable deficiencies that have been tabulated in a previous message. Given these considerations, there is little reason to work with Vista or Vista Speech other than out of curiosity. We will continue to update Vista as the new betas become available, but it looks as if there will be a long delay before the software is useful in a production setting.

Robbie

I expected to be able to

I expected to be able to install a system which would not muck up my
computer.

After a lot of effort getting nowhere with installation which
indicated there was insufficient disk space, and since there was no
indication in the documentation as to what the requirements are, after
a visit to the specialist Microsoft Vista chat room, I discovered that
one needed a partition of not less than 10 Gb free disk for the OS
system, plus 3 Gb free area in C drive for temporary and lead-in
folders and files.

After re-partitioning to create the space, I proceeded to install, and
asked for a clean install retaining existing OS systems (I have XP,
2000 and 98). It duly installed, and I messed about to try and get the
interface to be as I like it (I was not at all happy with the standard
interface, particularly in Windows Explorer, and I also did not like
my inability to turn off the LAN since the disable button was greyed
out).

I then rebooted, and lo and behold the only operating system I had was
Vista.

I then tried to reconfigure the boot.ini, and all other boot files to
allow at least one other OS system to work, but that did not work. I
simply could not access the other OS systems.

As my intention was simply to install Vista, and to try it out over a
few weeks, but continue to work in my old OS systems, I was stymied.

The solution was simply to reformat the C drive (unfortunately I did
not have a ghost image – since rectified! ), and I had to first of all
reinstall Win 98 into its existing partition, through it re-install
Win 2000 in the C drive (I already had it in the D drive), and I was
now able to get back into my D drive 2000 which is my main operating
system.

My next consideration is whether to put back XP, or leave the second
2000 on C drive.

The only purpose of using XP before was to access 2000 after each time
I installed new software, and had to replace my backup Sys 32 \ Config
\ System file which always seemed to become corrupted after installing
new software. (It is much simpler and quicker to do this than trying
to run repair from the installation CD.)

You all know the number of hours involved in doing what I did. What a
waste of time.

I e-mailed Jonathan Garcia, and he has put it to his tech department
to come up with an answer.

Vista beta is now very much on my back burner.

Quentin

1. Is it really cricket to

1. Is it really cricket to repost material originally posted elsewhere without doing minimal revisions, e.g., having the courtesy to explain who the mysterious Jonathan Garcia is?

2. Don't you know that installing beta OS on your production system is asking for every bit of frustration you experienced? Consider yourself lucky you got away with only this amount of misfortune.

In the future, use a clean HD, unplug your existing drives, and install the beta OS in the confidence you can unplug it and return to your working setup. If you must see how the betaware can mess up your existing software, clone your old disk(s) on the new disk, unplug the old ones, then install the betaware on the cloned disk. You can still gripe about your misfortunes, but at least you won't have to live with the consequences of your folly.

People who volunteer to be unpaid MS slugs really do deserve everything they get.

Bruce

BruceCyr wrote:1. ... to

BruceCyr wrote:

1. ... to explain who the mysterious Jonathan Garcia is?

Jonathan Garcia is the Program Manager Microsoft - Windows Speech Recognition, and he is working on that aspect of Vista .

Quentin

BruceCyr wrote:2. Don't you

BruceCyr wrote:

2. Don't you know that installing beta OS on your production system is asking for every bit of frustration you experienced? Consider yourself lucky you got away with only this amount of misfortune.

In the future, use a clean HD, unplug your existing drives, and install the beta OS in the confidence you can unplug it and return to your working setup. If you must see how the betaware can mess up your existing software, clone your old disk(s) on the new disk, unplug the old ones, then install the betaware on the cloned disk. You can still gripe about your misfortunes, but at least you won't have to live with the consequences of your folly.

Bruce,

The omission is that there is no indication of requiring anything other than a new clean partition for installation. It even goes so far as to allow for the installation to be in addition to existing OS systems. There is no indication that there should be any problems with this type of installation.

Quentin

That comment belongs in the

That comment belongs in the "Do you believe everything you read?" bin!

I think you, like Oedipus, are a victim of your own overweening hubris. You learned how to put OS layer after OS layer on a single box, laboring into the rosy-pink fingers of innumerable dawns dealing with every hitch out our your own hide and will power, until you deemed yourself OS-infallible! (Any estimate of the sheer time involved in that feat?)

But Vista will nail you, Dude, when you least expect it!

Bruce

PS: Please excuse my jocoserious application, after four and half-decades, of DWM learning to practical use.

Its good to hear from SR

Its good to hear from SR users as expert as all of you are about Vista SR's state of development, although its ironic that most of the commentary seems oriented toward problems with installing and using Vista itself!

The current version of Office 2003 SR produces excellent recognition, so usability factors seem be the main area where this product lags behind DNS and VV.

From discussions on the http://groups.yahoo.com/group/ms-speech/ it seems that the development team is a bit taken aback by the feedback on substandard usability. There is little doubt MS has the resources and resolve to improve this area, so it behooves us to maintain the pressure while the development team is intact and fully funded. It equally behooves us not to abandon prematurely products that work today and set the mark for newcomers.

Bruce

Does Vista speech work with

Does Vista speech work with digital voice recorders, or other handheld devices? if yes, what files formats does it support. We currently work with DNS8 and are hoping to have more options with the Microsoft Vista speech engine.

Thanks -- Chris

Rob Chambers's picture

I wouldn't say we're taken

I wouldn't say we're taken aback by the feedback... Smiling It's all part of software development. I rather appreciate the feedback, in fact, especially when it's constructive and gives specific suggestions for improvement.

I do think the software is signficantly better than previously releases, and our usability study data proves it. Is it missing some features that competitors have (competitors that have had software in market for 20+ years)? Yes.

But, I believe we've closed the gap significantly. We're also listening carefully to the feedback that we're getting (including feedback on this site), and we're starting to make plans for our vNext release.

If you have feedback for us, you should definitely let your voice be heard. Send us some email, if you want, at lis...@microsoft.com. After all ... We're listening ... Smiling

--robch

Hi Rob and Richard, It is

Hi Rob and Richard,

It is great to have the participation of the Microsoft speech program developers in this forum. Hopefully, this will be of benefit to all who are interested in the development of speech recognition software.
Robbie

Sorry. Instead of "taken

Sorry. Instead of "taken aback" would you buy "eye opening"?

Current SR users are a pretty direct, articulate group -- sort of a focus group weaned on steroids Smiling If you can satisfy them, I think you'll wind up with a nice product.

Bruce

sprague's picture

Rob Chambers is the

Rob Chambers is the Microsoft architect for the Windows Vista speech UI, so please look at his blog or send him email if you have any trouble getting it to work. We want to hear feedback about your experiences.

I use build 5308 myself on my Toshiba Tablet M200, and although this version of Vista can be a little flakey, I haven't had trouble with the SR. Earlier versions of Vista (esp. the Beta1) didn't have our best SR engine yet, but at this point you shouldn't have trouble with the core recognition. If you do, I definitely want to know about it so we can fix it.

btw, 32-bit and 64-bit should be identical.

Microsoft finally appears to

Microsoft finally appears to have a decent speech recognition engine. It shouldn't be difficult to benefit from the experiences of the user interface designs of NaturallySpeaking and ViaVoice and to develop proper interfaces for Vista Speech. Also, Microsoft actually included a good user interface in one of the early versions of the SAPI 4 SDK.
A second issue is that a number of developers are requesting a SAPI 5.3 SDK that would be compatible with Vista and with the Windows XP/Server 2003 operating systems and would include speech tools and applications similar to those of the early SAPI 4 SDK.

Robbie

Rob Chambers's picture

Richard Sprague is your guy,

Richard Sprague is your guy, if you want to convince anyone at Microsoft to get a new version of the SAPI 5.3 SDK out to use on XP on Win2k3.

--robch

This is a more detailed

This is a more detailed description of testing of Vista Speech.

Hardware

The test computer system is a Dell PowerEdge 1800 with dual Xeon 3.2 GHz central processing units, 2 GB of RAM and dual SATA hard drives. The original CD-ROM drive was replaced with a Sony DRU-820A DVD/CD R/W drive. The sound board is a Creative X-Fi. It was chosen because it is one of the very few sound converters for which a Vista compatible driver (beta) is currently available. The microphone is a Sennheiser MD 431 II and the preamplifier is a Grace model 101. Windows 2003 Server 64-bit is installed on one hard drive and Vista 64-bit on the other in a dual boot configuration.
The two operating systems installed without any apparent problems. Vista, fortunately, was able to support the basic computer hardware including the DVD and the hard drives. Vista is a very large operating system that will not fit on one CD-ROM. Installation of Vista is an extremely slow process.
The video adapter is an "onboard" Radeon 7000. Windows 2003 Server provides correct support for this video adapter. There is generic support with Vista that is not perfect, but is usable. The beta driver for the Creative X-Fi appears to be fairly stable. We did load Creative’s application software using XP2 compatibility mode. It is fairly usable, but some of the applications including the media reader tend to hang. Nero’s applications can be installed in XP2 compatibility and seemed to work fairly well with the Sony drive.
It is very important to note that Vista requires new device drivers. The few that are currently available are beta code.

Vista Speech Recognition

Microphone Set-Up

The set-up routine lets one correctly select a sound adapter. We tried this with the Creative X-Fi and an Andrea USB. Sound level is set manually using a tri-color volume display. There is unfortunately no frequency spectrum display as is true with NaturallySpeaking. A display of this latter type is very useful for observing the distribution of signal to noise levels over the audio spectrum.

Speech Recognition Training

The training window displays either one sentence, or a fragment of a sentence, at a time. I believe that it is far better to display the text in paragraph form and to highlight the progression through the text as it is dictated. The sentence fragments are especially disconcerting because it breaks the pacing of the dictation. It is possible to pause the dictation, but there is no capability of backing up and re-dictating the text. The display provides no indication as to whether or not the dictation is acceptable. The program goes on to the next sentence regardless of what one says. You have no indication whatsoever as to how the dictation is progressing.
The dictation window lacks an audio volume display.
There is an opportunity to dictate additional text after the initial training session. Unlike other speech recognition programs, the user cannot select the nature of this text. For example, NaturallySpeaking and ViaVoice provide choices of texts from various books and also specific applications like business letters or technical reports.
On completion of the speech recognition training there is a very short delay before the program is ready to be used. This makes one wonder how much processing is carried out on the training dictation.

Dictation

Recognition accuracy appeared to be fairly good with very limited first use of the program. Unfortunately, this has not proven to be the case with more extensive dictation. The speech engine is more accurate than previous Microsoft versions, but is still far less than that of NaturallySpeaking. I do not think that the current recognition accuracy is adequate for any production usage of the program.
The correction window pops-up correctly on speech command and does display a reasonable list of alternatives. There is apparently no way to highlight the text manually and to pop-up the window either with a key command or a voice command like "correct that". Entering new words is a major problem. The word must be entered by voice spelling. This is an extremely awkward process. The word cannot be entered by typing. There is no way to pop-up a window that would permit one to train the new word.
You cannot display, add, delete, edit or to train words in the vocabulary. There are no consumer level tools to work with specialized vocabularies.

Impression

The lack of adequate recognition accuracy and the inability to work with the vocabulary are, unfortunately, very major limitations of Vista speech.

robbiex wrote:This is a more

robbiex wrote:

This is a more detailed description of testing of Vista Speech.

Hardware

The microphone is a Sennheiser MD 431 II and the preamplifier is a Grace model 101.

Why are you using a pre-amplifier? In our use and tests it is unnecessary. Try the microphone with just the Andrea USB pod. Our accuracy is superb.

--
Martin Markoe, eMicrophones, Inc.
The best microphones for Speech Recognition
See us at eMicrophones.com
Read, "Key Steps to High Speech Recognition Accuracy"

Speech Computing

Hi Marty, We have tried the

Hi Marty,

We have tried the Sennheiser with and without a preamplifier with both the Creative X-Fi and the Andrea USB. The signal level was low with the Andrea without the preamplifier. (This is a different experience compared with DNS and may have been due to the USB driver and Vista 64 audio processing. )
Audio is significantly cleaner with use of a preamplifier with both of the sound converters.
I don't know what to say about the Vista speech accuracy. Sometimes it is fairly good, but at other times it is very poor. The system is quite sensitive to audio amplitude which is no surprise.
We did do "second level" training. We did not train on a document base and, obviously, the recognition system doesn't have the benefit of a specialized vocabulary. The testing has been with general text, not medical reports. I did try some medical text and had to give up on this because of the marked problems with adding, training and editing new words. It has been possible to easily adapt other speech recognition programs to medical terminology without starting with a specialized vocabulary, but this is not practical with Vista speech and is a very major deficiency.
A number of Vista Speech's deficiencies, including the vocabulary problems, can possibly be resolved through direct use of SAPI 5.3. We haven't tried this and will probable wait until Vista itself is more stable.
Out of curiosity, I plan to try NaturallySpeaking on the 64 bit versions of Windows 2003 Server and Vista. I know that NaturallySpeaking doesn't use 64 bit code. Some 32 bit programs run properly on the 64 bit platforms and some don't. Perhaps someone else has already tried this and can report on their results. It is almost invariably necessary to execute the installers in XP2 compatibility mode because they otherwise fail due to an unrecognized OS.
Robbie

Robbie, I've had much better

Robbie,

I've had much better results with MS Office 2003 SR than that, but I may not use it any time soon because Vista itself will greatly exceed my budget if this guy is right:

"Why Vista Will Suck"
http://www.desktoplinux.com/articles/AT8288296398....

He says if your PC wasn't built in 2006, you'll need a new, top of the line one to run the full-blown AeroGlass interface. Of course, you have to factor in the fact that he's a Linux guru.

The other big negative about Vista SR (per your analysis and those of others) appears to be its deficient user interface, although they may get that cleaned up before Vista debuts.

At the moment I'm hoping Nuance decides it can still make a profit on at least one more iteration of DNS.

And wouldn't it be neat if IBM comes swinging back with a top-notch ViaVoice refresh to reward Quentin Crivon's devotion!

Bruce

BruceCyr wrote: And wouldn't

BruceCyr wrote:

And wouldn't it be neat if IBM comes swinging back with a top-notch ViaVoice refresh to reward Quentin Crivon's devotion!

Absolutely right Smiling

Hi Bruce, Vista is a very

Hi Bruce,

Vista is a very large, unwieldy, and inefficient operating system. One understands that the inefficiency is due in part to the debug code that is in beta software and the lack of optimization of the programs. Nevertheless, the release version of the OS is still likely to require a fast computer with large sized RAM and disk storage. I see little reason to "update" to Vista except for the possibility of using Microsoft Speech.
The safest way to test Vista is on a stand-alone computer. Microsoft has always had a poor implementation of multi-boot including a lack of any capability re-sizing existing partitions. It would certainly be advisable to use third party software for multi-booting. We did install Windows 2003 Server 64 bit on one hard drive and Vista on a second one. MS boot can switch between these partitions. Interesting Server doesn't see the Vista drive. Vista sees both drives and, most unfortunately, writes registry and other system data onto the "C" drive, which belongs to Server, rather than onto to "D" where the Vista OS is resident. Needless to say this corrupts the Server entries on "C". This glitch is probably solvable with a registry or INI hack, but we don't know the details.
I did install NaturallySpeaking on Windows Server 64 bit and it runs fine on this OS.

Robbie

robbiex wrote:Vista is a

robbiex wrote:

Vista is a very large, unwieldy, and inefficient operating system. One understands that the inefficiency is due in part to the debug code that is in beta software and the lack of optimization of the programs. Nevertheless, the release version of the OS is still likely to require a fast computer with large sized RAM and disk storage.

It is my understanding, and this is certainly an educated understanding, that Vista is a total rewrite of the Windows operating system. The workgroup responsible was told to go back and start from the beginning. It is to be more UNIX like and more security of proof on orders from high above. Therefore, what you and I are seeing now is very rough around the edges. I certainly am not going to make a judgment until the product is finished.

--
Martin Markoe, eMicrophones, Inc.
The best microphones for Speech Recognition
See us at: http://www.eMicrophones.com/index.asp
Read, "Key Steps to High Speech Recognition Accuracy" at:
http://www.emicrophones.com/docDetails.asp?Documen...

II would like to make a

II would like to make a correction. This probably belongs on the NaturallySpeaking forum, but the original comment was made here. It is true that NaturallySpeaking will run on the 64 bit version of Windows Server 2003. It seems to function properly with the WordPad. Dictation into an application, however, results in a many seconds delay in the appearance of each segment of the recognized text. Also, the class IDs of some of the SDK modules are not properly set-up and calls to these routines will not work.
These problems could probably be easily resolved by the vendor, but I doubt that there is much incentive to do this given the current limited usage of 64 bit Windows operating systems.
Robbie

robbiex wrote:This is a more

robbiex wrote:

This is a more detailed description of testing of Vista Speech.

Hardware

Microphone Set-Up

The set-up routine lets one correctly select a sound adapter. We tried this with the Creative X-Fi and an Andrea USB. Sound level is set manually using a tri-color volume display. There is unfortunately no frequency spectrum display as is true with NaturallySpeaking. A display of this latter type is very useful for observing the distribution of signal to noise levels over the audio spectrum.

****************************************************************************************************
Will Vista recognize and work with a wireless Bluetooth microphone?

Charles

This is an update on Vista

This is an update on Vista Speech.
We have been working with the SDK to see if some of the user interface problems can be resolved. It is difficult due to the erratic behavior of multiple sections of the software including Vista itself, the Vista SDK, SAPI 5.3, WinFX and other required components. Documentation for 5.3 is almost non-existent.
It has been possible to finally run Microsoft's sample program DictPad on version 8 of the recognition engine. I haven't checked to see which vocabulary is being used by Dictpad, but recognition accuracy using simple medical terms was outstanding. The following is a first time dictation that was error free:

Quote
The dictation system is working fairly well at this time.

Problem number one: hypertension.

The patient is feeling well generally. Moderate physical exertion is well tolerated. She has had no recent chest tightness, increased dyspnea, or any known arrhythmia.
Unquote

It is interesting that "dyspnea" and "arrhythmia" were correctly recognized. These words are not in the main vocabulary, but had been previously added to the user vocabulary.
It looks like Vista Speech will be an excellent product pending resolution of what I believe to be severe deficiencies in the correction window and the ability to work with the user and main vocabularies.

Robbie

robbiex wrote: The following

robbiex wrote:

The following is a first time dictation that was error free:

Quote
The dictation system is working fairly well at this time.

Problem number one: hypertension.

The patient is feeling well generally. Moderate physical exertion is well tolerated. She has had no recent chest tightness, increased dyspnea, or any known arrhythmia.
Unquote

First time dictation using ViaVoice 10.5 Pro USB :-

"Problem number one hypertension.

The patient is feeling well generally. Moderate physical exertion is well tolerated. She has no recent chest tightness, increased dyspnoea or any known as arrhythmia."

Dyspnea appears to be incorrectly spelled due to my inability to pronounce the word dyspnea, unless in fact the ViaVoice spelling is correct and the Vista version is not! Smiling
Quentin

Hi Quentin, Both are

Hi Quentin,

Both are correct.

Quote
Definition of Dyspnoea

Dyspnoea: Difficult or labored breathing; shortness of breath.

Dyspnoea is a sign of serious disease of the airway, lungs, or heart. The onset of dyspnoea should not be ignored but is reason to seek medical attention.

The word dyspnoea comes the Greek "dys-", difficulty + "pnoia", breathing = difficulty breathing.

Dyspnoea is the British spelling. The American is dyspnea.
Unquote

Robbie

Itamar

Itamar Even-Zohar
ita...@even-zohar.com

Hello all,

I have uploaded a file to the Files area of the ms-speech Yahoo group.

File : /MISSING AND MALFUNCTIONING FEATURES IN VISTA SPEECH.doc
Uploaded by : itamarez
Description : A SURVEY OF OPINIONS

You can access this file at the URL:
http://groups.yahoo.com/group/ms-speech/files/MISSING%20AND%20MALFUNCTIONING%20F\
EATURE%20IN%20VISTA%20SPEECH.doc

Please have a look and vote! At least we may save some of the nasics that have been eliminated from Vista.

Regards,

Itamar Even-Zohar

Hi Itamar, Can you post the

Hi Itamar,

Can you post the survey here ? I get an endless loop with attempts at accessing the Yahoo file.
Thank you.
Robbie

Itamar

Itamar Even-Zohar
ita...@even-zohar.com

Robbie,

I have put it on my Website, as it is clearest when in an MS-Word Table. Please go to

http://www.tau.ac.il/~itamarez/sr/vote_vista.doc

It is NOT in the public domain at this stage.

If you haven't already, please have a look at 'Good and Bad News in Vista SR', too. Your comments would certainly be useful. The address of my Website is

http://sr.even-zohar.com, or http://speech.even-zohar.com

Regards,

Itamar

Itamar Would you update your

Itamar

Would you update your comparative table in your general survey relating to ViaVoice.

Office 2003 is supported in ViaVoice 10.5.

Quentin

Quentin

Hi Itamar, Thank you very

Hi Itamar,

Thank you very much for the links.
What follows is a fairly complete listing of the major features that we believe are important.

Robbie

Speech Recognition Features

Audio Input Window

1. Ability to select sound adapter.
2. Ability to select input type, that is: Microphone, line in, or digital.
3. Options for setting audio level automatically and manually.
4. Frequency spectrum display to provide relative indication of signal/noise amplitudes.
5. Audio volume level display including an indication of the optimum signal amplitude limits.

Speech Recognition Training Window

1. Display of one to several paragraphs at a time.
2. Graying out of text as successive words are recognized. An alternative approach is to highlight the last successfully recognized word.
3. Audio amplitude indicator.
4. Progress through text indicator.
5. Ability to pause, back-up, repeat and skip text.
6. List of choices for additional training after the introductory training. The choices should include selections from general and specialized texts; for example, business letters or medical reports.

Dictation

1. Full-capability dictation into any standard Microsoft textbox control that is displayed by any application program
2. Control key selection of command or dictation modes.
3. Ability to pop-up the correction window by speaking "correct" followed by the phrase to be corrected or by highlighting it and commanding "correct that", meaning correct the highlighted text.
4. "Scratch that" meaning to delete the most recently dictated phrase.
5. Various commands for navigating through text.
6. Microphone on/off by control key press or voice commands.
7. Text to speech playback of all or selected text.
8. Control key press for selection of post dictation spelling and grammar checks.
9. Option for vocabulary switching.

Correction Window

1. Display of single word or phrase alternatives.
2. Selection of alternate by key press or voice command.
3. Entry of new word or phrase by voice spelling or typing.
4. Voice training of new or mis-recognized word or phrase.
5. Option to cancel or exit window.

Vocabulary

1. Entries may be single words or phrases.
2. Provision for specifying both actual spelling and "sound like" representation of word or phrase.
3. Ability to display, search, sort, edit, add, delete and train any word or phrase.
4. User option for adding specialized vocabularies.
5. Option for identifying primary and secondary vocabularies.
6. Availability of specialized vocabularies; for example, legal or medical.

Utilities

1. Backup and restore of user (training, options, etc) and vocabulary files.
2. Add, delete, edit and execute user developed macros.
3. SDK for Visual Basic and Visual C program access to all speech program APIs including audio, speech engine, dictation, vocabulary and control functions.

Recognition Options

1. User and specialized vocabulary specific options.
2. User selected, context sensitive control of abbreviations and number formatting.
3. User selected speed versus accuracy selection.

More frustrating problems

More frustrating problems with Vista Speech.

There is no apparent way to edit or train the words in the main vocabulary.
Disabling a word in the main vocabulary doesn't work - at least for me.
The vocabulary doesn't support phrases.
The vocabulary doesn't support entries that are "spelled as", but "spoken as".
Dictation into an application doesn't work with all standard Microsoft textbox controls.
There is no apparent way to process information from a document base after the first program start-up.

Robbie

Rob Chambers's picture

Hi Robbie, [I'm not sure if

Hi Robbie,

[I'm not sure if you posted this on the Vista Beta newsgroups or not ... If you didn't, it'd be great if you did also post this type of information there so the engineers on the team can see your input]

You can add a pronunciation for any word, by using the Speech Dictionary. It's true that you can't "train" a word, but you shouldn't typically need to "train" the word with the MS SR engine. That said, a 3rd party tool could be written using the SAPI APIs to do just this. Hopefully we'll be able to document this really well, and even provide a sample in the future.

Robbie, could you please log a bug on connect for the fact that the disabling of a word doesn't work for you. Please include what the word is, and the steps you took to disable it, and the steps you took to see that it wasn't disabled. Once we get that information, we'll have somebody take a look.

I don't understand what you mean when you say "The vocabulary doesn't support phrases". Could you elaborate?

Also, if you could log bugs on the specific fields you encountered that dictation wasn't supported in, that'd be great.

I also don't understand what you mean when you say to "process information from a document base" ... If you mean have the recognizer take a look at things you've written, that feature does exist. It's a check box on one of the first few screens. We'll then look on your computer for documents that you wrote, and analyze them so we'll better recognize what you say in the future.

Thanks for the feedback...

--
Rob Chambers [MSFT]
http://blogs.msdn.com/robch/default.aspx
Architect - Windows Speech Recognition - We're Listening...

This posting is provided "AS IS" with no warranties, and confers no
rights.

Rob Chambers wrote: I also

Rob Chambers wrote:

I also don't understand what you mean when you say to "process information from a document base" ... If you mean have the recognizer take a look at things you've written, that feature does exist. It's a check box on one of the first few screens. We'll then look on your computer for documents that you wrote, and analyze them so we'll better recognize what you say in the future.

Basically both DNS and ViaVoice presently analyse documents in a user's database, and add any new words found which can then be trained if the pronunciation is not already known.

The impression from your last sentence "We'll then look..." is that Microsoft will look into the computer for documents to analyse. Does this mean that it is done on line by Microsoft? If so this is not satisfactory. The entire of the analysis should be possible within the computer concerned as presently exists with current SR programmes. This is particularly relevant in the case lawyers or doctors where confidential documents would be analysed. These cannot be disclosed to third parties, such as Microsoft.

Quentin

I would also add that I have

I would also add that I have read on the various groups and sites about half the existing software will be rendered useless in Vista.

Surely it is not beyond the wit of the developers to ensure that existing programmes (used in XP and 2000) can be used in Vista. In particular, in relation to SR, it should be possible to transfer the existing trained users into Vista SR If they cannot, this is going to act as a disincentive to upgrade to Vista.

I also hear that Vista is not going to be commercially available until January 2007. This is causing not only not only the manufacturers, but also the retailers, (and Microsoft shareholders) great concern, since, in particular, the Christmas trade for new computers will be destroyed. Nobody is going to buy a computer with an OS system which is going to be patently out of date within weeks.

Quentin

Rob Chambers's picture

I'm not saying that will

I'm not saying that will actually transfer the information back to Microsoft. I am saying however that we will analyze the information on the local client computer. This is an optional feature that you can turn on or off during the first time user experience.

--
Rob Chambers [MSFT]
http://blogs.msdn.com/robch/default.aspx
Architect - Windows Speech Recognition - We're Listening...

This posting is provided "AS IS" with no warranties, and confers no
rights.

Hi Rob, Thank you again for

Hi Rob,

Thank you again for the feedback. Your comments and assistance are greatly appreciated.
It is possible to voice train a new word from the correction window, but you cannot voice train a word that already exists in the main vocabulary. The assumption in the latter case may be that there is an automatic correction. It has, however, been my experience that the new words that are voice trained are almost always subsequently correctly recognized. Mis-recognized main vocabulary words frequently continue to be mis-recognized and the user has no option for voice training these words.
You can display, add, delete, edit and voice train words in the user's vocabulary, but there is no known access to the main vocabulary. The latter is essential for many reasons. You need to be able to voice train mis-recognized words and it is helpful to be able to delete words that are rarely encountered, but which are confused with frequently used words. An example is that "breweries" is confused with the medical term "bruits". We don't have much contact with breweries in our setting and the best solution is to remove this word from the active vocabulary. We have had no success in disabling a word in the main vocabulary.
It has been my understanding that the vocabulary cannot handle phrases. For example, we might have a Dr. Neill Smith whose "Neill" becomes "Neal". Voice training "Neill Smith" as a single entity, that is, a fixed phrase, corrects this problem. Phrases are also essential for handling commands, abbreviations, "sounds like", but is "spelled as" and other similar circumstances.
Document processing usually means that the user can select a directory that contains the documents or one, or a range of, documents can be selected from within a directory. There are always issues of document formats, for example, doc, rtf, or txt; and limits on the size of an individual file which may contain a single long document or multiple documents. Document processing typically includes finding all the new words that are not in the user or main vocabularies, giving the user the choice of including or omitting the new words and of training the new words. Finally, the document text should be used to update the speech engine's statistical model. Vista Speech does give one the opportunity to do some kind of screening of the documents in "My Documents" on initial start-up of the program. You cannot, however, select other directories and checking the document choice after the initial setup doesn't seem to do anything. No details are available concerning the operation of the document screening.

Robbie

MSDN Magazine has an

MSDN Magazine has an interesting article on Vista Speech.
See:
http://msdn.microsoft.com/windowsvista/default.asp...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.




view recent posts