Registering and logging in removes this ad.
Registering and logging in removes this ad.
Using Dragon With Open Office
Submitted by Andy W on Sun, 06/29/2008 - 01:14.
Before I go download and set up open office, I was wondering how well it works with Dragon? I appreciate any comments or suggestions regarding this. Thanks in advance.
Andy



Don't Even Think about It
Before installing OpenOffice consider this. Do you enjoy watching HD 1080p Blu-ray DVDs or 1080i broadcast HDTV? If so, how would you feel about going back to 480i standard broadcast color after having experienced HDTV? This is what you'll be doing when you install OpenOffice. The office suite is really impressive but unfortunately it's not fully NaturallySpeaking compatible. You will be able to dictate but you won't be able to enjoy Select-&-Say and once you been spoiled by Select-&-Say it is very hard to go back to dictating a sentence that ends in a period, manually being forced to move the cursor to the middle of a sentence to add a missing word and seeing that word appear 2 spaces to the right of your cursor (instead of a single space) and seeing the beginning of the next word appear with a capital letter instead of an appropriate lowercase letter. No more selecting phrases, no more automatic spacing (you'll see extra space and words jammed up together) etc.. If this sounds like fun then by all means download OpenOffice.
You will find the rest of our answer on the KnowBrainer forums at http://knowbrainer.com/PubForum/index.cfm?page=viewForumTopic&topicId=49...
Lunis Orcutt - Developer of KnowBrainer &
Host of the http://www.TheSpeechRecognitionStore.com
A Nuance Gold Certified Endorsed Dragon NaturallySpeaking Vendor/Trainer
ALWAYS Ask If Your Speech Recognition Vendor Is Nuance Certified
Andy W wrote:Before I go
Before I go download and set up open office , I was wondering how well it works with Dragon? I appreciate any comments or suggestions regarding this. Thanks in advance.
Andy
Andy,
Just to simplify what Lunis has said (KnowBrainer) because his comments are a little esoteric, the bottom line is DNS does not work and is not compatible with Open Office.
Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS
"Life's Rule #1: Once you pull the pin, Mr. Grenade is no longer your friend." (Variant of Murphy's Law - Edward A. Murphy, Jr)
slight modification on what others have said
I do occasionally use Dragon with open office (which is to say, I always use Dragon, I sometimes use open office). I prefer using open office for myriad reasons, however, as others have said, Dragon and open office are not compatible. You can manipulate the menus with keystroke commands, but select and say doesn't work (e.g., you can't say "select [words that you want to have selected]"). It doesn't hurt to have it installed on your computer, but it is unlikely to be your first choice word processor. Alas.
jadelennox wrote: I do
I do occasionally use Dragon with open office (which is to say, I always use Dragon, I sometimes use open office). I prefer using open office for myriad reasons, however, as others have said, Dragon and open office are not compatible. You can manipulate the menus with keystroke commands, but select and say doesn't work (e.g., you can't say "select [words that you want to have selected]"). It doesn't hurt to have it installed on your computer, but it is unlikely to be your first choice word processor. Alas.
As Open Office is Open Source, perhaps someone "in the know" could ask the developers if they could make it "Select and Say" compatible on the Windows version. I wonder if the problem might be in Java and not OO?
I don't know what's involved or I'd ask them!
Show Me the Money
Like most companies, Nuance is driven by profits. If you can convince them that there is a profit in spending development funding on any given project, they will do so but they are already aware of OpenOffice and currently see no reason to justify the R&D expense. For the most part, Nuance only supports a handful of mainstream programs and doesn't even support ACT or QuickBooks. We would think the demand for these types of programs would be higher than OpenOffice. However, most of these programs can utilize standard Microsoft rules and their prospective manufacturers can make their own products Select-&-Say enabled. We have a tendency to blame Nuance when in fact, it’s the other manufacturers who refuse to play ball with the gorilla and this is one more reason why Microsoft wins because they actually work with the Nuance developers. The REAL solution is for the other manufacturers to get with it. Frankly, we're surprised that WordPerfect is supported because they don't exactly play ball either.
Lunis Orcutt - Developer of KnowBrainer &
Host of the http://www.TheSpeechRecognitionStore.com
A Nuance Gold Certified Endorsed Dragon NaturallySpeaking Vendor/Trainer
ALWAYS Ask If Your Speech Recognition Vendor Is Nuance Certified
I dont HAVE Microsoft Word
Lunis,
Openoffice is not some new marginal gadget that people suddenly install additionally to their MS Office suite. Sure, if someone has already payed lots of money for both MS Office and DNS then changing that runing system would be a deliberate regression. But today there are loads of price sensitive people out there who have been socialised with the free Openoffice suite. Particularly academics who need styles and footnotes and such for their work that are way better implemented in OOo than in MSO. And only now do they consider or encounter the beauties of speech computing. How can DNS/Nyance be so uneconomical to tell them to either buy MSO additionally to DNS (which also means discarding/converting all of their formerly produced text files) or use a text editor such as Wordpad? If there´s a support for WordPerfect the same should be possible and neccessary for the growing community of OOo Users. In my circle of acquaintances OOo IS standard and I refrain from suggesting DNS to any of my friends because they wouldn´t be able to use it to its full extend.
Platon
platon wrote: Lunis,
Lunis,
Openoffice is not some new marginal gadget that people suddenly install additionally to their MS Office suite. Sure, if someone has already payed lots of money for both MS Office and DNS then changing that runing system would be a deliberate regression. But today there are loads of price sensitive people out there who have been socialised with the free Openoffice suite. Particularly academics who need styles and footnotes and such for their work that are way better implemented in OOo than in MSO. And only now do they consider or encounter the beauties of speech computing. How can DNS/Nyance be so uneconomical to tell them to either buy MSO additionally to DNS (which also means discarding/converting all of their formerly produced text files) or use a text editor such as Wordpad? If there´s a support for WordPerfect the same should be possible and neccessary for the growing community of OOo Users. In my circle of acquaintances OOo IS standard and I refrain from suggesting DNS to any of my friends because they wouldn´t be able to use it to its full extend.
Platon
At least in principle I would agree with you. However, we live in a Windows world at this point. If software developers create applications that do not use standard Windows text editors, then there is nothing that Nuance can do to link with those applications.
Everyone has to remember that the interface between an application and DNS relies on SAPI (Microsoft's Speech Application Program Interface). Nuance has no control over what SAPI is capable of doing. However, if an application interface doesn't, or isn't, able to interface with SAPI through standard text edit formats, then there is nothing that Nuance can do to make it work as regards to Select-and-Say. And, depending upon the text edit format that an application uses, DNS tries to employ SAPI to communicate in the best possible way. Unfortunately, that means that DNS is limited to what SAPI can or cannot do as far as communicating between DNS and the application in question.
So, the bottom line is don't blame it on Nuance because it is not Nuance's problem. You have to blame that on the way that Open Office is designed because they choose not to make their application SAPI compliant, and until they do you will have a problem using Open Office.
OpenOffice has to comply with the requirements for SAPI and until they do, and that is a Microsoft/OpenOffice conflict that has little or nothing to do with DNS. If OpenOffice doesn't conform to SAPI requirements, then no amount of effort on the part of Nuance is going to make it work. SAPI is as SAPI does and only Microsoft or OpenOffice can fix that.
Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS
"Facts do not cease to exist because they are ignored." -- Aldous Huxley
Platon There are people who
Platon
There are people who use NaturallySpeaking successfully with Open Office.
NB: my following comments are based on users' experience/s with earlier versions of OO and DNS.
The usual problem is that spoken corrections become misplaced ('cursor jump'). I worked with some users on this problem -- the posts are on this Forum, somewhere -- one of them found the workaround, to use 'Spell That' at the correction substitution phase. I might have to resort to that or see if it works with NotaBene, as I can't afford Word right now. I'd urge people to at least try that.
Of course what Chuck etc. have said is correct insofar as it concentrates on explaining the difficulty of making DNS 'compatible' with everything! but it perhaps overemphasises the difficulty of working with some 'non-standard' programmes. Having said that, increasingly, it seems, despite the massive number of people who use OO, there's an expectation we'll all use Word and indeed that we all possess Word. Alas.
Judy
oops
"but it perhaps overemphasises the difficulty of working with some 'non-standard' programmes"
I meant, perhaps overemphasises the difficulties DNS end-users who use it with OO, encounter.
Judy
hopes up
Judy, you're really geting my hopes up now, I have been messing around with that dictation window for a while now but it's such a waste of time because I actually want to use footnotes and database integration in OOo. But I searched this forum for a while now with no success. Neiter Judy nor openoffice nor "spell that" nor "cursor jump" found me a helpful result. What should I search for?
Thanks
platon
platon I can't find the
platon
I can't find the posts here either (and can't go through them all right now). I did find the post below, which may help.
Also, what using Spell That with Open Office does is eliminate the problem that if you use Correct That, corrected text will be misplaced.
http://www.speechcomputing.com/node/538
Judy
spell that in German
dear Judy,
I've fiddled around a bit with different commands now but I dont seem to be able to find a way to avoid the misplacement of the cursor/corrected text. To complicate things I am using the German version so I can only guess that "Correct That" would be "Korrigier Das" and "Spell that" is the command "Buchstabier Das". Neither of these commands makes a different in the whole mißplacement issue. What confuses me even more is that the mißplacement occurs only after a while of working with OOo and DNS. For some time "Correct that" would actually highlight the correct string of words. Then suddenly the selection will be one digit off the actually dictated string and consequently replace not all of the misstyped string but a bit of another piece of text. Furthermore when I use the arrows to go to the next/previously dictated string, sometimes the highlighted area falls back into place and actually marks the entire string. Even restarting OOo does not change this behavior but it might fix itself randomly when selecting strings, getting a mißplaced highlight and then cancelling the selection.
If there really is a solution to this problem I`d happily put up with other mishaps like the occasional crushing of OOo when correcting something or DNS only remembering the last few words I dictated.
As for the other suggestions: "Enable that" is a more versatile version of Dragonpad - still won´t be able to do footnotes and such things so I would still have to work in two su procedures: dictating first, then amending footnotes and quotes.
And thanks to Bruce but I can´t use MSO 97 because I'm not just word processing. In OOo I have a textfile, a database and a bibliography all of them working together beautifully and there would be no way integrating them into MS Word. Still: Thanks to Quentin also for offering help with VV.
I would be very thankful if anyone could point me towards the workarounds that have been developed to use DNS INSIDE OOo.
Platon
This may help understand the problem
This thread gives more information about the problems faced with Open Office and any speech recognition program.
http://www.speechcomputing.com/node/1487
The way I see it is that if a large body of professionals are using Open Office, and contact Nuance asking for the ability to work in Open Office for DNS there's a larger chance for it to happen.
It wouldn't hurt to have the developers of Open Office look toward detecting and using SAPI if it's available as well.
I think I have all that right...
First, MS Office 97 is a
First, MS Office 97 is a complete suite of applications which include all of those that you are using in Open Office, but of course there is scant chance you'll be able to transfer your current information to those applications. Given your current investment, you seem to have no feasible option except to continue on your current course.
Second, the thread that Skip cites plus the link to Peter Maddern's blog entry basically explain the limits of DNS functionality within Open Office. It also explains the circumstances under which Select and Say/Correct functionality is lost. Basically it will work within the most recently uttered text. When you try to go back beyond the most recent utterance, it's pretty much a crapshoot.
So far as workarounds are concerned, unless some new information shows up here, you may be on your own. If you can convince yourself to forget what you know about DNS' Select and Say/Correct functionality, it should be a breeze
FWIW, learning how to install and use Vocola and/or UniMacro could help bridge some of the gaps.
Bruce
Platon I'm sorry you're
Platon
I'm sorry you're still having the problem. Correct That certainly would be Korrigier Das. I thought Buchstrabier had to do with using a military (e.g.) alphabet, but my German is really rusty!
That the problem begins a little way into a document would normally suggest a formatting problem but here, I imagine, it simply is to do with OO and non-standard windows.
The posts here about the issue probably will not help you, I will try to get in touch with one of the people who, after reading them, used the Spell That technique. (They posted about it on the Knowbrainer Forum, as I recall it, but I think the posts were lost.) You could also trying re-setting the correct and select options, to see whether that helps.
Judy
Note that the work-around
Note that the work-around Judy mentioned involves just one part of a multi-part problem when you try to work in Open Office with DNS. There is no practical way to make DNS work in OO the way it usually works in M$ Office.
Therefore, I regret to suggest, the more productive use of your time might be to figure out how to beg, borrow or steal a copy of M$ Office.
The only practical alternative may be to pick up copies of Office 97 and ViaVoice, both of which should be cheap. We have a strong VV advocate, Quentin Crivon, who would probably be a gracious and competent cicerone were you to opt for that trek.
HTH,
Bruce
BruceCyr wrote: The only
The only practical alternative may be to pick up copies of Office 97 and ViaVoice, both of which should be cheap. We have a strong VV advocate, Quentin Crivon, who would probably be a gracious and competent cicerone were you to opt for that trek.
Dragged in by the short and curlys (if only I had some left!)
If you wish to go the VV route, please contact me and I will give such assistance as I can.
I did try using VV in Open Office some time ago, but the procedure is direct dictation, which means natural commands are not enabled. Because of this I went back to my favourite Word 97.
Quentin
Because it is open source...
As Open Office is Open Source, anyone who is technically capable can do it themselves. That is how many things get accomplished in the open source community: rather than ask the developers to do this or that, people make their own enhancements and contribute them to the community. If the change is well conceived and executed, it will often get incorporated into the code base.
I haven't worked with the DNS API, so I don't know whether a Java application would have trouble calling it. If it's implemented by a DLL, that wouldn't be too hard.
OpenOffice and DNS addendum
As Open Office is Open Source, anyone who is technically capable can do it themselves. That is how many things get accomplished in the open source community: rather than ask the developers to do this or that, people make their own enhancements and contribute them to the community. If the change is well conceived and executed, it will often get incorporated into the code base.
I haven't worked with the DNS API, so I don't know whether a Java application would have trouble calling it. If it's implemented by a DLL, that wouldn't be too hard.
DNS doesn't have an API. The only API relative to speech recognition is SAPI (Microsoft's Speech Application Program Interface). In addition, MSAA (Microsoft Active Accessibility) is what gives SR it's access to menus, commands, etc.
Unfortunately, both SAPI and MSAA are Microsoft's. The only thing that DNS does is hook MSAA via the Options dialog.
Therefore, if an applications text window does not support the standard text editor formats supported by SAPI, then all DNS can do is to apply the most commonly used text edit format. Nonetheless, you bring up a good point. If OO employs Java, DNS does not support Java and never has. I tried to get this implemented in the SDK for DNS 6. However, because of the problems surrounding Chapter 11 and the time constraints on getting the SDK out, Lernout & Hauspie R&D decided to can this and the Grammar Studio, which would have allowed users to create their own Natural Language Commands, or would have at least allowed developers to do so.
If, and I don't use OO, Java is the underlying program language, I would suspect that this is the problem. That would limit what you can do and can't do in OO. In short, this would very likely limit the extent to which you can select-and-correct, go back to and correct phrases/words (misrecognitions), as well as what happens after pausing your dictation or moving back and forth within the document. To some extent, there is an explanation of support for nonstandard Windows in the DNS online help. However, keep in mind that what is addressed in the help is subject to the type of text edit window and application uses. For example, in some applications that are not Select-and-Say enabled, you can even dictate. In others, you can dictate, but you can't correct, or you can't select text, you don't get capitalization after punctuation, etc., etc. ad infinitum. Each application that is technically "nonstandard" has its own set of issues relative to DNS, and the online help explanations only give you a basic overview. The information contained therein is not absolute gospel for all nonstandard Windows.
As I said in the previous post, OpenOffice folks have to get together with Microsoft if they want to fix this problem. Aside from the Java issue, Nuance has no control over this.
Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS
If computers get too powerful, we can organize them into a committee - that will do them in. - Bradley's Bromide
Possibilities and problems
Nuance's web site says that their SDK lets you "Develop with any language that supports Active X and COM, including C++, C# and Visual Basic." If that's not just moonshine, it can be made to work. There are ActiveX bridges for Java.
Depending on the type of license that Open Office uses, though, interfacing to proprietary software could be problematic. That might be a bigger barrier than the technology.
Perhaps it would be easier to implement an interface to Microsoft's own speech recognition feature. That might light a fire under Nuance, too.
Thanks For Your Help
Thank for all the comments. They make it that much easier to not download open office. I like using Dragon too much to be cut off from using half of its functions. Thank you for saving me the time.
Andy
For the record
I tried Dragon with Jarte and AbiWord Portable, and found that they weren't compatible, sorry to say. It's especially perplexing with Jarte, which supposedly uses WordPad somehow.
On my wanderings I found this...
Peter Maddern of Speech Empowered Computing in the UK has written a review of Dragon NaturallySpeaking and how it works (and doesn't) with the Open Office Writer software.
You can read it here:
http://speechempoweredcomputing.co.uk/Newsletter/?p=142
Thought that may help those of you looking into this.
There are more and more
There are more and more Java-only (or otherwise platform-independent) programs these days, because it is easier for the developers to support both Macintosh and Windows versions of a Java only program. It's becoming very difficult for me the NaturallySpeaking doesn't work with Java; there's only so many times I can tell my boss "I can't use that product because it's not accessible". E-mail clients, XML editors, content management systems...
If there actually were a way to make NaturallySpeaking work with Java-only or other platform-independent tools written to standard APIs, that would be worth so much more to me than slight recognition improvements.
jadelennox wrote:There are
There are more and more Java-only (or otherwise platform-independent) programs these days, because it is easier for the developers to support both Macintosh and Windows versions of a Java only program. It's becoming very difficult for me the NaturallySpeaking doesn't work with Java; there's only so many times I can tell my boss "I can't use that product because it's not accessible". E-mail clients, XML editors, content management systems...
If there actually were a way to make NaturallySpeaking work with Java-only or other platform-independent tools written to standard APIs, that would be worth so much more to me than slight recognition improvements.
Apparently I am not communicating clearly.
DNS doesn't care what application you're dictating into or what operating system you're using. DNS is a recognizer, period. By itself all it does is listen to your speech, convert it to a format that the recognizer can interpret, analyze it against your Acoustic Model, analyze it against your Language Models, and spit out the results. That's all it does.
DNS does not communicate directly with any application. That is done through SAPI. If you go to the folder with the speech files (Windows\speech), select Vdict.dll or Vcmd.dll, right-click and select properties from the pop-up menu, and take a look at who owns SAPI. Lo and behold you'll find that it happens to be Microsoft.
All communications between applications and DNS are controlled by SAPI. SAPI communicates with DNS to tell DNS what words are currently visible on the screen in a document. SAPI takes the results of DNS and pastes it into your document or the text window into which you're dictating. Microsoft maintains the standard text edit formats via SAPI that it can use to keep track of words (visible words on the screen) and is responsible for communicating the location of text in a window back to DNS. DNS hands off the results of recognition to SAPI and SAPI writes that information into your document.
Therefore, there are three problems with nonstandard Windows. Please keep in mind that this is all intended tongue-in-cheek, so please don't take offense.
1. SAPI SAPI SAPI SAPI SAPI SAPI SAPI SAPI SAPI SAPI SAPI SAPI SAPI... ad infinitum.
2. SAPI is not equal to Nuance, SAPI is not equal to DNS, SAPI is equal to Microsoft. Write that 1000 times on the blackboard.
3. Nonstandard Windows are those that do not conform to SAPI supported text editing formats. If SAPI won't support it, DNS won't support it. But it's not DNS's problem. DNS relies on SAPI. Therefore, DNS relies on Microsoft the author of SAPI. If an application text window is not SAPI compliant, it is a nonstandard window.
It's time to stop blaming DNS. It's not DNS's problem. If you want to scream bloody murder at somebody, scream at Microsoft, or scream at the application developers for not making their text window SAPI compliant. It's their problem. The only thing that Nuance, or ScanSoft and Lernout & Hauspie before them, could ever do with regard to nonstandard Windows is to try to find the best text editing format with which to communicate with an application text window, by SAPI is still the interface that does the actual communication in both directions. If none of the formats supported by SAPI work in those Windows, or work fully (i.e., Select-and-Say), then there's nothing that Nuance can do to fix the problem. Lernout & Hauspie and subsequently ScanSoft and Nuance have tried the best they can to deal with the problem in the only way possible. That is, via the dictation box.
Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS with Lernout & Hauspie
"We are all victims of mythology in one way or another. We are the inheritors, and many times the propagators, of a desire to believe what we want to believe, regardless of whether or not it is true." -- J.V. Stewart
Chuck Runquist
Apparently I am not communicating clearly.
I have the same problem. I've always thought I spoke and wrote English. But anymore I'm not sure what language I write and speak.
Welcome to the club.
Forgive me, Chuck, but I
Forgive me, Chuck, but I think you missed my point. I didn't "blame" naturally speaking for anything, and I certainly didn't "scream bloody murder", I stated a desire for an added feature. I am a disabled hands-free programmer who would like to be able to use applications in the growing pool of platform-independent tools without directing them through a series of mousegrid commands. I'm not sure how "If there actually were a way to make NaturallySpeaking work with Java-only or other platform-independent tools written to standard APIs, that would be worth so much more to me than slight recognition improvements" counts as "blaming DNS". It reads to me as "an end-user of the $900 Professional product through every version of its existence expresses a desire for new features in a discussion about the tool's limitations". My statement of desired features is not attacking the former or current developers; it is a statement of desired features.
Nonstandard Windows are those that do not conform to SAPI supported text editing formats. If SAPI won't support it, DNS won't support it. But it's not DNS's problem. DNS relies on SAPI. Therefore, DNS relies on Microsoft the author of SAPI. If an application text window is not SAPI compliant, it is a nonstandard window.
The technical explanation you give here is useful, and I thank you for it. However, I disagree with you that it's not DNS's problem. Or rather, it is my problem, as a DNS user, and therefore it would be DNS's problem is a software product if I had any available alternatives that could serve my needs as a hands-free programmer. As I don't, I suppose it's true that it's not Nuance's problem.
However, the fact that, given the growing popularity of Java-based tools, Nuance's take is still apparently to work only with SAPI with no planning for future architectural changes, well, defining the end-user applications as the source of the problem there is semantics, isn't it? NaturallySpeaking works only with SAPI by design -- which is a legitimate design choice, as much as it bites me in the ass on a daily basis, and as much as I wish the product would do some development with the Java Speech API. But defining platform-independent applications as nonstandard (when in fact they are just conforming to a set of platform-independent standards, instead of Microsoft-specific standards), or saying we should scream bloody murder at them, that's not right either.
What's going on here is that the developers of naturally speaking made a design choice in the past which has caused ripples in a changing technology environment (more platform-independent tools, less Windows ubiquity, etc.). Until there is a competitor to NaturallySpeaking with actual teeth, nobody who owns the product will have any vested interest in fixing that underlying architectural limitation, and those of us who are stuck with dictation will have to live with that design choice. Nobody is evil, nobody needs to get screamed at, nobody needs to get blamed.
But it's everybody's problem. Mine, DNS's, Nuance's.
jadelennox wrote: Until
Until there is a competitor to NaturallySpeaking with actual teeth, nobody who owns the product will have any vested interest in fixing that underlying architectural limitation, and those of us who are stuck with dictation will have to live with that design choice. Nobody is evil, nobody needs to get screamed at, nobody needs to get blamed.
But it's everybody's problem. Mine, DNS's, Nuance's.
Ah! There's the catch 22! There is a competitor with the best dentures money can buy, viz., M$ itself! So many competing software makers avoid M$' apparatuses for the obvious reason that they're afraid the world's biggest OS and software company will steal their thunder to get bigger by manipulating the rules (i.e., the OS) to make their versions of the apps work better. As a user I get hammered too because so many of the apps I work with are JAVA, or at least non-M$, based. For now, Nuance gets away with coat-tailing M$ because it has a superior product. Whether it would try to develop a JAVA-compatible base is moot, but note that instead of entering the MAC market it simply licensed its products. I don't see a likely resolution of the impasse any time soon.
I appreciate Chuck's earnest efforts and enjoy even more your nimble, spot-on rejoinders! This has been a lively, informative topic on a critical SR issue.
Bruce
Cheering for the gorilla
On one hand, I have observed that it usually takes Microsoft three tries to get a new technology right. I'm not sure whether their current product should be considered the first try or the second, but I will await the third with interest.
In this case I would cheer for the 800 pound gorilla. I generally hate to see a big monopoly get bigger, but Nuance's indifference to its customers' needs out-Microsofts Microsoft.
On the other hand, I have read predictions that the next generation of computing technology will use the Web as its "operating system," i.e., the hardware and OS you run won't make a darn bit of difference to the applications you use and how you use them. So far I don't see it happening much. If it does come about, though, it will throw a major monkey wrench into Nuance's product strategy, as well as Microsoft's.
The first generation of Web
The first generation of Web 2.0 is certainly throwing a wrench into hands-free computing. I don't know how many times I've had to explain to appalled looking friends and colleagues that no, I will not share documents with them on Google Docs, because I can't. It's possible to write Web 2.0 applications that are accessible to NaturallySpeaking, but many application developers don't bother, and the more complex the application, the more difficult it is.
As much as I gripe about the extra features and integrations I wish NaturallySpeaking had, its tight integration with the operating system and applications is almost certainly what's made it cost effective, and NaturallySpeaking is the only reason that I manage to be employable. I have no idea what's going to happen to us hands-free types in Web 3.0, 4.0, &c..
With the imminent change of
With the imminent change of adminstration, now is the time to begin thinking strategically.
What might help? One solution would an extension of the ADA to require all software sold in interstate commerce to include a set of programming standards that would facilitate assistive software to control the primary software.
Such a specification should focus on low-level standards. The makers of, say, SR and vision-assistive softwares should specify what kinds of low-level controls they need to hook into all softwares. The software makers would then be responsible for providing those hooks regardless of their programming basis, be it JAVA, M$, etc.
That would enable the assistive software makers to write basically to one, standard interface.
Perhaps the law should be "recommended" for a period of three to five years to permit mutual adjustments, and then "required" with genuine penalties like easy legal redress on behalf of anyone who suffers from an inoperative or inadequate interface module. Or perhaps there could be a schedules of "tariffs" on softwares that don't meet the requirements, like a fixed percentage of the purchase price -- the maker could recapture some of these tariffs the more quickly and more completely he remedies the defects.
I offer this as a proposal for consideration and discussion. Obviously it might need adjustments, and its necessarily vague about the nature of the "interface modules", but it should possible to flesh them out in a practical manner by programming experts.
Obviously we need to get our act together in order to promulgate this approach to other interested parties.
BTW, I don't necessarily anticipate software makers to be wholly hostile to this approach -- if the approach is drafted reasonably, implemented gradually, and enforced mandatorily so that no one software makers bears special costs that would hurt him competitively, these people might actually welcome an approach that will help to increase sales.
Bruce
Some odd observations
I know your qualifications, so I do not question your explanation, but I'm puzzled about how to reconcile it with some of the things I've observed as a Dragon user.
The bottom line is: Dragon behaves differently with different applications, in numerous and sometimes gross ways.
I'll give three examples.
First, it recognizes text a lot better in Word than in any other application I have used it with. I can't quantify this, but it's unmistakable. There's a whole level of stupidity to the "speakos" in other applications that just doesn't occur in Word.
Second, its ability to distinguish commands from text is application-dependent. Specifically, "new paragraph" is often interpreted as text in Eudora, but nowhere else. This is not a matter of words being spoken without a pause, so that they are mixed with the words before and after, or being spoken too slowly, so that they are interpreted separately. I've looked at the recognition history, and it proves that.
Third, with Firefox 2 (I haven't tried 3 yet), Dragon sometimes loses the ability to recognize speech completely. Sometimes anything I say produces "<???>" in the text balloon, and nothing on the page. If I switch to another application and say the same thing, no problem. Then, if I switch back to Firefox and say it, no problem. This happens in Firefox perhaps 10% of the time, and I have never, ever had it happen elsewhere.
Can you give me any insight into what is happening here? I simply can't reconcile this behavior with the mode of operation you've described.
How do Dragon NaturallySpeaking and SAPI work together
It occurred to me that I left out a point in the previous post. Therefore, to clear up a point, here is how DNS works with applications and SAPI.
1. You dictate something.
2. Through a series of algorithms your speeches converted from analog to digital, from digital input to a voice pattern, from the voice pattern to a phonetic equivalent utilizing the Acoustic Model and the vocabulary (i.e., the underlying IPA -- International Phonetic Alphabet for the specific language vocabulary being used).
3. If you watch the results box very carefully you will see how the transition progresses from the Acoustic Model representation of your dictation (speech) through the comparison to the Language Models for context. In other words, the first representation is the Acoustic Model interpretation of your dictation (speech) based on your enunciation of words derived from either the speaker independent Acoustic Model or your general training (initial training).
4. Once the recognizer has gone through the entire process and shows in the "BestMatch III" top selection match, natspeak.exe, which is DNS's linking loader and not DNS itself because DNS is MREC.dll (the recognizer), sends the end result to SAPI, which then attempts to write the text to the application window. All the other surrounding DLLs and ActiveX controls/COM programs are simply auxiliary functions which natspeak.exe uses to process user requests. This is why there is a delay between the time that you pause and when the text is displayed in the application window.
Keep in mind that no application performs any operation that is controlled by Windows. When you type on the keyboard, Microsoft Word does not insert the keystrokes correctly into your document. The keystrokes are passed to (Oh my God!) COMMAND.COM -- YES MSDOS. COMMAND.COM contains all of the communications protocols required to convert keystrokes into characters. This is called the console, or con for short. The results are then passed back to Windows which displays the results in your document. This is the way it has been done since the beginning of DOS. So, when you talk about applications interfacing with other applications, they only do that through Windows, of which COMMAND.COM and MS-DOS.sys are still a part.
Also, note that the SAPI speech file, even though they are located in their own separate folder, are basically system files. In SAPI 5.3, which Microsoft uses in its WSR in Windows Vista, the SAPI files all written to the Windows system folder(s) (i.e., system32, etc.). The reason that SAPI 4 is separated out in its own set of speech files and folder is because SAPI 4 was originally a system component of Windows 98/98 SE. Microsoft installed SAPI 4.0 when you installed Windows 98/98 SE. At that point in time, Microsoft was not involved in creating its own speech recognition application. SAPI was also not installed in Windows 2000, which came out after Windows 98 SE. Nevertheless, when Windows XP was first introduced, it was introduced in conjunction with Microsoft Office XP, which was Microsoft's inroad into the speech recognition arena, and also why DNS users have problems with ctfmon.exe. Ctfmon.exe is installed automatically every time a user installs a version of Microsoft Office from Office XP up through Office 2007. This occurs regardless of whether or not the user installs the "Alternate Input" option in Microsoft Office.
Nevertheless, the above four basic functions are independent of the operating system, as well as application independent.
As far as communicating with an application, when you load Dragon NaturallySpeaking and activate a user, and then open an application, natspeak.exe, through its various functions as a linking loader, sends a query to SAPI and essentially asks SAPI if there is a text window (document window or corresponding text field) and if so what format is a using. If SAPI in turn says, "Hey I got it. I recognize it as being one of my supported text edit formats.", then DNS applies the appropriate format for dictating into the document or text field (for the sake of common jargon, will call it a text object), and identifies it as Select-and-Say enabled. However, if SAPI says <???>, or in essence "I have no idea, then DNS applies the most logical and most commonly used text edit format. However, DNS identifies such text objects as nonstandard windows. And, if that text object is capable of certain types of SAPI communications, then some or all of the features that are normally enabled with Select-and-Say may or may not work. Note also that Select-and-Say is not simply allowing DNS to communicate with an application via SAPI, Select-and-Say also enables Natural Language Commands in a particular application. This is what the case is with dgnword.dll and dgnoutlook.dll. That is, these two particular DLLs are not simply for the purpose of communicating text back and forth via SAPI, they are also and primarily used for enabling Natural Language Commands for those applications that DNS supports using what are called constrained grammars (i.e., only applicable to that application vs. global commands). DNS does this through a number of low-level SAPI calls, some of which are part of dnsapi (SDK low-level SAPI interface functions) and some of which are controlled directly through SAPI itself. Regardless, the primary SAPI DLLs that are used to communicate back and forth between DNS and an application text object are, as noted in a previous post, Vdict.dll and Vcmd.dll. Vdict handles dictation issues, while Vcmd.dll handles the communication and execution of commands (MSAA -- Microsoft Active Accessibility, or as DNS notes that Active Accessibility, as well as scripting commands).
The point here is that DNS does not connect directly with an application text object except via SAPI. Delete the SAPI files and you'll find out real quick that DNS crashes with the error message that the speech files are missing. I'm sure that everyone has seen either a post on this problem or experienced it themselves wherein the user has to download and reinstall the spchapi.exe. This also occurs if the wrong version of the speech files are installed. Nonetheless, no SAPI no DNS.
Lastly, what capabilities are or are not accessible and/or functional with regard to dictation into nonstandard windows is determined exclusively by whether or not DNS's calls to SAPI and/or SAPI's response to such are available in, for lack of a better term, intelligible. For example, you try to do correction in a nonstandard window and the text is selected, or the correction that is returned, are not in the right place. You go to dictate into a particular text object, and you get nothing even though the results box displays your text. Or, you get <???> And no matter what you do you cannot get the text displayed in the text object window. Also, pausing and then continuing to dictate results inplaced between the last utterance and the new utterance, coupled with no capitalization after terminal punctuation when you move from one point in a document to another to insert text, etc. and so on ad infinitum.
This is not a technical explanation as much as it is an attempt to make the process understood as simply as possible. So, you programmers out there don't grill me as to whether this occurs are that occurs, or not doesn't make sense. The explanation is an explanation of the process, not the methodology or the technical details. If I've overlooked something in attempt to keep it as simple as possible, I apologize. Nevertheless, this is basically how it works and where the necessary "fixes" must be made in order to make a nonstandard window work properly with DNS. On the other hand, yes, it may be possible for the DNS programmers to figure out a way around this problem. The chief Belgian engineer for Lernout & Hauspie was adamant that he could make L&H VoiceExpress work with Java, but he's long gone. Nonetheless, possible or otherwise, it won't be easy and it won't be soon, if at all.
Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS
"Facts do not cease to exist because they are ignored." -- Aldous Huxley