Registering and logging in removes this ad.
Registering and logging in removes this ad.
How to Set Up A System to Transcribe Comments By a Small Number of Speakers
Submitted by Joeblake1 on Tue, 08/30/2005 - 01:23.
Here's how I've "remotely trained" my DNS to a speaker who doesn't even know they're doing it (ie training the Dragon).
Firstly, some more caveats: or the same ones in different words. I mentioned that it could be "fiddly" because it depends on which version of DNS you're using. Whilst I have V6 and V7, and the V7 upgrade, I reverted back to V5 because it was faster and easier. Secondly, there are three parts, the first part, creation the training document is reasonably straight forward, the second part is probably the most difficult, the third is easy, but initially time consuming.
(As a court reporter - or more accurately I suppose an audio transcription WPO - I work in an industry which seems to be trying to eliminate people from the chain and go entirely computer, but in trying to "train" a dragon "remotely" I have seen that it will be a long term project which in won't succeed until there has been a big improvement in creation of artificial intelligence.)
Trying to find which file to create and/or amend is very much a "suck it and see" process, so (a) be prepared to take time and (b) back up any file you think you are going to amend. In fact I'd suggest you copy the whole directory onto a floppy or similar. They aren't too big.
Anyway, let's take a deep breath.
Step one: Creation of the initial training document.
When you create a user you have to train DNS by reading a prepared piece of text, usually something like an extract from "2001:A Space Odyssey" or "How to talk to your computer". I have to confess I've no longer got any working version of 6 or 7 on any of my computers so I can't remember where the files are hidden. However, in V5 they are in the folder Dragon/NaturallySpeaking/Training/Enx. (NOTE: I have a suspicion that the name of the "Enx" directory may be linked to a language chosen upon initially setting up DNS, but I can't be sure of that.) You will should see a series of files with names like "data1.bin", and some others labelled "enroll1.bin" "enroll2.bin" etc.
When you are asked to initially train a user you will be offered a short list of about 4 or 5 pieces to choose from. Make a note of which pieces you are being offered (or at the very least the number of pieces). The files marked with a ".bin" extension are essentially ASCII files. Look at each "enroll" file and see which one matches the initial list of documents offered.
Some of the other "enroll" files contain more files which can be used for advanced training.
For this exercise I'll assume it's "enroll4.bin". It may vary on your system. When you open this file you should see something like this:
Quote:
Kennedy's Inaugural Address (Medium Reading: Historical Speech)
data34.bin
Dave Barry in Cyberspace (Medium Reading: Humor)
data2.bin
Dogbert's Top Secret Management Handbook (Harder Reading: Humor)
data9.bin
3001: The Final Odyssey (Harder Reading: Science Fiction)
data0.bin
2001: A Space Odyssey, Chapter 16 (Harder Reading: Science Fiction)
data1.bin
Stories Written by Children (Reading for Children)
data35.bin
Note that each title ends with a hard return, followed by the "data*.bin" number.
NOTE: Again, back up the file before make any changes.
Obviously, the best way would be to print out one of the training documents and give it to your client and ask him/her to read it out loud in a clear voice, but at his/her natural pace and intonation, pausing between paragraphs. If you have a choice in this matter, make sure you use the best possible equipment, mic, taperecorder (NOT the office handheld dictation machine), or if you're using digital recording, get the highest sampling rate your program offers (See comment below about what DNS "transcribe" function requires to work properly). In other words, get the best possible recording.
Alternatively, it is possible to take an existing recording of a speaker (eg Mr John Brown), preferably if they are doing something which is regular and clear, such as reading a book or dictating a letter, transcribe the text manually, correct it and save is an ASCII file, but with a "data**.bin" number next in the sequence. However BEFORE saving this file break it up into small paragraphs with two hard returns, such as you'll see in the extract.
*****
"The Great Glass Elevator was a thousand feet up and cruising nicely. The sky was a brilliant blue. Everybody on board was wildly excited at the thought of going to live in the famous Chocolate Factory.
Grandpa Joe was singing. Charlie was jumping up and down. Mr. and Mrs. Bucket were smiling for the first time in years, and the three old ones in the bed were grinning at one another with pink toothless gums.
"What in the world keeps this thing up in the air?" croaked Grandma Josephine.
"Sky hooks," said Mr. Wonka.
"You amaze me," said Grandma Josephine.
******
Each time you do two hard returns, that determines how much text will appear on the screen when you are training the program.
Having saved the transcript as an ASCII file with a .bin extension (eg data36.bin), open the "enroll4.bin" file, and scroll down until you find the appropriate place to insert the title you wish to have this training extract , with the next "data.bin" number. So if the highest data.bin number is 35, as in the extract above, just type
Quote:
2001: A Space Odyssey, Chapter 16 (Harder Reading: Science Fiction)
data1.bin
Stories Written by Children (Reading for Children)
data35.bin
Remote Training for Mr John Brown
data36.bin
Save this file as an ASCII file, but KEEP the ".bin" extension.
You can then open your DNS and create a new user. When you come to the bit you are asked to select the piece for reading, you should see your new training file for Mr John Brown.
What you should see on your screen is the first paragraph of your transcript from this recording.
THIS is when the problems can start, because you'll have to work out a way of playing your recording into the Dragon, without using the "transcribe" function. Because there are so many different ways of doing not (not all guaranteed successful) I can't nominate any preferred one.
(1) You may try putting your microphone in front of one of the speakers of your play back system, but this is when the previous comments in other posts about loss of fidelity come in to play.
(Good luck.)
(2) I sometimes have to copy a cassette-based oral history interview into my computer as a digital recording because my clients would like it burned onto a CD, so I've purchased a cable whereby I can take a direct feed from the headphone output on my hi fi tape player (not the transcribing machine, because they are generally giving a playback of poorer quality) and the line-in socket on my computer. Or better still, if you have it, a USB pod. I have an Andrea which gives excellent results. I'm sorry I can't do more than just outline the solution, because its success depends on so many variables, type of sockets on the tape player is but one. (The hi fi tape player has the "big" headphone socket so you need to buy an adaptor for the cable).
Having initialised a new user for DNS, and selected the appropriate reading piece, what you have to do then is just play back the recording to keep "in synch" with the paragraphs on the screen.
(I should point out that you can use the same method to train up a user with proclivity towards a particular topic (eg if you were doing a lot of medical transcription, you could create a training document with lots of medical terminology) and use your own voice to train a "medical" user.)
Step 3.
NOTE: In order for DNS to transcribe from an existing .wav file, it has to have the following settings: 11.025 KHz, 16 Bit Mono. This will be a setting that you have to make when creating the sound file, so I'll not go too deeply into that as it will be infinitely variable, depending on HOW you create the sound file.)
If you haven't used DNS, there is a function called "transcribe" which you can use to have an existing sound file transcribed automatically. The steps are fairly self explanatory, and in the manual. (NOTE: Something in my mind tells me that in installing DNS there was a choice somewhere I had to make to enable the "transcribe" function. Perhaps someone could joggle my memory on this. It's been a year since I last installed DNS on a computer.)
Remember of course that this is a BASICALLY trained user in action here. Once you get down to actually transcribing a recording, remember to do all your "Dragon" speech file corrections in the same session. If you close the document before making any corrections, you will wipe out the sound files DNS used and you won't be able to increase the accuracy.
I'd suggest if you are transcribing a third party using the method described, do it in short bursts. Play the sound file, transcribe it, "rewind", correct. That way you increase the accuracy of the DNS for the next burst.
NOTE: BIG LETTERS. I should put this first (or rather you should do this first) but since I'm just describing the method I've used, I'll leave it till last and say practice all this sort of stuff with your own voice first. It's fiddly, but once you've mastered the transcription function, it is easy to dictate a document straight through (I usually dictate straight into the computer), plug the sound file into DNS "transcribe", go and make a cup of coffee or do something else, and then come back to the completed document, do my normal corrections, and it's finished.
***********
Have said all that (phew) I'd have to say that I find it much easier, much more accurate and generally much faster to "shadow speak" ie I listen to the tape, speak into DNS repeating what's just been said, insert any punctuation and formatting "on the fly", make any "DNS corrections" ie speech file corrections at the time, ie as soon as they appear, then go back when it's all finished and 100% sound check. I've found that doing it that way (a) the DNS accuracy is way up, probably 99% or better (and I must say the errors are almost entirely my fault - sloppy speaking) and (b) if the recording is good enough, eg a lawyer making a final submission in summing up their case, I can set the playback speed at faster than normal, so I am actually transcribing in better than real time (albeit in short bursts, because I still have to stop and draw breath, make corrections, have a drink etc), because unlike a tape recording or even a digital recording, when I speed up my speech rate, there's no distortion so the Dragon can handle it with ease.
Anyway, I hope that hasn't caused you to run away screaming. If anything causes a problem please let me know and I'll do my best to close the can of worms I've opened.
Have fun
Joe
Originally posted here: http://www.speechcomputing.com/node/311?from=20&co...
- BruceCyr's blog
- Login or register to post comments



I wonder if anyone has heard
I wonder if anyone has heard anything about this adaptation of Dragon, or seen it in action. I think it's trying to do the same thing as your scheme Joe/Bruce (?), but of course your solution is probably a lot cheaper.
http://www.tmcnet.com/channels/speech-recognition/speech-recognition-articles/voice-perfect-naturallyspeaking-scansoft.htm
Andy, I can't say I've ever
Andy,
I can't say I've ever heard of it, but it certainly would seem to be a practical way of approaching the problem, particularly giving each speaker their own headset. I suppose if I were doing it, I might have a separate computer for each person. I've often wished the courts could have a system of each speaker with their own headset mic. Would make my job a lot easier.
Joe