Thank you for your reply. Unfortunately, Microsoft solution for speech recognition is not a useful solution. The recognition accuracy is significantly lower than NaturallySpeaking, it doesn't have the basic tools that we have through natlink and is generally seen as an also-ran product by disabled users. The Cortana solution is best described as not even in the same universe as what a disabled person needs for an accessibility interface. We've been launching applications by speech since 1985 with word spotting command and control systems. It is sad but that kind of thinking has kept handicap accessibility in general and speech recognition specifically in the stone ages as compared to modern user interfaces.
The tl;dr form of a speech friendly interface is one where the speeches interface can interact with and analyze the contents of an application in order to be able to create and modify grammars based on the context. Scraping a GUI is losing proposition because by the time information has been decimated for presentation, you lose essential information necessary to produce a complete and useful grammar.
Speech user interfaces are wide and shallow with poor discoverability. Graphical user interfaces are narrow but deep with moderate discoverability. Just as you would not want to model a GUI on a speech user interface, you do not want to model speech interface on a GUI.
In summary, a basic accessibility friendly interface for a user is not what you do with text to speech or speech recognition or even a graphical display, it's the tool that uses an API which presents information necessary to create an appropriate interface. It's the tools let you can automate functions provided by an application in order to minimize the number of steps to accomplish a task. The notation used to automate also affects the accessibility of the system because if the language is not something can be easily used from the accessibility interface, then the user becomes dependent on somebody else to create the interface for them which is a horrible thing to do to someone.
Take a look at this speech interface I created here. I apologize for the rough cut nature of the demonstration but it was an on-the-fly thing I captured to demonstrate where I was to a few people in the programming by speech community.
http://www.youtube.com/watch?v=nIz53GDNtYo
is a very rough start at what I'm working on for programming by speech. Everything speakable relatively efficiently and easily. By identifying markers in the result of the workflow, one can then add a little bit of workflow to transform it from the speech friendly form to the code friendly form. You may not understand this but try to write C sharp code by speech for a day without a keyboard or mouse on your system and you will see what I mean.
Interestingly, I have been coming to the conclusion that the best UI for speech interface is a very simple grammar structure within a text region supporting the grammar for hinting and nothing more. All the buttons and checkboxes and widgets and flashy they-make-my-hands-hurt-worse screen junk actually get in the way of building a speech friendly or application interface.
An example of this is the speech friendly tempering system I'm experimenting with. Right now my solution counts on features of Emacs and since Emacs itself is becoming less and less important to the function of a speech driven programming environment, I was looking for alternative environments that could work with NaturallySpeaking Select-and-Say.
So if you would like to help disabled people, don't try to build the user interface for them. Instead, give us or more specifically me, the ability to do a few simple things. Make the edit control you use for this code writer application work with NaturallySpeaking. Let the Select-and-Say operation work as it was intended. You're almost there. I can dictate directly but not correct. The next, provide the ability to manipulate the text in the buffer by a relatively simple API, not something overly complicated like Microsoft API.
For example, the toggle name counts on the ability to select a region by name (i.e. class, method, statement, line, expression, region). It counts on the ability to mark where the cursor is, the ability to extract the region, pass it through a filter and then replace the contents of the region with the output of the filter. At the same time, the filter will be extract information to improve the grammar for subsequent work.
The requirements for supporting toggle name are a great example of how a speech user interface has almost no overlap with what's in a keyboard driven interface. It is clumsy and awkward to navigate by character, by word, by paragraph. In fact, it's almost completely unnecessary. However, at least in many applications other than writing, the ability to identify something by an abstract name and operate on it is far more valuable for speech and completely unnecessary for keyboard.
Thank you for reading this far.
--- eric
[1] Python because it's the only language I can drive with speech recognition without damaging my voice. Most other languages have way too many extra punctuation characters that take a lot to say to get a little result. My programming by speech model should eliminate much of that problem.