Practical Tips for Developers, and How Mozilla Does It
This document is for people working to support MSAA in an application in order to make it accessible with 3rd party assistive technologies, as well as for hackers wishing to be involved in Mozilla's MSAA support specifically.
You may also wish to read Gecko Info for Windows Accessibility Vendors, a primer for vendors of 3rd party accessibility software, on how MSAA clients can utilize Gecko's MSAA support.
3. MSAA's Quirks and Workarounds
Hacky caret tracking not working
Event window confusion
Confusion with system-generated events
No unique child ID for object in window
Not all MSAA features utilized by 3rd party vendors
Missing functionality in MSAA
Dueling text equivalents
Issues with Links
Differing client implementations
Undocumented Window Class Usage
1. Intro: What is MSAA?
MSAA is the Microsoft Active Accessibility (MSAA) API , used on Windows operating systems. to support assistive technologies for users with disabilities.
Third party assistive technology, such as screen readers, screen magnifiers and voice input software, want to track what's happening inside Mozilla. They needs to know about focus changes and other events, and it needs to know what objects are contained in the current document or dialog box. Using this information, a screen reader will speak out loud important changes to the document or UI, and allow the user to track where they navigate. The screen reader user can navigate the web page using screen reader commands or browser commands, and the two pieces of software must remain in sync. Some screen readers can even show information on a refreshable braille display. Screen magnifiers will zoom to the focus, keeping it on the screen at all times, or even allow the user to enter a special low vision document reading mode, with a variety of features such as ticker mode where text is streamed on a single line. Finally, voice dictation software needs to know what's in the current document or UI in order to implement "say what you see" kinds of features.
On Microsoft Windows, these kinds of assistive technology acquire this necessary information via a combination of hacks, MSAA and proprietary DOMs. MSAA is supposed to be the "right way" for accessibility aids to get information, but sometimes the hacks are more effective. For example, screen readers look for screen draws of a vertical blinking line, to determine the location of the caret. Without doing this, screen readers would not be able to let the user know where there caret has moved to in most programs, because so many applications do not use the system caret (Gecko does not). This is so commonly done, that no one even bothers to support the MSAA caret, after all the hack is general solution works with pretty much all applications.
MSAA provides information in several different ways:
- A COM interface (IAccessible) that allows applications to expose the tree of data nodes that make up each window in the user interface currently being interacted with and
- Custom interface extensions via interfaces via QueryInterface and QueryService. This can provide assistive technology with contextual information specific to your object model. For example, Gecko support ISimpleDOMNode to provide information about the DOM node for an accessible object.
- A set of system messages that confer accessibility-related events such as focus changes, changes to document content and state changes in UI objects like checkboxes.
To really learn about MSAA, you need to download the entire MSAA SDK. Without downloading the SDK, you won't get the extremely useful tools, which help a great deal in the learning process. The Accessible Event Watcher shows what accessible events are being generated by a given piece of software. The Accessible Explorer and Inspect Object tools show the tree of data nodes the Accessible object is exposing through COM, and what the screen boundaries of each object are. In addition, MSDN has improved their MSAA documentation.
2. Deciding Which MSAA Features to Support
MSAA Methods - Cheat Sheet for Developers
- get_accParent: Get the parent of an IAccessible. [important]
- get_accChildCount: Get the number of children of an IAccessible. [important]
- get_accChild: Get the child of an IAccessible. [important]
- get_accName: Get the "name" of the IAccessible, for example the name of a button, checkbox or menu item. [important]
- get_accValue: Get the "value" of the IAccessible, for example a number in a slider, a URL for a link, the text a user entered in a field. [important]
- get_accDescription: Get a long description of the current IAccessible. This is not really too useful.
- get_accRole: Get an enumerated value representing what this IAccessible is used for, for example. is it a link, static text, editable text, a checkbox, or a table cell, etc. [important]
- get_accState: a 32 bit field representing possible on/off states, such as focused, focusable, selected, selectable, visible, protected (for passwords), checked, etc. [important]
- get_accHelp: Get context sensitive help for the IAccessible.
- get_accHelpTopic: We don't use this, it's only if the Windows help system is used.
- get_accKeyboardShortcut: What is the keyboard shortcut for this IAccessible (underlined alt+combo mnemonic)
- get_accFocus: Which child is focused? [important]
- get_accSelection: Which children of this item are selected?
- get_accDefaultAction: Get a description or name of the default action for this component, such as "jump" for links.
- accSelect: Select the item associated with this IAccessible. [important]
- accLocation: Get the x,y coordinates, and the height and width of this IAccessible node. [important]
- accNavigate: Navigate to the first/last child, previous/next sibling, up, down, left or right from this IAccessible. [important, but no need to implement up/down/left/right]
- accHitTest: Find out what IAccessible exists and a specific coordinate.
- accDoDefaultAction: Perform the action described by get_accDefaultAction.
- put_accName: Change the name.
- put_accValue: Change the value.
The IAccessible interface is used in a tree of IAccessible's, each one representing a data node, similar to a DOM.
Here are the methods supported in IAccessible - a minimal implementation would contain those marked "[important]" :
MSAA Events Cheat Sheet
For information on what each event does, see the MSDN Event Constants page.
Check with your assistive technology partners to find out what events you need to support. There's a very good chance they won't ask for more than the events marked [important]:
EVENT_SYSTEM_SCROLLINGEND [possibly important, talk to AT vendor]
|EVENT_OBJECT_CREATE [don't implement, watching system generated versions of this event causes assistive technology crashes] |
EVENT_OBJECT_DESTROY [don't implement, watching system generated versions of this event causes assistive technology crashes]
EVENT_OBJECT_REORDER [important for mutating docs in future, but not yet]
EVENT_OBJECT_STATECHANGE [important for checkboxes and radio buttons]
MSAA States Cheat Sheet
For information on what each state does, see the MSDN State Constants page.
Check with your assistive technology partners to find out what states you need to support. There's a very good chance they won't ask for more than the states marked [important]:
|STATE_UNAVAILABLE [important] |
|STATE_OFFSCREEN [important] |
MSAA Roles Cheat Sheet
For information on what each role does, see the MSDN Role Constants page.
Check with your assistive technology partners to find out what roles you need to support. There's a very good chance they won't ask for more than the roles marked [important]:
There is no need to support the objects marked [inserted by system]. Windows will add those objects to your hierarchy for you.
|ROLE_TITLEBAR [inserted by system] |
ROLE_MENUBAR [important if you don't use native menus]
ROLE_WINDOW [inserted by system]
|ROLE_LIST [important] |
MSAA Object Identifiers Cheat Sheet
For information on what each object identifier does, see the MSDN Object Identifiers Constants page.
OBJID_NATIVEOM [important? might be useful for supporting custom interfaces, need to research]
3. Dealing with the Quirks of MSAA
MSAA has a well deseved reputation for quirkiness. It is not "plug and play", and will take a lot of testing/refinement before your solution works with any product. Here are some of its quirks and some solutions/workarounds:
Problem: Many of MSAA's crash occur because more than one process is refcounting the same objects, and because pointers are being shared between processes. When your application closes, different signals are typically broadcast. For example, the application window closes and the window is blurred. It is impossible to know if and when the 3rd party assistive technology will use one of these signals to release the objects of yours that is is refcounting. This can lead to crashes where it releases something and the wrong time, when some of your dll's are unloaded but not others, and a destructor is called in an unloaded DLL.
Solution: Create a "shutdown" method for each internal accessible object, to remove any references to other internal objects before any of your dll's are unloaded. In order to do this effectively, you will have to keep track of every accessible object that you create. The shutdown method for an accessibility object should be called whenever the document or UI object it refers to goes away. The easiest way to do that is to keep a pointer to an accessible in each internal UI object. If that pointer is non-null, then there is an accessible object for it. Whenever the UI object is destroyed, shutdown its accessible object as well. In Gecko/Mozilla we are not allowed to keep this extra pointer for each accessible object, so when accessibility is turned on we use a hash table to cache these objects. Such a cache must be kept in perfect sync with the tree of UI and document objects, which is difficult. Therefore, unless 4 bytes extra on each object is criticial in your application, just keep the extra pointer around instead of using a hash table.
Also, don't implement EVENT_OBJECT_CREATE or EVENT_OBJECT_DESTROY. Vendors have found that watching these events causes crashes.
Problem: Assistive technologies do not use the MSAA caret. They follow screen draws, looking for a vertical blinking line. Unfortunately, some products can get confused by the vertical lines on other objects, such as list boxes, even though those lines are not blinking. The assistive technology may not see your caret at all.
Solution: Make sure there is a configuration file for each assistive technology specific to your application. Read the manual or help, and find the keystroke or commands for training the caret, and save this information in the configuration file. Don't support the MSAA caret, none of the vendors use it.
Solution: This may be because you are reporting that the events in a different window from the current system focused. The assistive technology may be asking GetGUIThreadInfo for its hwndFocus, and throwing away MSAA events that are not in the currently focused window. Even if you are visibly showing window focus on the correct window, you must also tell the operating system to focus this window before any other accessibility events get fired in it.
<big>Confusion with system-generated events</big>
Solution: When an object is about to get focused in a different window, make sure you focus a window before you fire your own focus events for objects inside it. Test using Accessible Event Watcher in the MSAA SDK, and use the settings panel to watch subsets of accessibility events. Count on the assistive technology to make sense out the jumble of extra system-generated events, it's not your problem.
<big>No unique child ID for event target in window</big>
Solution: In Gecko/Mozilla, we did not want to store an extra 32 bit unique ID value on every object. Instead, we hand back a 32 bit value derived from the UI object's pointer, which is unique. We ensure that the value we hand back is always negative. When the get_accChild call comes back, we check our hash table cache for that window to see if there's an accessible object still associated with that unique value. This means the client must use AccessibleObjectFromEvent immediately, because there is a danger that the object will go away, and another different object will be created with the same pointer value.That case seems extremely remote, because information from events is generally retrieved right after the event.
If you're not using a hash table to keep track of unique ID's, store the child ID's and objects for the last 50 or so events in a circular array. In practice, this is enough to keep AccessibleObjectFromEvent() happy.
<big>Not all MSAA features utilized by 3rd party vendors</big>
Solution: Use this document to see what is generally considered important by assistive technology manufacturers. Contact the the top vendors early and often as you plan and implement your architecture, to see what's important to them. Implement only what's needed -- supporting everything would take too long for zero results.
<big>Missing functionality in MSAA</big>
- No way of signifying that a document has finished loading. Fire EVENT_OBJECT_STATECHANGE for a window/client/pane object when it starts to load a new document. Use STATE_BUSY to indicate that a new document is being loaded. When the loading has finished, fire another EVENT_OBJECT_STATECHANGE event and clear the STATE_BUSY flag.
- No method to get clipped/unclipped bounds of a piece of text within a text object. This is needed by screen magnifiers. No scrollTo method, also needed by screen magnifiers. Implement a custom interface for text objects, and support it through QueryInterface or QueryService if it's being implemented on a different object than IAccessible is. Support a scrollTo method which takes a text index, and a getClippedBounds and getUnclippedBounds method which takes a start and end index. Publish your custom interface.
- No way for assistive technology to know when scrolling has stopped. Fire the EVENT_SYSTEM_SCROLLINGEND event to indicate when scrolling has ended (try not to fire too many of these, wait until scrolling has truly stopped). There is no need to support EVENT_SYSTEM_SCROLLINGSTART, it is not used by assistive technology.
- No support for document formatting or "DOM" as requested by some vendors: support a custom interface that gives them the formatting information they are requesting.
Solution: Be as consistent with Internet Explorer as possible. Use accessible name for most text equivalents, and accessible value for URL's. Don't use accessible description unless you really do have a long description for the object you need to expose -- most assistive technology makes little use of it. Use ROLE_STATICTEXT for labels specific to dialog and UI controls, and always use ROLE_TEXT for document text even if the text is not editable (in that case use ROLE_TEXT with STATE_READONLY).
<big>Issues with Links</big>
Solution: Make sure the ROLE_LINK object and its child ROLE_TEXT objects all have STATE_LINKED set. For multi-line links with a line break in the middle, make sure there is no whitespace at the beginning or end of any of the accessible names, and make sure there is a \r\n where the line breaks occur in the accessible name for the ROLE_LINK. For an example of how to do this properly, see Internet Explorer or Gecko. Again, if it's not done exactly this way, some links will not be read.
<big>MSAA Implementation is Not Performant</big>
Solution: Try not to calculate the same things more than once or create the same objects more than once. For example, create and cache an object's children when you look for them in get_accChildCount(), so that you can just hand them back when asked for using get_accChild() or accNavigate(). Support IEnumVARIANT so that the MSAA client can ask for a number of children in one call. In custom interfaces, create methods that hand back a lot of data in one call, rather than requiring a large number of calls. Fewer calls are much better better because COM Marshaling is slow.
<big>Differing client implementations</big>
Solution: We don't know of any outright conflicts in the differing uses of MSAA (yet). However, be on guard. If a vendors asks you to do something different from the spec, it's better to check with the other vendors before moving forward. Check to see what applications from Microsoft do in a similar situation.
<big>Undocumented Window Class Usage</big>
Solution: Contact each vendor and let them know what window classes you will be using MSAA for. If possible, use a different window class name for documents/content than you use for UI/dialogs. Or, do what Mozilla does - expose a control ID (GWL_ID) of 1 for content, and 0 for UI. Consistent window class names are important for the assistive technology vendors, so that they can determine what code to run for a given window. Don't change window class names after you have shipped a version.
Solution: Try to reach out in a friendly manner to the assistive technology company. Be as easy to work with as you possibly can -- this includes being extremely responsive to their bug reports with new test builds, as well as being very communicative about what you have changed and when. Do as much work as you possibly can without their help. See if your organization can offer something they can't get for themselves. Be patient, and set your expectations to a reasonable level. Realize that it's about both pride and revenue for these companies, and that they need to sell a lot of copies of their software to make up the work they put in to support your app. Remember that no matter how small they are, you need them more than they need you, unless your application's accessibility is being demanded by end-users.
4. Example: How Gecko and Mozilla Implement MSAA
The accessible module is also where support for Sun's ATK accessibility API for Linux and UNIX is implemented. For documentation specific to the Mozilla ATK effort, supported by Sun Microsystems, see the Mozilla accessibility on Unix page.
Creation of IAccessible Objects
The first thing that happens when an assistive technology wants to watch our application is that calls the Windows API function AccessibleObjectFromWindow(). This usually happens right after a window gets focused.
When the WIN32 API function AccessibleObjectFromWindow() is called, Windows sends the window in question a WM_GETOBJECT message requesting an IAccessible for your root object in the window. In our case, this event is received in mozilla/widget/src/windows/nsWindow.cpp. We send back an IAccessible pointer which can be used by the client to get information about this root object. The assistive technology will use that root IAccessible to traverse the rest of the object tree, by navigating to children and then siblings, etc. Every navigation function such as accNavigate(), get_accChild() and get_accParent() returns an IAccessible pointer.
To create the root IAccessible for a window the first time it gets the WM_GETOBJECT message in, nsWindow.cpp first generates an internal event called NS_GETACCESSIBLE, which is handled in PresShell::HandleEventInternal() via the creation of an nsDocAccessibleWrap for an inner window or nsRootAccessibleWrap for a top level window. These classes implement both nsIAccessible, our cross platform API, as well as IAccessible, which is specific to Windows/MSAA/COM. The cross-platform nsDocAccessible and nsRootAccessible classes they inherit from are then told to start listening for DOM, page load and scroll events. These events cause MSAA-specific events, such as EVENT_OBJECT_FOCUS or EVENT_OBJECT_STATECHANGE, to fire on UI and document objects within the applicable window. We'll explain more about events later in this section.
Until the WM_GETOBJECT message is processed, the Gecko accessibility service is not used, and thus the accessibility.dll is not loaded, so there is almost zero overhead for accessibility API support in Mozilla or Gecko, in the general case. Once the accessibility service is created, however, Gecko loads code to create an object on demand for every UI or document object that should support IAccessible. The created objects are cached in a hash table, and shutdown when they're no longer needed. They may still exist in memory in a nonfunctional state until the assistive technology completely releases them. See the section on accessible roles to see what kinds of objects Gecko support IAccessible for.
The Accessible Tree vs. the DOM Tree
After the root or doc accessible for a window has been created and handed back to the MSAA client, it is used to traverse the rest of the IAccessible tree using accNavigation, get_accChild and get_accParent. Any IAccessible will support those methods. We also support IEnumVARIANT::Next() which allows for fast marshaling of all of an objects children to a client via COM. In other words, the assistive technology can say "give me all 20 children of this object into this array". That's much faster than 20 separate calls, one for each child.
In Mozilla, the client has another choice for tree navigation -- it can utilize data stored in the DOM via Mozilla's custom ISimpleDOMNode COM interface. Any IAccessible can be used to QueryInterface to an ISimpleDOMNode, and vice versa for a round trip. However, one might QI ISimpleDOMNode to IAccessible only to find it is null, which means that particular node in question is not exposed in the IAccessible tree. See the following diagram for examples of nodes that do no support IAccessible.
MSAA tree vs. DOM tree - what's the relationship?
The MSAA tree and the DOM tree are parallel structures, although the MSAA tree is a subset of the DOM tree.
QueryInterface()can be used to switch between the interfaces used in the two trees (IAccessible and ISimpleDOMNode). If there is no MSAA node for a DOM node, pAccessible->
QueryInterface(IID_IAccessible)will return null.
A Variety of Implementations for IAccessible
There are two main kinds of classes in Mozilla's accessibility class hierarchy, platform-specifc and cross-platform. All of the platform-specific classes have the word "Wrap" appended to them. The Wrap classes contain implementations and interfaces specific to MSAA or ATK. These platform-specific classes inherit from cross-platform classes, where the most of the implementation is done. For example, nsAccessibleWrap inherits from nsAccessible. Every accessible object in the MSAA tree has an implementation dertived from nsAccessible, which exposes accessibility information through nsIAccessible, in a generic cross-platform manner.
This default implementation for nsIAccessible knows how to use nsAccessibleTreeWalker to walk Mozilla's content DOM and frame tree, exposing only the objects that are needed for accessibility. The nsAccessibleTreeWalker class knows what it needs to expose by asking each DOM node's primary frame (a Gecko formatting object) for an nsIAccessible, using the nsIFrame::GetAccessible() method. If nsAccessibleTreeWalker gets an nsIAccessible back, then the DOM node considered to be an accessible object. The nsIAccessible that is returned is either a new one, or reused from the accessibility cache, and the correct type of accessibility object to correctly expose that DOM node through the cross-platform nsIAccessible and MSAA-specific IAccessible interfaces.
Every accessibility object created must be cached, and must inherit from nsAccessibleWrap so that it supports a base implementation of nsIAccessible and IAccessible. Apart from that, it is free to override IAccessible or nsIAccessible methods. In this way each class is tailored to the specific abilities and properties of the HTML or XUL/UI objects it applies to, and can support both MSAA, ATK and hopefully any future accessibility API's we need to support. For example nsHTMLButtonAccessible overrides nsIAccessible::GetAccRole to expose ROLE_BUTTON for IAccessible::get_accRole which uses that.
A more complicated set of nsIAccessible methods which can be overridden are GetAccFirstChild/GetAccLastChild/GetAccChildCount, which allows for objects to define their own decendant subtrees. The default behavior for nsIAccessible::getAccFirstChild is to instantiate a nsDOMTreeWalker, and ask it for the first child. However, nsImageAccessible overrides getAccFirstChild, returning the first area of an image map if there is one, otherwise nsnull. This is necessary because the image map areas can be in a completely different area of the DOM from the image they apply to.
Generating MSAA Events
First, keep in mind that most MSAA events aren't utilized by accessibility aids. Therefore we implement only the handful that matter. See the Events cheat sheet above for the list of events we implement. By far the most important one is EVENT_OBJECT_FOCUS.
When a potential accessibility-related event occurs within Mozilla, it is typically listened for by nsDocAccessible or nsRootAccessible. The event listeners on these classes call FireToolkitEvent(), which is implemented for every accessible. Eventually, the event ends up at nsDocAccessibleWrap::FireToolkitEvent() which calls NotifyWinEvent from the Win32 API. NotifyWinEvent is passed arguments for the window the event occurred in, and the ID of the child within that window. Accessibility aids use the Win32 call SetWinEventHook() to register as a listener for these events. Creating a unique child ID for every object within a window can be difficult, see the problem and solution for no unique child ID for object in window.
The assistive technology chooses which events it is interested in learning more about by calling the Win32 method AccessibleObjectFromEvent, which returns the IAccessible to the node corresponding to the child number that had been indicated from NotifyWinEvent(). This ends up asking nsDocAccessibleWrap::get_accChild() for a child IAccessible which matches the child ID we indicated through NotifyWinEvent().
In Mozilla, we use the DOM node pointer in the accessible object as a basis for its child ID, which is also used as a hash key into our cache. We also negate the 32 bit value so that it is always <0, telling us that they're really looking for the IAccessible for an event, not child number x. During the callback, we look up the original accessible node in the nsDocAccessible's cache and return it.