| |
Our Technology
The WordSign project was launched in 2008 to develop a suite of tools to reproduce sign language using photo-real 3D animation. Our objective is to make it much easier for translation teams to produce sophisticated sign language video content.
There are three technology areas that are part of the first phase of the WordSign development project:
Capture
In employing 3D animation for the production of sign language videos, one of the most critical factors is the reproduction of smooth, natural motions. The most accurate approach is to capture the movements of an expert signer using video, and then to use software algorithms to track and translate the recorded motions for direct application to 3D characters. In our case the characters are as human-realistic as possible. In the production of animated movies and more sophisticated video games this approach is called ‘motion capture’. For WordSign we are developing a motion capture system that is simple, low-cost, and specifically architected for the capture of sign language communication.
A requirement of the WordSign motion capture system is that it needs to identify accurate 3D positional information for the signer’s hands, arms, head, and upper body. Because we need 3D and not only 2D information, we are using a stereo camera.

Click to view the example stereo camera mounted on a standard tripod
An appropriate stereo camera consists of a pair of synchronized ‘global shutter’ cameras, mounted on a rigid support and separated by a fixed distance. These cameras are a standard commercial product, often used for ‘machine vision’ applications, such as inspection or robotics.
The stereo camera is mounted on a conventional tripod that is about two meters in front of the signer being recorded.
Ordinary incandescent lights are used to illuminate the signer. During recording the stereo image data is stored on the hard disk of a PC and then subsequently batch-processed to derive 3D depth-map information for all pixels. Depth information is automatically calculated from the angular differences resulting from the fixed spacing between the two cameras.
For improved 3D tracking accuracy, the signer is asked to wear a patterned shirt and simple paper wrist bands during recording; these wrist bands are generated on an ordinary black and white inkjet or laser printer. The WordSign motion capture software is used following recording to automatically track the positions of the relevant body parts, frame-by-frame. The batch-mode tracking software focuses on minimizing subsequent manual cleanup and intervention required. A motion capture editing environment provides capabilities for viewing and cleanup of the final motion-captured result, before it is imported into the animation production editor. Different segments of motion capture 3D data can be mapped to the ‘skeletons’ of different 3D characters or avatars, allowing a single expert signer to record multiple parts of a story in a single session.

Click to view the process example
Production Editing
The WordSign production editing tools are used to assemble and manipulate the different components of an animated video production. These tools are being implemented as a special plug-in module for the industry-standard Maya software. Building on top of Maya allows us to take advantage of the robustness and sophistication of a toolset that is widely used in commercial animation.
WordSign production editing is built around a ‘multi-track’ model, allowing the motion captured sign language information to be assigned to multiple 3D animated characters at different points in time, for multi-character stories, including those that involve back and forth dialogs between characters. Scenery such as buildings, trees, wind, clouds, water, and other supporting scenery can be imported into the production to establish visual context for the story.
An important goal of the production editing environment is the elimination of frame-by-frame hand editing of detailed motions, through the application of software-assisted editing techniques. These include controls or ‘handles’ that manipulate pre-determined combinations of joints or muscles, such as those needed for facial expression editing. In addition, the production editing step involves pre-determined camera angles and lighting conditions that are built into scenery, to keep editing as simple as possible with professional-grade results.

3D Animation Assets
We have invested considerable effort in developing human-realistic 3D characters that support the requirements of sign language video production. Several of our team members who are expert level animators have identified, tested, and implemented techniques that yield characters with muscles, joints, skin, and clothing that move naturally that look ‘real’, while employing techniques that keep production rendering time and manual editing at a minimum. All of the characters, independent of their size, age, or sex, are built using a consistent approach, so that the motion capture data and editing tools can be used for any character.
Scenery is 3-dimensional to allow for correct interactions between the foreground and background parts of scenery and the characters that are inserted into a scene. Scenery is built with camera position controls and lighting mechanisms as an integral part of each scene. The WordSign animation and graphics design team is building a library of characters and scenes in support of translation teams that will use the WordSign toolset.
See the Examples page for samples of scenery and characters that are being developed.
 
Click to view the asset examples
Back to Top |
|