Designing Spatial UI: Part 5

Understanding Input Methods

Siddarth Kengadaran
Siddarth Kengadaran
4 min readOct 3, 2020

--

This is part of a series on Designing Spatial UI,

Part 1: Understanding Spatial Sizing

Part 2: Understanding Movement

Part 3: Diegesis theory

Part 4: Understanding Color

Immersion in XR is dependent on how natural the interaction is with the extended world. Both GUI and NUI based input methods are used in XR. Standard input methods include speech, gaze, gesture, and controllers.

Gaze

The gaze is a fundamental input method in XR. It is closely related to the way we interact with computers using a mouse by moving a cursor. In XR, a gaze pointer is used to mimic the functionality.

Gaze can be divided into two Head Gaze and Eye Gaze.

In eye gaze, the movement of the eye is measured and processed to understand where a person is looking, what they are looking at, and for how long their gaze is in a particular spot. Using eye-tracking alone can feel unnatural and fatiguing to users as it demands consciously moving the eye to control the cursor. It is advised to use eye gaze alone only in exceptional cases.

Alternatively, in the head gaze, the movement of the cursor is based on the orientation of the users’ heads. The cursor is placed on the center of the (FoV) Field of View and moved in the direction the user is currently facing, giving the users the experience of controlling the system with their gaze direction. To make a selection, the users ‘look, ‘but turn their heads towards an object.

Interactions like the focus are indicated with visual or auditory feedback. Dwell time is used to select an object using the gaze. When a user continuously looks at an item for a specific predetermined time, the object is selected. Long dwell time will decrease the number of unintended selections that are made, but it will also reduce the responsiveness of the UI. It is said that dwell time more than 3/4 of the second is not natural for eyes to fixate on one spot.

Though gaze is a primary input method in XR, it is always better to pair it with other input methods like voice, controller, or gesture for a better experience.

Gesture

The gesture is one of the more intuitive ways of interaction in XR. Gestures are identified by tracking human hands and fingers.

Though gestures are more natural and let us interact handsfree as the saying goes, everything comes with a price. Usage of hand gestures for a long time causes physical fatigue (a.k.a gorilla arm).

Extended use of gestural interfaces without the ability of the user to rest their arm is referred to as “gorilla arm”.

Since the tracking is done with cameras and sensors, users have to keep their hands within the line of sight; if the camera or sensors cannot see the hands, the tracking fails. Though with the advancement in technology, the rate of failure has been brought down. Some use cases where users hand might be busy, so it is always advisable to fall back for hand tracking like gaze or voice as an interaction technique.

Voice

Voice as an interface has been evolving as an interaction technique just not in XR space but many ambient computing devices from computers to watches as it is a more natural way of passing commands.

We can use voice commands in XR in two ways, one with a combination of other interactions like gaze and in isolation, which do not require any actions like targeting. For example, “go to start” in Hololens.

Like other interaction techniques, voice commands also have their limitations; sometimes, a voice system may incorrectly hear a command or fails to listen altogether. Voice commands are also tricky in noisy environments or in environments where we cannot speak.

Controllers

Controllers come in different shapes and features based on the use cases. Either they fall under 3DoF or 6DoF. They also include gloves. One of the advantages of using controllers is precision and how these devices can help us use perception or awareness of the position and movement of the body, combined with other senses like haptics. But controllers don’t help us to perform handsfree interactions.

Brain-Computer Interface

Not in mainstream companies like Neuralink, Emotiv, Neurable are building interfaces so we can use brain signals as an input mechanism. Though this sounds like magic, it will be the future. Even it would have its limitations.

Multimodal experiences are best in XR. We should let users use the best input method seamless by blending gaze, gesture, voice, or any of the available mechanisms, so the interaction is natural and immersive.

--

--

Product Consultant, Enabling teams to strategize and build with conscious intention. Currently exploring Spatial Computing (XR) and AI.