Science Behind AR: How Neuroscience-based

AR Can Improve Workplace Performance

What Works Better: 2D or 3D Instructions?



This blog post is one of our special edition longreads. It chronicles a recent study we conducted with Accenture and presented at this year's International Symposium on Mixed and Augmented Reality (ISMAR). We examined whether 3D instructions delivered via AR were more effective than 2D paper / screen instructions. Read on to discover the full results and surprise findings.


Key Takeaways from the Meta-Accenture Labs Study

→ The use of both 3D and movement in AR instructions speeds up people's ability to assemble things

→ After using both 2D and AR instructions, people indicated that they prefer to use AR instructions over 2D instructions


Imagine coming home with a new Ikea floor lamp. After you've unwrapped the package, gathered and sorted the necessary parts and tools, you crack open the instruction manual to figure out how the lamp should be assembled. 30 minutes go by and you're on step 9 when you stop and do a quadruple-take because the diagram is beyond mind-boggling:


(Dudero floor lamp image courtesy of Ikea)


And then you keep needing to go back and forth to look at what you're doing, which wastes time and makes you lose focus. With augmented reality (AR), you cut that step out and you're able to increase your working memory load to continuously complete the task.


Assembling furniture shouldn't be this difficult, but it usually is because the process comes with subpar paper instruction manuals. And for those of you who think it's not so much the instruction manuals that are inadequate, but rather people and their inability to follow directions, then we beg to differ with both a short and long answer:


The Short Answer

2D instructions – whether they be textual and graphical instructions on a screen or paper – are hard to understand because they're flat representations of complex actions and objects in a 3D environment aka the real world. This is why simulations (and at times closely supervised and guided real-world scenarios) are integral for training sessions in highly complex procedural jobs like surgery and manufacturing assembly. Put it another way, imagine learning CPR with just text and pictures on either a paper or screen: you'll get a general idea of where and when to put your hands on someone's chest and breathe air into their mouth, but you won't know exactly how to do so until you've practiced the process on a CPR dummy.


The Long Answer

From a neuroscience perspective, instructions on paper and screens are cognitively limiting, i.e. difficult to understand, for the following reasons:


      1. They're perceptually inefficient

Oversimplified, flat 2D images of 3D objects and actions make it harder for our brains to process and understand the actions that need to be taken.


      2. They overload working memory

Like the RAM in computers, working memory is where our brains temporarily store and process information. If our working memories are overloaded, then our performance on complex tasks will be undermined.


      3. They're difficult to recall

Unless you have eidetic memory (mistakenly called "photographic memory"), remembering each step to execute in the overall process, along with their corresponding diagrams, is impossible for most people.


Given how 2D instructions place these cognitive limits on us, you might be wondering what can help overcome those limits and in turn provide a better way to deliver instructions.


Immersive AR: A Better Alternative to 2D Instructions



The current body of research on AR's (and more broadly 3D visualizations) efficacy seems to show that depending on the task, static 3D instructions can help people execute tasks more efficiently than 2D instructions. At a high level, static 3D visuals introduce the perceptual cue of stereo 3D, which makes it easier for our brain to process and understand the combination of visual and textual information and actions we need to complete as part of a task. From a neuroscience standpoint, the human brain (specifically the Middle Temporal area of the cortex) has evolved to become finely attuned to both stereo and motion cues, which indicates why people might find 3D representations of complex, multi-step tasks and actions easier to understand than 2D representations.


Given this understanding of how our brains work  and the lack of studies that have systematically examined the perceptual cues that our brains use to rapidly process procedural tasks  we decided to partner with Accenture Labs on a pilot study examining the use of perceptual cues in AR. More specifically, we wanted to measure the effect an additional perceptual cue (motion) would have on the time it takes to complete a procedural task. We operated under the hypothesis that integrating both stereo and motion perceptual cues could further reduce the limitations of 2D instructions  ultimately enabling people to more quickly complete a procedural task.


How We Conducted the Study

At this year's Bay to Breakers pre-race expo, the colorful annual footrace in San Francisco (California), we and the Accenture researchers set up the procedural task of assembling a physical lighthouse Lego set. The model (image below) was chosen because it was complex enough to require instructions in order to completely assemble it (several steps called for bricks of various sizes and shapes to be used).


(Product image courtesy of The Lego Group)


We defined three conditions based on the different types of instructions participants were to receive:


      1. 2D Paper

Copies of each page in the original paper instruction manual were transferred onto a PowerPoint slide and displayed on a computer monitor. We transferred the paper instructions to digital form in order to track participants' time to complete each step. Participants used a keyboard to navigate through the instructions on the screen.


      2. Holographic Static 3D (Stereo Cue)

Showed the visual instructions from the same viewpoints as their corresponding 2D paper counterparts. The Lego pieces were made in 3ds Max, a 3D modeling and animation software, and ported into Unity as holographic instructions that mimicked the 2D paper instructions. Participants used the Meta 2 to view the Static 3D Instructions.


      3. Holographic Dynamic 3D (Stereo & Motion Cues)

Derived from the same Unity prefabs as the ones used in the Static 3D Instructions, with participants also using the Meta 2 to view the instructions. Showed the same features as the 2D paper counterparts in addition to allowing participants to see the model from all points of view (models were rotated 360° for three seconds at the beginning of each step).


(Instruction conditions for 2D paper and both Static and Dynamic 3D instructions.)


We drafted 77 voluntary participants, each of whom was randomly assigned to one of the three instruction conditions. We split the participants evenly across all three conditions.


The Results Weren't What We Entirely Expected

Given the time constraints, we were not able to measure participants' accuracy in completing the task. We instead measured the time it took to complete (TTC) each step, especially since all participants finished building the lighthouse model from start to finish. TTC became our metric for evaluating how quickly participants were able to complete each step.


Comparing the three instruction conditions, we found that Dynamic 3D Instructions enabled participants to more quickly complete each step. Participants using Static 3D Instructions and 2D Paper Instructions were much slower in comparison. This confirmed our hypothesis that the use of both the stereo and motion perceptual cues in AR instructions speeds up assembly time. Interestingly enough, we found that participants using Static 3D Instructions were the slowest of the three instruction conditions. This was especially surprising to us because based on past studies conducted in 2003 and 2013, we expected people using any kind of 3D instructions to perform the Lego building task more quickly than those using paper 2D Paper Instructions.


Post-lighthouse model building, we surveyed each participant and gauged their perceptions of time, instruction helpfulness, and instruction effectiveness. We were surprised to find that regardless of the instruction types participants used, there was no statistically significant difference across perceived speed, helpfulness, and effectiveness. These data indicated that participants weren't consciously trying to make an effort to speed up or slow down when using 3D or 2D instructions. And most interesting of all, we asked participants whether they thought AR instructions were more effective than 2D instructions and the survey shows that participants overwhelmingly (86%) preferred AR instructions over paper instructions:



We recap the study's results and findings in the video below if you'd like to learn more:


Do These Findings Mean We Should All Use AR Instructions?

This pilot study has shown some promising findings, but by no means is it conclusive. We believe that additional systematic studies of individual perceptual cues are needed, especially with regards to informing AR user interface (UI) design. Our results strongly suggest that careful consideration of AR interface design is needed we can't just dump applications into 3D and think users will find it easy to use, especially given our study's finding that people performed worse with static 3D instructions than those using paper and dynamic 3D instructions.

Subscribe to AR news & best practices