What’s Missing From Zoom Reminds Us What It Means to Be Human

Over the last month billions of people have been unwilling participants in the largest unintentional social experiment ever run – testing how video conferencing replaced face-to-face communication.

While we’ve discovered that in many cases it can, more importantly we’ve discovered that, regardless of bandwidth and video resolution, these apps are missing the cues humans use when they communicate. While we might be spending the same amount of time in meetings, we’re finding we’re less productive, social interactions are less satisfying and distance learning is less effective. And we’re frustrated that we don’t know why.

Here’s why video conferencing apps don’t capture the complexity of human interaction.


All of us sheltering at home have used video conferencing apps for virtual business meetings, virtual coffees with friends, family meetings, online classes, etc. And while the technology allows us to conduct business, see friends and transfer information one-on-one and one-to-many from our homes, there’s something missing. It’s just not the same as connecting live at the conference room table, the classroom or local coffee shop. And it seems more exhausting. Why?

What’s missing?
It turns out that today’s video conferencing technology doesn’t emulate how people interact with others in person. Every one of these video applications has ignored a half-century of research on how people communicate.

Meeting Location
In the physical world the space and context give you cues and reinforcement. Are you meeting on the 47th floor boardroom with a great view? Are you surrounded by other animated conversations in a coffee shop or sitting with other classmates in a lecture hall? With people working from home you can’t tell where the meeting is or how important the location or setting is. In a video conference all the contextual clues are homogenized. You look the same whether you are playing poker or making a sales call, in a suit or without pants. (And with video conferences people are seeing your private space. Now you need to check if there’s anything embarrassing lying around. Or your kids are screaming and interrupting meetings. It’s fatiguing trying to keep business and home life separate.)

In the real world you just don’t teleport into a meeting. Video conferencing misses the transitions as you enter a building, find the room and sit down. The same transitions are missing when you leave a video conference. There is no in and out. The conference is just over.

Physical Contact
Second, most business and social gatherings start with physical contact – a handshake or a hug. There’s something about that first physical interaction that communicates trust and connection through touch. In business meetings there’s also the formal ritual of exchanging business cards. Those all are preambles to establish a connection for the meeting which follows.

Meeting Space Context
In person we visually take in much more information than just looking at someone’s face. If we’re in a business meeting, we’ll scan the room, rapidly changing our gaze. We can see what’s on desks or hanging on the walls, what’s in bookshelves or in cubicles. If we’re in a conference or classroom, we’ll see who we’re sitting next to, notice what they’re wearing, carrying, reading, etc. We can see relationships between people and notice deference, hierarchy, side glances and other subtle cues. And we use all of this to build a context and make assumptions—often unconsciously —about personalities, positions, social status and hierarchy.

Looking in a Mirror While Having A Meeting
Before meeting in person, you may do a quick check of your appearance, but you definitely don’t hold up a mirror in the middle of a meeting constantly seeing how you look. Yet with the focus on us as much as on the attendees, most video apps seem designed to make us self conscious and distract from watching who’s speaking.

Non-Verbal Cues
Most importantly, researchers have known for at least fifty years that at least half of how we communicate is through non-verbal cues. In conversation we watch other’s hands, follow their gestures, focus on their facial expressions and their tone of voice. We make eye contact and notice whether they do. And we are constantly following their body language (posture, body orientation, how they stand or sit, etc.)

In a group meeting it’s not only following the cues of the speaker, but it’s often the side glances, eye rolls and shrugs between our peers and other participants that offer direction and nuance to the tenor of a meeting. On a computer screen, all that cross person interaction is lost.

The sum of these non verbal cues is the (again often unconscious) background of every conversation.

But video conferencing apps just offer a fixed gaze from one camera. Everyone is relegated to a one-dimensional square on the screen. It’s the equivalent of having your head in a vise, having been wheeled into a meeting wearing blinders while tied to a chair.

Are Olfactory Cues Another Missing Piece?
There’s one more set of communication cues we may be missing over video. Scientists have discovered that in animals, including mammals and primates, communication not only travels through words, gestures, body language and facial expressions but also through smells via the exchange of chemicals and hormones called pheromones. These are not odors that consciously register, but nevertheless are picked up by the olfactory bulb in our nose. Pheromones send signals to the brain about sexual status, danger and social organization. It’s hypothesized odors and pheromones control some of our social behaviors and regulate hormone levels. Could these olfactory cues be one additional piece of what we’re missing when we try to communicate over video? If so, emulating these clues digitally will be a real challenge.

Why Zoom and Video Teleconferencing is Exhausting
If you’ve spent any extended period using video for a social or business meeting during the pandemic, you’ve likely found it exhausting. Or if you’re using video for learning, you may realize it’s affecting your learning by reducing your ability to process and retain information.

We’re exhausted because of the extra cognitive processing (fancy word for having to consciously do extra thinking) to fill in the missing 50% of the conversation that we’d normally get from non-verbal and olfactory cues. It’s the accumulation of all these missing signals that’s causing mental fatigue.

Turning Winners Into Losers
And there’s one more thing that makes video apps taxing. While they save a lot of time for initial meetings and screening prospects, salespeople are discovering that closing complex deals via video is difficult. Even factoring out the economy, the reason is that in person, great salespeople know to “read” a meeting. For example, they can tell when someone who was nodding yes to deal actually meant “no way.” Or they can pick up the “tell me more signal” when someone leans forward. In Zoom all those cues are gone. As a result, deals that should be easy to close will take longer, and those that are hard won’t happen. You’re investing the same or more time getting the meetings, but frustrated that little or no forward progress occurs. It’s a productivity killer for sales.

In social situations a feel for body language may help us sense that a friend who’s smiling and saying everything is fine is actually have a hard time in their personal life. Without these physical cues—and the loss of physical contact—may lead to a greater distance between our family and friends. Video can bridge the distance but lacks the empathy a hug communicates.

An Opportunity for Innovators to Take Video Conferencing to the Next Level
This billion person science experiment replacing face-to face communication with digital has convinced me of a few things:

  1. The current generation of video conferencing applications ignore how humans communicate
    • They don’t help us capture the non-verbal communication cues – touch, gestures, postures, glances, odors, etc.
    • They haven’t done their homework in understanding how important each of these cues is and how they interact with each other. (What is the rank order of the importance of each cue?)
    • Nor do they know which of these cues is important in different settings. For example, what are the right cues to signal empathy in social settings, sincerity, trustworthiness and rapport in business settings or attention and understanding in education?
  2. There’s a real opportunity for a next generation of video conference applications to fill these holes. These new products will begin to address issues such as: How do you shake hands? Exchange business cards? Pick up on the environment around the speaker? Notice the non-verbal cues?
  3. There are already startups offering emotion detection and analytics software that measure speech patterns and facial cues to infer feelings and attention levels. Currently none of these tools are integrated into broadly used video conferencing apps. And none of them are yet context sensitive to particular meeting types. Perhaps an augmented reality overlay with non verbal cues for business users might be a first step as powerful additions.

Lessons Learned

  • Today’s video conferencing applications are a one-note technical solution to the complexity of human interaction
    • Without the missing non-verbal cues, business is less productive, social interactions are less satisfying and distance learning is less effective
  • There’s an opportunity for someone to build the next generation of video conference applications that can recognize key cues in the appropriate context
    • This time with psychologists and cognitive researchers leading the team