In my last post, I noted that, if you want to truly claim you’re solving a problem, you need to first understand the pain being felt by your ‘users’, and build something tailor-made to resolve that pain. Coronavirus has brought many new pains into sharp relief for all of society, but today, I’d like to focus on one specific challenge – how we can fulfill our fundamental need to socialize and interact with other people, when physical interaction is all but forbidden. My goal in writing this post isn’t to propose solutions, though in cases where I have suggestions I’ll certainly share them; rather, I want to take this moment to simply emphasize the discrepancy between the virtual and the physical world.
Some of you might be surprised by this. After all, I’m the CEO of a company which is deeply invested in social activity taking place online more and more often. And indeed, I believe that the scalability and customization possible with online platforms will, in the long run, make online interaction eventually surpass physical socialization. What’s more, I think it would be an incredibly valuable service for society if Modulate and many others could ultimately bring this to fruition – as not only would truly rich online interaction insulate us from pain in times like this, it would also help remove geographic and physical boundaries, enabling more people to find those in the world who most deeply share their passions, excitements, and personalities. My vision is for online socialization to win over physical interaction because it is better.
But it’s not entirely better yet. That’s the simple but important truth of the matter. And there are two ways we can try to rectify that.
The first method is Modulate’s approach: leveraging the unique aspects of online interaction to offer new possibilities which would have been impossible in the physical world. In our case, the new option is the ability to change one’s self-expression through choosing the perfect voice.
But equally important is to understand the ways in which virtual interactions fail to live up to their physical counterparts. Those gaps are what are most relevant to us now, as our once-typical physical interactions – work conversations, hanging with friends, happy hours, etc – are thrust onto an online platform. So as part of my overall mission to make virtual interactions the best they can be, I thought it was well worth exploring those gaps.
Enough caveats. Let’s dive in.
Flexibility Of Motion
Cameras point in a single direction unless they are moved. This probably sounds obvious, but it has a profound impact on our virtual interactions. After all, in physical interactions, the camera – that is, the eyes of those you’re socializing with – can move freely! This means you can drift from one place to another, shift your sitting position, or quickly duck over to the kitchen to grab a snack or drink, all without breaking up the conversation.
Unfortunately, this isn’t possible with virtual calls today, which means you’re forced to remain seated in a relatively fixed position. What’s worse, since the camera can’t meet you halfway the way other people can, you’re forced to expend constant effort keeping “eye contact”. The constant sitting isn’t great for you physically, but even more so, this process of staring at the camera is mentally draining, in a way that wouldn’t even register as an issue in a physical interaction. For this reason, I’ve found it’s sometimes easier to socialize with friends while relying only on audio – thus freeing everyone from the strain of monitoring their cameras at all times. Ideally, though, we’d find a way to more fully unlock ourselves from our seats, and regain the flexibility and casual feeling of in-person interactions even when chatting online.
Distance and Grouping
When among a group of people, it’s easy – to the point of being automatic – to tune out the “crowd” and focus on a conversation with a smaller subset of those in attendance. Part of this work comes from our brains honing in on the facial expressions and vocal frequency of the individual(s) you are speaking with; but that process is significantly helped by the fact that you are usually physically proximate to those you wish to converse with, so their voices are simply louder than the background. At a 50-person party, there might be 10 parallel conversations happening all at the same time, with conversants able to focus only on their immediate discussion; yet each individual also gets the overall sensation of being within the crowd, and can take in ambient information such as a particularly exciting announcement or a transition to a new group activity.
Online, things are typically more binary – either you’re in the conversation, or you’re out. Many games are working on improving this using ideas like spatial audio (which ensures you hear sounds from the physical direction they would be coming from) and proximity voice chat (applying spatial audio to a public voice chat so you hear those approaching you automatically.) Unfortunately, these tools are still largely in their infancy, and many games still don’t support significant voice chat capabilities at all. In contrast, video conferencing tools typically are designed for business meetings with a single focus, and so rarely support options to imply proximity between speakers. (A few tools have “breakout sessions”, where a single group call can be broken into smaller sub-calls between individuals, but this is really the equivalent of each conversation at a party happening in a separate, isolated room.)
This is particularly sad for networking. Few platforms, physical or virtual, can claim to have helped more like-minded folks meet than MMORPGs like World of Warcraft, but those meeting there have typically been individuals who were committed to the platform first, and happened to encounter each other while playing the game. Without making these games more casual and accessible for a wider range of users, and/or adding proximity voice chat to allow people to interact without previously knowing each other, I suspect it will be difficult to make online platforms work as a place to intentionally meet new people as a primary goal – which is a shame, since the internet would put so many more interesting people within our reach if we could solve this problem!
In the physical world, people express their identity and individuality through physical attributes – things like their clothing and accessories, the way they carry themselves, and even their physical proximity to other individuals within a larger group. These characteristics don’t translate very well into the virtual world. There’s no “proximity” on a video call, as just discussed above. You can vary how you carry yourself in video call a bit – lying down vs sitting up, for instance – but there’s a lot less variation in what’s captured in a video headshot (more on this below.) And even clothing doesn’t work the same way – just take a look at this story about how Walmart is seeing more sales in tops but not bottoms due to what’s visible in a video call.
Now, the point about clothes might sound like a superficial difference – choosing a top means you’re still defining your style, doesn’t it? But even if we ignore the fact that this is incomplete, it also elides another important consideration – which is that who we are varies depending on the circumstances. When someone uses Modulate’s voice skins to sound like a monster, or wizard, or princess, they’re not just changing how they sound – they feel more like they are that character, and that influences their behavior in a variety of subtle ways. By the same stretch, even if those pajama bottoms aren’t visible, they inevitably impact your own impression of yourself and your circumstances, and will change the nuances of the interaction in ways you can’t predict or explain.
On the flip side, while virtual environments sacrifice some of these physical characteristics, they replace them with a wealth of new customizable attributes. Consider Zoom’s “virtual backgrounds” which everyone has taken to with such fervor – this is a new part of how we express ourselves in the age of digital interactions, which didn’t exist in the physical world. By the same stretch, we can use visual avatars – and yes, voice skins too – to customize ourselves in other novel ways.
At Modulate, we often discuss the fact that, when looping audio back into one’s own ear, you only have 30ms before the speaker registers an echo. But this is for your own speech – experiments with human reactions to audio stimuli from others show a reaction time between 100-200ms. This means that if it takes more than ~100ms for your speech to reach the ears of the other person you’re speaking with, they could register a delay which would throw off the conversation – say, by leading them to start speaking before the realize you’ve also begun to talk.
Strictly speaking, light-speed travel around the edge of the earth should be able to beat 100ms – but in reality, we need to deal with things like compressing the audio for transmission (and decoding it), running through physical fiber-optic cables, and building in redundancies to handle data loss. As such, 100ms or more of latency is not at all uncommon in voice chats today. This doesn’t necessarily fundamentally prevent you from engaging with others online, but it does make the interaction more awkward/painful, as it becomes more difficult to read each others’s reactions and to gauge the appropriate time to chime in to an active conversation. Because of this, I find it crucial, especially when having emotionally complex conversations such as during a 1:1 discussion, to slow down the conversation intentionally to ensure I’m understanding the other person’s reaction in full. But when your aim is simply to casually socialize, this is one more thing to keep in mind and spend mental energy managing, making the interaction overall feel less pleasant than it could be.
Danger of Distraction
One last difference I’ll mention, which is well known to those who work remotely – if you’re staring at your computer anyways, it’s oh-so-easy to open up another tab and start browsing. Paying attention ideally should be simple, but the reality is that sometimes we place ourselves in conversations which are important but also not the most gripping. In these cases, being in-person helps – you can put your phone and computer to the side, and know that the other person would trivially notice if you became distracted. But in virtual calls, it’s difficult to force this upon yourself.
It’s worth noting that I’m not suggesting that the others should invasively monitor what you’re doing; rather, I’m saying that many people (myself included) use the fact that others are watching as a forcing function to help overcome our innate distractibility. And sadly, virtual calls force me to battle that weakness entirely on my own, further draining my mental energy. One tactic I’ve found surprisingly successful is simply backing further away from the computer during a video chat – but of course, ideally it would be nice if virtual calls could be as immersive and focused as those we have in the physical world.
Of course, the above examples skip over the most obvious difference – which is that physical interaction and virtual interaction permit different activities. For the time being, it’s hard to play tennis, attend a dance party, or cook with your friends, online; and while one can play online games with others in the same physical room, it’s often preferable to play remotely for clarity of communication. But I wonder if this one is as fundamental as the others. After all, there are a wealth of activities available either physically or virtually – it’s just that the options are different in each place. But just as gamers have developed an entirely separate vocabulary for things like e-sports, I wonder if this is simply a matter of us becoming more familiar with the activities the online world has to offer, and developing the same level of comfort we have with physical activities, rather than today’s virtual activities being fundamentally “worse” in some way. Of course, this is an incredibly difficult question to answer; but, at least anecdotally, I’ve been fascinated to hear friends and family expressing their amazement at the fun they’ve had trying out new activities online now that they’ve been pushed to do so.
To be clear, I would have much preferred we skip the coronavirus altogether. But if I may take a silver lining from this time spent self-quarantining, I think that the fact that online socialization has not only rapidly lost its stigma but has in fact seen new demographics join in, will have extremely positive effects on the inclusivity, variation, and overall design of virtual platforms moving forward.