A few notable events in industry this week – let’s jump right in!
In the midst of significantly increased usage due to COVID-19 lockdowns, Discord has rebranded its primary platform to emphasize that it is not just for gamers, but meant to enable online conversation for a much broader audience. I’d like to think I get partial credit for predicting this – my blog post from last year specifically called out that Discord wasn’t taking the steps you’d expect to reinforce their ownership of in-game voice chat if it were a primary aspect of the platform. (Though I confess the blog post was written too narrowly, given my own focus on gaming chat, rather than looking at all of Discord more comprehensively.)
This also suggests a more significant turn in Discord’s business model – they seem to me to be increasingly focusing on reinforcing their identity as a social network, rather than as a specific platform. They still are monetizing heavily through subscriptions, but I wouldn’t be surprised to hear them talking about making that network more available and monetizing more thoroughly based on those relationships as well.
HBO recently released a documentary, Welcome to Chechnya, in which LGBTQ members of Chechnya’s population discuss the often brutal mistreatment they’ve faced from society and, more directly, from the government. Typically, such sources might provide their story offscreen to be retold by an actor or narrator, but HBO decided they wanted to retain as much of the emotion and authenticity of the speakers as possible, while still protecting them from being targeted for speaking out. Thus, they utilized “deepfake” technology to replace the faces of those speaking with the faces of real people who volunteered for their faces to be used in this way.
This kind of face-swapping technology has generally gotten negative coverage due to (reasonable!) concerns about one’s face being appropriated without consent, especially for ethically or morally concerning works such as pornography. But this documentary showcases an important alternate angle of deepfake technology – where, for many people, their physical identity puts them in danger or fails to represent them appropriately. In those cases, having the option to swap out one’s face (and voice) can be an incredibly important tool for freedom.
It appears that the source of the issue is that researchers started with a list of words, and then pulled down images from the internet which matched those words. Unfortunately, that list of words contained many slurs and offensive words. On their own, the list of words draws no relationships, and included these slurs for the sake of completeness. But the images pulled from the internet now were training machine learning systems across the globe to associate these slurs with images of innocent, ordinary individuals, which were only labeled with these slurs by a small number of malicious internet users. As a result, MIT is pulling the images entirely, and apologizing for their oversight.
I wish the moral here could simply be as easy as “review your datasets,” and certainly that’s an important part of it. But I think it’s telling that this stemmed not from collecting fresh data directly, but from expanding an initial dataset – which itself wasn’t inherently nearly as problematic. To me, this just underscores how difficult a problem data management truly is, and reinforces the need for everyone working in machine learning to be eagle-eyed in thinking about the degrees to which their data – and thus, their machine learning models – will reflect the world as it is, versus the world we all aspire to.