Google is defending its practice of letting human employees, most of which appear to be contract workers located around the globe, listen to audio recordings of conversations between users and its Google Assistant software. The response comes after revelations from Belgian public broadcaster VRT NWS detailed how contract workers in the country sometimes listen to sensitive audio captured by Google Assistant on accident.
In a blog post published today, Google says it takes precautions to protect users identities and that it has “a a number of protections in place to prevent” so-called false accepts, which is when Google Assistant activates on a device like a Google Home speaker without the proper wake word having been intentionally verbalized by a user.
The company also says it has human workers review these conversations to help Google’s software operate in multiple languages. “This is a critical part of the process of building speech technology, and is necessary to creating products like the Google Assistant,” writes David Monsees, a product manager on the Google Search team who authored the blog post.
“We just learned that one of these language reviewers has violated our data security policies by leaking confidential Dutch audio data,” Monsees adds, referencing snippets of audio the Belgian contract worker shared with VRT NWS. “Our Security and Privacy Response teams have been activated on this issue, are investigating, and we will take action. We are conducting a full review of our safeguards in this space to prevent misconduct like this from happening again.”
Additionally, Google claims just 0.2 percent of all audio snippets are reviewed by language experts. “Audio snippets are not associated with user accounts as part of the review process, and reviewers are directed not to transcribe background conversations or other noises, and only to transcribe snippets that are directed to Google,” Monsees adds.
Google goes on to say it gives users a wide variety of tools to review the audio stored by Google Assistant devices, including the ability to delete those audio snippets manually and set up auto-delete timers. “We’re always working to improve how we explain our settings and privacy practices to people, and will be reviewing opportunities to further clarify how data is used to improve speech technology,” Monsees concludes.
What’s not addressed in the blog post is how the number of overall requests workers around world are reviewing for general natural language improvements, and not just to make sure the translations are accurate.
It’s widely understood by those in the artificial intelligence industry that human annotators are required to help make sense of raw AI training data, and those workers are employed by companies like Amazon and Google, where they’re given access to both audio conversations and text transcripts of some conversations between users and smart home devices. That way, humans can review the exchanges, properly annotate the data, and log any errors so software platforms like Google Assistant and Amazon Alexa can improve over time.
But neither Amazon nor Google has ever been fully transparent about this, and it’s led to a number of controversies over the years that have only intensified in the last few months. Ever since Bloomberg reported in April on Amazon’s extensive use of human contract workers to train Alexa, big tech companies in the smart home sector have been forced to own up to how these products and AI platforms are developed, maintained, and improved over time.
Often, the answer to those questions is small armies of human employees, listening to recorded conversations and reading transcripts as they input data for the underlying machine learning algorithms to digest. Yet there’s no mention of that the Google Home privacy policy page. There are also GDPR implications for European users when this level of data collection is done without proper communication and consent on the user end.
If you want this data deleted, you have to jump through quite a few hoops. And in the case of Amazon and Alexa, some of that data is stored indefinitely even after a user decides to delete the audio, the company revealed just last week. Google’s privacy controls appear to be more robust than Amazon’s — Google lets you turn off audio data storage completely. But both companies are now contending with a broader public awakening to how AI software is being beta tested and tinkered with in real time, all while it powers devices in our bedrooms, kitchens, and living rooms.
In this case, we have a Belgian news organization that says it identified as many as 150 or so Google Assistant recordings out of 1,000 snippets provided by a contract worker that were accidentally captured, with no wake word uttered. That the employee in question who was able to get this data easily, violating user privacy and Google’s apparent safeguards, is disconcerting. Even more questionable is how the worker says he was able to piece together sensitive happenings inside user’s homes, like a potential threat of physical violence captured by a false accept when the worker heard a female voice that sounded as if it were in distress.
It’s clear that owning a Google Home or a similar Assistant device and allowing it to listen to your sensitive daily conversations and verbalized internet requests involves at least some type of privacy compromise. Using any Google product does, because the company makes money on collecting that data, storing it, and selling targeting ads against it. But these findings contradict Google’s claims that it’s doing seemingly everything it can to protect its users’ privacy, and that its software isn’t listening unless the wake word is uttered. Clearly, someone is in fact listening, somewhere else in the world. And sometimes they’re not supposed to be.