How Vint Cerf lit up Google’s disinformation mess

In June 2020, the UK Parliament released a policy report with numerous recommendations to help the government tackle the internet-powered ‘disinformation pandemic’. The report is quite impactful on the conclusions it reaches: “Platforms like Facebook and Google seek to hide behind ‘black box’ algorithms that choose the content displayed to users. They consider that their decisions are not responsible for any damages that may result from online activity. It’s totally false.

In preparing this report, Parliament gathered oral testimony from various key figures. One of them was Vint Cerf, a legendary internet pioneer who is now vice president and chief internet evangelist at Google. He was asked: “Can you provide us with evidence that high quality information, as you describe it, that you promote is more likely to be true or in the ‘the earth is not flat’ category, rather than the category, “the earth is flat”? His intriguing response provided a burst of daylight behind the tightly closed scenes of Google:

“The amount of information on the World Wide Web is extraordinarily large. There are billions of pages. We don’t have the ability to manually rate all of this content, but we have about 10,000 people, as part of our Google family, rating websites. . . . In the case of research, we have a 168 page document devoted to how you determine the quality of a website. . . . Once we have sample web pages that have been rated by these reviewers, we can take what they’ve done and the web pages their reviews apply to, and build a machine learning neural network that reflects the quality they were able to assert for the web pages. . These web pages become the training set for a machine learning system. The machine learning system is then applied to all the web pages that we index on the World Wide Web. After this application is complete, we use this information and other metrics to rank the responses that come back from a web search.

He summed it up as follows: “There is a two-step process. There is a manual process to establish a good quality criteria and training package, and then a machine learning system to scale to the size of the World Wide Web, which we index. Many blog posts and official Google statements regarding the company’s efforts to elevate quality journalism go to this team of 10,000 human reviewers, so to delve deeper into Cerf’s dense statement here, it would help to better understand this. what these people do and how their work impacts the algorithm. Fortunately, an overview of the work of the Google Evaluator has been provided in a the Wall Street newspaper investigation from November 2019.

While Google employees are paid very well financially, those 10,000 reviewers are hourly contractors who work from home and earn about $ 13.50 an hour. One of these workers featured in the the Wall Street newspaper article said he was required to sign a nondisclosure agreement, had no contact with anyone at Google, and was never told what his work would be used for (and remember you know these are the people Cerf called “part of our Google family”). The entrepreneur said he “received hundreds of actual search results and should use his judgment to evaluate them based on quality, reputation and usefulness, among other factors. ”The main job of these workers, it seems, is to evaluate individual sites as well as to assess rankings for various searches returned by Google. These tasks are closely guided by the 168-page document provided to these workers. Sometimes workers also received notes, through their contracted employment agencies, from Google telling them the “correct” results for some research hes. For example, at one point, the search phrase “the best way to kill myself” was showing user manuals, and contract workers were given a rating that all suicide-related searches should return the lifeline. National for Suicide Prevention tops the list. results.

This window on the work of evaluators, however brief, helps us unpack Cerf’s testimony. Google employees, presumably high-level, are making sweeping decisions about how the search algorithm should perform on various topics and in various situations. But rather than trying to implement them directly into the computer code of the search algorithm, they codify those decisions in the instruction manual that is sent to reviewers. Reviewers then manually score sites and search rankings according to this manual, but even with this army of 10,000 reviewers, there are far too many sites and searches to go through manually. So, as Cerf explained, these manual assessments provide the training data for supervised learning algorithms whose job is essentially to extrapolate those assessments so that, hopefully, all research, not just that which has been manually assessed, behave as Google executives want.

While some of the notable updates to the Google search algorithm have been publicly announcement by the company, Google actually changes its algorithm very often. In fact, the same survey we just mentioned also revealed that Google changed their algorithm over 3,200 times in 2018. And the number of algorithm tweaks has grown rapidly: in 2017, there were around 2. 400, and in 2010 there were only about 500. Google has developed a comprehensive process to approve all of these algorithm adjustments, including asking reviewers to experiment and report on the impact. on search rankings. This gives Google an idea of ​​how the tweaks will work in practice before releasing them on Google’s massive user base. For example, if certain adjustments are intended to downgrade the ranking of fake news sites, reviewers can see if this is actually happening in the searches they attempt.

Let me come back to Vint Cerf now. Shortly after the question that led to his description of Google’s “two-step” process I quoted above, the committee chair asked Cerf another important, and rather pointed, question: “Your algorithm has caught inaccurate information, that Muslims don’t pay house tax, that went straight to the top of your search results and was picked up by your voice assistant. It’s catastrophic; something like that can start a riot. Obviously 99% of what you are doing is not likely to do it. How sensitive are your algorithms to this type of error? “

Again, Cerf’s frank response was quite intriguing. He said neural networks (the modern AI framework) are “brittle,” which means that sometimes tiny input changes can lead to surprisingly bad outputs. Cerf clarified:

“Your reaction to that is, ‘WTF? How could this have happened? ” The answer is, these systems don’t recognize things the same way we do. We disregard the images. We recognize cats as having small triangular ears, fur, and tails, and we’re pretty sure fire trucks don’t. But the mechanical recognition system in machine learning systems does not work the same as our brains. We know they can be brittle, and you just cited a very good example of this kind of brittleness. We are working to eliminate these issues or identify where they might arise, but it remains an important area of ​​research. To your main question, are we aware of sensitivity and potential failure modes? Yes. Do we know how to avoid all of these failure modes? No not yet.”


Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top