We utilized publicly available, anonymized, and aggregated national-level data from Google’s Symptom Search Dataset (SSD), which reports on the relative frequency of Internet searches for 420 signs, symptoms, and health conditions with well-documented privacy protections31. For comparison, we used data from: (1) the Centers for Disease Control and Prevention’s (CDC) National Syndromic Surveillance Program (NSSP), which tracks emergency department (ED) visits for various health conditions from facilities across 48 US states6 and (2) the US Census Bureau’s Household Pulse Survey (HPS) assessing the social and economic impact of the pandemic7. The key features of these data sets are summarized in Table 1.
SSD is publicly available30 and provides daily and weekly time-series of the relative volume of searches in the United States in English or Spanish for common symptoms and conditions. The data are available at national, state, and county levels in the US and five other English-speaking countries. Search queries relating to each symptom are aggregated and anonymized through the use of differential privacy32, and then normalized by the total search volume in that region, as detailed elsewhere31.
SSD was created by leveraging Google’s web search tools that map queries onto Knowledge Graph33,34 entities by continuously learning the associations between words in user queries and the entities described in web pages viewed following those queries. The 420 symptoms and conditions included in SSD represent the most frequently searched entities (by query volume). Each entity (symptom or condition) is associated with tens or hundreds of thousands of individual queries issued by Google users on desktop computers or mobile devices. Quotation marks and capitalization in queries are ignored and spelling mistakes are autocorrected. Sample queries included [lexapro], [depression test], or [signs of depression] for depression; [trazodone], [agoraphobia] or [panic attack] for anxiety; and [I want to die], [how to die] and [I want to kill myself] for suicidal ideation.
For the current study, we focused on SSD search queries related to anxiety, depression, and suicidal ideation between January 1, 2018 through December 31, 2020. We chose these entities a priori because they represent common conditions that are frequently searched for, and because of their high relevance to population mental health. We also considered searches related to motion sickness as a putative negative control in a subset of our analyses.
We compared national-level, weekly data on Internet searches as measured by SSD to national-level data on ED visits as reported by the NSSP. The NSSP is a collaboration led by the CDC to collect, analyze, and share electronic health data from approximately 3500 emergency departments, urgent and ambulatory care centers, inpatient healthcare settings, and laboratories (collectively referred to as ED facilities from here on) across 48 states (excluding Hawaii and Wyoming) and Washington DC6. These facilities account for approximately 70% of all US ED facilities. The data used in this analysis were previously utilized by Holland et al. (2021)20 and reused in the current study with permission from the authors.
We focused on two variables reported on by Holland et al. (2021)20: (1) national counts of weekly ED visits for mental health conditions associated with natural or human-originated disasters, such as stress, anxiety, symptoms consistent with acute stress disorder or posttraumatic stress disorder, and panic, and (2) national counts of weekly suicide attempts. The dataset included weekly ED visit counts from December 30, 2018 to October 10, 2020.
We additionally compared Internet search data to data from the HPS. The HPS is a national survey designed to measure the impacts of the COVID-19 pandemic on the economic, physical, and mental health of American households7. Phase 1 of the survey took place between April 23, 2020 and July 21, 2020, Phase 2 took place from August 19, 2020 to October 26, 2020, and Phase 3 took place between October 28, 2020 and March 29, 2021. Although the survey is still ongoing, in the current analysis we used HPS data from these three phases35.
Questions regarding symptoms of anxiety and depression were administered in all phases of the survey, while questions regarding mental health care were included in Phases 2 and 3. Questions regarding symptoms of anxiety and depression included 4 items that are a modified version of the two-item Patient Health Questionnaire (PHQ-2) and the two-item Generalized Anxiety Disorder (GAD-2) questionnaires. For each question, responses covered the last 7 days and were coded as follows: not at all = 0, several days = 1, more than half the days = 2, and nearly every day = 3. Scores for anxiety and depression were obtained by summing responses across the two questions for each construct. The percentage of respondents scoring 3 or above on these summed scores is used in analyses of survey results. Items indexing mental health care assessed the percentage of adults in the past 4 weeks that reported taking prescription medication, receiving counseling or therapy from a mental health professional, or needing counseling or therapy from a mental health professional but not receiving it (i.e., unmet needs).
We first used graphical approaches and descriptive statistics to identify temporal patterns in Internet searches related to anxiety, depression, and suicidal ideation. We then fit a generalized linear model with a log link function to quantify the impacts on relative search volumes associated with the week of the Thanksgiving and Christmas holidays and the onset of the COVID-19 pandemic (defined as the first 4 weeks of March 2020), adjusting for calendar year and season.
Second, we quantified the change in search volumes associated with the pandemic by calculating the percent change in search frequency for each topic versus the same week 1 year earlier for the period from January 1, 2020 through December 31, 2020. We similarly estimated the change in rates of ED visits for mental health symptoms and suicide attempts from the NSSP.
Third, we computed pairwise Pearson correlation coefficients between contemporaneous measures derived from SSD, NSSP, and HPS. Results were not materially different when using Spearman rather than Pearson correlation coefficients. We additionally used scatter plots to visualize the relationship between specific pairs of markers in more depth. In sensitivity analyses we considered the potential for the presence of a 1 or 2 week lag between change in search volumes and change in rates of ED visits for mental health or suicide attempt. Specifically, we used a generalized linear model with a log link function to quantify the relative change in ED visits associated with searches the same week, the previous week, and 2 weeks earlier. We fit separate models for each search concept. All analyses were conducted using R (version: 4.0.2). The code to replicate these analyses is publicly available via GitHub at https://github.com/anthonysun95/Google_SSD_and_Mental_Health.