January 19, 2009 By Chad Vander Veen
For much of its history, Google has been a widely admired company that could seemingly do no wrong. But in recent years, some observers have cast a suspicious eye at the search giant. From censoring content in China to accusations of invading user privacy at the behest of the U.S. government, the company with the motto "Do no evil" has lost some of its luster.
If its image has been tarnished, much of the blame stems from Google's ability to intimately track users' Web browsing habits. Though Google is by far the most popular site for searching the Web, users are growing more uncomfortable with the notion they may be under the lens of Google's microscope.
But what if Google could use its considerable power for good? The company will tell you that's what it's always done. If you want proof, look no further than Flu Trends, a remarkably simple service Google devised to help the nation's health officials get an upper hand during flu season.
If advertisers can determine your shopping trends based on Web searches, health officials should be able to monitor health trends the same way. That's the underlying, albeit simplified, rationale behind Detecting influenza epidemics using search engine query data, a paper that appeared in the November 2008 issue of Nature. The authors - Jeremy Ginsberg, Matthew H. Mohebbi, Rajan S. Patel, Mark S. Smolinski and Larry Brilliant of Google and Lynnette Brammer of the Centers for Disease Control and Prevention (CDC) - analyzed years of search terms and concluded they could develop a model to quickly identify influenza outbreaks.
"By processing hundreds of billions of individual searches from five years of Google Web search logs, our system generates more comprehensive models for use in influenza surveillance, with regional and state-level estimates of ILI (influenza-like illness) activity in the United States," they wrote.
The authors gathered historical logs of Google search queries from 2003 to 2008. From that data they developed a formula to track the occurrence of common search queries amid the 50 million most common searches in the U.S. during that time. The formula was then further refined to narrow the query tracking to ILI-related searches. The resulting search trends were then compared to the data gathered by the CDC across its nine public health regions. The CDC's influenza-surveillance data is gathered by 1,500 doctors who report to the CDC on 16 million annual physician visits concerning ILI - a process that can take several weeks. It turned out that the researchers' Web query analysis produced trends similar to those discovered by the CDC.
"Google Web search queries can be used to estimate ILI percentages accurately in each of the nine public health regions of the United States," according to the authors. "Because search queries can be processed quickly, the resulting ILI estimates were consistently one to two weeks ahead of CDC ILI surveillance reports. The early detection provided by this approach may become an important line of defense against future influenza epidemics in the United States, and perhaps eventually in international settings."
The authors are quick to note, however, that their model is not intended to replace the sort of on-the-ground surveillance conducted by the CDC. Instead, Google Flu Trends is designed to help public health officials spot an outbreak before it starts. "This system is not designed to be a replacement for traditional surveillance networks or supplant the need for laboratory-based diagnoses and surveillance. Notable increases in ILI-related search activity may indicate a need for public health inquiry to identify the pathogen or pathogens involved. Demographic data, often provided by traditional surveillance, cannot be obtained using search queries," the authors said.
"In the event that a deadly strain of influenza emerges, accurate and early detection of ILI percentages may enable