The study by academics at Warwick Business School, UK, found that while internet users in US states with higher birth rates search for more information about pregnancy, those in states with lower birth rates look up more information about cats.
However, analysis of the relationship between Google search data and the number of infant deaths per 1,000 births showed that internet users in US states with higher infant mortality rates search for more information on credit, loans and sexually transmitted diseases.
Using the Google Correlate service, the researchers - Adrian Letchford, Tobias Preis and Suzy Moat, of Warwick Business School’s Data Science Lab – sifted through millions of potential correlations with birth rates and infant mortality rates.
However, the researchers highlight that such analyses require careful statistical precautions to avoid identification of spurious correlations – a danger when handling large datasets.
Their paper, titled Quantifying the search behaviour of different demographics using Google Correlate and published in PLOS ONE, introduces a method to address this problem.
“To carry out an analysis like this, our method has to take into account that Google users search for a huge number of different phrases,” said Dr Letchford, Research Fellow in Data Science. “This means that we would expect to find some phrases with strong correlations by chance.
“We compared large amounts of random data for US states with Google search data for those states. Once we know how strong the correlations between random data and Google search data tend to be, we can work out whether the correlations we see between socioeconomic data and Google search data are likely to be spurious or not.”
The study analysed data from the Centers for Disease Control and Prevention on the number of births per 1,000 people in each US state and the number of infant deaths per 1,000 births in each state.
“Using our approach, we find that people in states with higher birth rates search more frequently for phrases like 'pregnancy workout', 'pregnancy calendar' and 'baby constipation',” said Dr Letchford.
Suzy Moat and Tobias Preis lead the MOOC “Big Data: Measuring and Predicting Human Behaviour” on the FutureLearn platform. The current course started in March and can be joined for free at https://www.futurelearn.com/courses/big-data.
Suzy Moat is an Associate Professor of Behavioural Science at Warwick Business School, where she co-directs the Data Science Lab. Her research investigates whether data on our usage of the Internet, from sources such as Google, Wikipedia and Flickr, can help us measure and even predict human behaviour in the real world.
Tobias Preis is an Associate Professor of Behavioural Science and Finance at Warwick Business School. His recent research has aimed to carry out large scale experiments on complex social and economic systems by exploiting the volumes of data being generated by our interactions with technology.
Warwick Business School, located in central England, is the largest department of the University of Warwick and the UK's fastest rising business school according the Financial Times.