Download printable overview of the workshop here!

Big Data epidemiology: Turning a trend into useful preventive medicine

Session organisers: Maya Wardeh and Marie McIntyre, Institute of Infection and Global Health, University of Liverpool, UK
E-mail: &

The phrases ‘Big Data’ and ‘Big Data Epidemiology’ are increasingly discussed in infectious disease research, but what does this mean? Could Big Data be useful within your work? If so, what are the best ways to integrate such approaches in your research topic?

Within this workshop we aim to provide an introduction to Big Data approaches to the clinical sciences and infectious disease surveillance; what it means, how it’s done; and what its limitations are.

We will focus on the four Vs of Big Data, discussing examples of technologies and applications. The discussion will highlight how you can bridge the gap in your knowledge, in order to manipulate and aggregate data from various sources.

Learning outcomes

Big Data approaches are about bringing together and analysing multiple data sources including clinical health records, disease surveillance data, climate data or phone use data. By creating structure and preparing data and ideas for projects properly, the ‘big’ research questions can be investigated. The aim of this workshop is to provide an overview of what Big Data Epidemiology is, including helping participants to understand:

  • What is your perception of big data; what do you think it is?
  • How can it be applied with your work?
  • What are the four Vs of big data? How does each affect your work?
  • What are the limitations to using big data?


No prior knowledge of the subject is needed; the targeted participants are anyone who has an interest in Big Data epidemiology, regardless of their experience.

Content and structure

This workshop will be an interactive introduction to big data, big data approaches and best practices in epidemiology including:

  • Big data in global health;
  • Big data in disease surveillance;
  • Big data analytics.
  • Sources of big data include: health records (and their ethical constraints), animal movement datasets, social media, scientific literature (and supporting materials), genetic sequences, climate, satellite and geographical datasets.
  • Participants will be asked to classify what is big and what is not big data and discuss research opportunities.
  • We will discuss best practices to dealing with big data: acquiring, cleaning, storing and compressing.
  • Examples of big data research in epidemiology and global health: modelling outbreaks, mapping diseases, prediction of outbreaks and disease emergence will be demonstrated.
  • Discussion will also be undertaken on the limitations of big data, and why ‘big’ is not always better.


The session organisers will bring recent examples of Big Data driven research in epidemiology and global health, Big Data health and surveillance paradigms and applications. Examples of sources of data and a guide on how to create Big Data from various sources will be provided. Examples of tools and scripts that can be used in acquiring big data will be discussed.

Maximum number of participants

30 people


Dr. Maya Wardeh

Maya is computer and data scientist and has worked in both human and animal fields, in academic and industrial environments. Her research topics include: complex network analysis and modelling, particularly networks of shared pathogens amongst host species; detection of transmission routes of pathogens; data and text mining of publically available sources as well as human and veterinary health records. She has developed various Big Data tools and solutions in the areas of infectious diseases, human health, veterinary surveillance and neglected tropical diseases. She has taught various computer science topics including programming, algorithms, spatial and complex databases, and artificial intelligence.

Dr. K. Marie McIntyre

Marie is an epidemiologist and has worked in both the human and animal fields, in academic and applied agency environments. Her research topics include: eHealth methodologies to aid the detection, identification and collation of information on infectious diseases; examining drivers of human and animal disease and the characteristics for emergence and further transmission; modelling the spatial distribution of pathogens/diseases; developing methodologies for prioritisation of the impact of diseases/pathogens; and studying the epidemiology of certain diseases e.g. zoonoses, emerging infections, gastrointestinal diseases in humans, scrapie, atypical scrapie and Bluetongue virus. She has taught statistics and epidemiology to students and professionals using various methods for longer than she’d like to admit!