DATASETS

This page provides a collection of datasets obtained through the SocioPatterns sensing platform.

Contacts in a workplace

Release data: Jun 23, 2016

This data set contains the temporal network of contacts between individuals measured in an office building in France, from June 24 to July 3, 2013. This network was described and analyzed in the publication “Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers” by M. Génois et al., published as Network Science 3, 326 (2015).
The data set comprises two files. The first one contains a tab-separated list representing the active contacts during 20-second intervals of the data collection. Each line has the form “t i j”, where i and j are the anonymous IDs of the persons in contact, and the interval during which this contact was active is [ t – 20s, t ] (t is expressed in seconds since the time origin taken as 0:00 on June 24, 2013).
The second file contains a list of the form “i Di” where i is the anonymous ID of an individual and Di the name of his/her department in the workplace.

Kenyan households contact network

Release data: Jun 20, 2016

This dataset contains the full list of contacts measured between members of 5 households of rural Kenya between April 24  and May 12, 2012.

Results from the analysis of this dataset have been published in EPJ Data Science, 5(1), 1-21 (2016).

Each file in the downloadable package contains a comma-separated list representing each measured contact between any two household members (member 1 and member 2) over three days of experiment. The first file stores the contacts recorded between members of the same household, the second file stores the contacts between members of different households.

Each line has the form: “h1, m1, h2, m2, age1, age2, sex1, sex2, duration, day, hour”, where:

  • h1 is the household of member 1; h1=[L, F, E, B, H]
  • m1 is the anonymous ID number of member 1;
  • h2 is the household of member 2; h2=[L, F, E, B, H]
  • m2 is the anonymous ID number of member 2;
  • age1 is the age of member 1; age1 = [0, 1, 2, 3, 4]
  • age2 is the age of member 2; age2 = [0, 1, 2, 3, 4]
  • sex1 is the gender of member 1; sex1 = [F, M]
  • sex2 is the gender of member 2; sex2 = [F, M]
  • duration is the duration of the contact event in seconds;
  • day is the day of experiment; day = [1, 2, 3]
  • hour is the day time of the contact event; hour = [7 – 20]

A more detailed description of the variables is available in the variable dictionary file.

 

Primary school temporal network data

Release data: Sep 30, 2015

This data set contains the temporal network of contacts between the children and teachers used in the study published in BMC Infectious Diseases 2014, 14:695. The file contains a tab-separated list representing the active contacts during 20-second intervals of the data collection. Each line has the form “t i j Ci Cj”, where i and j are the anonymous IDs of the persons in contact, Ci and Cj are their classes, and the interval during which this contact was active is [ t – 20s, t ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds.

High school contact and friendship networks

Release data: Jul 15, 2015

These data sets correspond to the contacts and friendship relations between students in a high school in Marseilles, France, in December 2013, as measured through several techniques.

-The first data set gives the contacts of the students of nine classes during 5 days in Dec. 2013, as measured by the SocioPatterns infrastructure. The file contains a tab-separated list representing the active contacts during 20-second intervals of the data collection. Each line has the form “t i j Ci Cj“, where i and j are the anonymous IDs of the persons in contact, Ci and Cj are their classes, and the interval during which this contact was active is [ t – 20s, t ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds.

-The second data set corresponds to the directed network of contacts between students as reported in contact diaries collected at the end of the fourth day of the data collection. Each line has the form “i j w”, meaning that student i reported contacts with student j of aggregate durations of (i) at most 5 min if w = 1, (ii) between 5 and 15 min if w = 2, (iii) between 15 min and 1 h if w = 3, (iv) more than 1 h if w = 4.

-The third data set corresponds to the directed network of reported friendships. Each line has the form “i j”, meaning that student i reported a friendship with student j.

-The fourth data set corresponds to the list of pairs of students for which the presence or absence of a Facebook friendship is known. Each line has the form “i j w”, where w=1 means that students i and j are linked on Facebook, while w=0 means that they are not.

-Finally the metadata file contains a tab-separated list in which each line of the form “i Ci Gi” gives class Ci and gender Gi of the person having ID i.

High school dynamic contact networks

Release data: Aug 24, 2014

These datasets contain the temporal network of contacts between students in a high school in Marseilles, France. The first dataset gives the contacts of the students of three classes during 4 days in Dec. 2011, and the second corresponds to the contacts of the students of 5 classes during 7 days (from a Monday to the Tuesday of the following week) in Nov. 2012.

Each Contact list file contains a tab-separated list representing the active contacts during 20-second intervals of the data collection. Each line has the form “t i j Ci Cj“, where i and j are the anonymous IDs of the persons in contact, Ci and Cj are their classes, and the interval during which this contact was active is [ t – 20s, t ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds.

Each metadata file contains a tab-separated list in which each line of the form “i Ci Gi” gives class Ci and gender Gi of the person having ID i.

Hospital ward dynamic contact network

Release data: Sep 14, 2013

This dataset contains the temporal network of contacts between patients, patients and health-care workers (HCWs) and among HCWs in a hospital ward in Lyon, France, from Monday, December 6, 2010 at 1:00 pm to Friday, December 10, 2010 at 2:00 pm. The study included 46 HCWs and 29 patients.

The file contains a tab-separated list representing the active contacts during 20-second intervals of the data collection. Each line has the form “t i j Si Sj“, where i and j are the anonymous IDs of the persons in contact, Si and Sj are their statuses (NUR=paramedical staff, i.e. nurses and nurses’ aides; PAT=Patient; MED=Medical doctor; ADM=administrative staff), and the interval during which this contact was active is [ t – 20s, t ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds.

 

Infectious SocioPatterns dynamic contact networks

Release data: Nov 28, 2011

This dataset contains the daily dynamic contact networks collected during the Infectious SocioPatterns event that took place at the Science Gallery in Dublin, Ireland, during the artscience exhibition INFECTIOUS: STAY AWAY. Each file in the downloadable package contains a tab-separated list representing the active contacts during 20-second intervals of one day of data collection.  Each line has the form “t i j“, where i and j are the anonymous IDs of the persons in contact, and the interval during which this contact was active is [ t – 20s, t ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds and expressed in UNIX ctime.

This dataset is the dynamic counterpart of the daily cumulated contact networks available here.

Hypertext 2009 dynamic contact network

Release data: Oct 28, 2011

This dataset was collected during the ACM Hypertext 2009 conference, where the SocioPatterns project deployed the Live Social Semantics application. Conference attendees volunteered to wear radio badges that monitored their face-to-face proximity. The dataset published here represents the dynamical network of face-to-face proximity of ~110 conference attendees over about 2.5 days. No personal data are released here, and no metadata collected by the Live Social Semantics application are exposed. We provide two data files, described below.

Contact List. This is a tab-separated list representing the active contacts during 20-second intervals of the data collection. Each line has the form “t i j“, where i and j are the anonymous IDs of the persons in contact, and the interval during which this contact was active is [ – 20s, ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds since 8am on Jun 29th 2009 (UNIX ctime 1246255200).

Contact Intervals. This file is in JSON format and contains a dictionary. Each key is a person ID and the corresponding value is a dictionary of neighbors of that person in the contact network. This dictionary of neighbors has person IDs as keys and, for each key, the value gives the list of time intervals during which the corresponding contact was active. Time is measured as above.

 

Primary school – cumulative networks

Release data: Aug 27, 2011

Annotated cumulative network of first day.This dataset is part of our study of contact networks in a primary school, as reported in the paper High-Resolution Measurements of Face-to-Face Contact Patterns in a Primary School. The dataset comprises two weighted networks of  face-to-face proximity between students and teachers. For each day of the study, a daily contact network is provided: nodes are individuals and edges represent face-to-face interactions.  Nodes have two attributes: classname that indicates the school class and grade of the corresponding individual, and gender. Teachers are all assigned to the “Teachers” class. Edges between A and B have two weights associated with them: duration, which is the cumulative time spent by A and B in face-to-face proximity, over one day, measured in seconds (multiples of 20 seconds); and count, which is the number of times the A-B contact was established during the school day. The networks are provided as two GEXF files, one per day of the study, which can be loaded directly into Gephi. These GEXF files contain the same data provided in the supplementary information of the paper above.

 

Infectious SocioPatterns

Release data: Mar 31, 2011

Graph visualization of two daily cumulative networks.This first dataset contains the daily cumulated networks represented in the Infectious SocioPatterns visualization. The downloadable package contains one gml (Graph Modelling Language) file for each of the sixty-nine covered days. The nodes represent visitors of the Science Gallery while the edges represent close-range face-to-face proximity between the concerned persons. The weights associated with the edges are the number of 20 seconds intervals during which close-range face-to-face proximity has been detected. Note that the same node ids are used in successive days for simplicity, but they naturally correspond to different visitors as each visitor was present only on one day.

For more details on the data collection and processing please see our paper What’s in a crowd? Analysis of face-to-face behavioral networks.

 

NEWS

New paper in Nature Communications

We have published a new paper in Nature Communications. In this paper, we consider the issue of how to correctly inform numerical models of the propagation of infectious diseases when only partial information on the contacts of individuals is available, due to population sampling.
Indeed, the coverage of the population in many measures of detailed contact networks is incomplete, and this yields a systematic underestimation of epidemic risk if the data is used without precaution. Here, we introduce a method to compensate for this systematic bias and obtain accurate evaluations of epidemic risk using incomplete data. To this aim, we have developed an algorithm that effectively fills in the gaps of the empirical data with a realistic picture of the missing contact network. Although the obtained surrogate contacts are different from the actual missing contacts, using them in the simulation of an influenza-like process gives an accurate estimation of what would have been obtained on using complete data. It is therefore possible to have a good estimation of the epidemic risk, even if a substantial fraction of the contacts are missing from the empirical data.


New paper and new data available!

We have just published a new paper in PLoS ONE. In this paper, we present a detailed comparison between various types of data describing contacts and relationships between students in a high school: data collected from wearable sensors, data from contact diaries and data from surveys in which students were asked to name their friends.

We release all the corresponding data both in the Supplementary Information of the paper and in the SocioPatterns page dedicated to data.

journal.pone.0136497.g005

SocioPatterns at a full-scale emergency response exercise

On May 27t the SocioPatterns platform was deployed to track and analyze the interactions of people, objects and spaces during a full-scale exercise organized by the CRIMEDIM Research Center in Emergency and Disaster Medicine in collaboration with the Italian Army and a number of other partners, including the ISI Foundation. The exercise involved almost 500 people, a ROLE2+ military field hospital, 2 primary care centers, 8 ambulances and a coordination center. The tracking system featured fully-distributed recording of the interactions between people, ambulances, hospital rooms and equipment, with real-time monitoring of the hospital workflow and live views in the operations center.
emdm_ticino_2015

 

SUPPORTED BY

ISI Foundation logo  CNRS logo