DATASETS

This page provides a collection of datasets obtained through the SocioPatterns sensing platform.

Infectious SocioPatterns dynamic contact networks

Release data: Nov 28, 2011

This dataset contains the daily dynamic contact networks collected during the Infectious SocioPatterns event that took place at the Science Gallery in Dublin, Ireland, during the artscience exhibition INFECTIOUS: STAY AWAY. Each file in the downloadable package contains a tab-separated list representing the active contacts during 20-second intervals of one day of data collection.  Each line has the form “t i j“, where i and j are the anonymous IDs of the persons in contact, and the interval during which this contact was active is [ t - 20s, t ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds and expressed in UNIX ctime.

This dataset is the dynamic counterpart of the daily cumulated contact networks available here.

Hypertext 2009 dynamic contact network

Release data: Oct 28, 2011

This dataset was collected during the ACM Hypertext 2009 conference, where the SocioPatterns project deployed the Live Social Semantics application. Conference attendees volunteered to wear radio badges that monitored their face-to-face proximity. The dataset published here represents the dynamical network of face-to-face proximity of ~110 conference attendees over about 2.5 days. No personal data are released here, and no metadata collected by the Live Social Semantics application are exposed. We provide two data files, described below.

Contact List. This is a tab-separated list representing the active contacts during 20-second intervals of the data collection. Each line has the form “t i j“, where i and j are the anonymous IDs of the persons in contact, and the interval during which this contact was active is [ - 20s, ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds since 8am on Jun 29th 2009 (UNIX ctime 1246255200).

Contact Intervals. This file is in JSON format and contains a dictionary. Each key is a person ID and the corresponding value is a dictionary of neighbors of that person in the contact network. This dictionary of neighbors has person IDs as keys and, for each key, the value gives the list of time intervals during which the corresponding contact was active. Time is measured as above.

 

Primary school – cumulative networks

Release data: Aug 27, 2011

Annotated cumulative network of first day.This dataset is part of our study of contact networks in a primary school, as reported in the paper High-Resolution Measurements of Face-to-Face Contact Patterns in a Primary School. The dataset comprises two weighted networks of  face-to-face proximity between students and teachers. For each day of the study, a daily contact network is provided: nodes are individuals and edges represent face-to-face interactions.  Nodes have an attribute classname that indicates the school class and grade of the corresponding individual. Teachers are all assigned to the “Teachers” class. Edges between A and B have two weights associated with them: duration, which is the cumulative time spent by A and B in face-to-face proximity, over one day, measured in seconds (multiples of 20 seconds); and count, which is the number of times the A-B contact was established during the school day. The networks are provided as two GEXF files, one per day of the study, which can be loaded directly into Gephi. These GEXF files contain the same data provided in the supplementary information of the paper above.

Infectious SocioPatterns

Release data: Mar 31, 2011

Graph visualization of two daily cumulative networks.This first dataset contains the daily cumulated networks represented in the Infectious SocioPatterns visualization. The downloadable package contains one gml (Graph Modelling Language) file for each of the sixty-nine covered days. The nodes represent visitors of the Science Gallery while the edges represent close-range face-to-face proximity between the concerned persons. The weights associated with the edges are the number of 20 seconds intervals during which close-range face-to-face proximity has been detected. Note that the same node ids are used in successive days for simplicity, but they naturally correspond to different visitors as each visitor was present only on one day.

For more details on the data collection and processing please see our paper What’s in a crowd? Analysis of face-to-face behavioral networks.

NEWS

New paper in BMC Infectious Diseases

We have just published a new paper in BMC Infectious Diseases. We use SocioPatterns data collected in a hospital ward to ask which representations of contact data work best to inform models of disease spread. We show that the commonly used contact matrix representation fails to reproduce the size of the epidemic obtained using the high-resolution contact data and also fails to identify the most at-risk classes. We introduce a contact matrix of probability distributions that takes into account the heterogeneity of contact durations between (and within) classes of individuals, and we show that, in the case study presented, this representation yields a good approximation of the epidemic spreading properties obtained by using the high-resolution data.

New research using SocioPatterns data

A new manuscript describing research done using SocioPatterns data collected in two jointly organized conferences is available here.

SocioPatterns featured in Scientific American Graphic Science column

Scientific American used our data to display contact networks in a hospital ward in the November 2012 Graphic Science column. The interactive visualization can be seen here.

 

SUPPORTED BY

ISI Foundation logo  CNRS logo