Examining Data Processing Work as Part of the Scientific Data Lifecycle: Comparing Practices Across Four Scientific Research Groups


Data processing is work that scientists must undertake in order to make data useful for analyses, and is a key component of twenty-first century scientific research. The analysis of scientific data is contingent upon the successful collection or production and then processing of data. This qualitative research study, of four data-intensive research groups, investigates scientists engaging in data processing work practices to describe and analyze three distinctive but intertwined practices: cleaning data products, selecting a subset of a data product or assembling a new data product from multiple sources, and transforming data products into a common format. These practices are necessary for researchers to transform an initial data product in to one that is ready for scientific analysis. This research finds that data processing work requires a high level of scientific and technical competence that does not merely set up analyses, but also often shapes and is shaped by iterations of research designs and research questions themselves.

In iConference 2015 Proceedings, iSchools.