Data science is being hailed as the latest frontier in evidence-informed policy making. It’s the shiny new crayon for nearly every level of government from local councils to national policymakers. There is a near universal embrace of data’s potential to improve our day-to-day lives. This has led to calls for more open data, massive reviews on data sharing and the introduction of initiatives like the European Union Open Data Portal. Taglines like ‘opening up government’ from data.gov.uk are widespread. But alongside optimism for the potential social good of data science, there are genuine concerns about the privacy of personal data, transparency of data usage, and the democratic accountability of public agencies collecting, storing and using data. Over the past few months I have had the opportunity to shadow the organizers of the UK Government’s Public Dialogue on Data Ethics. The dialogue will include four sets of workshops designed to assess public attitudes on the ethical line between privacy and useful data science in policy.
What is data science?
Data science is the combination and application of data in new ways. This data is generally ‘big data’. It ranges from public information like Tweets and Facebook likes to personal data like credit card purchases and A&E records. Data is truly all around us and nearly every activity that can be recorded, is recorded. There are 7 billion people on earth and probably at minimum a dozen daily bits of data from each of these people. This has led to unimaginably large data sets that can be leveraged to inform policy. One innovative example is the combination of GP and mortality records with Facebook “likes” to monitor community health in data-poor regions of Florida (Gittelman, Lange, Crawford, Okoro, Lieb, Dhingra & Trimarchi, 2015). This, however, is a relatively simple example of the potential of ‘big data’.
Data Science is pushing the boundaries of the public-technology interface. Tech like automated decision making and machine learning potentially removes the role of a human actor in evaluating policy options. For example, these technologies could combine data on the frequency of cold & flu tablet purchases, GP visits related to cold symptoms, the pattern of sick days and Facebook comments related to feeling ill to create a live purchasing system for flu jabs. This could potentially result in a more accurate prediction of flu trends. Data science has the potential to reconfigure human interactions, social and technological. Although the machines aren’t taking over yet, data science, heavily pushed by corporates and policy analysts, will likely make some existing technologies and the jobs that go with them obsolete, just as it will create new ones. At its core, data science is about taking the data we already have and using it to understand our world better. For government that means creating more useful and potentially successful policy solutions.
So what’s this dialogue about?
At the moment data science is used to create more complex, faster and potentially more accurate evidence for policy (see https://www.gov.uk/guidance/open-policy-making-toolkit-data-science). While no one is going to support having slower and less accurate data, a larger public and democratic conversation about data in policy is warranted. Initiatives like Care.Data resulted in substantial public controversy over government data usage (Triggle, 2014). Data is not just numbers and statistics held in a supercomputer; it is a reflection and representation of the public. Therefore it should be subject to public consultation and democratic accountability, and compliant with human rights and data protection laws. Most academics, corporations and government bodies are now attempting to wade through what this means in practice. It includes new data ethics, reworked privacy laws, new forms of consent and better public engagement. Those first and last concerns are the focus of the Public Dialogue on Data Ethics.
The dialogue will involve four two-day workshops with groups of demographically diverse participants across England. The key objective is to evaluate how the public forms opinions about data projects. Alongside the workshops, an online survey will use conjoint analysis to piece together the underlying factors that contribute to the public’s views on data-driven policy. For example, is a project that combines health and consumer supermarket data acceptable if it targets people for health promotion initiatives? What if the same project evaluates eligibility for social services? Officials from the Office for National Statistics to the Ministry of Justice have contributed hypothetical case studies on data usage. This it is hoped will stimulate discussion on data in the context of real-world policy. The ultimate goal is to use the participants’ views to shape guidelines on data ethics for policymakers, as well as providing insights in to the current state of public knowledge on data-driven policy.
The dialogue and survey are a form of upstream public consultation and knowledge translation that has typified discourse on biotechnology and nanotechnology in the past decade. A key goal is to anticipate where forms of public controversy could emerge. Alongside this idea of data science acceptability, there are elements of deliberative democratic processes in the workshop. The plan to redevelop the current data ethics guidelines potentially frames the dialogue process as substantive. Under a substantive model, public engagement is used to improve governance and technology rather than just increasing public acceptance (Rowe & Frewe, 2005). For a true substantive process, there needs to be the intention for change to result from the consultation. The public’s opinions, or at least the opinions of those involved in the workshop, must be considered in the ethical guidelines. Whether the dialogue can balance the difficult goal of being both deliberative and consultative, remains to be seen.
Data science may be the poster child for government innovation but its use must be publicly legitimate – if for no other reason than the public both creates the data and is the proposed beneficiary of its use. The question isn’t how do we get more people to agree to data science but rather how can we improve data science by engaging our fellow citizens in it?
This blog post is the first of 3 parts. Look back here for a link to part 2 when available.
More about the dialogue:
Rowe, G., Frewer, L. J., & Frewer, L. J. (2015). All use subject to JSTOR Terms and Conditions A Typology of Public Engagement Mechanisms, 30(2), 251–290. http://doi.org/10.1177/0162243904271724
Gittelman, S., Lange, V., Gotway Crawford, C. A., Okoro, C. A., Lieb, E., Dhingra, S. S., & Trimarchi, E. (2015). A New Source of Data for Public Health Surveillance : Facebook Likes, 17(4), 1–12. http://doi.org/10.2196/jmir.3970
Triggle, N. (2014). Care.data: How did it go so wrong? BBC News. Retrieved from www.bbc.co.uk/news/health-26259101
Emily Rempel is an interdisciplinary PhD student in the Department of Psychology and the Institute for Policy Research who is exploring the role of public engagement and data science in UK policy making. She is working with the Cabinet Office’s Government Digital Service on the Public Dialogue on Data Science Ethics.