How Can Public Authorities Use Big Data to Serve the General Interest?

The author thanks Edgar Gastón Jacobs, Law Projects Coordinator at SKEMA Brazil and Claude Revel Director of SKEMA PUBLIKA for their inputs.

A Vital Need to Gather More Information

Among the range of capabilities that a State needs to function effectively and shape development outcomes, information is a key input. Without reliable information on the location and activities of economic agents over its territory, a State cannot easily exert its coercive, fiscal, and administrative functions. Data have even been described as the lifeblood of decision making.

Many countries have become aware of the vital need to gather more information, but they still suffer from “statistical anaemia”. One way of understanding this is to seek the answer to the following question: what is the average time elapsed between observations of a given household in nationally representative expenditure or wealth surveys? According to a recent study, the average revisit interval is considerably less than one in 10 years in the United States and one in 1,000 years in Africa! The same study suggests an alternative and much higher-frequency source of data: satellite imagery. Tapping new, technology-led, sources of information means entering the realm of big data.

What Exactly Are Big Data?

The International Monetary Fund 5Vs

In a 2017 comprehensive survey, the IMF clarifies the concept of big data by declining it into five Vs: high-Volume, high-Velocity, high-Variety but also concerns about Veracity and Volatility. In other words, big data are about massive amount of data, available at a high frequency and for a large range of topics but which may not precisely identify the latent topic of interest and may be highly influenced by technological changes and uses. The IMF also provides a classification of big data, which can be generated by social networks, businesses, public agencies, or sensors (such as weather sensors or GPS, etc.).

*Figure 1: Big data classification.* Source.

The World Bank Typology

Another way of thinking about big data is to adopt the typology suggested by the World Bank in its 2021 World Development Report Data for Better Lives. Data may be traditional or new, generated for public or commercial purposes (Figure 2). New data can be understood as deriving from new tools but also uses which deviate from the initial purpose of data collection, e.g. sentiment analysis of Twitter data.

*Figure 2 : Typology of big data.* Source.

Worries about how big data could be a political and economic instrument of dominance ought to be taken seriously. Big data could lead to surveillance capitalism, and possibly to a new form of totalitarianism. As argued by Acemoglu and Robinson, there is a narrow corridor between a strong State and an empowered society, where the State is powerful enough to foster economic development but remains under the control of society.

Data which Deserve Special Care

Some data, notably relating to health, deserve special care. Creating a risk-based taxonomy from what was originally treated as “sensitive data” in healthcare could be an answer. From that perspective, the collection and treatment of big data (e.g. biometric data), in the fields of medicine and public health need to respect individuals’ autonomy, achieve equity, and protect privacy. How de we ensure that participants in a research project exploiting big data have given their consent? How do we correct for the potential biases due to training of algorithm on non-representative sources of data? How hard is it to recover the identity of participants from supposedly de-identified (made anonymous) big data?

Stares are clearly concerned about protecting the neural data of their citizens, taken and processed by neural devices or digital tools. Today, such data can be used in marketing activities, political campaigns, health profiling, and even in legal proceedings. In Brazil, a bill includes this issue in the Data Protection Law, in Chile an amendment to the Constitution protects ‘neurorights’ and the Council of Europe published a detailed report on the issue.

States to Use Big Data for Economic and Social Development: Examples

Big Data and Covid-19: An Answer in Times of Crisis

The Covid-19 pandemic has required the implementation of rapid economic and health measures. More precisely, answers were required to the following questions: what was happening to economic activity and how mobility reacted to communication campaigns and lockdown policies? Governments did not have the luxury to wait for official statistics or conduct population surveys. However, big data helped to provide complementary and new indicators. Using Google Trends data and machine learning data techniques, the OECD has developed a weekly tracker of economic activity, allowing the nowcasting of the state of the economy; one could also have used data on payment card transactions to obtain a granular perspective on daily spending by industry and textual analysis to assess uncertainty. Meanwhile, the geolocation associated with mobile networks and platform data allowed to detect in real time the mobility patterns of the population (see also).

*Figure 3: Weekly tracker of economic activity in France*. Source.

Lastly, satellite data highlighted that high pollution is, unfortunately, an indicator of economic activity. For example, levels of nitrogen dioxide in the troposphere substantially increased in China once lockdown measures were loosened.

A ‘View from Above’ to Provide Services to the People Most in Need

As stressed in the introduction, many countries do not have the financial and administrative capacities to conduct frequent, rigorous, population surveys. Said differently, how do you help and provide public services to the people the most in need if you do not know where they are or the evolution of their living conditions? One solution is to take the view from above, in combination with traditional data. In a nutshell, algorithms (e.g. convoluted neural networks) can be trained to ‘recognise’ the features of population or poverty (e.g. type of buildings and structures) in daytime and night-time satellite imageries. Maps have thus been created, at 30 metre resolution, which indicates where the population is located as well as its demographic distribution (e.g. age or gender). This information is crucial to plan the allocation of resources related to health, education, and infrastructure. Turning to living conditions, this study (see also) shows how a detailed map of poverty in Nigeria can be generated (Figure 4).

*Figure 4: Detailed poverty map of Nigeria using satellite data and AI.* Source.

Obviously, satellite data allow for an unlimited range of applications if the unit of interest is visible by the sensors. For example, it allows the tracking of deforestation, which contributes to carbon emissions, biodiversity losses, and the emergence of infectious diseases.

In many respects, the use of satellite data and artificial intelligence in research epitomises how big data draws on and complements public initiatives. Satellite data are freely available online (e.g. the NASA/USGS Landsat Program), algorithms are trained using the (also freely available Demographic and Health Surveys, and code and results are often publicly available (e.g. on mapping poverty in Africa). Furthermore, satellite data have many desirable features (e.g. consistency, reliability, transparency, longevity) making Earth observation a tool in the service of sustainable development.

Big Data to Improve State Accountability

Finally, big data can improve State accountability. The official growth and inflation statistics of some countries seem sometimes ‘too good to be true’. An epitomising example is the case of Argentina’s official inflation rate, which seemed disconnected from practical experience. The Billions Prices Project ‘scrapped’ (collected) online prices on the websites of retailers and demonstrated that inflation was underestimated by 10–20 percentage points. The manipulation of the official price index ended with the election of a new government in 2015 but the Billions Prices Project demonstrates that civil society can gain access to the tools required to verify the credibility of government statistics. As the authors of the study emphasises, their methodology can be used in all countries and may reflect better the evolution of prices in a fast-changing environment, dominated by technological (e.g. new and improved products) but also exogenous shocks (e.g. a war affecting global supply chains).

Another way of thinking about big data as a disciplining device of the State is through access to websites dedicated to public transparency, as in Brazil. Thanks to this online information, citizens can audit the financial activities of their government in real time.

Can We Trust Governments Using Big Data?

Foster Well-being or Harm Society?

Big data offer huge opportunities but also some challenges and risks (Figure 5). Beyond the need for new tools and new skills to analyse these high-Volume, high-Velocity, high-Variety data, concerns remain about their Veracity and Volatility. As highlighted by the IMF or INSEE, some of the big data varieties are a by-product of commercial activities. There are no guarantees that the methodology, the quality, the (often non-representative) sample or public access will remain stable over time. For example, what happens to the consistency of the time series generated, in terms of content or uses, if a social platform tweaks its algorithm to emphasise some specific topics or attract a new segment of the population?

*Figure 5: Opportunities, challenges and risks of big data.* Source.

More broadly, the World Bank provides a conceptual framework which captures that big data are a dual technology (Figure 6) that can foster well-being but also harm society.

*Figure 6: Potential benefits and drawbacks of big data.* Source.

Noam Chomsky underlines that technology is basically neutral. It’s kind of like a hammer. The hammer doesn’t care whether you use it to build a house, or whether a torturer uses it to crush somebody’s skull. Likewise, a government can use data to make its citizens better off or to develop a surveillance society dominated by algorithmic governmentality. States and international organisations (e.g. OECD) often advocate for strong ethical guidelines governing the joint use of big data and AI, with respect to issues related to privacy, algorithmic transparency and bias, access and storage of data, regulation of platforms.

Given current trends, such an ethical framework is dearly needed. In her book, Weapons of Maths Destruction, Cathy O’Neil reminds us that governments, businesses, and politicians increasingly use big data and algorithms to observe, predict, and sometimes influence, our individual behaviour in all aspects of life (going to school, getting a job, getting a loan, getting healthcare, political choices…), while the AI Global Surveillance Index highlights that the world increasingly looks like a mix between Jeremy Bentham’s panopticon and Philip K. Dick’s minority report (Figure 7). Indeed, deployment of sensors, the use of facial recognition systems, and data-driven smart policing tools have become common in many countries.

However, beyond obvious privacy issues, which need to be considered in conjunction with efficacy and cost-saving issues, there is the risk that algorithms, notably in the realms of police and justice, may be perceived as more trustful, and therefore more aligned with the public good, because not tainted by human subjectivity. This would be a mistake. It has been widely documented that facial recognition systems (e.g. to identify potential suspects) or risk assessment instruments (e.g. to predict recidivism) are still very imperfect instruments that often perpetuate and exacerbate existing biases due to algorithms learning from, and repeating, historical patterns. It is still not clear how to reconcile algorithmic justice and human rights. At best, AI ought to augment human intelligence, not replace it. More broadly, the use of big data needs to be regulated in order, as argued by the European Commission that governments use big data solely in the public interest.

*Figure 7: Adoption of surveillance and predictive technologies.* Source.

There is also an inherent paradox when personal data are published for transparency purpose in the public sector. Information about citizens, notably those occupying a public office, may be made available to fight corruption and avoid conflict of interests. However the amount of data made available ought to follow strict proportionality principles in order to respect individual rights.

A Social Contract for Data

Leveraging big data for good requires thus, as argued by the World Bank, a social contract for data which aligns together value of data, trust, and equity (Figure 8). To achieve this new social contract, value needs to be extracted from big data and trust needs to be established.

Figure 8: Defining a new social contract in data-intensive societies. Source

The IMF stresses that countries will need to strengthen their national statistical agencies to make ‘big data speak’, by recruiting multidisciplinary teams, developing new statistical frameworks, and creating strategic partnerships with private providers. Some developing countries may also benefit from capacity development and technical assistance funded by official foreign aid. The OECD, in the context of its multidisciplinary project on New Sources of Growth: Knowledge-Based Capital, also points out that the evidence based on the transformative role of big data ought to broaden and improve, to generate policies which maximise the benefit-to-risk ratio.

Trust will be earned if governments can tangibly demonstrate that big data is used for the common good. In this regard, the case of Estonia should be deeply analysed. This country has massively invested in a user-centric e-government, with online access of many public services, joint public-private use of electronic identification, and the ability of citizens to monitor which public authorities access their data and for which purpose. Estonia also faced a dramatic cyberattack in 2007, which briefly paralysed the country. The rising occurrence of cyber operations sponsored by countries or criminals is a reminder that data sources can be breached, information can be manipulated, and the rights of citizens can be violated. Hence, big data will serve the public interest if a State ensures that its citizens are protected from internal and external threats through tighter domestic regulations and better cybersecurity protection. Keeping these caveats in mind, big data can help countries to fulfil their sustainable development goals, provided that these policies be subjected to rigorous principles that are still to be defined. As acknowledged in the UN resolution Transforming Our World: the 2030 Agenda for Sustainable Development, then, and only then, the proper use of big data will facilitate the measurement, monitoring, and reporting of progress towards the relevant targets, and therefore be a force for good.

The public use of big data only makes sense with political and civil society supervision. It is also crucial to remember some lessons from past experiences with the public use of data. For example, the historian Gregory Daddis tells how, in the context of the US-Vietnam war, in the process of data collection, the data had become an end unto itself. States can then work together, notably through their involvement in of multilateral and intergovernmental organisations (e.g. UN, World Bank, IMF, OECD) to develop common international principles governing big data in order to proactively steer its access and applications towards the purpose of inclusive growth. SKEMA PUBLIKA possesses the required international and impartial mindset as well as the necessary multidisciplinary competences to assist States and produce operational recommendations.