Page 1 of 1

Nowhere near enough methodological work has

Posted: Sun Feb 09, 2025 5:20 am
by asimd23
Most big data sources do not result from any explicit design considerations (for research) but are reflections of what has already been measured. Consequently they will often over- and under-represent different groups within the population in different and complex ways. For example, the very young and the very elderly do not and will not engage with the digital economy in the same way as young adults – they therefore may leave very incomplete traces in the big datasets. All ‘big data’ has the possibility of suffering from some sort of bias which has the potential to affect results.

Big data may offer exciting insights on topics which may previously have been hard to measure, but also enormous opportunities to misrepresent and misunderstand important social and economic europe rcs data questions. yet been done to bridge this gap. The nature of some of these data and the allied techniques used to analyse them are such that validation and research integrity may be extraordinarily hard to perform. Two statisticians may be able to take the same raw survey and end up with the same result; two data scientists doing the same with big data will almost certainly not. The golden thread of integrity from data to policy (evidence-based policy) has the potential to be broken. Again there has not been enough work done to bridge the gap between the two.

A further risk is that ‘traditional’ sources of socio-economic data become less reliable and/or less accessible. The consultation for the 2021 census proposes the exploration of administrative data (really no more than another term for big data) to supplement the 2021 census. The exploration of this topic is warmly welcomed, but these sources should not be seen as the magic bullet to reduce costs. Obviously data collection costs might fall, but data analysis costs will rise, and the richness of available data becomes diluted.