Citing big data
Posted: Sun Feb 09, 2025 5:31 am
Data sharing places responsibilities on researchers who plan to re-use existing data to provide “full and appropriate acknowledgement, via citation”, as outlined by the ESRC Research Data Policy (2010). The Research Councils UK Common Principles on Data Policy, states: “all users of research data should acknowledge the sources of their data and abide by the terms and conditions under which they are accessed.”
Proper citation is also important because it helps maximise the impact of research. My colleague, Dr Victoria Moody, UK Data Service Communications and Impact Director, when announcing the launch germany rcs data of the UK Data Service’s #CiteTheData campaign, said “The citation of research data (and metadata) can support the understanding and promotion of research impact through the tracking of the use of data in research and on into policy and product development, influencing decisions about public and commercial spending and service provision.”
So, a researcher needs to cite the data – sounds simple. However, big data, being being larger and/or more complex than traditional datasets, present challenges. For example, data sourced from Twitter may be attributed to a particular Twitter account that does not include the author’s real name. To cite a Tweet correctly, the citation must include both the Twitter account name and the author’s real name (the Modern Language Association provides guidelines on how to cite a Tweet, while the American Psychological Association blog also provides recommendations and examples for citing social media sources). One also needs to identify the URL of that tweet – Twitter outlines how to find the URL of an individual Tweet so it can be linked to individually (many people are unaware that all Tweets have an individual URL that provides the exact time and date that the Tweet was posted and also the amount of times that it has been favourited and re-tweeted).
Proper citation is also important because it helps maximise the impact of research. My colleague, Dr Victoria Moody, UK Data Service Communications and Impact Director, when announcing the launch germany rcs data of the UK Data Service’s #CiteTheData campaign, said “The citation of research data (and metadata) can support the understanding and promotion of research impact through the tracking of the use of data in research and on into policy and product development, influencing decisions about public and commercial spending and service provision.”
So, a researcher needs to cite the data – sounds simple. However, big data, being being larger and/or more complex than traditional datasets, present challenges. For example, data sourced from Twitter may be attributed to a particular Twitter account that does not include the author’s real name. To cite a Tweet correctly, the citation must include both the Twitter account name and the author’s real name (the Modern Language Association provides guidelines on how to cite a Tweet, while the American Psychological Association blog also provides recommendations and examples for citing social media sources). One also needs to identify the URL of that tweet – Twitter outlines how to find the URL of an individual Tweet so it can be linked to individually (many people are unaware that all Tweets have an individual URL that provides the exact time and date that the Tweet was posted and also the amount of times that it has been favourited and re-tweeted).