Typically, survey data includes various types of ‘missing’ codes, such as ‘don’t know’/’not applicable’/’not asked’. However, when it comes to big data, the reason why particular data are missing may not be clear – there may not be an accompanying list of codes to define the reasons. Exploratory data analysis techniques can find the missing responses, but it may take some more investigation to find out what type of ‘missing’ those responses are. Perhaps the category is ‘not applicable’ or perhaps india rcs data the data are just not included in the file. The overall analysis can be flawed if missing values are not properly defined, or the dataset is incomplete. In the social sciences, methodologists are currently exploring the best ways to deal with such deficiencies in data. As data publishers, it is our responsibility to indicate the quality of data and to point out any known deficiencies or obvious errors.
Sometimes there can be a big gap between the data and a researcher’s assertions. Enabling transparency of production ensures that replication of methodology is possible, which can back up a researcher’s findings, providing a form of quality control and validity. In order to replicate the methodology, it is important that the data are accompanied with a clear description of how the data were collected, stored, cleaned, and analysed. Our guide on depositing shareable survey data, provides clear step-by-step instructions are provided covering the planning of fieldwork (such as consent protocols by survey respondents), during fieldwork (such as ensuring appropriate anonymisation, consent forms), negotiating (identifying access and licensing requirements) and depositing (with the owner signing licence agreements).