Data Management for Research
The Pitfalls of Spreadsheets in Data Management
Good data management should save you time, be efficient and ensure high quality data. It should also assist to prevent data catastrophes including data loss, non-compliance with legislation, breaches of privacy, confidentiality and research integrity. Generally, spreadsheets (whether Excel, Google Sheets, or other) can be poor tools for data management leaving researchers vulnerable to the above data catastrophes without supporting high quality data collection.
• Data quality (DQ) is generally lacking: Entering data into spreadsheets is easy, entering errors into spreadsheets is easier. The user interface makes it incredibly difficult not to introduce errors into a data collection.
• Resolving DQ issues are difficult: DQ checking is typically relegated to manual spot-checking, conditional formatting, sorting, filtering, and find-and-replacing. The result is an established pattern of data errors in spreadsheet-driven research.
• Capturing data complexity is challenging: Spreadsheets do not handle dates consistently. Text-fields, time-dependent data, and other complex data formats are also not well handled.
• Conditional or skip logic is difficult to handle and often inconsistent.
• Versioning and collaboration are complex, and prone to error: Versioning and multiple users means collaborating through spreadsheets frequently involves saving and re-saving versions, emailing files, and file names with dates and initials. This is almost guaranteed to introduce error and confusion into your data set.
• Project governance and access: Best practice means providing access to project data based on the Principle of least privilege (POLP) and using role based permissions to do so.
• Role-specific data access is not possible using spreadsheets, making enforcing robust governance and ethics approval commitments very difficult to implement.
• Security and Data sharing: Password protecting an excel spreadsheet, which is often the only security measure a research project uses is unreliable and can often be broken. Of more concern is the sharing of data via speareasheets in a non-secure, encrypted way such as through email
• Back up and storage: Use of spreadsheets with multiple copies and versions can often mean researchers save files to non secure hard drives that are not backed up or to portable devices which leave the researcher prone to loss of data through accident or misadventure and at risk of breaching privacy and confidentiality provisions.
Avoiding the Headaches of Spreadsheets
What we use for data management is often determined by our budgets, what is available and our personal familiarity with a tool. In clinical research, in a busy clinical setting this often equates to researchers using Excel or another spreadsheet tool. It is available, free and familiar to many.
Although spreadsheets have their place including for creating good graphs and tables, other tools offer stress-free, scalable data management. The SLHD provides one such tool in the REDCap data management software. It is provided free of charge to researchers and offers a secure, encrypted database option with regular back-ups which are stored within the SLHD IM&TD environment. It has a wide variety of features and functions to support high quality data collection and reduce the potential for the above mentioned data catastrophes.
More information can be found on the SLHD REDCap home page https://redcap.sswahs.nsw.gov.au