Archiving options for research data
The following table provides University of Sydney managed options for long-term preservation of research data.
The University also keeps a list of research data platforms that can be used for storing your data for active projects.
Archiving eNotebooks
For guidance on how to archive data from an eNotebook, see Archiving completed eNotebooks.
Organise your files
Give your files meaningful names and organise your content in a logical file structure. This is useful for active files that you are currently working with, and it is essential before publishing or archiving your content.
A well-organised folder structure can save you time and help other users to discover, understand and use your digital content.
Key considerations
- Keep any raw data in a separate folder from working data.
- Store any consent forms separately for ethics and privacy reasons.
- Nest your folders in the direction that best suits how you plan to use them, e.g. Location > Method > Date or Method > Date > Location.Don’t create too many empty folders ahead of time.
- Don’t manually change or delete file extensions (e.g. .docx, .pdf, .csv).
- Avoid using special characters (e.g. \ / : * ? " < > |), apart from hyphens and underscores, in file names.
- Don’t make file names too long.
Helpful guidelines
File naming
Select an appropriate naming convention for your files as early as possible and follow it consistently throughout your research or work.
Make sure your naming convention is documented in a README file before archiving or publishing your content.
Helpful guidelines
File formats
Save your work in an appropriate file format to ensure long-term accessibility. The file formats you use when working with your data may not be appropriate for archiving or publishing purposes.
You should think about capturing data or converting files into formats that are:
- widely used within your discipline
- publicly documented, i.e. the complete file specification is publicly available
- open and non-proprietary
- endorsed by standards agencies such as the International Organisation for Standardization (ISO)
- self-documenting, i.e. the file itself can include useful metadata
- unencrypted
- uncompressed or that use lossless compression.
Recommended long-term formats
The following table provides general suggestions for file formats that enable long-term preservation of digital content.
Format category
|
Preservation format options
|
Archive
|
|
Audio
|
- PCM Wave Format (.wav) – minimum 16bit/44.1kHz
- Broadcast Wave Format (.bwf) - minimum 24bit/48kHz
|
Images
|
- Tagged Image File Format (.tif, .tiff)
|
Tabular datasets
|
- Comma Separated Values (.csv)
- Microsoft Excel (.xlsx)
|
Text
|
- Plain Text (UTF-8) (.txt)
- PDF/A (.pdf)
- PDF/A-3 (.pdf)
|
Video
|
- Audio Video Interleave (.avi) – uncompressed
- MPEG-4 (.mp4) - CODEC: ProRes, H.264, AVC, audio: stereo AAC
- MOV (.mov) - CODEC: ProRes, H.264, AVC, audio: stereo AAC
- JPEG2000 OP1a MXF (.mxf)
- FFV1 Matroska (.mkv)
|
For more specific recommendations, please contact the Library Digital Collections team. We can provide advice on:
- which file formats to use for long-term preservation, access, and sharing with collaborators
- when and how to convert to these formats.
Helpful guidelines
Metadata
Metadata provides vital contextual information that will help others find, understand and reuse your content. Metadata may include:
- title
- creator(s)
- date produced/collected
- location
- abstract
- subject
- method
- process
- quality
- format
- rights
- ownership.
Consider using a metadata standard (also called a metadata schema), which is a defined set of fields that can either be general or discipline specific. Using a metadata standard will not only provide a rich description of your content, but also increase the likelihood of people finding it.
Helpful guidelines
Creating documentation
Once you’ve decided what metadata you need to collect and keep, you should record this information and store it with the digital content.
Some storage systems, like the University’s eNotebook, provide mechanisms for you to do this when you save your content.
In other storage systems, like the Research Data Store, you may have to record your metadata manually in a README document or using a version control table (.docx template).
Helpful guidelines
Retention periods
Some digital content will have a legally mandated retention period. This is the minimum amount of time that you need to keep the content after its creation or the completion of a project.
Retention periods vary depending on the content and the context in which it was created, and you must keep your data for at least as long as the legal retention requirement.