Archiving digital content

Archiving options for research data 

The following table provides University of Sydney managed options for long-term preservation of research data.  

Archiving option

Size limit

Access

Accepts sensitive data?

How to deposit data

Sydney eScholarship Repository

5GB 

Open 

No 

How to submit

Research Data Store

As arranged 

Nominated team members 

Yes – if encrypted

Research Dashboard (DashR)

Records Online

50GB 

Nominated team members 

Yes 

ICT Self Service Portal

The University also keeps a list of research data platforms that can be used for storing your data for active projects. 

Archiving eNotebooks 

For guidance on how to archive data from an eNotebook, see Archiving completed eNotebooks

Organise your files 

Give your files meaningful names and organise your content in a logical file structure. This is useful for active files that you are currently working with, and it is essential before publishing or archiving your content. 

A well-organised folder structure can save you time and help other users to discover, understand and use your digital content. 

Key considerations 

  • Keep any raw data in a separate folder from working data.
  • Store any consent forms separately for ethics and privacy reasons. 
  • Nest your folders in the direction that best suits how you plan to use them, e.g. Location > Method > Date or Method > Date > Location.Don’t create too many empty folders ahead of time. 
  • Don’t manually change or delete file extensions (e.g. .docx, .pdf, .csv). 
  • Avoid using special characters  (e.g. \ / : * ? " < > |), apart from hyphens and underscores, in file names. 
  • Don’t make file names too long. 

Helpful guidelines 

File naming 

Select an appropriate naming convention for your files as early as possible and follow it consistently throughout your research or work.  

Make sure your naming convention is documented in a README file before archiving or publishing your content. 

Helpful guidelines 

File formats 

Save your work in an appropriate file format to ensure long-term accessibility. The file formats you use when working with your data may not be appropriate for archiving or publishing purposes.  

You should think about capturing data or converting files into formats that are: 

  • widely used within your discipline 
  • publicly documented, i.e. the complete file specification is publicly available 
  • open and non-proprietary 
  • endorsed by standards agencies such as the International Organisation for Standardization (ISO) 
  • self-documenting, i.e. the file itself can include useful metadata 
  • unencrypted 
  • uncompressed or that use lossless compression.

Recommended long-term formats 

The following table provides general suggestions for file formats that enable long-term preservation of digital content.  

Format category

Preservation format options

Archive 

  • ZIP File Format (.zip) 

Audio 

  • PCM Wave Format (.wav) – minimum 16bit/44.1kHz 
  • Broadcast Wave Format (.bwf) - minimum 24bit/48kHz 

Images 

  • Tagged Image File Format (.tif, .tiff) 

Tabular datasets 

  • Comma Separated Values (.csv) 
  • Microsoft Excel (.xlsx) 

Text 

  • Plain Text (UTF-8) (.txt) 
  • PDF/A (.pdf) 
  • PDF/A-3 (.pdf) 

Video 

  • Audio Video Interleave (.avi) – uncompressed 
  • MPEG-4 (.mp4) - CODEC: ProRes, H.264, AVC, audio: stereo AAC 
  • MOV (.mov) - CODEC: ProRes, H.264, AVC, audio: stereo AAC 
  • JPEG2000 OP1a MXF (.mxf) 
  • FFV1 Matroska (.mkv) 

For more specific recommendations, please contact the Library Digital Collections team. We can provide advice on:  

  • which file formats to use for long-term preservation, access, and sharing with collaborators 
  • when and how to convert to these formats. 

Helpful guidelines 

Metadata 

Metadata provides vital contextual information that will help others find, understand and reuse your content. Metadata may include:  

  • title 
  • creator(s) 
  • date produced/collected  
  • location 
  • abstract 
  • subject 
  • method 
  • process 
  • quality 
  • format 
  • rights 
  • ownership. 

Consider using a metadata standard (also called a metadata schema), which is a defined set of fields that can either be general or discipline specific. Using a metadata standard will not only provide a rich description of your content, but also increase the likelihood of people finding it. 

Helpful guidelines 

Creating documentation 

Once you’ve decided what metadata you need to collect and keep, you should record this information and store it with the digital content.  

Some storage systems, like the University’s eNotebook, provide mechanisms for you to do this when you save your content.  

In other storage systems, like the Research Data Store, you may have to record your metadata manually in a README document or using a version control table (.docx template). 

Helpful guidelines 

Retention periods 

Some digital content will have a legally mandated retention period. This is the minimum amount of time that you need to keep the content after its creation or the completion of a project.  

Retention periods vary depending on the content and the context in which it was created, and you must keep your data for at least as long as the legal retention requirement.  

  • Contact

    For help with publishing data, contact Sydney eScholarship support. For help with digital preservation, contact the Digital Collections team.

    Contact Sydney eScholarship support Contact Digital Collections team