File formats, University of Sydney Library

Skip to main content

File formats

Choose the best file formats for analysis, preservation and publishing

When preserving and publishing data it’s essential that all datasets are saved in an appropriate file format to ensure long-term accessibility of data. An appropriate and trustworthy data repository or archival storage should be selected to enhance the findability of your data, and to ensure it is stored safely and securely. For further assistance in selecting file formats, a data archive or publishing your research data, please contact the Research Data team.

The file formats you use when working with your data may not be appropriate for archiving or publishing purposes. You should think about capturing data or converting files into formats that are:

  • widely used within your discipline
  • publicly documented, ie the complete file specification is publicly available
  • open and non-proprietary
  • endorsed by standards agencies such as the International Organisation for Standardization (ISO)
  • self-documenting, ie the file itself can include useful metadata
  • unencrypted
  • uncompressed or that use lossless compression

For example:

  • Quantitative research data
    • While data’s being collected and analysed you might need it in a number of different formats: an Excel spreadsheet, a database, an SPSS, SAS, R, MATLAB or other file format native to the specific data analysis software you are using.
    • Once the data’s been collected and the analysis performed, it’s best to save the data as a comma separated values (.csv) file for long-term storage, rather than leaving it in the native format of the program that was used to collect or analyse it. Most data software packages provide options for saving data as a comma separated values file. This format is portable across different computing and software platforms, so, if you publish your data, other researchers who want to use it in their own computing environments with their own software will find it easier to do so.

  • Image files
    • The uncompressed TIFF (Tagged Image File Format) is the best choice for long-term preservation of image files. Most image creation software packages provide options for saving images as TIFF files. You should save your image files in this format right from the outset, so that you capture the highest possible quality master image files.
    • While working with your images, you may need to manipulate, share, or embed them in other documents. For these sorts of purposes it may be useful to compress your image files into JPEG format so that they’re smaller and easier to send over the internet or embed in analysis project files.

The following table provides general suggestions for suitable file format choices for long-term preservation and for working data. For more specific recommendations, please contact the Research Data team for further advice on which file formats to use for long-term preservation, as well as when and how your data should be converted into these formats.

  • Archive
    Preservation Format(s)
    ZIP File Format (.zip)
  • Audio
    Preservation Format(s)
    Broadcast Wave Format (.wav)
  • Images
    Preservation Format(s)
    Tagged Image File Format (.tif, .tiff)
  • Tabular Datasets
    Preservation Format(s)
    Comma Separated Values (.csv)
    Microsoft Excel (.xlsx)
  • Text
    Preservation Format(s)
    Plain Text (UTF-8) (.txt)
    Portable Document Format (.pdf)
  • Video
    Preservation Format(s)
    Motion JPEG 2000 (.mj2)
    MPEG-4 (.mp4)

  • Audio
    Working Data Format
    MPEG-1 Audio Layer 3 (.mp3)
  • Images
    Working Data Format
    JPEG (.jpeg, .jpg)
  • Video
    orking Data Format
    MPEG-4 (compressed) (.mp4)

Further information