Skip to main content

File formats

Choose the best file formats for analysis, preservation and sharing
Share this page

When preserving data it’s essential that all datasets are saved in an appropriate file format to ensure long-term accessibility of data. An appropriate and trustworthy archival storage should be selected to enhance the findability of your data, and to ensure it is stored safely and securely. For further assistance in selecting file formats or a data archive, please contact the Research Data team.

The file formats you use when working with your data may not be appropriate for archiving purposes. You should think about capturing data or converting files into formats that are:

  • widely used within your discipline
  • open and non-proprietary
  • publicly documented or self-documenting
  • endorsed by standards agencies

You should also consider choosing file formats that are:

  • unencrypted
  • uncompressed or that use lossless compression

For example:

  • Quantitative research data
    • While data’s being collected and analysed you might need it in a number of different formats: an Excel spreadsheet, a database, an SPSS, SAS, R, MATLAB or other file format native to the specific data analysis software you are using.
    • Once the data’s been collected and the analysis performed, it’s best to save the data as a comma separated values (.csv) file for long-term storage, rather than leaving it in the native format of the program that was used to collect or analyse it. Most data software packages provide options for saving data as a comma or tab separated values file.

  • Image files
    • While working with image files, you may need to manipulate, share, or embed them in other documents. For these sorts of purposes it may be useful to save your files in proprietary formats used by image editing software such as Adobe Photoshop, or to compress your images into JPEG format so that they’re smaller and easier to send over the internet or embed in presentations.
    • The uncompressed TIFF (Tagged Image File Format) is the best choice for long-term preservation of image files, rather than any of the proprietary or compressed formats you might need to use while you’re working with the images. Most image creation and editing software packages provide options for saving images as TIFF files.

The following table provides general suggestions for acceptable file formats for long-term preservation. For more specific recommendations, please contact the Research Data team for further advice on which file formats to use for long-term preservation, as well as when and how your data should be converted into these formats.

  • Archive
    Acceptable Formats (*preferred)
    GNU ZIP File Format (.gzip)
    Tape Archive File Format
    ZIP File Format (.zip)
  • Audio
    Acceptable Formats (*preferred)
    Audio Interchange File Format (.aiff)
    *Free Lossless Audio Codec (.flac)
    *Waveform Audio File Format (.wav)
  • Computer Aided Design (CAD)
    Acceptable Formats (*preferred)
    Design Web Format (.dwf)
    Drawing Exchange Format (.dxf)
    Drawing Files (.dwg, .dws, .dwt)
    Extensible 3D (.x3d)
    Standard for the Exchange of Product
    Model Data (.step, .stp)
  • Email
    Acceptable Formats (*preferred)
    *Email (Electronic Mail Format) (.eml)
    *MBOX Email Format (.mbox)
    Microsoft Outlook Item (.msg)
    *Microsoft Outlook Personal Folders File (.pst)
  • Geospatial (see also CAD and Dataset categories)
    Acceptable Formats (*preferred)
    ESRI Shapefile (.shp, .shx, .dbf)
    Geospatial Tagged Image File Format (.tif, .tiff, .gtiff)
    Keyhole Markup Language (.kml)
  • Moving Images
    Acceptable Formats (*preferred)
    Motion JPEG 2000 (.mj2)
    MPEG-4 (.mp4)
  • Presentations
    Acceptable Formats (*preferred)
    Microsoft PowerPoint (.pptx)
    *OpenDocument Presentation (.odp)
  • Still Images
    Acceptable Formats (*preferred)
    Portable Network Graphics (.png)
    *Tagged Image File Format (.tif, .tiff)
  • Tabular Datasets
    Acceptable Formats (*preferred)
    *Comma Separated Values (.csv)
    eXtensible Markup Language (.xml)
    Microsoft Excel (.xlsx)
    OpenDocument Spreadsheet (.ods)
    Tab Delimited Values (.tab, .tsv, .txt)
  • Text
    Acceptable Formats (*preferred)
    eXtensible Markup Language (.xml)
    Microsoft Word (.docx)
    OpenDocument Text (.odt)
    Plain Text (ASCII, UTF-8, or UTF-16) (.txt)
    Portable Document Format (.pdf)
    Rich Text Format (.rtf)
  • Website
    Acceptable Formats (*preferred)
    eXtensible HyperText Markup Language (.xhtml)
    MIME HTML (.mhtml)
    Web ARChive File Format (.warc)

Further information