Universiti Sains Malaysia Library 

Managing Research Data

Research Data Management Division USM Library


File Formats

It is important for you to decide what formats to choose for your research data when you start to plan your research projects, as it determines how the data may be used, analysed, stored, and reused in the future.

This table contains guidance on file formats recommended and accepted by the UK Data Service.

 
Type of data Recommended formats
Acceptable formats
Tabular data with extensive metadata

 

variable labels, code labels, and defined missing values

SPSS portable format (.por)

 

delimited text and command (‘setup’) file (SPSS, Stata, SAS, etc.)

structured text or mark-up file of metadata information, e.g. DDI XML file

proprietary formats of statistical packages: SPSS (.sav), Stata (.dta), MS Access (.mdb/.accdb)
Tabular data with minimal metadata

 

column headings, variable names

comma-separated values (.csv)

 

tab-delimited file (.tab)

delimited text with SQL data definition statements

delimited text (.txt) with characters not present in data used as delimiters

 

widely-used formats: MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf), OpenDocument Spreadsheet (.ods)

Geospatial data

 

vector and raster data

ESRI Shapefile (.shp, .shx, .dbf, .prj, .sbx, .sbn optional)

 

geo-referenced TIFF (.tif, .tfw)

CAD data (.dwg)

tabular GIS attribute data

Geography Markup Language (.gml)

ESRI Geodatabase format (.mdb)

 

MapInfo Interchange Format (.mif) for vector data

Keyhole Mark-up Language (.kml)

Adobe Illustrator (.ai), CAD data (.dxf or .svg)

binary formats of GIS and CAD packages

Textual data Rich Text Format (.rtf)

 

plain text, ASCII (.txt)

eXtensible Mark-up Language (.xml) text according to an appropriate Document Type Definition (DTD) or schema

Hypertext Mark-up Language (.html)

 

widely-used formats: MS Word (.doc/.docx)

some software-specific formats: NUD*IST, NVivo and ATLAS.ti

Image data TIFF 6.0 uncompressed (.tif) JPEG (.jpeg, .jpg, .jp2) if original created in this format

 

GIF (.gif)

TIFF other versions (.tif, .tiff)

RAW image format (.raw)

Photoshop files (.psd)

BMP (.bmp)

PNG (.png)

Adobe Portable Document Format (PDF/A, PDF) (.pdf)

Audio data Free Lossless Audio Codec (FLAC) (.flac) MPEG-1 Audio Layer 3 (.mp3) if original created in this format

 

Audio Interchange File Format (.aif)

Waveform Audio Format (.wav)

Video data MPEG-4 (.mp4)

 

OGG video (.ogv, .ogg)

motion JPEG 2000 (.mj2)

AVCHD video (.avchd)
Documentation and scripts Rich Text Format (.rtf)

 

PDF/UA, PDF/A or PDF (.pdf)

XHTML or HTML (.xhtml, .htm)

OpenDocument Text (.odt)

plain text (.txt)

 

widely-used formats: MS Word (.doc/.docx), MS Excel (.xls/.xlsx)

XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0

Disclaimer

USM Library will not be responsible for any loss or damage caused by the use of any information obtained from this website.

Contact Us

Hamzah Sendut Library, Universiti Sains Malaysia, 11800, Penang, Malaysia

Tel : +604 - 653 3720 
Fax : +604 - 654 2508
Email: This email address is being protected from spambots. You need JavaScript enabled to view it. | 


Universiti Sains Malaysia Library © 2023
All Rights Reserved

  • Last Modified: Thursday 23 October 2025.
Web Analytics