Skip to Main Content

Data Management: Formatting & Organization

Information about how to organize, describe, preserve and share your research data

Organizing Your Files

How you organize your data files is dependent on the research / project requirements.  

Use consistent methods when naming, describing (metadata), and storing you data files.

Prior to data collection, consider the following:

  • File directory structures
  • Naming conventions
  • Data file formats
  • Metadata 

File Directory Structures

Top- level folder should include a title, unique ID, and year (YYYY).

Subfolders should use a concise, documented naming convention that could include, experiment runs, dataset versions, group personal.

Naming Conventions

Ensure that your data files are organized and labeled / named correctly so that they are easily identifiable.

  • Top-level folder should include : Project Name, Unique ID, and YYYY
  • Each subfolder should have a documented naming convention
  • Be consistent with naming schema 
  • Use short meaningful file names 
  •  Use version control for updated / modified files
    • Ex. Spec_Data_20220930 (use YYYYMMDD format in file names)
  • Avoid special characters. Use underscore to connect naming elements

File Formats

To maximize the ability to share, preserve and re-use datasets / digital files, carefully consider the format you use for digital files.

Researchers should plan for both hardware and software obsolescence and consider file format choices that ensure long term operability and access.
Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.

Preferred File Formats: 

 

 

Format Highest Confidence Medium Confidence Lowest Confidence
Text

Plain text -- US-ASCII, UTF-8, UTF-16 with BOM (.txt)

SGML with included DTD (.sgm, .sgml)

XML with included schema (.xml)

PDF/A-1 --  ISO 19005-1 (.pdf)

Plain text -- ISO 8859-x (.txt)

Rich Text Format 1.x (.rtf)

Cascading Style Sheets (.css)

HTML (.html, .htm)

LaTeX with referenced files (.latex)

OpenDocument Text (.odt, .sxw)

MS Word 2007+ (OOXML) (.docx)

PDF with fonts embedded (.pdf)

Microsoft Word (.doc)

WordPerfect (.wpd)

all others
Spreadsheet or Database

Comma- or tab-separated Values (.csv, .tsv, .txt)

Delimited text

SIARD: Software Independent Archiving of Relational Databases (.siard)

dBASE (.dbf)

OpenDocument Spreadsheet (.ods)

MS Excel 2007+ (OOXML) (.xlsx)

Excel (.xls)

all others
Digitized Books, Maps, Paper etc.

JPEG2000 -- lossless (.jp2)

TIFF -- uncompressed (.tiff)

PDF/A-1 --  ISO 19005-1 (.pdf)

n/a All others
Graphics and Images

TIFF -- uncompressed or CCITT 4 compressed (.tiff)

JPEG2000 -- lossless compression (.jp2)

PNG (.png)--24bit true color

TIFF -- compressed (.tiff)

JPEG (.jpg)

JPEG2000 -- lossy compression (.jp2)

GIF (.gif)

Digital Negative DNG (.dng)

BMP (.bmp)

PNG (.png)--8 bit indexed

PhotoShop (.psd)

Encapsulated Postscript (.eps)

MrSID (.sid)

RAW files

All others

Digital Audio 

BWAV LPCM (.bwav, .wav)

24-bit, 96kHz

AIFF -- PCM (aif, aiff) LPCM codec.

WAV -- PCM (.wav)

LPCM codec

SUN audio -- uncompressed (.au, .snd)

Standard MIDI (.mid)

Free Lossless Audio Codec (.flac)

Apple Lossless Audio Codec (ALAC) (.m4a)

MP3 (.mp3)

Advance Audio Coding (.mp4)

SUN audio -- uncompressed (.au, .snd)

Standard MIDI (.mid)

Free Lossless Audio Codec (.flac)

Apple Lossless Audio Codec (ALAC) (.m4a)

MP3 (.mp3)

Advance Audio Coding (.mp4)
Digital Video

FFV1/Matroska (.mkv)

AVI -- uncompressed (.avi)

QuickTime -- uncompressed, motion JPEG (.mov)

Uncompressed .mxf

MPEG-4 (.mp4) H.264

MPEG-1, MPEG-2 (.mp1, .mp2)

Ogg Theora (.ogv, .ogg)

ProRes  (.mov)

Motion JPEG 2000 (.jp2)

Windows Media Video (.wmv)

RealVideo (.rm, .rv)

all others
Presentation PDF/A-1 --  ISO 19005-1 (.pdf)

OpenDocument Presentation (.odp)

MS Powerpoint 2007+ (OOXML) (.pptx)

PowerPoint (.ppt)

all others
Containers

Zip --no compression

.tar
Zip- compressed All others
Quantitative and Statistical Data

Comma- or tab-separated Values (.csv, .tsv, .txt)

Structured text or markup file containing metadata information:

Data Documentation Initiative (.ddi), XML (.xml), JSON (.json)

SIARD: Software Independent Archiving of Relational Databases (.siard)

HDF5 (.hdf)

SPSS (.sav, .sps, .spv, .spo)

SAS (.sas, .sas7dat)

R (.R)

HDF4 (.hdf)

Excel (.xls)

Other proprietary formats
Email

MBOX

EML

MSG

PST
All others

 

Table reused courtesy of University of Washington Libraries: Preferred File Formats—UW Libraries. (n.d.). Retrieved March 6, 2023, from https://www.lib.washington.edu/preservation/preservation_services/digitization-and-digital-preservation/preferred-file-formats

LibGuides Footer; South Dakota State University; Brookings, SD 57007; 1-800-952-3541