Dataset Preparation

Dataset Preparation

Regardless of how you arranged your data when you collected them, you must make sure they follow the appropriate format for use with the Ecoplexity data analysis tools. Species composition datasets are slightly different from standard datasets.

Standard Dataset Structure

  • All of the data you would like graphed or analyzed together should be in a single column.
  • Each column in the dataset should contain a separate variable (e.g., humidity, site code, soil temperature).
  • The first row should contain names for the variables.
  • All additional rows should contain data entries or NA (blank cells will be substituted with NA).
  • There should not be any empty rows or columns in the dataset.
  • For variables that are numbers, there are should be no extraneous characters (e.g., <1).
  • If there are groups within a variable, a separate factor variable should be used to designate groups (see example below).
  • Group can be designated with numbers or words.
  • Text used in variable names and grouping variables should not have spaces between words (use Soil.temp.celc).

Example Standard Dataset

To the right is dataset that could be created in the Excel template. It describes soil temperatures at six sites, measured during each of three months.

To prepare the dataset for a test combining the three groups (e.g., a comparison of means with date as the factor) or to make a single graph containing all of the data (e.g., a bar graph), place all of the temperature data in a single column, and place values to designate sites in a separate column (as shown at right).

In this case, the column containing data for the main variable would be 1 and the column containing grouping data would be 2. The numbers 1 and 2 would be placed in the corresponding fields in the tool.


Above: Stacked data, with a factor variable (Visit) to designate groups.

Left: Unstacked data, with data from each group next to each other.

Species Composition Dataset Structure

Datasets containing counts of species (or another level of taxa) at one or more sites differ slightly in their structure from standard datasets. The differences are as follows:

  • The top-left cell should be left blank (rather than saying "Species" or something similar).
  • The first row should contain the names of sites, starting in the second column.
  • The first column should contain names of Species.
  • All cells within the matrix should be filled in with zero or greater (missing values are not allowed).

Example Species Composition Dataset

The following is a dataset that could be created in Excel. These specific data describe arthropod species collected in pitfall traps at several sites.