Data analysis is about extracting meaningful information from your dataset using an appropriate tool. Displaying the data in graphical form can be a simple way to extract something meaningful. However, there is often a great deal of stochasticity (randomness, or noise) in ecological systems, so ecologists often employ tools whose specialty is separating meaning from noise - statistics!
The Ecoplexity website contains various tools for exploring, analyzing, and graphing your datasets. An approach that often works for people is to start by generating summary statistics and basic plots for datasets. This way one can identify potential trends and anomalies in the data, and determine whether parametric or non-parametric statistics will be most appropriate. After that, analyses are performed.
1) Count data (e.g., number of individuals)
2) Categorical data (e.g., forest vs. edge vs. meadow habitat) and
3) Continuous data (e.g., inches of rain).
Note that you may want to treat data as a different type than they would be considered in another circumstance. For example, dates might be categorical for the purposes of comparing the mean temperature of a stream during different times of the year but continuous for determining if there is a correlation between day of the year and mean stream temperature.
1) Examine summary statistics and
2) Generate graphs of the distribution of each variable.
Summary statistics (like the mean and range) are especially useful for getting an initial sense of how samples from two groups (levels of a categorical variable) compare. This can also be helpful for identifying erroneous values in your dataset. Because many statistical tests require data to be "normally distributed" (follow the bell-curve), graphs like histograms, box-plots, and Q-Q (quantile-quantile) plots are very useful for determining what test to perform.
Regardless of how you arranged your data when you collected them, you must make sure they follow the appropriate format for use with the Ecoplexity data analysis tools. Species composition datasets are slightly different from standard datasets.
If your data include:
Use the format for Standard Datasets
Use the format for Species Composition Datasets
To the right is a dataset that could be created in the Excel template. It describes soil temperatures at six sites, measured during each of three months.
To prepare the dataset for a test combining the three groups (e.g., a comparison of means with date as the factor) or to make a single graph containing all of the data (e.g., a bar graph), place all of the temperature data in a single column, and place values to designate sites in a separate column (as shown at right).
In this case, the column containing data for the main variable would be 1 and the column containing grouping data would be 2. The numbers 1 and 2 would be placed in the corresponding fields in the tool.
Datasets containing counts of species (or another level of taxa) at one or more sites differ slightly in their structure from standard datasets. The differences are as follows:
The following is a dataset that could be created in Excel. These specific data describe arthropod species collected in pitfall traps at several sites.
There are three tools for exploring data:
By now you may have a good idea about what patterns may be present in your dataset, but is that pattern different from what a random selection of values could give you?
Occasionally, people want to know what computational methods are used in data analysis packages. The Analysis Tools here on the Ecoplexity website use a statistical programming language called R. Community analyses further use an R library called vegan. You can learn more about them at the R-Project website.