Home | Using Katmandoo | Quick Start Guide | How-to | Table of Contents |  Glossary | See Also | Collapse All

Statistical Data Structures

Overview

The storage of observed data in Katmandoo is based on elementary statistical notions of the experimental unit and the sampling (or observational) unit, and the distinction between the two. Experimental units are identified by user defined design factors, and observational units have additional identifiers for each level of sub-sampling. Measured variables are recorded as traits, which, together with a time stamp and operator create a variate; a mechanism that admits repeated measures data where the same trait is recorded on an observational unit on more that one occasion.

Phenotypic data storage

The experimental unit is defined as the smallest division of the experimental material such that any two units may receive different treatments. A typical example is a plot in a field experiment where the applied treatment might be a particular genotype or management practice, say. Experimental units are identified by a unique combination of design fractors typically related to the spatial or temporal characteristics of the experiment. For example, field Column and field Row uniquely identify a plot in a rectangularly arranged field trial, while day and timeOfDay might identify units in a laboratory experiment. In Katmandoo, treatment factors such as genotype are considered a property of the experimental unit and are not included in the unique key formed by the design factors, however, the concept of virtual column allows users the flexibility of setting up factors in a way appropriate to their needs.

The sampling unit is the unit on which individual observations are made and data recorded. This may also correspond to the experimental unit, for example where an entire plot is harvested to estimate grain yield, say, but it may also correspond to some portion of the experimental unit, such as  multiple dry matter quadrats (samples) from the same plot at some point through the experiment. In the latter case, the plot is the experimental unit and a quadrat is the sampling unit; additional instances of the sampling unit can be invoked to handle any number of levels of  sub-sampling.

Experimental design

Experimental design is the process of assigning genotypes (or treatment combinations) to the experimental units, with due consideration of the sources of variation that may impact on the experiment. For example, in field experiments where trial operations such as planting and harvesting occur in rows and columns, it would be prudent to make the comparison among treatments as robust as possible against natural or induced variation in these directions. Classical designs used in these situations include randomised complete blocks, incomplete block designs such as square and rectangular lattices and contemporary row-column designs. More recent developments include designs for correlated data, such as those generated by the DiGGer package.

Design is an evolving area of research, consequently Katmandoo does not include any "hard" links or embedded code to generate experimental designs. For convenience, a "soft" link to the DiGGer package is included to generate a class of designs typically used in field experimentation from the Katmandoo user interface. The link is described as "soft" as DiGGer can be upgraded or augmented with little or no change to the Katmandoo application.

Designs can be imported into Katmandoo in one of two ways:
  1. the interface to the DiGGer package noted above, or
  2. as part of the process of importing data.
The latter is flexible and implies that designs can be generated from any convenient or specialised source. An important concept to remember is that experimental units in Katmandoo are identified by one or more user-definable design factors established through the virtual column mechanism. These factors would typically locate an experimental unit in time or space, for example column+row or  range+plot for field experiments (depending on local convention) , and batch+operator+time-of-day for a laboratory experiment. As these are user defined, alternatives such as unit or block+plot could be used but they carry little spatial or temporal information. The implication is that these columns must exist in the import file and be identified as design factors.

Traits, variates and repeated measures

A trait in Katmandoo is a numeric or alpha-numeric characteristic of the sampling unit that is recorded and stored against the appropriate design and treatment factors for an experiment. For example, traits in varietal testing regimes might include plant responses such as grain yield, plant height, disease resistance, seed colour etc. An instance number is added to the trait column to form a unique identifier, which,  when coupled with date recorded and operator provides: The union of trait, instance number, date and operator in Katmandoo is called a variate.  A variate captures a single instance of a trait and is used in the hierarchy of primary keys that identify an individual measurement or an analysed summary (for example, a treatment mean) of an experiment. A trait, on the other hand, is used to identify information sourced from more than one variate, such as a meta analysis over experiments or the repeated measures analysis of a single experiment.

Analysed data

Katmandoo stores summary information from statistical analyses at both the treatment and experiment level, including:

Analysis of single experiments

Analysed summaries from individual experiments would not typically be used as a reporting mechanism but more commonly would form the input data for a meta-analysis over a series of experients. In this context, Katmandoo stores:

Analysis of series of experiments

Either data at the sampling unit level or analysed summaries for one or more traits from multiple experiments can be exported for further analysis. Katmandoo assigns a unique identifier to the exported data set to facilitate subsequent importing of results. The analysis of a series of experiments in Katmandoo is called a Multi Environment Trial analysis, or MET, and assumes that Genotypes are the treatments of interest.

Typically, the purpose of the analysis of a series of experiments is to get some overall estimate of genotype performance and evaluate the nature of  any genotype by experiment (or environment) interactions (GxE). Experiments may be grouped into geographical regions, ecological zones or any user-definable grouping and genotype predictions stored accordingly. The overall estimate of genotype performance in Katmandoo is the Estimated Genetic Value (EGV) , essentially the best linear unbiased prediction (BLUP)  assuming random genetic effects. There is provision for storing an image of the biplot that can be derived from particular variance models for GxE.

Genotype effects may also be classified according to the levels of some explanatory variate used in the analysis and subsequent prediction. For example, if an index of experiment "mean value" was included in the analysis, subsequent predictions might be made of genotype performance at a range of values of this index. The virtual column feature of Katmandoo allows EGVs to be stored against such a classifying factor.

See Also