Home | Using
Katmandoo | Quick Start Guide |
How-to | Table of Contents
| Glossary |
See Also |
Collapse All
ExpandAll
Statistical Data Structures
Overview
The storage of observed data in
Katmandoo is based
on elementary statistical notions of the experimental unit and the sampling (or
observational) unit, and the distinction between the two. Experimental units are
identified by user defined
design factors, and observational units have additional identifiers
for each level of
sub-sampling.
Measured variables are recorded as
traits, which, together with a time stamp
and operator create a
variate; a mechanism
that admits repeated measures data where the same trait is recorded on an observational
unit on more that one occasion.
Phenotypic data storage
The
experimental unit
is defined as the smallest division of the experimental material such that any two
units may receive different
treatments. A typical example is
a plot in a field experiment where the applied treatment might be a particular genotype
or management practice, say. Experimental units are identified by a unique combination
of
design fractors typically related to the spatial or temporal characteristics
of the experiment. For example,
field Column and field Row uniquely identify a plot in a rectangularly
arranged field trial, while
day and
timeOfDay
might identify units in a laboratory experiment. In
Katmandoo,
treatment factors such as
genotype are considered a property of the experimental unit and are
not included in the unique key formed by the design factors, however, the concept
of
virtual column allows users the
flexibility of setting up factors in a way appropriate to their needs.
The
sampling unit
is the unit on which individual observations are made and data recorded. This may
also correspond to the experimental unit, for example where an entire plot is harvested
to estimate grain yield, say, but it may also correspond to some portion of the
experimental unit, such as multiple dry matter quadrats (samples) from the
same plot at some point through the experiment. In the latter case, the plot is
the experimental unit and a quadrat is the sampling unit; additional instances of
the sampling unit can be invoked to handle any number of levels of sub-sampling.
Experimental design
Experimental design is the process of assigning
genotypes (or treatment combinations) to the experimental units, with due
consideration of the sources of variation that may impact on the experiment. For
example, in field experiments where trial operations such as planting and harvesting
occur in rows and columns, it would be prudent to make the comparison among treatments
as robust as possible against natural or induced variation in these directions.
Classical designs used in these situations include randomised complete blocks, incomplete
block designs such as square and rectangular lattices and contemporary row-column
designs. More recent developments include designs for correlated data, such as those
generated by the
DiGGer package.
Design is an evolving area of research, consequently
Katmandoo
does not include any "hard" links or embedded code to generate experimental designs.
For convenience, a "soft" link to the
DiGGer
package is included to generate a class of designs typically used in field experimentation
from the
Katmandoo user interface. The link
is described as "soft" as
DiGGer
can be upgraded or augmented with little or no change to the
Katmandoo
application.
Designs can be imported into
Katmandoo in one
of two ways:
- the interface to the DiGGer
package noted above, or
- as part of the process of importing
data.
The latter is flexible and implies that designs can be generated from any convenient
or specialised source. An important concept to remember is that experimental units
in
Katmandoo are identified by one or more
user-definable
design factors
established through the
virtual column
mechanism. These factors would typically locate an experimental unit in time or
space, for example
column+row or
range+plot for field experiments (depending
on local convention) , and
batch+operator+time-of-day
for a laboratory experiment. As these are user defined, alternatives such as
unit or
block+plot
could be used but they carry little spatial or temporal information. The implication
is that these columns must exist in the import file and be identified as design
factors.
Traits, variates and repeated
measures
A
trait in
Katmandoo
is a numeric or alpha-numeric characteristic of the
sampling
unit that is recorded and stored against the appropriate design and treatment
factors for an experiment. For example, traits in varietal testing regimes
might include plant responses such as grain yield, plant height, disease resistance,
seed colour etc. An instance number is added to the trait column to form a unique
identifier, which, when coupled with date recorded and operator provides:
- an audit trail
- a means by which to capture multiple measurements on a sampling unit over time
(or space)
The union of trait, instance number, date and operator in
Katmandoo
is called a
variate. A variate captures
a single instance of a trait and is used in the hierarchy of
primary keys that identify an
individual measurement or an
analysed
summary (for example, a treatment mean) of an experiment. A trait,
on the other hand, is used to identify information sourced from more than one variate,
such as a meta analysis over experiments or the repeated measures analysis of a
single experiment.
Analysed data
Katmandoo stores summary information from statistical
analyses at both the treatment and experiment level, including:
- estimates of treatment performance, their precision, treatment weights that may
be appropriate when pooling information across experiments, estimated components
of variance from any random effects (including residual error) fitted in the analysis.
- the statistical method and model, experiment (general) mean, error degrees of freedom
and average standard error of difference between treatments.
Analysis of single experiments
Analysed summaries from individual experiments would not typically be used as a
reporting mechanism but more commonly would form the input data for a meta-analysis
over a series of experients. In this context,
Katmandoo
stores:
- estimates of treatment means
- standard errors of treatment means
- number of non missing replications
- a user derived weight appropriate to each treatment mean for use in a combined
analysis across trials
- any components of variance from design or extraneous factors
- the overall experiment mean
- residual degress of freedom
- coefficient of variation
- the average pairwise standard error of difference
- a text version of the statistical model fitted
Analysis of series of experiments
Either data at the sampling unit level or analysed summaries for one or more traits
from multiple experiments can be exported for further analysis.
Katmandoo
assigns a unique identifier to the exported data set to facilitate subsequent importing
of results. The analysis of a series of experiments in
Katmandoo
is called a Multi Environment Trial analysis, or
MET, and assumes that
Genotypes
are the
treatments of interest.
Typically, the purpose of the analysis of a series of experiments is to get some
overall estimate of genotype performance and evaluate the nature of any genotype
by experiment (or environment) interactions (
GxE).
Experiments may be grouped into geographical regions, ecological zones or any user-definable
grouping and genotype predictions stored accordingly. The overall estimate of genotype
performance in
Katmandoo is the Estimated Genetic
Value (
EGV) , essentially
the best linear unbiased prediction (
BLUP)
assuming random genetic effects. There is provision for storing an image of
the biplot that can be derived from particular variance models for
GxE.
Genotype effects may also be classified according to the levels of some
explanatory variate used in the analysis and subsequent prediction. For example,
if an index of experiment "mean value" was included in the analysis, subsequent
predictions might be made of genotype performance at a range of values of this index.
The
virtual column feature of
Katmandoo allows
EGVs to be stored against such a classifying factor.
See Also