Katmandoo is a biosciences data management system initially
created for use in plant breeding and evaluation programs where the aim is selection
of superior genotypes in terms of a range of traits (eg. grain yield, quality characteristics
and disease resistance). It aims to provide a single tool for managing phenotypic
and genotypic data at both an individual and multiple experiment level. The system
has much wider application, however, being appropriate for data from other programs
of designed experimental research within the biosciences.
The
Katmandoo
system consists of two (separate) fundamental components: a relational database
to store the information and a software application to access it. The
Katmandoo application
is being developed as a suite of modules:
- trial data management (completed)
- genealogy (pedigree) management (beta)
- tools to capture trait score using Windows-Mobile OS (beta)
- crossing tool (v2.5)
- molecular marker subsystem (v2.5)
- tools to integrate phenotypic and genomic data (v3.0)
- seed inventory subsystem (v3.5)
where not all need be present depending on the level of functionality required.
Katmandoo is not an analysis system; statistical
analysis and experimental design are areas of active research as is the specialist
software necessary to implement them. A key feature of
Katmandoo
is the ability to import, store and export all information necessary to allow
a rigorous statistical analysis for a wide range of designed experiments. The
accommodation of multi-environment trial data means that
Katmandoo is ideal for the ever-growing scenario of inter-organizational
and national scale programs in both plant and animal research.
The flexibility of
Katmandoo to meet emerging biological
and statistical needs is a key property of the underlying data model. The storage
of phenotypic data is made with reference to the distinction between the
experimental unit and the
observational
unit (that is, the unit on which a measurement is made). The generality
this allows is most easily illustrated by example; consider the following related
to crop improvement programs:
- Plant variety field trial: In such trials,
the treatments (varieties) are randomised to field plots using some sort of experimental
design. Katmandoo stores data associated
with both the experimental unit (field plot) and the observational unit, which in
the case where one observation per plot is taken, are notionally the same. The information
at the plot or experimental unit levels includes the variety assigned to it and
design information such as replicate number, field column, field row and incomplete
block etc. The data are stored at the observational unit level and may include response
variables such as grain yield and covariates such as insect damage. If several samples
are taken per plot, such as dry matter quadrats, these constitute multiple observational
units from the same experimental unit.
- Plant variety laboratory trial: In such
trials, grain from field plots is sampled then processed in a laboratory to generate
the trait of interest (eg. milling yield for wheat). These are known as
multi-phase trials. The milling yield example has two phases, namely the
field trial (phase 1) and the laboratory trial (phase 2). Replication and randomisation
are employed in both phases of the design. Phase 1 data are stored as in the previous
example but for phase 2 data the field plot essentially becomes a "treatment" in
the laboratory and the experimental units correspond to grain samples taken from
the plots. The design variables associated
with the experimental unit would include field replicate, field column, field row,
lab replicate, day of milling, time of day and mill operator.
Katmandoo stores these as attributes (virtual
columns) of the experimental unit while the data, in this case milling
yield, is kept at the observational unit level. The measurement of some quality
traits involves more than two phases (eg. dough extension traits require a field,
milling and extension phase); Katmandoo stores all
the necessary design factors (that is, from all phases) and the data (including
any subsampling) at the appropriate levels.
- Series of plant variety trials: Varietal
selection is usually based on information from a series of trials, also known as
multi-environment trials (METs) in which
similar sets of varieties are tested in a range of geographic locations and possibly
over several seasons. Katmandoo stores
the raw data, design and treatment information as above, summarised and meta information
from each experiment to facilitate a joint analysis across experiments at the trial
mean level if necessary, and the results of any such MET analysis from trial summaries
or raw plot data.
- Known genetic relationships: Another goal
of experiments such as those above is to select varieties for use as parents in
the breeding program. Statistical analyses can be extended to incorporate pedigree
information about the varieties from the genealogy stored in Katmandoo,
which can be exported together with the phenotypic data for analysis.
- Identification of Quantitative Trait Loci (QTL):
Katmandoo uses special data structures to store
molecular information on individuals. The marker, phenotypic and observed pedigree
data are integrated and available for export.
Katmandoo has application beyond crop improvement
programs. Data from any type of field trial may be stored, including data from factorial
experiments (where the treatments comprise combinations of levels of several factors
rather than levels of a single factor). Other examples include glass-house experiments
and animal experiments, including experiments where repeated measurements and multiple
traits are recorded on experimental units.
The database component is built on a normalised data model implemented in SQL Server
2000 / SQL Server 2005 / SQL Server 2005 Express Edition and the
Katmandoo
application is developed in C# using the Microsoft .NET Framework 2.0.
Minimum recommended system configuration:
- Microsoft Windows XP or Windows 2000 SP4 or later
-
.Net Framework 2.0 SP1 redistributable package*
- Microsoft Data Access Component (MDAC) 2.8 or later*
- Microsoft SQL Server 2005 Express Edition* or SQL Server 2000 or SQL
Server 2005 or MySQL 5.1 (or later)
- Microsoft Excel and Microsoft Word 97 or later
* Provided by the installataion package
The
Katmandoo database and application
binary are distributed free of charge subject to the conditions in the
Katmandoo
license.
Please
click here for contact information
and responsibilities of the team members.
We would like to thank Chris Lisle
1, Damian
Collins
1, David Luckett
1, Neil Coombes
1, Peter
Martin
1, Gregory Lott
4, David Jordan
2,
Colin Cavanagh
3, Marcus Newberry
3 and our
colleague at
NSW DPI and
QLD Primary Industries and Fisheries for their support during the system
development of Katmandoo and
KollecTrait.
1NSW Department of Primary Industries
2QLD Primary Industries and Fisheries
3CSIRO Plant Industries, Canberra
4Molecular Plant Breeding CRC, University of Adelaide