BioCatNet Data Model

BioCatNet aims to create an infrastructure for family-specific protein databases like TEED, the thiamine diphosphate dependent enzyme engineering database. The structure of BioCatNet follows a relational database approach which means that pieces of information are stored in several tables rather than in a single list of entries. BioCatNet uses the Firebird 2.5 relational database management system.

The data structure of BioCatNet is based on the previously developed DWARF system (Fischer, M., Thai, Q. K., Grieb, M., & Pleiss, J. (2006). DWARF–a data warehouse system for analyzing protein families. BMC Bioinformatics, 7:495.). The global protein sequence similarity defines the degree of relatedness for hierarchical groups of proteins. The DWARF system was mainly amended for more detailed taxonomic information, user management and biocatalytic data.

The following figure shows a very simplified scheme of BioCatNet's data table structure centered around the SEQUENCES table:

simplified scheme of <em>BioCatNet</em>'s data structure


Sequence information is stored as entries of individual positions, allowing distinct annotations of relevant residues. Sequences are ordered hierarchically on the levels of proteins, homologous families and superfamilies, similar to the DWARF system. Additionally, BioCatNet can assign families of homologous proteins and superfamilies to groups of homologous families or superfamilies.

Sequences are assigned to proteins by their global sequence similarity (e.g. by a threshold of 98%). Homologous families are formed by global sequence similarity, too. Depending on the respective enzyme family, the threshold can vary (e.g. 40 to 60%)

A single sequence entry can be linked to one or more source organisms. In a similar way, a single sequence entry can be linked to several structures or homology models.


Structural information from protein crystals is parsed from PDB to BioCatNet. An additional routine can compute homology models, given suitable templates.


BioCatNet connects sequence entries to original data from experimenters. The investigated reaction, the reaction conditions, additions of enzyme(s) and (substrates) and the applied reaction buffer are specified. Experimental results can comprise time courses of substrate depletion or product formation. If no concentrations are measured, parameters like, conversion or enantiomeric excess (ee) can be specified. For requests of additional parameters, please contact the BioCatNet administrator.


A sequence in BioCatNet can be linked to the source organism. Taxonomic lineage is derived from NCBI Taxonomy.