The Decathlon Data Browser is an interface for exploring the correlation structure of a suite of behavioral measures collected on individual inbred and outbred fruit flies and their gene expression profiles. This tool serves as a companion resource to the original paper and dataset.
High-dimensional data can be difficult to develop an intuition for, even more so when trying to understand the interaction of multiple high dimensional datasets. The Data Browser is designed to facilitate understanding of three types of questions:
Several options for selecting behavioral measures have been included to help subset the data by relevant characteristics. Create a behavior selection by any of the following ways:
All behaviors are considered selected by default if no specific behaviors are selected. This default state can be restored at anytime by clicking the "Clear All Selections" button above the correlation matrix.
A priori behavioral groups are composed of sets of categorically similar behaviors that are likely correlated because they measure similar underlying phenomena. For example, behaviors such as average movement speed, total number of movement bouts, and average length of movement bouts are all measures of an animal's activity level and therefore belong to the "Activity" a priori group. We attempt to dimensionally reduce these groups with principal component analysis into a smaller number of significant principal components (PC) which are composed of linear combinations of a priori group behaviors. More information on this process can be found in the original paper.
The Metric Loadings tab can be used to visualize the linear weights (i.e. loadings) of each behavioral measure to each PC within the group. Behavior selections can be partitioned by interesting features within their PC loading plots by clicking on single or ranges of behaviors within the plot.
Details of behaviors such as which assay the behavior was collected from, a description of the behavior, and the calculation of the behavioral score can be found under the Behavior Summary tab. The plots shown here depict the individual fly histograms the of the raw behavioral data split by experimental batch. A summary of the most recently selected behavioral measure is shown by default.
The Search Genes by Behavior tab can be used to explore lists of genes correlated to each behavior or to generate selections of gene IDs to perform a search on the "Search by Gene ID" tab. This option can be used to obtain a list of genes from a selection of behaviors of interest. Genes can be selected manually (Left-click for single, Shift-click for range) or automatically by selecting genes below a p-value threshold. The p-value threshold used to determine significance can be adjusted manually via the "p-value cutoff" textbox. (p=0.05 by default). All genes within the selection can be added to the Search by Gene ID input window by clicking the "Append Selected Genes" button. Selected genes can be cleared by clicking the "Clear" button.
The scope of the threshold, append, and clear operations can be adjusted to apply to:
The Search by Gene ID tab can be used to explore the relationship between behavioral measures and fly gene expression profiles. To perform a search, construct a list of gene identifiers (either gene symbol, FlyBase gene number, KEGG gene ID) separated by either by a newline or space and click "Submit". Gene IDs will be automatically matched and displayed in the "Valid Matching Identifiers" window. Gene IDs with no match will be filtered and removed from the list. Gene lists can be constructed in a few different ways:
Information about gene sets associated with KEGG pathways that were found to be significantly enriched in one or more behaviors can be found under the Gene pathways tab. Summary behaviors of the pathway across all behaviors such as the bootstrap minimum p-value and the average number of behaviors for which the pathway was significantly enriched were computed by bootstrapping the behavioral data upstream of the gene-behavior modeling step. These bootstrapped models were used to construct a list of significantly predictive genes for each behavior and, from there, a list of significantly enriched KEGG pathways for each behavior. Therefore, these summary values represent average across all replicates of the minimum p-value across behaviors and the total number of behaviors hits for a given pathway.