Decathlon Data Browser: Documentation

Contents

Overview

The Decathlon Data Browser is an interface for exploring the correlation structure of a suite of behavioral measures collected on individual inbred and outbred fruit flies and their gene expression profiles. This tool serves as a companion resource to the original paper and dataset.

High-dimensional data can be difficult to develop an intuition for, even more so when trying to understand the interaction of multiple high dimensional datasets. The Data Browser is designed to facilitate understanding of three types of questions:

  1. How are behavioral measures related to one another?
  2. How is individual gene expression related to behavioral measures?
  3. Which are molecular pathways are predictive of behavioral measures?

Getting started

  1. Select a dataset - The decathlon behavioral data is split by genetic background (inbred or outbred) and behavioral measure dimensionality (full or distilled).
  2. Explore behavior correlations - Click on pairwise correlations in the correlation matrix to see scatter plots of standardized behavioral metrics or hover to see details of the correlation such as names of the corresponding behaviors, sample size, and correlation coefficient.
  3. Create a behavioral selection - In addition to visualizing behavior correlations, the correlation matrix serves as an organizing feature for selecting subsets of behaviors to explore further. Use the correlation matrix or behavior group selection menus to subset behavioral metrics. See Selecting Behaviors for details.
  4. Explore behavioral dimensionality - View principal components (PCs) of Behavior Groups from the correlation matrix and the loadings of each behavior within the group from the Behavioral Loadings tab. From here you can also select behaviors by features in principal components (e.g. select all behaviors that are positively weighted for the second PC of the Activity Behavior Group).
  5. View behavior details - Browse behavior and assay descriptions, details, and calculations from the Behavior Summary tab. View histograms of the data split by experimental batch.
  6. Find genes correlated to behaviors - Browse lists of genes correlated to each behavior and construct gene queries by selecting genes significantly correlated to one or more behaviors.
  7. Search and view gene details - Search genes in the Search by Gene ID tab by inputting a list manually or building a list via the Select Genes by Behavior or KEGG Pathways tabs. Browse summary data for each gene such as the total number and rank-ordered p-values of signficantly correlated behaviors. Optionally generate new behavior selections of correlated behaviors or view KEGG pathways linked to each gene that were found to be significantly enriched in behavioral measures.
  8. Explore functional biological pathways associated with behaviors - Browse KEGG pathways significantly enriched in lists genes correlated to decathlon behaviors. Browse behaviors linked to KEGG Pathways and build search queries from genes significantly linked to both behavioral metrics and enriched KEGG pathways.

Selecting Behaviors

Several options for selecting behavioral measures have been included to help subset the data by relevant characteristics. Create a behavior selection by any of the following ways:

  1. Select behaviors based on correlation values directly from the matrix.
  2. Select behaviors by categorically similar "Behavior Groups" or by the "Assays" they were collected from.
  3. Select behaviors based on their loading contributions to Behavior Group principal components.

All behaviors are considered selected by default if no specific behaviors are selected. This default state can be restored at anytime by clicking the "Clear All Selections" button above the correlation matrix.

Behavior Loadings

A priori behavioral groups are composed of sets of categorically similar behaviors that are likely correlated because they measure similar underlying phenomena. For example, behaviors such as average movement speed, total number of movement bouts, and average length of movement bouts are all measures of an animal's activity level and therefore belong to the "Activity" a priori group. We attempt to dimensionally reduce these groups with principal component analysis into a smaller number of significant principal components (PC) which are composed of linear combinations of a priori group behaviors. More information on this process can be found in the original paper.

The Metric Loadings tab can be used to visualize the linear weights (i.e. loadings) of each behavioral measure to each PC within the group. Behavior selections can be partitioned by interesting features within their PC loading plots by clicking on single or ranges of behaviors within the plot.

Behavior Summary

Details of behaviors such as which assay the behavior was collected from, a description of the behavior, and the calculation of the behavioral score can be found under the Behavior Summary tab. The plots shown here depict the individual fly histograms the of the raw behavioral data split by experimental batch. A summary of the most recently selected behavioral measure is shown by default.

Select Genes by Behavior

The Search Genes by Behavior tab can be used to explore lists of genes correlated to each behavior or to generate selections of gene IDs to perform a search on the "Search by Gene ID" tab. This option can be used to obtain a list of genes from a selection of behaviors of interest. Genes can be selected manually (Left-click for single, Shift-click for range) or automatically by selecting genes below a p-value threshold. The p-value threshold used to determine significance can be adjusted manually via the "p-value cutoff" textbox. (p=0.05 by default). All genes within the selection can be added to the Search by Gene ID input window by clicking the "Append Selected Genes" button. Selected genes can be cleared by clicking the "Clear" button.

The scope of the threshold, append, and clear operations can be adjusted to apply to:

Search by Gene ID

The Search by Gene ID tab can be used to explore the relationship between behavioral measures and fly gene expression profiles. To perform a search, construct a list of gene identifiers (either gene symbol, FlyBase gene number, KEGG gene ID) separated by either by a newline or space and click "Submit". Gene IDs will be automatically matched and displayed in the "Valid Matching Identifiers" window. Gene IDs with no match will be filtered and removed from the list. Gene lists can be constructed in a few different ways:

KEGG Pathways

Information about gene sets associated with KEGG pathways that were found to be significantly enriched in one or more behaviors can be found under the Gene pathways tab. Summary behaviors of the pathway across all behaviors such as the bootstrap minimum p-value and the average number of behaviors for which the pathway was significantly enriched were computed by bootstrapping the behavioral data upstream of the gene-behavior modeling step. These bootstrapped models were used to construct a list of significantly predictive genes for each behavior and, from there, a list of significantly enriched KEGG pathways for each behavior. Therefore, these summary values represent average across all replicates of the minimum p-value across behaviors and the total number of behaviors hits for a given pathway.