Decathlon Data Browser: Documentation
Contents
The Decathlon Data Browser is an interface for exploring the correlation structure of a suite of
behavioral measures collected on individual inbred and outbred fruit flies and their gene expression profiles.
This tool serves as a companion resource to the original paper and dataset.
High-dimensional data can be difficult to develop an intuition for, even more so when trying to understand the
interaction of multiple high dimensional datasets. The Data Browser is designed to facilitate understanding of
three types of questions:
- How are behavioral measures related to one another?
- How is individual gene expression related to behavioral measures?
- Which are molecular pathways are predictive of behavioral measures?
-
Select a dataset - The decathlon behavioral data is split by genetic background (inbred or outbred) and behavioral measure
dimensionality (full or distilled).
-
Explore behavior correlations - Click on pairwise correlations in the correlation matrix to see scatter plots of standardized behavioral
metrics or hover to see details of the correlation such as names of the corresponding behaviors, sample size, and correlation coefficient.
-
Create a behavioral selection - In addition to visualizing behavior correlations, the correlation matrix serves as an organizing feature for
selecting subsets of behaviors to explore further. Use the correlation matrix or behavior group selection menus to
subset behavioral metrics. See Selecting Behaviors for details.
-
Explore behavioral dimensionality - View principal components (PCs) of Behavior Groups from the correlation
matrix and the loadings of each behavior within the group from the Behavioral Loadings tab. From here you can also select
behaviors by features in principal components (e.g. select all behaviors that are positively weighted for the
second PC of the Activity Behavior Group).
-
View behavior details - Browse behavior and assay descriptions, details, and calculations from the Behavior Summary tab. View histograms of the data
split by experimental batch.
-
Find genes correlated to behaviors - Browse lists of genes correlated to each behavior and construct gene queries by selecting
genes significantly correlated to one or more behaviors.
-
Search and view gene details - Search genes in the Search by Gene ID tab by inputting a list manually or building a list via
the Select Genes by Behavior or KEGG Pathways tabs. Browse summary data for each gene such as the total number
and rank-ordered p-values of signficantly correlated behaviors. Optionally generate new behavior selections of correlated
behaviors or view KEGG pathways linked to each gene that were found to be significantly enriched in behavioral measures.
-
Explore functional biological pathways associated with behaviors - Browse KEGG pathways significantly enriched in lists
genes correlated to decathlon behaviors. Browse behaviors linked to KEGG Pathways and build search queries from genes significantly
linked to both behavioral metrics and enriched KEGG pathways.
Several options for selecting behavioral measures have been included to help subset the data by relevant characteristics.
Create a behavior selection by any of the following ways:
- Select behaviors based on correlation values directly from the matrix.
- Left-click + drag to add a range of rows and columns to the selection.
- Hold Ctrl + left-click + drag to remove a range of rows and columns from the selection.
- Keep in mind that each value in the matrix is representative of pairwise correlation of the row and column
behavioral measures, meaning that selecting measures this way will add/remove measures in both the row and column range.
- Select behaviors by categorically similar "Behavior Groups" or by the "Assays" they were collected from.
- Click a behavior group or assay to display the associated behaviors in the behavior window
- Left-click to add/remove individual measures or Shift-click to
add/remove a range of behaviors from the behavior window.
- Use the buttons above the behavior selection window to quickly select or deselect all behaviors within a group.
- Select behaviors based on their loading contributions to Behavior Group principal components.
- Use the dropdown menu from the "Behavior Loadings" tab to switch display of a priori behavioral groups.
- Left-click on the loading bar plots to add/remove individual measures or Shift-click to
add/remove a range of behaviors.
- Use the buttons below the dropdown menu to quickly select or deselect all behaviors within the
active a priori behavioral group.
All behaviors are considered selected by default if no specific behaviors are selected.
This default state can be restored at anytime by clicking the "Clear All Selections" button above the correlation matrix.
A priori behavioral groups are composed of sets of categorically similar behaviors that are likely correlated because they measure
similar underlying phenomena. For example, behaviors such as average movement speed, total number of movement bouts,
and average length of movement bouts are all measures of an animal's activity level and therefore belong to the
"Activity" a priori group. We attempt to dimensionally reduce these groups with principal component analysis into a smaller number of
significant principal components (PC) which are composed of linear combinations of a priori group behaviors. More information on
this process can be found in the original paper.
The Metric Loadings tab can be used to visualize the linear weights (i.e. loadings) of each behavioral measure to each
PC within the group. Behavior selections can be partitioned by interesting features within their PC loading plots
by clicking on single or ranges of behaviors within the plot.
Details of behaviors such as which assay the behavior was collected from, a description of the behavior, and the calculation
of the behavioral score can be found under the Behavior Summary tab. The plots shown here depict the individual fly histograms the of the
raw behavioral data split by experimental batch. A summary of the most recently selected behavioral measure is shown by default.
The Search Genes by Behavior tab can be used to explore lists of genes correlated to each behavior or to generate
selections of gene IDs to perform a search on the "Search by Gene ID" tab. This option can be used to obtain a
list of genes from a selection of behaviors of interest. Genes can be selected manually (Left-click for single,
Shift-click for range) or automatically by selecting genes below a p-value threshold. The
p-value threshold used to determine significance can be
adjusted manually via the "p-value cutoff" textbox. (p=0.05 by default). All genes within the selection can
be added to the Search by Gene ID input window by clicking the "Append Selected Genes" button. Selected genes can
be cleared by clicking the "Clear" button.
The scope of the threshold, append, and clear operations can be adjusted to apply to:
- this behavior - operations apply only to the behavior currently selected from the dropdown menu
- selected behaviors - operations apply only to the behaviors currently selected in the correlation matrix
- all behaviors - operations apply to all behavioral measures in the currently select batch (change batches under the Search by Gene ID)
The Search by Gene ID tab can be used to explore the relationship between behavioral measures and fly gene expression profiles.
To perform a search, construct a list of gene identifiers (either gene symbol, FlyBase gene number, KEGG gene ID) separated by
either by a newline or space and click "Submit". Gene IDs will be automatically matched and displayed in the
"Valid Matching Identifiers" window. Gene IDs with no match will be filtered and removed from the list.
Gene lists can be constructed in a few different ways:
- A pre-existing list of candidate genes can be entered manually in the Gene-ID search box or imported by file. This option can be
used to obtain a list or selection of significantly correlated with genes of interest.
- Genes significantly correlated with behaviors in the current behavioral selection can be appended
to the search list from the Search by Gene ID tab.
- Gene lists associated with KEGG pathways found to be significantly
enriched in the decathlon gene-behavior models can be constructed from the Enriched KEGG Pathways tab. This option
can be used to identify genes or behaviors linked to molecular pathways.
Information about gene sets associated with KEGG pathways that were found to be significantly enriched in one or more
behaviors can be found under the Gene pathways tab. Summary behaviors of the pathway across all behaviors
such as the bootstrap minimum p-value and the average number of behaviors for which the pathway was significantly enriched
were computed by bootstrapping the behavioral data upstream of the gene-behavior modeling step. These bootstrapped models were used
to construct a list of significantly predictive genes for each behavior and, from there, a list of significantly enriched KEGG pathways
for each behavior. Therefore, these summary values represent average across all replicates
of the minimum p-value across behaviors and the total number of behaviors hits for a given pathway.