Exercise: Day 2 - Introduction to GitHub, Correlation, and Basic Clustering
Get the data
We will be using the same expression data as for the Day 1 exercises. Instructions for loading that dataset are copied below:
Download the following data of from https://tcga-data.nci.nih.gov/docs/publications/gbm_exp/.
unifiedScaledFiltered.txt
- Filtered unified gene expression estimate for 202 samples and 1740 genes
Load the data into R
Do this by either first downloading the file and providing the local path:
#replace location with local path_to_file
my_data <- read.table("../data/unifiedScaledFiltered.txt",sep="\t",header=1)
..or by reading the file directly from the remore location:
#source("https://bioconductor.org/biocLite.R") #run if necessary
#biocLite("curl") #run if necessary
library(curl)
my_data <- read.table(curl("https://tcga-data.nci.nih.gov/docs/publications/gbm_exp/unifiedScaledFiltered.txt"),sep="\t",header=1)
Correlation
A. Calculate the pairwise correlations between genes. What are the dimensions of the output matrix? What would it be if we correlated samples instead of genes?
B. What is the range of values in this matrix? What are the highest and lowest values you would expect and why?
C. Find the pair of genes that are most correlated. Do you need to operate over the entire matrix to do this?
D. By default, the R function cor()
uses the Pearson method for calculating correlation coefficients. If we change to using a rank-based method like Spearman’s rho, do we find a different pair of genes that are the most highly correlated?
E. Plot a matrix of the top 20 most correlated genes (hint: consider using the library corrplot
)
Clustering
A. Cluster the top 20 most correlated genes using complete hierarchical clustering and visualize the output tree
B. Try using different clustering methods and observe the effects upon how correlated genes are grouped together.
C. If we cluster these genes with complete hierarchical clustering and divide the tree into 4 groups, how many genes are in the largest group? The smallest?