Calculating Similarities: Introduction

Now we need to make sense of these data. Recall that our original question was how tumor formation affects gene expression patterns.

A common analysis method starts by calculating similarities between the expression patterns of individual genes.

There are many ways of calculating similarities. One popular method is the Pearson correlation coefficient, a measure that describes how two variables (in this case, expression levels from two genes) go up and down together.

Log10 in five samples of genes A, B, C, and D. Greens=repressed; Black=1:1; Reds=induced.
Gene A Gene B Gene C Gene D
Sample 1 0.602 0 −0.481 0
Sample 2 0.301 −0.0969 0 0.114
Sample 3 0.544 0.301 −0.602 0.477
Sample 4 0.176 −0.301 −0.602 0
Sample 5 −0.0969 0 0.0792 −0.0969