Calculating Similarities: Introduction
Now we need to make sense of these data. Recall that our original question was how tumor formation affects gene expression patterns.
A common analysis method starts by calculating similarities between the expression patterns of individual genes.
There are many ways of calculating similarities. One popular method is the Pearson correlation coefficient, a measure that describes how two variables (in this case, expression levels from two genes) go up and down together.
| Gene A | Gene B | Gene C | Gene D | |
|---|---|---|---|---|
| Sample 1 | 0.602 | 0 | −0.481 | 0 | 
| Sample 2 | 0.301 | −0.0969 | 0 | 0.114 | 
| Sample 3 | 0.544 | 0.301 | −0.602 | 0.477 | 
| Sample 4 | 0.176 | −0.301 | −0.602 | 0 | 
| Sample 5 | −0.0969 | 0 | 0.0792 | −0.0969 |