Hierarchical Clustering

By casual inspection, we could surmise that:

  • Gene C’s behavior is opposite that of Genes A, B, and D.
  • Gene B and Gene D have the most similar behaviors.
The similarity scores for genes A, B, C, and D
Gene A Gene B Gene C Gene D
Gene A 1 0.450 −0.633 0.597
Gene B 0.450 1 0.107 0.729
Gene C −0.633 −0.107 1 −0.454
Gene D 0.597 0.729 0.454 1

To analyze thousands of genes, scientists use a technique called hierarchical clustering.

Hierarchical clustering works by taking the most similar genes and joining them in a cluster. Genes B and D are the most similar at 0.729, so they are joined to become [BD].