Hierarchical clustering

Author

Cox Lab

Published

June 27, 2024

1 General

  • Type: - Matrix Analysis
  • Heading: - Clustering/PCA
  • Source code: not public.

2 Brief description

This activity performs hierarchical clustering of rows and/or columns and produces a visual heat map representation of the clustered matrix. Clustering can be performed with a choice of distances and linkages. This activity can also be used just to display your data in a heat map without performing clustering by deselecting row and column clustering.

3 Parameters

3.1 Row tree

If checked rows will be clustered and a tree (dendrogram) is generated (default: checked).

3.1.1 Distance

Selected distance that will be used for the clustering process (default: Euclidean). The distance can be selected from a predefined list:

  • Euclidean
  • L1
  • Maximum
  • Lp
  • Pearson correlation
  • Spearman correlation
  • Cosine
  • Canberra

3.1.2 Linkage

Selected clustering method that will be applied (default: Average). It can be selected from a predefined list:

  • Average
  • Complete
  • Single

3.1.3 Constraint

Selected constraint that should be preserved from the input data (default: None). The used constraint can be selected from a predefined list of constraints:

  • None
  • Preserve order
  • Preserve order (periodic)

3.1.4 Preprocess with k-means

Specifies, whether the data should be pre-processed using k-means before applying clustering and generating a heatmap (default: checked).

3.1.5 Number of clusters

This parameter is just relevant, if the parameter “Preprocess with k-means” is checked. Defines the number of clusters that will be created by the k-means algorithm (default: 300).

3.2 Column tree

If checked, columns will be clustered and a tree (dendrogram) is generated (default: checked).

3.2.1 Distance

Selected distance that will be used for the clustering process (default: Euclidean). The distance can be selected from a predefined list:

  • Euclidean
  • L1
  • Maximum
  • Lp
  • Pearson correlation
  • Spearman correlation

3.2.2 Linkage

Selected clustering method that will be applied (default: Average). It can be selected from a predefined list:

  • Average
  • Complete
  • Single

3.2.3 Constraint

Selected constraint that should be preserved from the input data (default: None). The used constraint can be selected from a predefined list of constraints:

  • None
  • Preserve order
  • Preserve order (periodic)
  • Preserve grouping

3.2.4 Preprocess with k-means

Specifies, whether the data should be pre-processed using k-means before applying clustering and generating a heatmap (default: checked).

3.2.5 Number of clusters

This parameter is just relevant, if the parameter “Preprocess with k-means” is checked. Defines the number of clusters that will be created by the k-means algorithm (default: 300).

3.3 Which columns to use

List of all expression/numerical columns in the data set (default: all numerical columns; the expression columns are selected see parameter “Use for clustering”).

3.4 Use for clustering

Selected expression/numerical columns that should be used for the clustering (default: all expression columns are selected).

3.5 Display in heat map but do not use for clustering

Selected expression/numerical columns that should be displayed in the output heat map, but are not used for the clustering (default: empty).

4 Parameter window