Hierarchical clustering
1 General
- Type: - Matrix Analysis
- Heading: - Clustering/PCA
- Source code: not public.
2 Brief description
This activity performs hierarchical clustering of rows and/or columns and produces a visual heat map representation of the clustered matrix. Clustering can be performed with a choice of distances and linkages. This activity can also be used just to display your data in a heat map without performing clustering by deselecting row and column clustering.
3 Parameters
3.1 Row tree
If checked rows will be clustered and a tree (dendrogram) is generated (default: checked).
3.1.1 Distance
Selected distance that will be used for the clustering process (default: Euclidean). The distance can be selected from a predefined list:
- Euclidean
- L1
- Maximum
- Lp
- Pearson correlation
- Spearman correlation
- Cosine
- Canberra
3.1.2 Linkage
Selected clustering method that will be applied (default: Average). It can be selected from a predefined list:
- Average
- Complete
- Single
3.1.3 Constraint
Selected constraint that should be preserved from the input data (default: None). The used constraint can be selected from a predefined list of constraints:
- None
- Preserve order
- Preserve order (periodic)
3.1.4 Preprocess with k-means
Specifies, whether the data should be pre-processed using k-means before applying clustering and generating a heatmap (default: checked).
3.1.5 Number of clusters
This parameter is just relevant, if the parameter “Preprocess with k-means” is checked. Defines the number of clusters that will be created by the k-means algorithm (default: 300).
3.2 Column tree
If checked, columns will be clustered and a tree (dendrogram) is generated (default: checked).
3.2.1 Distance
Selected distance that will be used for the clustering process (default: Euclidean). The distance can be selected from a predefined list:
- Euclidean
- L1
- Maximum
- Lp
- Pearson correlation
- Spearman correlation
3.2.2 Linkage
Selected clustering method that will be applied (default: Average). It can be selected from a predefined list:
- Average
- Complete
- Single
3.2.3 Constraint
Selected constraint that should be preserved from the input data (default: None). The used constraint can be selected from a predefined list of constraints:
- None
- Preserve order
- Preserve order (periodic)
- Preserve grouping
3.2.4 Preprocess with k-means
Specifies, whether the data should be pre-processed using k-means before applying clustering and generating a heatmap (default: checked).
3.2.5 Number of clusters
This parameter is just relevant, if the parameter “Preprocess with k-means” is checked. Defines the number of clusters that will be created by the k-means algorithm (default: 300).
3.3 Which columns to use
List of all expression/numerical columns in the data set (default: all numerical columns; the expression columns are selected see parameter “Use for clustering”).
3.4 Use for clustering
Selected expression/numerical columns that should be used for the clustering (default: all expression columns are selected).
3.5 Display in heat map but do not use for clustering
Selected expression/numerical columns that should be displayed in the output heat map, but are not used for the clustering (default: empty).