WPGMA Phylogenetic Clustering
Build phylogenetic trees step by step. Add species with gene sequences, view the k-mer distance matrix, then watch the WPGMA algorithm merge the closest clusters into a dendrogram.
How it works
1. Add species — each species has an ID (a short name) and a gene sequence made of nucleotides (A, C, G, T).
2. Distance table — distances between species are computed using k-mer analysis: the gene is split into overlapping subsequences of length k, and species are compared by how many k-mers they share.
3. WPGMA clustering — the algorithm repeatedly merges the two closest clusters, averaging their distances to all other clusters. Click Next Step to watch it happen, or Run All to jump to the final tree.
4. Dendrogram — the resulting phylogenetic tree shows how species are related. Closer branches = more similar genes.
Species
5 species loaded| ID | Gene Sequence | |
|---|---|---|
| A | AACTGCATGC | |
| B | AACTGCTTGC | |
| C | GGTACCATGC | |
| D | CATGCAACTG | |
| E | TTGCAACTGC |
Distance Table
| A | B | C | D | E | |
|---|---|---|---|---|---|
| A | — | 54.55 | 76.92 | 22.22 | 40.00 |
| B | 54.55 | — | 93.33 | 66.67 | 40.00 |
| C | 76.92 | 93.33 | — | 76.92 | 93.33 |
| D | 22.22 | 66.67 | 76.92 | — | 40.00 |
| E | 40.00 | 40.00 | 93.33 | 40.00 | — |
WPGMA Clustering
About this project
Originally a C++17 project for the PRO2 course at FIB-UPC. The algorithm computes pairwise distances between species using k-mer frequency analysis, then iteratively merges the closest pair using Weighted Pair Group Method with Arithmetic Mean (WPGMA) to build a phylogenetic tree.