Tree-penalized Path Length (tpPL) criterion for data seriation

From this page, you can see comparisons of the results of using different seriation criteria on synthetic datasets. Three criteria are used: The 2022 article that describes the tpPL ordering method is available online.

There are 44 sythetic datasets, 40 of them two dimensional, and one RNA dataset. The synthetic datasets and the code that generated them is available on GitHub. The 3D structures forming the RNA dataset come from the RNA motif group HL_33239.14. See the link to view the 3D coordinates of the instances.

Four hierarchical clustering linkages are used for tpPL and OLO; these are average, complete, single, and Ward. tpPL orderings are shown for a variety of tree penalty strengths; as the penalty strength increases from 0, the paths transition from TSP to tpPL to OLO paths, and the path lengths get longer. For each dataset, each clustering linkage, and each tree penalty strength, the best known ordering is shown, as well as a re-ordered heat map of all-against-all distances.

Frontier graphs for all linkages are posted.

The sections below have links to instructive examples of the different ordering methods making imperfect orderings. These are not meant to be exhaustive, but rather to make it easy to find the prime examples. Examples are given for OLO average and OLO complete, and comments are made concerning OLO Ward when appropriate. OLO single consistently gives long, inefficient paths, so no examples are chosen to illustrate that. Datasets with elongated clusters generally have those clusters split by tpPL and OLO with average, complete, and Ward linkages; only one example of that (line with points off the line) will be shown below.

Examples where TSP backtracks

Examples where TSP splits clusters

Examples where OLO average makes it appear there is a sharp cluster boundary

Examples where OLO average has a particularly inefficient path

Examples where OLO complete splits clusters

Examples where OLO ward has a particularly inefficient path

Examples where tpPL average does poorly

Examples where tpPL complete does poorly

Examples where tpPL single does poorly

Examples where tpPL Ward does poorly

Copyright 2022 Denis A. Aliyev and Craig L. Zirbel.