Lab Visit: INTERPRETING THE POSTERIOR SAMPLE

April Wright, Southeastern Louisiana University

September 10, 2020

Talk slides, etc.

https://graemetlloyd.github.io/ProjectWhalehead/

Where did this manuscript come from?

Sansom et al. 2018

Comparison between Bayesian and parsimony trees using measures of stratigraphic congruence

Sansom et al. 2018

Metric Meaning Range
Stratigraphic Consistency Index (SCI) Proportion of nodes for which the oldest descendent of that node is younger than the oldest descendent of that node’s ancestor 0 to 1, with one being perfectly consistent
Minimum Implied Gap (MIG) The sum of the branch lengths excluding tip durations Positive numbers in millions of years
Relative Completeness Index (RCI) MIG score proportional to the summed length of tip durations All real numbers
Manhattan Stratigraphic Measure (MSM*) MIG for the maximally stratigraphically consistent possible tree divided by the actual MIG 0 to 1, with one being the most consistent
Gap Excess Ratio (GER) MIG minus the best possible stratigraphic fit, scaled by the contrast between the best and worst fit values 0 to 1, with one being the most consistent

Sansom et al. 2018

This was a neat idea

  • Stratigraphic congruence has been used to compare between competing hypotheses of topology

Sansom et al. 2018

Compared the most parsimonious set to 500 trees from the Bayesian posterior sample

Bayesian Phylogenetics

We’ve been stuck in this modality of Bayesian vs. parsimony for a long time.

It is not exciting.

Bayesian Phylogenetics

We’ve been stuck in this modality of Bayesian vs. parsimony for a long time.

It is not exciting.

Bayesian Phylogenetics

We’ve been stuck in this modality of Bayesian vs. parsimony for a long time.

Bayesian morphological phylogenetics with the Mk model performs well enough

Bayesian Phylogenetics

We’ve been stuck in this modality of Bayesian vs. parsimony for a long time.

Bayesian phylogenetics estimates combinations of the tree topology + parameters that are plausible given a phylogenetic model and data

Bayesian Phylogenetics

Estimate combinations of the tree topology + parameters that are plausible given a phylogenetic model and data

  • Solutions are visited in proportion to how likely they are

Bayesian Phylogenetics

Estimate combinations of the tree topology + parameters that are plausible given a phylogenetic model and data

  • Solutions are visited in proportion to how likely they are
  • A good tree will be visited many times

Bayesian Phylogenetics

Estimate combinations of the tree topology + parameters that are plausible given a phylogenetic model and data

  • Solutions are visited in proportion to how likely they are
  • A good tree will be visited many times
  • The distribution of solutions is information

A forest or a tree

A forest or a tree

A forest or a tree

A forest or a tree

A forest or a tree

“We’ve replaced a maximum likelihood tree with the median or a distribution – is that better?” ~ Joëlle Barido-Sottani

Treeset visualizations

We wanted to show practitioners, who may or may not be very tech-savvy, how to easily make more sense of the posterior sample

Treeset visualizations

Graphics that display trees in 2-D space based on their proximity to one another

Methods

  • Estimated parsimony trees (TNT, Goloboff and Catalano 2016) and Bayesian trees (Höhna et al 2016) for 127 published paleontological matrices

Methods

  • Estimated parsimony trees (TNT, Goloboff and Catalano 2016) and Bayesian trees (Höhna et al 2016) for 127 published paleontological matrices

  • Calculated stratigraphic congruence for all equally-parsimonious trees and the Bayesian posterior sample in the R package Strap (Bell and Lloyd 2015)

Methods

  • Estimated parsimony trees (TNT, Goloboff and Catalano 2016) and Bayesian trees (Höhna et al 2016) for 127 published paleontological matrices

  • Calculated stratigraphic congruence for all equally-parsimonious trees and the Bayesian posterior sample in the R package Strap (Bell and Lloyd 2015)

  • Modified the RWTY (Warren, Geneva and Lanfear 2017) to color points in the treespace by MIG score

Methods

  • Estimated parsimony trees (TNT, Goloboff and Catalano 2016) and Bayesian trees (Höhna et al 2016) for 127 published paleontological matrices

  • Calculated stratigraphic congruence for all equally-parsimonious trees and the Bayesian posterior sample in the R package Strap (Bell and Lloyd 2015)

  • Modified the RWTY (Warren, Geneva and Lanfear 2017) to color points in the treespace by MIG score

  • Also calculated some basic summary tables across the datasets using tidyverse

Results

  • 63% of datasets had the highest average stratigraphic congruence with parsimony

Results

  • 63% of datasets had the highest average stratigraphic congruence with parsimony
  • In 94% of datasets the highest stratigraphic congruence value was observed in the Bayesian posterior sample

Results

  • 63% of datasets had the highest average stratigraphic congruence with parsimony
  • In 94% of datasets the highest stratigraphic congruence value was observed in the Bayesian posterior sample
  • In 99% of datasets the lowest stratigraphic congruence was found in the Bayesian posterior sample

Results

  • 63% of datasets had the highest average stratigraphic congruence with parsimony
  • In 94% of datasets the highest stratigraphic congruence value was observed in the Bayesian posterior sample
  • In 99% of datasets the lowest stratigraphic congruence was found in the Bayesian posterior sample

Results

Example dataset: Yates (2003)

Results

Example dataset: Yates (2003)

Results

Example dataset: Demar 2013

Results

Example dataset: Demar 2013

Summary

Bayesian methods estimate a sample of solutions

Summary

Bayesian methods estimate a sample of solutions

Unlike other methods, the distribution itself may be important

Summary

Bayesian methods estimate a sample of solutions

Unlike other methods, the distribution itself may be important

Looking at the full distribution of trees can provide us with information that one solution itself may not

Summary

We shouldn’t be asking “Is Bayes or parsimony better?”

Summary

We shouldn’t be asking “Is Bayes or parsimony better?”

We should instead be looking for ways to comfortably visualize variation in large datasets

Summary

We shouldn’t be asking “Is Bayes or parsimony better?”

We should instead be looking for ways to comfortably visualize variation in large datasets

Treespace visualizations provide an easy and intuitive way to do this

Thank You!

A sincere thank you to Sebastian for inviting me!

To Dan Warren and Rob Lanfear for the endlessly hackable RWTY software

And to my co-author and partner in crime, Dr. Graeme Lloyd