annotate test-data/README @ 0:66ebc4b19d6c draft default tip

"planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
author padge
date Wed, 13 Apr 2022 15:03:31 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
1 **Song et al**:
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
2 We include here sample input files based on the [Song et al.](http://www.pnas.org/content/109/37/14942.short) dataset of 37 mammalian species and 442 genes.
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
3 We have removed 23 problematic genes (21 mislabeled genes and 2 genes we classified as outliers) and
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
4 we have also re-estimated gene trees using RAxML on the alignments that authors of that paper kindly provided to us.
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
5 We have also included 200 replicates of bootstrapped gene trees for the gene trees we estimated on the Song et al. dataset.
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
6
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
7 **Simulation**:
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
8 We have included simulations based on the Song et al. dataset with increases rates of ILS.
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
9
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
10 **primates:**
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
11 We have also created a reduced version of the Song et al. dataset with 9 primates, tree shrews, and 4 other mammalian taxa. This dataset is provided for testing the exact version.
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
12
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
13 **14 taxon simulation:**
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
14 Simulated used SimPhy with extreme levels of ILS
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
15
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
16 **100-taxon simulations:**
66ebc4b19d6c "planemo upload for repository https://github.com/smirarab/ASTRAL commit 0f93f327c49e93d6af057973d68ba772ba5715dc-dirty"
padge
parents:
diff changeset
17 A simulated dataset with 100 taxa and 2500 bootstrap replicate gene trees is also provided for testing large datasets.