What it does
PopPUNK will calculate core and accessory distance between input assemblies using variable length k-mers. A model will be fitted to all of these distances to determine genetic clusters for all inpits.
The most important thing to check is that in the output plot the component (blob) closest to the origin has been correctly identified - this should be checked in the cluster/model plot output. If it has not, you may wish to try another model. Some broad advice:
- DBSCAN is a good default, but may lead to unclassified points (black). If there are a large number of these consider another model.
- GMM will work well with well-separated components and an appropriate choice of K (consider increasing it based on the number of components that can be seen).
- The refine mode should be added in recombining species, which can be seen from the output plots if the coloured components are overlapping, or if there is a blur of points rather than discrete blobs of points..