Exclude sequences by alignment
This method aligns feature sequences to a set of reference sequences to
identify sequences that hit/miss the reference within a specified
perc_identity, evalue, and perc_query_aligned. This method could be used to
define a positive filter, e.g., extract only feature sequences that align
to a certain clade of bacteria; or to define a negative filter, e.g.,
identify sequences that align to contaminant or human DNA sequences that
should be excluded from subsequent analyses. Note that filtering is
performed based on the perc_identity, perc_query_aligned, and evalue
thresholds (the latter only if method==BLAST and an evalue is set). Set
perc_identity==0 and/or perc_query_aligned==0 to disable these filtering
thresholds as necessary.
Parameters
- query_sequences : FeatureData[Sequence]
- Sequences to test for exclusion
- reference_sequences : FeatureData[Sequence]
- Reference sequences to align against feature sequences
- method : Str % Choices('blast', 'vsearch', 'blastn-short'), optional
- Alignment method to use for matching feature sequences against
reference sequences
- perc_identity : Float % Range(0.0, 1.0, inclusive_end=True), optional
- Reject match if percent identity to reference is lower. Must be in
range [0.0, 1.0]
- evalue : Float, optional
- BLAST expectation (E) value threshold for saving hits. Reject if E
value is higher than threshold. This threshold is disabled by default.
- perc_query_aligned : Float, optional
- Percent of query sequence that must align to reference in order to be
accepted as a hit.
Returns
- sequence_hits : FeatureData[Sequence]
- Subset of feature sequences that align to reference sequences
- sequence_misses : FeatureData[Sequence]
- Subset of feature sequences that do not align to reference sequences