Repository 'filter_assemblies'
hg clone https://toolshed.g2.bx.psu.edu/repos/abims-sbr/filter_assemblies

Changeset 1:a83562c0719f (2025-02-03)
Previous changeset 0:7a813e633d1c (2019-02-01) Next changeset 2:000dbfafe31d (2025-06-30)
Commit message:
planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 68979144b9949c27bcc3340a9e8375de1391526c
modified:
filter_assembly.xml
macros.xml
scripts/S01_script_to_choose.py
test-data/trinity_and_velvet_up.output
test-data/trinity_out/AcAcaud_trinity.fasta
test-data/trinity_out/AmAmphi_trinity.fasta
test-data/trinity_out/ApApomp_trinity.fasta
test-data/trinity_out/PfPfiji_trinity.fasta
test-data/trinity_up.output
test-data/velvet_out/AcAc_transcriptome_25591.fasta
test-data/velvet_out/ApAp_transcriptome_35099.fasta
test-data/velvet_out/PgPg_transcriptome_90109.fasta
test-data/velvet_up.output
removed:
scripts/S02a_remove_redondancy_from_velvet_oases.py
scripts/S02b_format_fasta_name_trinity.py
scripts/S03_choose_one_variants_per_locus_trinity.py
scripts/S04_find_orf.py
scripts/S05_filter.py
b
diff -r 7a813e633d1c -r a83562c0719f filter_assembly.xml
--- a/filter_assembly.xml Fri Feb 01 10:22:32 2019 -0500
+++ b/filter_assembly.xml Mon Feb 03 14:37:31 2025 +0000
[
@@ -1,4 +1,4 @@
-<tool name="Filter assemblies" id="filter_assemblies" version="2.0.3">
+<tool name="Filter assemblies" id="filter_assemblies" version="2.0.4">
 
     <description>
         Filter the outputs of Velvet or Trinity assemblies
@@ -9,8 +9,7 @@
     </macros>
 
     <requirements>
-        <expand macro="python_required" />
-        <requirement type="package" version="0.0.14">fastx_toolkit</requirement>
+        <expand macro="python3_required" />
         <requirement type="package" version="10.2011">cap3</requirement>
     </requirements>
 
@@ -23,19 +22,13 @@
         #end for
         #set $infiles = $infiles[:-1]
 
-        ln -s '$__tool_directory__/scripts/S02a_remove_redondancy_from_velvet_oases.py' . &&
-        ln -s '$__tool_directory__/scripts/S02b_format_fasta_name_trinity.py' . &&
-        ln -s '$__tool_directory__/scripts/S03_choose_one_variants_per_locus_trinity.py' . &&
-        ln -s '$__tool_directory__/scripts/S04_find_orf.py' . &&
-        ln -s '$__tool_directory__/scripts/S05_filter.py' . &&
-
         python '$__tool_directory__/scripts/S01_script_to_choose.py'
 
         '$infiles'
         $length_seq_max
         $percent_identity
         $overlap_length
-        > ${log}
+        > '${log}'
     ]]>
     </command>
 
@@ -106,13 +99,13 @@
 
 **Description**
 
-This tool reformats Velvet Oases or Trinity assemblies for the AdaptSearch galaxy suite and selects only one variant per gene according to its length and quality check.
+This tool runs the CAP3 software on assembly FASTA data, merge singlets and contigs and then reformat headers to allow any assembly tools.
 
 ---------
 
 **Input format**
 
-(1) Sequences are in the sequential format:
+Sequences are in the FASTA format:
 
 | >seqname1
 | AAAGAGAGACCACATGTCAGTAGC -on one or several lines -
@@ -121,18 +114,6 @@
 | etc ...
 |
 
-2) The file name should begin with a two letter abbreviation of the species name (for isntance, 'Ap' if the species is Alvinella pompejana).
-
-**For Velvet Oases assemblies input**
-            
-    The headers must be as follow : *>Locus_i_Transcript_i/j_Confidence_x.xxx_Length_N* where i is the locus number, j the transcript variant among all versions of the transcript, x.xxx the confidence value and N the length.
-
-**For Trinity assemblies inputs**   
-            
-    The headers must be as follow : *>cj_gj_ij Len=j path=[j:0-j]* where all the j are integers (locus number, transcript variant, length, position...)
-
-**The tool handles the case if input files come from both assemblers (there is no need for input files to be exclusively from one or another assembler).**
-
 ---------
 
 **Parameters**
@@ -150,11 +131,9 @@
 **Steps**:
     
 The tool:
-    1) Modifies the sequence name to add the species abbreviation using the 2 first letters of the name of the transcriptome file : note that each species abbreviation must be unique
-    2) Selects one allelic sequence from each transcript (c or locus) using the length of the sequence and its level of confidence
-    3) Selects the best ORF from the sequence between two stop codons
-    4) Performs a CAP3 from the full set of ORFs to minimize redundancy
-    5) Retrieves the initial transcript sequences from the remaining set of proceeded ORF sequences
+    1) Performs a CAP3 from the full set of ORFs to minimize redundancy
+    2) Merges singlets and contigs identified by CAP3
+    3) Reformats headers of the FASTA records by adding a specified prefix (defined from the original filename) and ensures that sequences are on a single line
 
 **Outputs**
 
@@ -172,6 +151,11 @@
 Changelog
 ---------
 
+
+**Version 2.2 - 07/10/2024**
+
+    - Input files can be from any assembly tools
+
 **Version 2.1 - 15/01/2018**
 
     - Input files can be a mix from files coming either from Trinity or Velvet Oases assemblers
b
diff -r 7a813e633d1c -r a83562c0719f macros.xml
--- a/macros.xml Fri Feb 01 10:22:32 2019 -0500
+++ b/macros.xml Mon Feb 03 14:37:31 2025 +0000
b
@@ -1,9 +1,13 @@
 <macros>
 
  <xml name="python_required">
- <requirement type="package" version="2.7">python</requirement>
+ <requirement type="package" version="3.10">python</requirement>
  </xml>
 
+ <xml name="python3_required">
+                        <requirement type="package" version="1.79">biopython</requirement>
+        </xml>
+
     <token name="@HELP_AUTHORS@">
 .. class:: infomark
 
b
diff -r 7a813e633d1c -r a83562c0719f scripts/S01_script_to_choose.py
--- a/scripts/S01_script_to_choose.py Fri Feb 01 10:22:32 2019 -0500
+++ b/scripts/S01_script_to_choose.py Mon Feb 03 14:37:31 2025 +0000
[
b'@@ -1,54 +1,157 @@\n #!/usr/bin/env python\n-#coding: utf-8\n+import os\n+import subprocess\n+import sys\n \n-## AUTHOR: Eric Fontanillas\n-## LAST VERSION: 10.2017 by Victor Mataigne\n+from Bio import SeqIO\n+\n \n-import glob, sys, string, os\n-    \n-def nameFormatting(name, script_path, prefix):\n-    f = open(name, "r")\n-    f1 = f.readline() # Only need to check first line to know the assembler which has been used\n-    f.close()\n-    name_find_orf_input = ""\n+def fasta_formatter(input_file, output_file):\n+    """\n+    Reformats the input FASTA file to ensure that sequences\n+    are on a single line.\n+    """\n+    os.makedirs(os.path.dirname(output_file), exist_ok=True)\n+    with open(input_file, \'r\') as infile, open(output_file, \'w\') as outfile:\n+        sequence = \'\'\n+        header = \'\'\n+        for line in infile:\n+            if line.startswith(\'>\'):\n+                if sequence:\n+                    outfile.write(sequence + \'\\n\')\n+                header = line.strip()\n+                outfile.write(header + \'\\n\')\n+                sequence = \'\'\n+            else:\n+                sequence += line.strip()\n+        if sequence:\n+            outfile.write(sequence + \'\\n\')\n+\n \n-    if f1.startswith(">Locus"):\n-        name_remove_redondancy = "02_%s" %name\n-        os.system("python S02a_remove_redondancy_from_velvet_oases.py %s %s" %(name, name_remove_redondancy))\n-        name_find_orf_input = "%s%s" %(prefix, name)\n-        os.system("sed -e \'s/Locus_/%s/g\' -e \'s/_Confidence_/_/g\' -e \'s/_Transcript_/_/g\' -e \'s/_Length_/_/g\' %s > %s" % (prefix, name_remove_redondancy, name_find_orf_input))\n-    elif f1.startswith(">c"):        \n-        #Format the name of the sequences with good name\n-        name_format_fasta = "03%s" %name\n-        os.system("python S02b_format_fasta_name_trinity.py %s %s %s" %(name, name_format_fasta, prefix))\n-        #Apply first script to avoid reductant sequences\n-        name_find_orf_input = "04%s" %name\n-        os.system("python S03_choose_one_variants_per_locus_trinity.py %s %s" %(name_format_fasta, name_find_orf_input))\n+def reformat_headers(input_file, output_file, prefix):\n+    """\n+    Reformats the headers of the FASTA records by adding a specified prefix\n+    and ensures that sequences are on a single line.\n+    """\n+    with open(input_file, \'r\') as infile, open(output_file, \'w\') as outfile:\n+        sequence = \'\'\n+        for line in infile:\n+            if line.startswith(\'>\'):\n+                if sequence:\n+                    outfile.write(sequence + \'\\n\')\n+                # Process header line\n+                original_id = line[1:].strip()\n+                header_parts = original_id.split(\'/\')\n+                numeric_part = header_parts[0].replace(\'ou\', \'\')\n+                rest = \'/\'.join(header_parts[1:]) \\\n+                    if len(header_parts) > 1 else ""\n+                if rest:\n+                    new_header = ">{}/{}".format(prefix +\n+                                                 str(numeric_part), rest)\n+                else:\n+                    new_header = ">{}".format(prefix + str(numeric_part))\n+                outfile.write(new_header + \'\\n\')\n+                sequence = \'\'\n+            else:\n+                sequence += line.strip()\n+        if sequence:\n+            outfile.write(sequence + \'\\n\')\n \n-    return name_find_orf_input\n+\n+def rename_fasta_headers(input_fasta, output_fasta):\n+    # Extract the base name of the file (without .fasta extension)\n+    base_name_dir = input_fasta.split(\'.\')[0]\n+    base_name = base_name_dir.split(\'/\')[1]\n+    # The first two letters of the file name\n+    prefix = base_name[3:5]\n+    # List to store new sequences\n+    modified_sequences = []\n+\n+    # Read the file and edit the headers\n+    for index, record in enumerate(SeqIO.parse(input_fasta, "fasta"), start=1):\n+        seq_length = len(record.seq)\n+        new_header = ">{}{}_1/1_1.000_{}".format(prefix, index, seq_length)\n+        record.id = new_header[1:]  # [1:] to remove '..b'      sys.exit(1)\n+\n+    output_dir = "outputs"  # Define the output directory\n+    os.makedirs(output_dir, exist_ok=True)\n     percent_identity = sys.argv[3]\n     overlap_length = sys.argv[4]\n \n-    for name in str.split(sys.argv[1], ","):         \n-        prefix=name[0:2]\n-        name_fasta_formatter = "01%s" %name\n-        os.system("cat \'%s\' | fasta_formatter -w 0 -o \'%s\'" % (name, name_fasta_formatter))\n-        name_find_orf_input = nameFormatting(name_fasta_formatter, script_path, prefix)\n-        #Pierre guillaume find_orf script for keeping the longuest ORF\n-        name_find_orf = "05%s"% name\n-        os.system("python S04_find_orf.py %s %s" %(name_find_orf_input, name_find_orf))\n-        #Apply cap3\n-        os.system("cap3 %s -p %s -o %s"%(name_find_orf, percent_identity, overlap_length))\n-        #Il faudrait faire un merge des singlets et contigs! TODO\n-        os.system("zcat -f < \'%s.cap.singlets\' | fasta_formatter -w 0 -o \'%s\'" % (name_find_orf, prefix))\n-        #Apply pgbrun script filter script TODO length parameter\n-        name_filter = "%s%s"%(prefix, name)\n-        os.system("python S05_filter.py %s %s outputs/%s" %(prefix, length_seq_max, name_filter))\n+    for name in sys.argv[1].split(","):\n+        if not os.path.isfile(name):\n+            print("Error: Input file {} does not exist.".format(name))\n+            continue\n+\n+        # Apply CAP3\n+        # Get the base file name\n+        file_name = os.path.basename(name)\n+        # Define the output file path in the output directory\n+        output_file_path = os.path.join(output_dir, file_name)\n+        # Create a symbolic link for the input file in the output directory\n+        symlink_path = os.path.join(output_dir, file_name)\n+        if not os.path.exists(symlink_path):\n+            os.symlink(os.path.abspath(name), symlink_path)\n+\n+        # Print and run the CAP3 command\n+        print(\n+            "cap3 {} -p {} -o {}".format(output_file_path,\n+                                         percent_identity, overlap_length)\n+        )\n+        subprocess.run([\n+            "cap3", output_file_path, "-p", percent_identity,\n+            "-o", overlap_length], check=True)\n+\n+        # Format file to have sequence in one line\n+        name_fasta_formatter = os.path.join(\n+            output_dir, "02_{}".format(os.path.basename(name)))\n+        fasta_formatter(\n+            "{}.cap.singlets".format(output_file_path), name_fasta_formatter)\n+\n+        # Merge singlets and contigs\n+        merged_file = os.path.join(output_dir,\n+                                   "03_{}_merged.fasta".format(file_name))\n+        # Define paths for CAP3 output files\n+        cap_singlets_file = os.path.join(output_dir,\n+                                         "{}.cap.singlets".format(file_name))\n+        cap_contigs_file = os.path.join(output_dir,\n+                                        "{}.cap.contigs".format(file_name))\n+        print("{} and {}".format(cap_singlets_file, cap_contigs_file))\n+\n+        with open(merged_file, \'w\') as outfile:\n+            # Write the contents of the contigs file first\n+            if os.path.exists(cap_contigs_file):\n+                with open(cap_contigs_file, \'r\') as contigs:\n+                    outfile.write(contigs.read())\n+            # Append the contents of the singlets file\n+            if os.path.exists(cap_singlets_file):\n+                with open(cap_singlets_file, \'r\') as singlets:\n+                    outfile.write(singlets.read())\n+\n+        # Reformat headers\n+        name_fasta_final = os.path.join(\n+            output_dir, "04_{}".format(os.path.basename(name)))\n+        rename_fasta_headers(merged_file, name_fasta_final)\n+\n+        # Format final file to have sequence in one line\n+        prefix = file_name[:2]\n+        tmp = prefix + os.path.basename(name)\n+        name_final_file = os.path.join(output_dir, tmp)\n+        fasta_formatter(name_fasta_final, name_final_file)\n+\n \n if __name__ == "__main__":\n     main()\n'
b
diff -r 7a813e633d1c -r a83562c0719f scripts/S02a_remove_redondancy_from_velvet_oases.py
--- a/scripts/S02a_remove_redondancy_from_velvet_oases.py Fri Feb 01 10:22:32 2019 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,122 +0,0 @@
-#!/usr/bin/env python
-## AUTHOR: Eric Fontanillas
-## LAST VERSION: 06.12.2011
-
-## DESCRIPTION: Remove redondant transcripts (i.e. transcript from the same locus) from Oases output on the basis of two recursive criterias (see in DEF1):  
-            ## 1. [CRITERIA 1] Keep in priority seq with BEST "confidence_oases_criteria" present in the fasta name
-            ## 2. [CRITERIA 2] Second choice (if same coverage) : choose the longuest sequence (once any "N" have been removed => effective_length = length - N number
-## => criticize of this approach: the transcripts may come from a same locus but may be not redundant (non-overlapping) ==> SEE "DEF2" for an alternative
-
-###################
-###### DEF 1 ######
-###################
-def dico_filtering_redundancy(path_in):
-    f_in = open(path_in, "r")
-    bash = {}
-    bash_unredundant = {}
-    file_read = f_in.read()    
-    S1 = file_read.split(">")
-    k = 0
-
-    ## 1 ## Extract each transcript and group them in same locus if they share the same "short_fasta_name"
-    for element in S1:
-        if element != "":            
-            S2 = element.split("\n")
-            fasta_name = S2[0]
-            fasta_seq = S2[1:-1] # that line was unindented
-            fasta_seq = "".join(fasta_seq) # that line was unindented            
-            L = fasta_name.split("_")
-            short_fasta_name = L[0] + L[1]
-
-            ## Used later for [CRITERIA 1] (see below)
-            confidence_oases_criteria = L[-3]
-            countN = fasta_seq.count("N")
-            length = len(fasta_seq)
-            effective_length = length - countN
-
-            if short_fasta_name not in list(bash.keys()): 
-                bash[short_fasta_name] = [[fasta_name, fasta_seq, confidence_oases_criteria, effective_length]]
-            else:
-                bash[short_fasta_name].append([fasta_name, fasta_seq, confidence_oases_criteria, effective_length])
-        k = k+1
-    f_in.close()
-
-    for key in list(bash.keys()):
-        ## 2 ## IF ONE TRANSCRIPT PER LOCUS:
-        ## In this case => we record directly
-        if len(bash[key]) == 1:
-            entry = bash[key][0]
-            name = entry[0]
-            seq = entry[1]
-            bash_unredundant[name] = seq
-
-        ## 3 ## IF MORE THAN ONE TRANSCRIPTS PER LOCUS:
-        ## In this case:
-        ## 1. [CRITERIA 1] Keep in priority seq with BEST "confidence_oases_criteria" present in the fasta name
-        ## 2. [CRITERIA 2] Second choice (if same coverage) : choose the longuest sequence (once any "N" have been removed => effective_length = length - N numb
-        elif len(bash[key]) > 1:   ### means there are more than 1 seq            
-            MAX_CONFIDENCE = {}
-            MAX_LENGTH = {}
-            for entry in bash[key]:    ## KEY = short fasta name    || VALUE = list of list, e.g. :  [[fasta_name1, fasta_seq1],[fasta_name2, fasta_seq2][fasta_name3, fasta_seq3]]
-                name = entry[0]
-                seq = entry[1]
-                effective_length = entry[3]
-                confidence_oases_criteria = entry[2]
-
-                ## Bash for [CRITERIA 2]
-                MAX_LENGTH[effective_length] = entry
-
-                ## Bash for [CRITERIA 1]
-                # confidence_oases_criteria = string.atof(confidence_oases_criteria)
-                confidence_oases_criteria = float(confidence_oases_criteria)
-                if confidence_oases_criteria not in list(MAX_CONFIDENCE.keys()):
-                    MAX_CONFIDENCE[confidence_oases_criteria] = entry
-                else:    ## IF SEVERAL SEQUENCES WITH THE SAME CONFIDENCE INTERVAL => RECORD ONLY THE LONGUEST ONE [CRITERIA 2]
-                    current_seq_length = effective_length
-                    yet_recorded_seq_length = MAX_CONFIDENCE[confidence_oases_criteria][3]
-                    if current_seq_length > yet_recorded_seq_length:
-                        MAX_CONFIDENCE[confidence_oases_criteria] = entry   ## Replace the previous recorded entry with the same confidence interval but lower length
-
-            ## Sort keys() for MAX_CONFIDENCE bash 
-            KC = list(MAX_CONFIDENCE.keys())
-            KC.sort()
-
-            ## Select the best entry
-            MAX_CONFIDENCE_KEY = KC[-1]  ## [CRITERIA 1]
-            BEST_ENTRY = MAX_CONFIDENCE[MAX_CONFIDENCE_KEY]            
-
-            BEST_fasta_name = BEST_ENTRY[0]
-            BEST_seq = BEST_ENTRY[1]
-            bash_unredundant[BEST_fasta_name] = BEST_seq
-
-    return bash_unredundant
-#~#~#~#~#~#~#~#~#~#
-
-###################
-### RUN RUN RUN ###
-###################
-import string, os, sys, re
-
-path_IN = sys.argv[1]
-path_OUT = sys.argv[2]
-file_OUT = open(path_OUT, "w")
-dico = dico_filtering_redundancy(path_IN)    ### DEF1 ###
-KB = list(dico.keys())
-
-## Sort the fasta_name depending their number XX : ApXX
-BASH_KB = {}
-for name in KB:
-    L = name.split("_")
-    nb = int(L[1])
-    BASH_KB[nb] = name
-NEW_KB = []    
-KKB = list(BASH_KB.keys())
-KKB.sort()
-
-for nb in KKB:
-    fasta_name = BASH_KB[nb]
-    seq = dico[fasta_name]
-    file_OUT.write(">%s\n" %fasta_name)
-    file_OUT.write("%s\n" %seq)
-
-file_OUT.close()
\ No newline at end of file
b
diff -r 7a813e633d1c -r a83562c0719f scripts/S02b_format_fasta_name_trinity.py
--- a/scripts/S02b_format_fasta_name_trinity.py Fri Feb 01 10:22:32 2019 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,66 +0,0 @@
-#!/usr/bin/env python
-## AUTHOR: Eric Fontanillas
-## LAST VERSION: 06.12.2011
-## DESCRIPTION: format fasta name in TRINITY output
-
-from os import listdir
-import re
-
-###################
-###### DEF 1 ######
-###################
-def dico_format_fasta_name(path_in, SUFFIX):
-    f_in = open(path_in, "r")
-    bash = {}
-    file_read = f_in.read()
-    S1 = file_read.split(">")
-    k = 0
-
-    for element in S1:
-        if element != "":
-            S2 = element.split("\n")
-            fasta_name = S2[0]
-            fasta_seq = S2[1]
-            L = fasta_name.split("_")
-            match=re.search('(\D+)(\d+)', L[0])
-            short_fasta_name= SUFFIX + match.group(2) + "_" + L[1] + "_" + L[2]
-            bash[short_fasta_name] = fasta_seq
-
-    return bash
-#~#~#~#~#~#~#~#~#~#
-
-###################
-### RUN RUN RUN ###
-###################
-import string, os, sys, re
-
-path_IN = sys.argv[1]
-path_OUT = sys.argv[2]
-suffix= sys.argv[3]
-file_OUT = open(path_OUT, "w")
-#Extract suffix info
-
-dico = dico_format_fasta_name(path_IN, suffix)   ### DEF1 ###
-
-print((len(list(dico.keys()))))
-
-KB = list(dico.keys())
-
-## Sort the fasta_name depending their number XX : ApXX
-BASH_KB = {}
-for name in KB:    
-    L = name.split("_")
-    nb = L[0][2:]    
-    nb = int(nb)    
-    BASH_KB[nb] = name
-
-KKB = list(BASH_KB.keys())
-KKB.sort()
-
-for nb in KKB:
-    fasta_name = BASH_KB[nb]
-    seq = dico[fasta_name]
-    file_OUT.write(">%s\n" %fasta_name)
-    file_OUT.write("%s\n" %seq)
-
-file_OUT.close()
b
diff -r 7a813e633d1c -r a83562c0719f scripts/S03_choose_one_variants_per_locus_trinity.py
--- a/scripts/S03_choose_one_variants_per_locus_trinity.py Fri Feb 01 10:22:32 2019 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,111 +0,0 @@
-#!/usr/bin/env python
-## AUTHOR: Eric Fontanillas
-## LAST VERSION: 06.12.2011
-
-## DESCRIPTION: Remove redondant transcripts (i.e. transcript from the same locus) from TRINITY on the basis of 1 criteria: 
-            ## 1. [CRITERIA 1] choose the longuest sequence (once any "N" have been removed => effective_length = length - N number
-
-
-
-###################
-###### DEF 1 ######
-###################
-def dico_filtering_redundancy(path_in):
-    f_in = open(path_in, "r")
-    bash = {}
-    bash_unredundant = {}
-    file_read = f_in.read()    
-    S1 = file_read.split(">")
-    k = 0
-
-    ## 1 ## Extract each transcript and group them in same locus if they share the same "short_fasta_name"
-    for element in S1:
-        if element != "":            
-            S2 = element.split("\n")
-            fasta_name = S2[0]
-            fasta_seq = S2[1]
-            
-            L = fasta_name.split("_")
-            short_fasta_name = L[0] + L[1] ## 1.1. ## Extract short fasta name
-            
-            ## Used later for [CRITERIA 1] (see below)
-
-            countN = fasta_seq.count("N")
-            length = len(fasta_seq)
-            effective_length = length - countN
-
-            if short_fasta_name not in list(bash.keys()):
-                bash[short_fasta_name] = [[fasta_name, fasta_seq, effective_length]]
-            else:
-                bash[short_fasta_name].append([fasta_name, fasta_seq, effective_length])
-        k = k+1
-        if k%1000 == 0:
-            print (k)
-    f_in.close()
-
-    for key in list(bash.keys()):
-        ## 2 ## IF ONE TRANSCRIPT PER LOCUS:
-        ## In this case => we record directly
-        if len(bash[key]) == 1:
-            entry = bash[key][0]
-            name = entry[0]
-            seq = entry[1]
-            bash_unredundant[name] = seq
-
-        ## 3 ## IF MORE THAN ONE TRANSCRIPTS PER LOCUS:
-        ## In this case:
-        ## [CRITERIA 1]: Choose the longuest sequence (once any "N" have been removed => effective_length = length - N numb
-        elif len(bash[key]) > 1:   ### means there are more than 1 seq
-            MAX_LENGTH = {}
-            for entry in bash[key]:    ## KEY = short fasta name    || VALUE = list of list, e.g. :  [[fasta_name1, fasta_seq1],[fasta_name2, fasta_seq2][fasta_name3, fasta_seq3]]
-                name = entry[0]
-                seq = entry[1]
-                effective_length = entry[2]
-
-                ## Bash for [CRITERIA 1]
-                MAX_LENGTH[effective_length] = entry
-
-            ## Sort keys() for MAX_LENGTH bash 
-            KC = list(MAX_LENGTH.keys())
-            KC.sort()
-
-            ## Select the best entry
-            MAX_LENGTH_KEY = KC[-1]  ## [CRITERIA 1]
-            BEST_ENTRY = MAX_LENGTH[MAX_LENGTH_KEY]
-
-            BEST_fasta_name = BEST_ENTRY[0]
-            BEST_seq = BEST_ENTRY[1]
-            bash_unredundant[BEST_fasta_name] = BEST_seq
-
-    return bash_unredundant
-#~#~#~#~#~#~#~#~#~#
-
-###################
-### RUN RUN RUN ###
-###################
-import string, os, sys, re
-
-path_IN = sys.argv[1]
-path_OUT = sys.argv[2]
-file_OUT = open(path_OUT, "w")
-dico = dico_filtering_redundancy(path_IN)    ### DEF1 ###
-KB = list(dico.keys())
-
-## Sort the fasta_name depending their number XX : ApXX
-BASH_KB = {}
-for name in KB:
-    L = name.split("_")
-    nb = L[0][2:]
-    nb = int(nb)
-    BASH_KB[nb] = name
-
-KKB = list(BASH_KB.keys())
-KKB.sort()
-
-for nb in KKB:
-    fasta_name = BASH_KB[nb]
-    seq = dico[fasta_name]
-    file_OUT.write(">%s\n" %fasta_name)
-    file_OUT.write("%s\n" %seq)
-
-file_OUT.close()
\ No newline at end of file
b
diff -r 7a813e633d1c -r a83562c0719f scripts/S04_find_orf.py
--- a/scripts/S04_find_orf.py Fri Feb 01 10:22:32 2019 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,64 +0,0 @@
-#!/usr/bin/env python
-#keeps the longest ORF found in the 6 possible ORF alltogether
-#python find_ORF.py file output
-
-def find_orf(entry):
-    orf={}
-    orf_length={}
-    stop=['TAA','TAG','TGA']
-    for i in range(0,3):
-        pos=i
-        orf[i]=[0]
-        while pos<len(entry):
-            if entry[pos:pos+3] in stop:
-                orf[i].append(pos-1)
-                orf[i].append(pos+3)
-            pos+=3
-        orf[i].append(len(entry)-1)
-        orf_length[i]=[]
-        for u in range(1,len(orf[i])):
-            orf_length[i].append(orf[i][u]-orf[i][u-1]+1)
-        orf[i]=[orf[i][orf_length[i].index(max(orf_length[i]))],orf[i][orf_length[i].index(max(orf_length[i]))+1]]
-    orf_max={0:max(orf_length[0]),1:max(orf_length[1]),2:max(orf_length[2])}
-    orf=orf[max(list(orf_max.keys()), key=(lambda k: orf_max[k]))]
-    if orf[0]==0:
-        orf[0]=orf[0]+max(list(orf_max.keys()), key=(lambda k: orf_max[k]))
-    return orf
-
-
-def reverse_seq(entry):
-    nt={'A':'T','T':'A','G':'C','C':'G', 'N':'N'}
-    seqlist=[]
-    for i in range(len(entry)-1,-1,-1):
-        seqlist.append(nt[entry[i]])
-    seq=''.join(seqlist)
-    return seq
-
-# RUN
-
-import string, os, sys, re, itertools
-
-path_IN = sys.argv[1]
-file_OUT = open(sys.argv[2], "w")
-inc=1
-threshold=0 #minimal length of the ORF
-
-with open (path_IN, "r") as f_in:
-    for ignored, line in itertools.izip_longest(*[f_in]*2):    
-        name=">"+path_IN[:2]+str(inc)+"_1/1_1.000_"
-        high_plus=find_orf(line[:-1])
-        reverse=reverse_seq(line[:-1])
-        high_minus=find_orf(reverse)
-        if high_plus[1]-high_plus[0]>threshold or high_minus[1]-high_minus[0]>threshold:
-            inc+=1
-            if high_plus[1]-high_plus[0]>high_minus[1]-high_minus[0]:
-                file_OUT.write("%s" %name)
-                file_OUT.write(str(high_plus[1]-high_plus[0]+1)+"\n")
-                file_OUT.write("%s" %line[high_plus[0]:high_plus[1]+1])
-                file_OUT.write("\n")
-            else:
-                file_OUT.write("%s" %name)
-                file_OUT.write(str(high_minus[1]-high_minus[0]+1)+"\n")
-                file_OUT.write("%s" %reverse[high_minus[0]:high_minus[1]+1])
-                file_OUT.write("\n")
-file_OUT.close()  
b
diff -r 7a813e633d1c -r a83562c0719f scripts/S05_filter.py
--- a/scripts/S05_filter.py Fri Feb 01 10:22:32 2019 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,21 +0,0 @@
-#!/usr/bin/env python
-#filters the sequences depending on their length after cap3, makes the sequences names compatible with the phylogeny workflow
-#python filter.py file length_threshold_nucleotides output
-
-import string, os, sys, re, itertools
-
-path_IN = sys.argv[1]
-threshold = int(sys.argv[2]) #minimum number of nucleotides for one sequence
-file_OUT = open(sys.argv[3], "w")
-inc = 1
-with open(path_IN, "r") as f_in:
-    for ignored, sequence in itertools.izip_longest(*[f_in]*2):
-        name=">"+path_IN[:2]+str(inc)+"_1/1_1.000_"
-        if len(sequence)-1>threshold-1:
-            inc+=1
-            file_OUT.write("%s" %name)
-            file_OUT.write(str(len(sequence)-1)+"\n")
-            file_OUT.write("%s" %sequence)
-file_OUT.close()
-
-#filtre eventuel sur les petits transcrits
\ No newline at end of file
b
diff -r 7a813e633d1c -r a83562c0719f test-data/trinity_and_velvet_up.output
--- a/test-data/trinity_and_velvet_up.output Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/trinity_and_velvet_up.output Mon Feb 03 14:37:31 2025 +0000
b
@@ -1,4 +1,3 @@
-20
 Number of segment pairs = 380; number of pairwise comparisons = 3
 '+' means given segment; '-' means reverse complement
 
@@ -6,15 +5,13 @@
 
 
 DETAILED DISPLAY OF CONTIGS
-21
-Number of segment pairs = 380; number of pairwise comparisons = 3
+Number of segment pairs = 420; number of pairwise comparisons = 4
 '+' means given segment; '-' means reverse complement
 
 Overlaps            Containments  No. of Constraints Supporting Overlap
 
 
 DETAILED DISPLAY OF CONTIGS
-20
 Number of segment pairs = 380; number of pairwise comparisons = 3
 '+' means given segment; '-' means reverse complement
 
@@ -22,32 +19,45 @@
 
 
 DETAILED DISPLAY OF CONTIGS
-22
-Number of segment pairs = 342; number of pairwise comparisons = 2
+Number of segment pairs = 462; number of pairwise comparisons = 4
 '+' means given segment; '-' means reverse complement
 
 Overlaps            Containments  No. of Constraints Supporting Overlap
 
 
 DETAILED DISPLAY OF CONTIGS
-Number of segment pairs = 4032; number of pairwise comparisons = 0
+Number of segment pairs = 39402; number of pairwise comparisons = 402
+'+' means given segment; '-' means reverse complement
+
+Overlaps            Containments  No. of Constraints Supporting Overlap
+
+
+DETAILED DISPLAY OF CONTIGS
+Number of segment pairs = 39402; number of pairwise comparisons = 343
 '+' means given segment; '-' means reverse complement
 
 Overlaps            Containments  No. of Constraints Supporting Overlap
 
 
 DETAILED DISPLAY OF CONTIGS
-Number of segment pairs = 4160; number of pairwise comparisons = 0
+Number of segment pairs = 39402; number of pairwise comparisons = 352
 '+' means given segment; '-' means reverse complement
 
 Overlaps            Containments  No. of Constraints Supporting Overlap
 
 
 DETAILED DISPLAY OF CONTIGS
-Number of segment pairs = 4422; number of pairwise comparisons = 1
-'+' means given segment; '-' means reverse complement
-
-Overlaps            Containments  No. of Constraints Supporting Overlap
-
-
-DETAILED DISPLAY OF CONTIGS
+cap3 outputs/Pfiji_trinity.fasta -p 100 -o 60
+outputs/Pfiji_trinity.fasta.cap.singlets and outputs/Pfiji_trinity.fasta.cap.contigs
+cap3 outputs/Apomp_trinity.fasta -p 100 -o 60
+outputs/Apomp_trinity.fasta.cap.singlets and outputs/Apomp_trinity.fasta.cap.contigs
+cap3 outputs/Amphi_trinity.fasta -p 100 -o 60
+outputs/Amphi_trinity.fasta.cap.singlets and outputs/Amphi_trinity.fasta.cap.contigs
+cap3 outputs/Acaud_trinity.fasta -p 100 -o 60
+outputs/Acaud_trinity.fasta.cap.singlets and outputs/Acaud_trinity.fasta.cap.contigs
+cap3 outputs/Pg_transcriptome_90109.fasta -p 100 -o 60
+outputs/Pg_transcriptome_90109.fasta.cap.singlets and outputs/Pg_transcriptome_90109.fasta.cap.contigs
+cap3 outputs/Ap_transcriptome_35099.fasta -p 100 -o 60
+outputs/Ap_transcriptome_35099.fasta.cap.singlets and outputs/Ap_transcriptome_35099.fasta.cap.contigs
+cap3 outputs/Ac_transcriptome_25591.fasta -p 100 -o 60
+outputs/Ac_transcriptome_25591.fasta.cap.singlets and outputs/Ac_transcriptome_25591.fasta.cap.contigs
b
diff -r 7a813e633d1c -r a83562c0719f test-data/trinity_out/AcAcaud_trinity.fasta
--- a/test-data/trinity_out/AcAcaud_trinity.fasta Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/trinity_out/AcAcaud_trinity.fasta Mon Feb 03 14:37:31 2025 +0000
b
@@ -1,38 +1,44 @@
->Ac1_1/1_1.000_151
-TCGCTCTCCTCCGCCTTTTCTCTAAGCTTAAAAATTATGAAGAGTCTGCACCAGAGACAACTCCTTAGCACATCGACCGACCAGCTGGCCATAAAATGTCTTATTTATACCTTTGTCATGAGCCTTGATAATCCTTTGTGCCAGTGGTGGC
+>Ac1_1/1_1.000_160
+GCCAGTGAACTCATTAGGCTCTGTCTGCGCCGATATGATAAGAACGAGTTTGACAGCGATGATGACAGCATGAACAGCGAGCTCGCCTACGATGATGACTCTGAAATGCCTGATGACCTAATTGACCGTTTGGAGATGTGCGACTTTTATGAAATGGAAG
 >Ac2_1/1_1.000_160
+GCCACCACTGGCACAAAGGATTATCAAGGCTCATGACAAAGGTATAAATAAGACATTTTATGGCCAGCTGGTCGGTCGATGTGCTAAGGAGTTGTCTCTGGTGCAGACTCTTCATAATTTTTAAGCTTAGAGAAAAGGCGGAGGAGAGCGATTATAGCGA
+>Ac3_1/1_1.000_160
 ATGTTAGTAAAAGAGATTAAAGAGTACCGAGAGATAAAAGAGAAGGCTAGAACCTATCTATGTTATATTATAAGTAGTAACCTATCTTATGGTTCAAGCATAAATGAGGAGACTCTTCAAGAGAGTATGGAGATGTTAAAGAGGGCAATCCCAAAGAGTG
->Ac3_1/1_1.000_160
+>Ac4_1/1_1.000_160
 ATCTGTAATGTCGTTTACCACACACTGGACACTGATATTTCCGCTCGCCAGTGTGTGGTAAACGATATTACAGATCAGATGTGCTGGCAATCCATATCAGTACACACAGCAGTGAAAAAAATCATAAATGTGACATCTGTGGCAAGGCTTTCTCAAATGC
->Ac4_1/1_1.000_160
-AAACAATGCAATCCTCTACCATTGCCAAGATATGAAGAACAAGTAAATGGCACATCAACAACAATGATAATAATAAGTGCTAATAACAATAAGAGTAATACAATTACCACAATATCTGAGAACAAGGGGCTTAAGCATAGCTATCATTATTTGGGAGGGG
 >Ac5_1/1_1.000_160
-GCACCGGGATGCGGATTTGCTGACGATATGGCAAAAGCATTGTCAGCGTGCGGAACCTGTTTATGTCACACCACTGGCATCTTCCTGGCCGTCGCAGCCTTCGTTCTGACGGCACTCGGTATTGTCTGCGTCACGCGATCAGCTGACCCGAGCCTTTGGT
+AAACAATGCAATCCTCTACCATTGCCAAGATATGAAGAACAAGTAAATGGCACATCAACAACAATGATAATAATAAGTGCTAATAACAATAAGAGTAATACAATTACCACAATATCTGAGAACAAGGGGCTTAAGCATAGCTATCATTATTTGGGAGGGG
 >Ac6_1/1_1.000_160
+GCACCGGGATGCGGATTTGCTGACGATATGGCAAAAGCATTGTCAGCGTGCGGAACCTGTTTATGTCACACCACTGGCATCTTCCTGGCCGTCGCAGCCTTCGTTCTGACGGCACTCGGTATTGTCTGCGTCACGCGATCAGCTGACCCGAGCCTTTGGT
+>Ac7_1/1_1.000_160
 CAGCCTACCACTGAGAAGAGATACTTCAACATGTCTTACTGGGGTAGAAGTGGTGGTCGTACAGCGGGTGGTAATGCAGGACGTGGTCGTGGCGGCGGCAGCGGCAGTGGCAGTAGTCAAAGTGGTGGTGGCAGCTTTCTACAGGAACGTATCAAAGAGA
->Ac7_1/1_1.000_160
+>Ac8_1/1_1.000_160
 GCACCTAGAATTACCCGAAGTTGCTTGGCAATAGCGACACCTAACGGTCGCCATGATATTTGCAGGAAGAAGGCATGTGGTACCATTGGGAACCGTCAAGCGTTTCCTCAGCCCTGTGGCAGCTGCCCGTCTGCGCCCGTGTTTGACCTTGAGCACCAAG
->Ac8_1/1_1.000_160
+>Ac9_1/1_1.000_160
 ATCAAAGAAGAGCAACATCGAGCTACTGGCACTGGCAATGGAATCCTAATTATAGCAGAAACAAGCACTGGTTGCCTGTTGTCTGGGTCAGCAATTGGTAGTAGAGGTGTTCCTGCTGAAGAAGTTGGGGTCAAAGCAGGACAGATGCTTTTGGATAACT
->Ac9_1/1_1.000_160
-GCCATTCGTCTTAGGAGAAGTTTGTCGTCAGGAAAGATACATGAGGCCTGGATTCTTTCTGACACCGACTCGACGATGTCATTACCTTGTCCACCTGGAACCAACCCCTCATCGACTTCAGCGGATCCATATCTGGTGATCACCAGAAAAACGAACACTA
 >Ac10_1/1_1.000_160
-GCCTGGGTATTATTTACCACAGTAACCTTTCATCAGTTTGTGGTGAAAGTACGTGACGTTATGCATTGGCAAGATTGGACATTTTGGTTCGCCCTGTTTTGTACGCATAATAATGTATGTAGTTGTATTTTCCAAAATAATTGTTATATTAGCTATCCAA
+GCCATTCGTCTTAGGAGAAGTTTGTCGTCAGGAAAGATACATGAGGCCTGGATTCTTTCTGACACCGACTCGACGATGTCATTACCTTGTCCACCTGGAACCAACCCCTCATCGACTTCAGCGGATCCATATCTGGTGATCACCAGAAAAACGAACACTA
 >Ac11_1/1_1.000_160
-ACAATTACACAGGTATCAACAAATGTTCACTGCACCTGTCAGTTCCACAAACATAAAGATTACACACATGTACACATCTTTACAAAATATTTACAATTTTGTATTCTTAATTCTATCCACTTGGCTCTGGAAGGCCTTCAGCCATCAGATGATGTGTTTA
+GCCTGGGTATTATTTACCACAGTAACCTTTCATCAGTTTGTGGTGAAAGTACGTGACGTTATGCATTGGCAAGATTGGACATTTTGGTTCGCCCTGTTTTGTACGCATAATAATGTATGTAGTTGTATTTTCCAAAATAATTGTTATATTAGCTATCCAA
 >Ac12_1/1_1.000_160
+CTGGGCATGGTGGCTACCAAAACGGAGTATAGAACACTGTGTGACTTTTATGTTGATAATAGAAAATATATTCTCTATATAGACGAAGACTGCAGGGTGTCTAGAATATCGCCTAATAAAATTAGAATAGGGAGTCTGCAGTTGATCCGGAAACTACCAG
+>Ac13_1/1_1.000_160
+ACAATTACACAGGTATCAACAAATGTTCACTGCACCTGTCAGTTCCACAAACATAAAGATTACACACATGTACACATCTTTACAAAATATTTACAATTTTGTATTCTTAATTCTATCCACTTGGCTCTGGAAGGCCTTCAGCCATCAGATGATGTGTTTA
+>Ac14_1/1_1.000_160
 CAATCCAGCACTAGCAGGAGTGTTGGCCGGAAGGTTGATGATATTTTTCAGTCAAAGAATCTGCATGCTCCAGATGATCGCCTATCAGACAAGGATAACCGTGACAAGTCCAAGAACCCTTTACTTAACAATGAGATGACTCCTCAGTCATTTTCTCGAG
->Ac13_1/1_1.000_141
-GATTACATGCAAAACATAATAGAAATGTTTGTCCCAAGGTCTTACCAGTTTATAGTTTTACATTCGTGTCTTGAAATAAGAAAATGCCTTTATGAGAGTGTATTATTACTCAGTAGATGGAAATTAGCTTACCGGGGGATA
->Ac14_1/1_1.000_160
+>Ac15_1/1_1.000_160
+GATTACATGCAAAACATAATAGAAATGTTTGTCCCAAGGTCTTACCAGTTTATAGTTTTACATTCGTGTCTTGAAATAAGAAAATGCCTTTATGAGAGTGTATTATTACTCAGTAGATGGAAATTAGCTTACCGGGGGATATAATTTAGGCCGGAAACCC
+>Ac16_1/1_1.000_160
+TGTTTGTCCCAAGGTCTTACCAGTTTATAGTTTTACATTCGTGTCTTGAAATTCAGTAGATGGAAATTAGTGCTTATAGTGGGTTTGGGCAATCGATTTTTTTTTTTTTTTTTTTAAAAAAAAAGGCGAGGCCGAGAGAAGATTCCTAGCGAACAGCCTA
+>Ac17_1/1_1.000_160
 CCTGTTGTGACTCGTTCCCTGACGTCGTGCACGCAAGCGCACGCGCGTGCGCGCCGGGTTAGGCACACATACGCGGCACAGGTGCGCAGTATTAGACAGACGCAGACGCAGGCGTCCAGACACGCCAGCCAGCACGGTTACAATGTCCATATCACAATGA
->Ac15_1/1_1.000_147
-CTGAATGTCAACCAGTCACTGACCATCAGCTACATGTCTCTAATGGTCACTAGCATGAAACATGAAATGCCTGCTTATAGTGGGTCTGTAACTGGTAGGATACTGATTACATGTGGAGGCTTATTAAAGGGGTATCCTATTATTTTT
->Ac16_1/1_1.000_160
+>Ac18_1/1_1.000_160
+CTGAATGTCAACCAGTCACTGACCATCAGCTACATGTCTCTAATGGTCACTAGCATGAAACATGAAATGCCTGCTTATAGTGGGTCTGTAACTGGTAGGATACTGATTACATGTGGAGGCTTATTAAAGGGGTATCCTATTATTTTTTAAAACCCCCCCC
+>Ac19_1/1_1.000_160
 CTATGTTGGCTACTGCTAAGGATGTGCTACTTGCCTGATGTAAACAATTCCCAGAATGAATATAAACCAATCATAAGGAGAACTATGGAACCATCCTTAAATGTATTAATCTTATTTAAAATTATGTGCACATCTTGTTTGGCAGAAGGTACATTAAAGC
->Ac17_1/1_1.000_160
+>Ac20_1/1_1.000_160
 ATCTGTAATGTCGTTTACCACACACTGGACACTGATATTTCCGCTCGCCAGTGTGTGGTAAACGATATTACAGATCAGATGTGCTGGCAATCCATATCAGTACACACAGCAGTGAAAAAAATCATAAATGTGACATCTGTGGCAAGGCTTTCTCAAATGC
->Ac18_1/1_1.000_160
+>Ac21_1/1_1.000_160
 ATGTTAGTAAAAGAGATTAAAGAGTACCGAGAGATAAAAGAGAAGGCTAGAACCTATCTATGTTATATTATAAGTAGTAACCTATCTTATGGTTCAAGCATAAATGAGGAGACTCTTCAAGAGAGTATGGAGATGTTAAAGAGGGCAATCCCAAAGAGTG
->Ac19_1/1_1.000_160
+>Ac22_1/1_1.000_160
 GCCAGTGAACTCATTAGGCTCTGTCTGCGCCGATATGATAAGAACGAGTTTGACAGCGATGGTTATATTATGAACAGCGAGCTCGCCTACGATGATGACTCTGAAATGCCTGATGACCTAATTGACCGTTTAGAAGCTGGAAATATTACAAGCTTTGTGC
b
diff -r 7a813e633d1c -r a83562c0719f test-data/trinity_out/AmAmphi_trinity.fasta
--- a/test-data/trinity_out/AmAmphi_trinity.fasta Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/trinity_out/AmAmphi_trinity.fasta Mon Feb 03 14:37:31 2025 +0000
b
@@ -4,8 +4,8 @@
 CAGCCTACCACTGAGAAGAGATACTTCAACATGTCTTACTGGGGTAGAAGTGGTGGTCGTACAGCGGGTGGTAATGCAGGACGTGGTCGTGGCGGCGGCAGCGGCAGTGGCAGTAGTCAAAGTGGTGGTGGCAGCTTTCTACAGGAACGTATCAAAGAGA
 >Am3_1/1_1.000_160
 GCACCTAGAATTACCCGAAGTTGCTTGGCAATAGCGACACCTAACGGTCGCCATGATATTTGCAGGAAGAAGGCATGTGGTACCATTGGGAACCGTCAAGCGTTTCCTCAGCCCTGTGGCAGCTGCCCGTCTGCGCCCGTGTTTGACCTTGAGCACCAAG
->Am4_1/1_1.000_147
-ACAGTTCTAAATAGCATTCGCCAAATGATATTGAAAGCATATTTTATGAATAGTGGTTCACAGATGAAAGATCATTATTGGGAACCCGTTCCAGCTTTTGTAGATCATTTTGTTCTTGCTATAGATCATCGACCCAGAATACAAGTT
+>Am4_1/1_1.000_160
+ACAGTTCTAAATAGCATTCGCCAAATGATATTGAAAGCATATTTTATGAATAGTGGTTCACAGATGAAAGATCATTATTGGGAACCCGTTCCAGCTTTTGTAGATCATTTTGTTCTTGCTATAGATCATCGACCCAGAATACAAGTTTAGCACACAAGGA
 >Am5_1/1_1.000_160
 TACTGCTGTCGAAGTGATGGCGCTTGGAACAGTGTGATAACCTTGCCTACAAATAGGCCATTCTATTTGCTCAGATACACAAGTCAATGTCAGCTGGTGAAAGGAATGAAAGTCAGAAGGGAAGTATTCTACTGGGATAATGAAGATATTAACAATATTG
 >Am6_1/1_1.000_160
b
diff -r 7a813e633d1c -r a83562c0719f test-data/trinity_out/ApApomp_trinity.fasta
--- a/test-data/trinity_out/ApApomp_trinity.fasta Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/trinity_out/ApApomp_trinity.fasta Mon Feb 03 14:37:31 2025 +0000
b
@@ -4,37 +4,39 @@
 ATACTCAGGCACACAGCATTTGTCGTACTAGGCGAGAGAGAGAGAGGAACGACTAATTGCAACCACGATTACGTTACATTTGTTTACAAACCAAACGTACTGGCGTCGAAGATAATTAAGAGGAAGCTGACTGAATGCGATTGGCGTTGGTCTACGGGTT
 >Ap3_1/1_1.000_160
 GCCATGCAGTACACTGGACTTCTGTTATTCTGTTTGTTTGCCTTGACGGCAGCCAAACCCGCGGAAGACCTTCAAATGCTCATCCGAGCCCTGCTCCATGAAATAGAAGAGGAAGGTGAACTCCAAGAGCGAGGCATTGGCGCCGTGAAGTATGGTGGAA
->Ap4_1/1_1.000_135
-CGTTTAACCAGGCCCTGCTACCCTCCAATCTCGTCCAATCGGTCTCTACGCATCCACTCAATAATTATTGACATATTACAATTGATTCGGATTAAAAAAATGGCGCTAGGCTTAAAACACAGACAGTTCGCTAGC
+>Ap4_1/1_1.000_160
+CGGCCGCGGCGCGTCGTTCTCAGCCAAGCTGACTTCGACTTGAGCCGTCCATTCGCTTATTTACACGACGACTGCTCGACCCTTTACGACTTAGTCACACTTCCGTTTAACCAGGCCCTGCTACCCTCCAATCTCGTCCAATCGGTCTCTACGCATCCGA
 >Ap5_1/1_1.000_160
+CCGTTTAACCAGGCCCTGCTACCCTCCAATCTCGTCCAATCGGTCTCTACGCATCCACTCAATAATTATTGACATATTACAATTGATTCGGATTAAAAAAATGGCGCTAGGCTTAAAACACAGACAGTTCGCTAGCTGATTAGGCTCTTTTTAAGGCGAA
+>Ap6_1/1_1.000_160
 AATCTACTGACAGATACCTGGAACGAGATGCAGGTCAAGTGGTCGTGTTGTGGTGTGGATGGCTACTCCGACTGGACGCAAGCTGAAGGTCTGGCCACGGGTCACTACGTGCCGCAGTCCTGCTGTCAGAACACGATGAGTACAAGCTGCACGTCACAGA
->Ap6_1/1_1.000_160
+>Ap7_1/1_1.000_160
 TGGCGAAATGTAGTGGTCATTGATGGATTTTATTGCAATCAGTGTTACATATTACAAGCATTTCTTAATAAACAAAAAGTTGCACGAGATATTTTTTACTTAAAGGTTTTATGGGATGAACACAGTCAATTATATTCATGTAAAAGGCCTTATCCGAGAA
->Ap7_1/1_1.000_160
+>Ap8_1/1_1.000_160
 TTCGTAATGAATCTTTTTGACTGGTATTCCGCAGGATACTCAATAATTATTGTCGCATTCTTCGAAGTTATCGCCATTTCTTGGATATACGGTCTCCAACGGTTCAAGAAGGACATTCAGATGATGGTTGGCAAGGGGCGATGGATCAATGCTAGTTTCT
->Ap8_1/1_1.000_160
+>Ap9_1/1_1.000_160
 GCGAAAACTGGTTTTAACACAAATAATTGTTACAGTACCAGGTTTCGGAACACGTTTGCATATAACCAGCGAGAGTGGTGCTCAGTTCTGTTATGTATGACAGTCCTTCTCCTCAACATGCAACGGAAGCGAGCACTTCCATCATCACATTTGTCAATAA
->Ap9_1/1_1.000_160
-TGTCTTTACTTCTATCCTTCTCATCATGTTTTACATCATTTTTATTGCTGCCTCTCTTCTCAGCCCTTTCCACACTTTCATGTTTATCTTTTGATTTTTCAACTTCAACTCCATCTTCATCATCATTCTCATGCATTAATTCTTCTATTTCTTCTTCCAA
 >Ap10_1/1_1.000_160
-GCAGTGGTGGGAAGTTGTTCACCCTGGCTTGGTGTCCCATGTTTCTCTGTAATTCCTGTTCCTTTCTCTGTAGTTCCTCAGCCTTCCTCTCCAGTTCTTCCTGACGTCTCTTCAGGTCATCTGTGGCAGCCTGGGCCGTGGTCTTGGCGGCTGAGTATGG
+TTGGAAGAAGAAATAGAAGAATTAATGCATGAGAATGATGATGAAGATGGAGTTGAAGTTGAAAAATCAAAAGATAAACATGAAAGTGTGGAAAGGGCTGAGAAGAGAGGCAGCAATAAAAATGATGTAAAACATGATGAGAAGGATAGAAGTAAAGACA
 >Ap11_1/1_1.000_160
-ACGACAGAGGTCCTCTGCTTGATGAATATGGTTACACCAGAGGATTTGGAAGATGAAGAGGAATATGAAGAAATTTTGGAGGATGTCAAAGAAGAGTGCAGCAAATATGGTTATGTGAAGAGTATAGAGATCCCACGGCCCATTAAGGGTGTGGAAGTGC
+CCATACTCAGCCGCCAAGACCACGGCCCAGGCTGCCACAGATGACCTGAAGAGACGTCAGGAAGAACTGGAGAGGAAGGCTGAGGAACTACAGAGAAAGGAACAGGAATTACAGAGAAACATGGGACACCAAGCCAGGGTGAACAACTTCCCACCACTGC
 >Ap12_1/1_1.000_160
-TGTCTTTACTTCTATCCTTCTCATCATGTTTTACATCATTTTTATTGCTGCCTCTCTTCTCAGCCCTTTCCACACTTTCATGTTTATCTTTTGATTTTTCAACTTCAACTCCATCTTCATCATCATTCTCATGCATTAATTCTTCTATTTCTTCTTCCAA
+ACGACAGAGGTCCTCTGCTTGATGAATATGGTTACACCAGAGGATTTGGAAGATGAAGAGGAATATGAAGAAATTTTGGAGGATGTCAAAGAAGAGTGCAGCAAATATGGTTATGTGAAGAGTATAGAGATCCCACGGCCCATTAAGGGTGTGGAAGTGC
 >Ap13_1/1_1.000_160
-GCGAAAACTGGTTTTAACACAAATAATTGTTACAGTACCAGGTTTCGGAACACGTTTGCATATAACCAGCGAGAGTGGTGCTCAGTTCTGTTATGTATGACAGTCCTTCTCCTCAACATGCAACGGAAGCGAGCACTTCCATCATCACATTTGTCAATAA
+TTGGAAGAAGAAATAGAAGAATTAATGCATGAGAATGATGATGAAGATGGAGTTGAAGTTGAAAAATCAAAAGATAAACATGAAAGTGTGGAAAGGGCTGAGAAGAGAGGCAGCAATAAAAATGATGTAAAACATGATGAGAAGGATAGAAGTAAAGACA
 >Ap14_1/1_1.000_160
-TTCGTAATGAATCTTTTTGACTGGTATTCCGCAGGATACTCAATAATTATTGTCGCATTCTTCGAAGTTATCGCCATTTCTTGGATATACGGTCTCCAACGGTTCAAGAAGGACATTCAGATGATGGTTGGCAAGGGGCGATGGATCAATGCTAGTTTCT
+GCGAAAACTGGTTTTAACACAAATAATTGTTACAGTACCAGGTTTCGGAACACGTTTGCATATAACCAGCGAGAGTGGTGCTCAGTTCTGTTATGTATGACAGTCCTTCTCCTCAACATGCAACGGAAGCGAGCACTTCCATCATCACATTTGTCAATAA
 >Ap15_1/1_1.000_160
+TTCGTAATGAATCTTTTTGACTGGTATTCCGCAGGATACTCAATAATTATTGTCGCATTCTTCGAAGTTATCGCCATTTCTTGGATATACGGTCTCCAACGGTTCAAGAAGGACATTCAGATGATGGTTGGCAAGGGGCGATGGATCAATGCTAGTTTCT
+>Ap16_1/1_1.000_160
 GTTGTCAGTGGATCTCGTGATGCAACACTGAGGCTATGGAATGTCGATACTGGCCAGTGTCTGCATGTTCTGATGGGACATATGGCAGCTGTACGGTGTGTGCAGTATGATGGCAAGCGTGTTGTTAGTGGTGCCTATGATTATACAGTTAGAGTGTGGG
->Ap16_1/1_1.000_160
+>Ap17_1/1_1.000_160
 AGATTTATATTTGAGAATGTTTTGAGTACGACTTCTGTACAGACACACAGCAGAATGACCCTTGTATTGTTTAACAACGTTCAAAATTTCCTGATTCTTCTACCGAAAAAAATACATAAGAAGAGCCACCAAGACGATCAGATCACGGAGGTACTGGCAT
->Ap17_1/1_1.000_160
+>Ap18_1/1_1.000_160
 GCAGACTCGGCTGGCACGGCCACCGCCTTCCTCTGTGGAGTGAAGGCTCGCTACGGAACGCTGGGTCTGGGACCGAGAGCCACACGATCTGACTGTAGACAGAGTCACATCAACAAACTGAAGTGTATAGGAGACATGGCACAACAAGCAGGTATGAGGA
->Ap18_1/1_1.000_160
+>Ap19_1/1_1.000_160
 CCGGCCTGCAAGACGCCATTTTACTTCGTCTGTCAATCGAGGTCAAAGGTCACTACCGTTGTCTCCGAGAAGCACACAGACGCCGAGCTGGTTCACACGCTGTGTATTCGGCACAGATCTACTGTTGCTTGGGATATTTTAGCCGGCGAACGAGCGAAAT
->Ap19_1/1_1.000_160
+>Ap20_1/1_1.000_160
 CCGGCGATCGTTCAGAGGGCCAGCGGTCTGGCCATGTCAGAGATCTATCACCTGCGCTTCTGCGATGGGGATCGGCTGAACGTCAGCTGCCCGGACAACTGGCAGATCCACATCTCGTCCAGCTACTTCGTCTACGTCAGCGGCGTCGACGGCCGCGGCG
->Ap20_1/1_1.000_160
+>Ap21_1/1_1.000_160
 CATGAAGGACCTGTGTGGCAGGTGGCTTGGGCACATCCAATGTTTGGTAATCTGATAGCATCATGTAGTTATGACAGAAAGGTGATTATTTGGAAGGAGACTGGAGGGACATGGGCAAAGCTTTATGAATACAACAATCATGATTCCTCAGTTAATTCAG
b
diff -r 7a813e633d1c -r a83562c0719f test-data/trinity_out/PfPfiji_trinity.fasta
--- a/test-data/trinity_out/PfPfiji_trinity.fasta Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/trinity_out/PfPfiji_trinity.fasta Mon Feb 03 14:37:31 2025 +0000
b
@@ -21,7 +21,7 @@
 >Pf11_1/1_1.000_160
 AGCATTGTCCGTGTTGCGCGGGTCGTCGACGTAACCTCGGTACACCTCAGCGTGCCCGGCCATCTGGTGCGTGAGCCGCTTGAAGACGACTCTCGCCGGCGGCTGCTTGCTGTCCAGCGTCATGCTCTCGAACGCCTTGATCCAGGACTCCTTGACACTG
 >Pf12_1/1_1.000_160
-GCCCTCGGCCACCAAGCCCAAGAGTCCCAACGTGATGCCCAACCTGCCCAAGCACGTGCTGCAGGCCATCGAAGAGAACATGATCTACTACAACAAAATGTACAGTCTCCGAGTCAAGCCGGACCTGCTCCAGGTTCACTAGAGGGCGCTGTGGTGTTCG
+CGAACACCACAGCGCCCTCTAGTGAACCTGGAGCAGGTCCGGCTTGACTCGGAGACTGTACATTTTGTTGTAGTAGATCATGTTCTCTTCGATGGCCTGCAGCACGTGCTTGGGCAGGTTGGGCATCACGTTGGGACTCTTGGGCTTGGTGGCCGAGGGC
 >Pf13_1/1_1.000_160
 CGCGTCCACGACCGCCACGCGCACCGAGGTCTACGACAAACTCGCGCCGCAGGAGGCTCCTCTCAACCTGCACAAGCCTCGCGCCGACAGCGTCCCGACCGACGGCAACGGCTGACGGCAGACACTCGAGCCTTGACTACGTGTATGCACAAAGCTACCC
 >Pf14_1/1_1.000_160
b
diff -r 7a813e633d1c -r a83562c0719f test-data/trinity_up.output
--- a/test-data/trinity_up.output Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/trinity_up.output Mon Feb 03 14:37:31 2025 +0000
b
@@ -1,12 +1,3 @@
-20
-Number of segment pairs = 380; number of pairwise comparisons = 3
-'+' means given segment; '-' means reverse complement
-
-Overlaps            Containments  No. of Constraints Supporting Overlap
-
-
-DETAILED DISPLAY OF CONTIGS
-21
 Number of segment pairs = 380; number of pairwise comparisons = 3
 '+' means given segment; '-' means reverse complement
 
@@ -14,7 +5,13 @@
 
 
 DETAILED DISPLAY OF CONTIGS
-20
+Number of segment pairs = 420; number of pairwise comparisons = 4
+'+' means given segment; '-' means reverse complement
+
+Overlaps            Containments  No. of Constraints Supporting Overlap
+
+
+DETAILED DISPLAY OF CONTIGS
 Number of segment pairs = 380; number of pairwise comparisons = 3
 '+' means given segment; '-' means reverse complement
 
@@ -22,11 +19,18 @@
 
 
 DETAILED DISPLAY OF CONTIGS
-22
-Number of segment pairs = 342; number of pairwise comparisons = 2
+Number of segment pairs = 462; number of pairwise comparisons = 4
 '+' means given segment; '-' means reverse complement
 
 Overlaps            Containments  No. of Constraints Supporting Overlap
 
 
 DETAILED DISPLAY OF CONTIGS
+cap3 outputs/Pfiji_trinity.fasta -p 100 -o 60
+outputs/Pfiji_trinity.fasta.cap.singlets and outputs/Pfiji_trinity.fasta.cap.contigs
+cap3 outputs/Apomp_trinity.fasta -p 100 -o 60
+outputs/Apomp_trinity.fasta.cap.singlets and outputs/Apomp_trinity.fasta.cap.contigs
+cap3 outputs/Amphi_trinity.fasta -p 100 -o 60
+outputs/Amphi_trinity.fasta.cap.singlets and outputs/Amphi_trinity.fasta.cap.contigs
+cap3 outputs/Acaud_trinity.fasta -p 100 -o 60
+outputs/Acaud_trinity.fasta.cap.singlets and outputs/Acaud_trinity.fasta.cap.contigs
b
diff -r 7a813e633d1c -r a83562c0719f test-data/velvet_out/AcAc_transcriptome_25591.fasta
--- a/test-data/velvet_out/AcAc_transcriptome_25591.fasta Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/velvet_out/AcAc_transcriptome_25591.fasta Mon Feb 03 14:37:31 2025 +0000
b
b'@@ -1,132 +1,398 @@\n->Ac1_1/1_1.000_2580\n-AAGACAACTGCTCTTGATAGTTGTCTCGGAAGAGACTTGAAAACATCCAAGATGGTGAACTTTACGGTAGACGAGATCCGTGCGATCATGGACAAGAAGAAGAACATACGTAACATGTCCGTGATTGCTCATGTGGATCATGGCAAGTCGACGCTGACTGATTCGTTGGTGAGCAAGGCTGGCATTATTGCTGGCTCCAAGGCTGGCGAGACCCGCTTCACAGACACAAGGAAGGATGAGCAGGAAAGATGTATTACCATCAAATCAACAGCAATTTCACTCTTTTACCAGCTGCCAGAAAAAGATTTGAAGTTGATCGAGCAGCCAAGAGAGGAGGGAGAGACTGCTTTCCTGATCAACTTGATTGACTCACCTGGTCACGTGGATTTCTCCTCGGAGGTGACTGCTGCCCTTCGTGTTACAGATGGTGCTCTGGTTGTTGTCGACTGTGTGTCGGGCGTGTGTGTACAAACAGAGACTGTGCTGCGTCAGGCCATTGCTGAGCGTATCAAGCCAGTACTGTTCATGAACAAGATGGACTTGGCTCTGCTGACCCTACAGCTTGGTGCTGAGGACCTCTACCAGACCTTCTCCCGTATCATTGAAAGCATCAATGTAATCATTGCCACTTATGCTGACGACGAGGGACCGATGGGTAACATCCATGTTGATCCATCCAAGGGTACAGTTGGCTTTGGATCTGGACTCCATGGCTGGGCATTCACACTGAAGCAGTTTGCCGAGATGTATGCAGACAAGTTCAAGATTGAGGAACCAAAACTGATGAAGAGGCTGTGGGGAGACCAGTTCTACAACCCAAAGGAGAAGAGATGGGGCAAAGAAATGCAGAAGGGCTATTGTCGTGGTTTCACACAATACATCCTTGACCCCATTTACAAGATGTTTGAGTTCTGCATGAAGAAGCCAAAGGAAGAGACACTGAAGCTGGTTGAGAAACTTGGCATCAAACTGACAAGTGATGACAAGGACCTCATAGACAAACAACTGTTGAAGGTTGTCATGCGTAAATGGCTGCCAGCTGGTGATGCTTTGCTTCAGATGATAACCATCCATCTGCCGTCACCAGTAGCGGCTCAGAGGTACCGTATGGAGATGCTGTATGAGGGGCCACATGACGATGAGGCTGCTCTGGGAATCAAGAACTGTGACCCCAATGGACCACTGATGATGTACATCTCCAAGATGGTACCAACATCAGACAAGGGTAGATTCTATGCATTTGGTCGTGTGTTCTCTGGTGTTGTGTCAACAGGTATGAAGGCTAGGATCATGGGTCCCAACTTTATCCCTGGGAAGAAGGAAGATCTCTATGTGAAGGCCATCCAGAGAACAATCCTTATGATGGGTCGTTACATAGAGCCAATTGAAGATGTGCCCTGTGGTAATGTTTGTGGTCTGGTTGGTGTTGACCAGTACATTCTGAAGACTGGAACCATCAGCACGTACGAGCATGCCCACAACTTGAAAGTGATGAAGTTCAGTGTCAGTCCAGTTGTGCGTGTGGCTGTTGAGTGTAAAAACCCAGCTGATCTGCCCAAGCTTGTTGAAGGATTGAAACGTCTGTCAAAATCTGATCCCCTGGTGCAGTGTTCCATTGAGGAATCTGGAGAGCACATTGTTGCTGGAGCTGGTGAACTTCATCTGGAAATCTGCCTCAAGGACTTGGAAGAAGATCATGCCTGCATCCCAATCAAGAAATCTGACCCTGTTGTCTCATATAGAGAGACTGTCAGTAACACATCTGACAGAACCTGCTTGTCAAAATCACCAAACAAGCACAATCGTCTCTTCATGGTTGCTGCACCACTGCCAGATGGCTTACCTGAAGAGATTGATAGGGGAGAGAAGGTCAGTGCTCGTCAGGATCAGAAGGAGAGAGCTAGATACCTGGCCGACACATACGAGTTTGATGTTACTGAGGCTCGTAAGATCTGGTGCTTTGGACCTGATGGCACAGGACCAAACCTGGTCATTGACTGCACAAAGGGTGTCCAGTACCTGAATGAAATCAAAGACAGTGTTGTGGCTGGCTTCCAGTGGGCTAGCAAGGAGGGTGTACTCTGTGAAGAGAACATGAGAGGAATCCGCTTCAACATTCTTGATGTCACACTGCATGCTGATGCTATTCACCGTGGTGGTGGCCAGATCATCCCAACAACAAGAAGATGTCTCTATGCATGTGTGCTGACAGCTGAACCAAGGTTGATGGAACCAATATACCTGGTTGAGATCCAGTGCCCTGAGCAAGCTGTTGGTGGCATTTATGGTGTGCTGAACAGAAGACGAGGTGTTGTCATTGAGGAGAACCAAGTGGTGGGAACCCCGATGTTCCAGGTCAAGGCATACCTTCCTGTAAACGAATCATTTGGTTTCACTGCCGACCTGAGGTCCAACACTGGTGGCCAGGCATTCCCACAGTGTGTGTTTGATCACTGGCAGATCCTCCCAGGCGATCCGTTTGTGGACAACTCCAAGCCTAACATAATCGTCCAAGAGACGAGAAAACGCAAAGGGCTGAAGGAGGGCGTTCCTCCACTGGACAACTTCCTGGACAAGTTG\n->Ac2_1/1_1.000_5295\n-GAATTTTGGCCGAGATATCAGCTGATGACTGTAGCTTTGGTCTGGGCACTGGCCATTGTTCCCCAGGTGCTTTGTCAACTGATGATGACCACGACACCACCACCAACTCCAATAGCGTGTAGAGAAAATATGTGGGGTTGTGCCGACGGCAAGCAGTGTATACGTGAACTGTATCGTTGTGATGGTGATTACGACTGTGAGGACCGCTCTGACGAGGCCTTTCTTTTGTGTGCCCTCATTGTTTGCGATGAAAACAGCCAGTTTGAGTGTACTGCCAACAGGTTTACTAATAACACTAAGATCTGCATACCTGTTTCTTATTTGTGTGATGAGGACAATGATTGCGGAGATAACTCAGATGAAGATCCAGCCAACTGTCCTACCACATTCCGTCCTCCGACGACTCCACCGCCTTGTGTTCCTGGTTTCGAGTTCTTCTGTCCAGCTAGTCGTGACAGGGGCTGTATACCAATTGGTTTGAAATGTGACACTAAGCATGACTGTATGAATGGTGAAGATGAACAAGGCTGCACCTACAGAAATTGTTCTGATACAACGGAGTTTCAGTGTCATTCTAAGCAATGTATTGATAGCCGTCTGAAGTGTAATGGTTATGCCGACTGTAGGGATGGAAGTGATGAAACACCAGATATATGTGATGTTGCCCCTTTGCAGTGTGCAAAACATGAGTTTCAGTGTAACAATGGAAAGTGTATGGTTTGGTATGAAGTTCTCTGTAACGGAATAGACGACTGTGGTGATAATTCCGATGAAGATATCTGTAACACACTCCACATAAATGAATGTAACAATAAGACATTGCATCAATGTTCTGATAACTGCGAGGAGATGACTTTTGGCTACAGGTGTACTTGTAATCCTGGATACAGCTTAGCAAAAGATGGAAAAACATGCATCAATTCCAATGAATGCCTAGATTCACCAGGTGTGTGTCCACAGATTTGTATGGACACACCAGGAAGTTACAAATGTCAGTGTGCTACAGGCTACAGGGATATAAATGGAGATGGAACAAAGTGTGTTCGTACAGACAAAACCGAACCATATTTGATTTTCGGCAACAAGTACTATATACGCCGCATGGACATTGATGGTAGTAACTATGTCAGTATGTCCAGTGAACATACCTACACACATGTTTTGGACTTTGATTACCGCAATAAGAAGATATACTATGCCGATGCTCCAAATATGAAACAGGCAATAAAGAGAATGAACTTTGATGGCTCTGGGAAGGAGATTATTGAAAAGCATCATGCCACAGGCATCGAAGGAATTGCTGTTGACTGGGTTGGAGACAATATATACTGGACTAGCAACAAACAATGGGGGA'..b'GTGGGTGGACCTTGACAACAGTTGTTGGATACTCAGGGAATGTCCTGCCAACGGCGTGGGCACCCTGCATGTTGGTACGCTCAGACAGACCGACAAAGATTTCTCTACCTGTCCAAAGGACATCGCCACCTTCCAGCTTCGTTTCCTCATCTCCTTTGTTCTCCACTTCAACAACTTTAAGTCCGAGTTCCTTTCTTAGCACCTGTCGGACGACAGCCAGCTCTCCCTCCCTGGAGGGCTTGTTCGGCGGCGACTGCGGTCGGCATATGAGTGCCGTTCCGTTGATGACGACGGCTATGTCGTCGACGAACAAGCCGTCCGGATGCTTCTCATCGCACGGCAGTTCTATCACGTCGAGGCTGATGCGTCTCAGGGCGTCGACCAGCTGCTCGTGCTCGGCGCGGGCCTTCTCGATGTTGATCGGCGACGCGCCCGGCTTCAGATCGAAGCTCGACGTCTCGGCGAACGAGTTCGCGATCCGGCTGACCAGAGCGAAGTTGTACTTGAAACAATTAGATCCAGCCATTTTCTCCGGCATGATGAATCCTTCTCCGACCGAGCTACTCTGCGCCGAGACGACAACGAGGCTCGCCGTGACCGCTAC\n+>Ac197_1/1_1.000_794\n+TCCGATCTTGTCTCGTGTTTATTTCTTGTTACACATCACACAGAATGATGCTAGCAGGTCACTTTCTATTGTAATCCATGGTTTACTGGGATTTTGCCGCGATCCTCTTGTCACGTTCCTCTGCTATCGTCAACCTCAGTCCCTTGCCAGCTGGTAATGACACCCACGGCTTGTTACCTTTACCAATAATGAACACGTTGTTCAAACGTGTGGCAAATGAGTGGCCCATGCTGTCCTTGATGTGAACAATATCAAAGCCACCAGGATGACGTTCTCTGTGTGTCACAAGGCCAACACGACCCAAGTTGTGCCCACCAGTGATCATGCACAAATTTCCTGATTCAAACTTGATGAAATCCTTTATCTTGCCTGTGGCAATGTCAACCTGAACTGTGTCATTGACCTTGATCATTGGATCTGGATAGCGAATGGTACGAGCATCATGAGTGACCAGATGTGGAACTCCCTTCAGTCCAATGATTATCTTCTTTACCTTACACAGTTTGTACTTGGCTTCTTGGGAAGTGATACGGTGAATGGTGAAACGACCTTTGACATCATAGATGAGACGGAAGTTTTCAGCCGTCTTCTCTATTGTGATCACATCCATAAAGCCAGCAGGGTATGTCTTGTCAGTTCTCACTTTGCCATCAACCTTGATCAGACGTTGGTTTACAATCTTCTTCACCTCATCATATGTCAAGGCATACTTCAGGCGATTTCTCAAGAACACCACCGGACGAGATGGGTGTTCGTGGTCCAAGGAAGCATTTGAAGAGGCTTCATGCCCCTAA\n+>Ac198_1/1_1.000_1615\n+TCCGATCTTGTCTCGTGTTTATTTCTTGTTACACATCACACAGAATGATGCTAGCAGGTCACTTTCTATTGTAATCCATGGTTTACTGGGATTTTGCCGCGATCCTCTTGTCACGTTCCTCTGCTATCGTCAACCTCAGTCCCTTGCCAGCTGGTAATGACACCCACGGCTTGTTACCTTTACCAATAATGAACACGTTGTTCAAACGTGTGGCAAATGAGTGGCCCATGCTGTCCTTGATGTGAACAATATCAAAGCCACCAGGATGACGTTCTCTGTGTGTCACAAGGCCAACACGACCCAAGTTGTGCCCACCAGTGATCATGCACAAATTTCCTGATTCAAACTTGATGAAATCCTTTATCTTGCCTGTGGCAATGTCAACCTGAACTGTGTCATTGACCTTGATCATTGGATCTGGATAGCGAATGGTACGAGCATCATGAGTGACCAGATGTGGAACTCCCTTCAGTCCAATGATTATCTTCTTTACCTTACACAGTTTGTACTTGGCTTCTTGGGAAGTGATACGGTGAATGGTGAAACGACCTTTGACATCATAGATGAGACGGAAGTTTTCAGCCGTCTTCTCTATTGTGATCACATCCATAAAGCCAGCAGGGTATGTCTTGTCAGTTCTCACTTTGCCATCAACCTTGATCAGACGTTGGTTTACAATCTTCTTCACCTCATCATATGTCAAGGCATACTTCAGGCGATTTCTCAAGAACACCACCGGAGGGAGACACTCTCGCATCTTGTGTGGACCAGTGCTTGGGCGTGGGGCAAAAACACCCCCAAGCTTGTCCAACATCCAGTGTTTAGGGGCATGAAGCCTCTTCAAATGCTTCCTTGGACCACGAACACCCATCTCGTCCGGTGGTGTTCTTGAGAAATCGCCTGAAGTATGCCTTGACATATGATGAGGTGAAGAAGATTGTAAACCAACGTCTGATCAAGGTTGATGGCAAAGTGAGAACTGACAAGACATACCCTGCTGGCTTTATGGATGTGATCACAATAGAGAAGACGGCTGAAAACTTCCGTCTCATCTATGATGTCAAAGGTCGTTTCACCATTCACCGTATCACTTCCCAAGAAGCCAAGTACAAACTGTGTAAGGTAAAGAAGATAATCATTGGACTGAAGGGAGTTCCACATCTGGTCACTCATGATGCTCGTACCATTCGCTATCCAGATCCAATGATCAAGGTCAATGACACAGTTCAGGTTGACATTGCCACAGGCAAGATAAAGGATTTCATCAAGTTTGAATCAGGAAATTTGTGCATGATCACTGGTGGGCACAACTTGGGTCGTGTTGGCCTTGTGACACACAGAGAACGTCATCCTGGTGGCTTTGATATTGTTCACATCAAGGACAGCATGGGCCACTCATTTGCCACACGTTTGAACAACGTGTTCATTATTGGTAAAGGTAACAAGCCGTGGGTGTCATTACCAGCTGGCAAGGGACTGAGGTTGACGATAGCAGAGGAACGTGACAAGAGGATCGCGGCAAAATCCCAGTAAACCATGGATTACAATAGAAAGTGACCTGCTAGCATCATTCTGTGTGATGTGTAACAAGAAATAAACACGAGACAAGATCGGA\n+>Ac199_1/1_1.000_912\n+TTAGGGGCATGAAGCCTCTTTTTCCCGTACCACCGGACGAGATGGGTGTTCGTGGTCCAAGGAAGCATTTGAAGAGGCTTCATGCCCCTAAACACTGGATGTTGGACAAGCTTGGGGGTGTTTTTGCCCCACGCCCAAGCACTGGTCCACACAAGATGCGAGAGTGTCTCCCTCTGGTGGTGTTCTTGAGAAATCGCCTGAAGTATGCCTTGACATATGATGAGGTGAAGAAGATTGTAAACCAACGTCTGATCAAGGTTGATGGCAAAGTGAGAACTGACAAGACATACCCTGCTGGCTTTATGGATGTGATCACAATAGAGAAGACGGCTGAAAACTTCCGTCTCATCTATGATGTCAAAGGTCGTTTCACCATTCACCGTATCACTTCCCAAGAAGCCAAGTACAAACTGTGTAAGGTAAAGAAGATAATCATTGGACTGAAGGGAGTTCCACATCTGGTCACTCATGATGCTCGTACCATTCGCTATCCAGATCCAATGATCAAGGTCAATGACACAGTTCAGGTTGACATTGCCACAGGCAAGATAAAGGATTTCATCAAGTTTGAATCAGGAAATTTGTGCATGATCACTGGTGGGCACAACTTGGGTCGTGTTGGCCTTGTGACACACAGAGAACGTCATCCTGGTGGCTTTGATATTGTTCACATCAAGGACAGCATGGGCCACTCATTTGCCACACGTTTGAACAACGTGTTCATTATTGGTAAAGGTAACAAGCCGTGGGTGTCATTACCAGCTGGCAAGGGACTGAGGTTGACGATAGCAGAGGAACGTGACAAGAGGATCGCGGCAAAATCCCAGTAAACCATGGATTACAATAGAAAGTGACCTGCTAGCATCATTCTGTGTGATGTGTAACAAGAAATAAACACGAGACAAGATCGGA\n'
b
diff -r 7a813e633d1c -r a83562c0719f test-data/velvet_out/ApAp_transcriptome_35099.fasta
--- a/test-data/velvet_out/ApAp_transcriptome_35099.fasta Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/velvet_out/ApAp_transcriptome_35099.fasta Mon Feb 03 14:37:31 2025 +0000
b
b'@@ -1,130 +1,398 @@\n->Ap1_1/1_1.000_256\n-AGCAACAAGACATTCCTTTTTGGTGCTAACTTCTCAATGGATGGCAGATATATCGTGGCCGGCTCACACGAAAACCTGCACCTGTGGAGCACGGAGAACTGCAAGCTGGTCACAACAATCAGACTGCACACCAACGACCACTTCCCAATGGCCGTCTGCTCAGACAGTAACTACATAGCCACCGGCTCAAACATCCACACGGCCATCAAAGTCTGGGACTTGACCAACGTCCAAATGTCCGAGCCGGGCTCCTTGA\n->Ap2_1/1_1.000_225\n-TTTGTCCAACACCCAGGCATACTTGAAGGAGCCTTTACCCATCTCCTGGGCCTCCTTCTCGAACTTTTCAATGGTTCTCTTGTCGATGCCACCACACTTGTAGATCAGATGGCCAGTGGTGGTAGACTTCCCGGAGTCTACGTGGCCAATAACCACGATGTTGATGTGTTTCTTTTCTGCTCCCATGCTGTGTTTCTACGTGCAACTTCTAGAGAATCAAAATAC\n->Ap3_1/1_1.000_189\n-TTGCTTTTGCTTAATGTAATAATATGGCTGGTTCCATTGAACTCCCTAGGTGTCAGCACTGTTTTATTGCCTAGACCGTCTTCTCCTTATAACCCCTCTTGCCAGTGCCTAGTGCCGGAGCAAGTTAAAAACTGTTTGAATGCTGGATCAATAGGAAATATCCAGTACATCACATACTGTGTGGAAACA\n->Ap4_1/1_1.000_330\n-CTCCGTATGGATTGGTGGATCGATCCTCGCCTCCCTGTCCACCTTCCAGCAGATGAGGATCAGCAAGCAGGAGTACGACGAGTCTGGACCATCCATCGTTCACAGGAAGTGCTTCTAAAGATAGTTGTGACCATCCCAACTGCCGTGACCACTCACAACAACAAACAACATTCTGTCTGCTCAGTGGCCCGTGGGCGACCTTTGTTCAGTGCCAGGGAACTCGTCATGACAAAGTCTAAAGAAAGTGCTGATTCCACCGTCAGAAGTTTGCTATACGAAATCCAGTCACACCATTCTGCTCTTCAAACATTACACAAACCAATCTTTTCC\n->Ap5_1/1_1.000_229\n-AGAATGATCTTCACAAAGGGATTAATGTTACACATTACTGTTCAAAGGACACACAGTGAAACAAGCACCAAACCAAGCTTGCTGGTCACCAGACCACACCAGTTTACATCAACATGCATGGACTGTGAATTCTTTGAAGAGCAATCGAGGGATTTGCTGTCATGTAAACAAAGTAGCAGGCTTCTGAACGTCTACTGTTTCATCCACGCCAACATGAGATGGACTGTTA\n->Ap6_1/1_1.000_374\n-AAATATACAGATTTAATAACAGCAATTGATAAACACTTTGAAAGCAAATTGTCATCTGTTGAGAAAGATGACATATTATTATCAGAATCTCACTTTGCACAATGTGGTTCTTTGTCTGACACAGTTAGCAGCTATCTTCACACTGTGATGACAGTCTCCAAGTGGAAGAAGAAAGTCAAAGCAATGCAGCAGACACCAGGACAACCACTACTTCTAATTATTTGTAGTGCTGCAAGTCGAGCAGTAGACTTGATAAGAGATTTGAGGTCCTTTTCTCAAGATAATTGTAAAGTTGCAAAGCTATTTGCAAAACACATGAAGCTGGAAGAACAAGTAAAATTCTTGAAGAAAAATGTAATACAGGCAGGAGTTGG\n->Ap7_1/1_1.000_291\n-CAGAAGGAAAGGCCCAAAGGGTTGTCCCGATGTCCGTCCCGCGGGCCAGACTCGACCCTCTCCGGCAGATCGGCAGCCCCAGTACCACCCTGCCAGATCGTGTTCGGGTGGGTTTTTTATCGACCTGCCGGGGACTGGCCGAGTAGTGACCGATCACGGGCGAACCGGAAACCGACACGACAACCCCGGACATCAGAAGACGGGACGACACACACACGCACGAACGGAGAGATAGACGCAAGACGACTACATCAGCACAGACGTCCGCCGCACACGGACTCGGACGCGGAC\n->Ap8_1/1_1.000_147\n-GCGCTGATTGTCATATTGTTATATAGTTCACGGCCTGGTCTGTCGGACAATGGCTCATACAACGCAGACTGCCACCTGAGAATATCTGAAAACTGGCAAATGCGTCATTTTACAAATCGCAGCTATCCAATTAATTATTTAGCTGGG\n->Ap9_1/1_1.000_1956\n-CATTTTCCTTCACGCATTTCATCTGACTTCAAAGTCGCAGAAATGGTAACAACAACAGTGGCATCAGCTCAGGCCGCTGACTCGGACGCCATGGCCCGGTCATACGTGTACGACTTCAAGAACAACACGTTCTCTGTCTGGGATTACGTGGTGTTTGGTGGCGTGCTGGCAGTGTCTGCTGGGATCGGGATATACTACGGCTGTACGGGCGGCAGGCAGAGGACAACATCTGAGTTCCTTATGGCTGACAGAAAGATGCATGTCCTTCCCGTTACCTTGTCACTGCTAGCCAGCTTCATGTCTGCCATTACCTTACTAGGTACCCCAGCTGAAATCTACATGTTTGGCACTCAGTATTGGATGATATGGATTGGATATGTTATTATGATTCCACTAGCTACACACGTTTTCATTCCTGTCTTCTACAATCTACAATTGACAAGTGTATTTGAGTATCTACAAATGAGGTTCGGTACCCACGTCAGGATCTTTGCCTGTCTCTGCTTCATCGTACAAATGATATTATACATGGCCATAGTTTTGTATACACCCTGTTTGGCTCTCTCGGTCGTTACTGGCTTTAATAAGTGGATATCCGTGTGTTTGGTTGGCGTCGTCTGTACCTTTTATACAACAATAGGAGGAATGAAAGCCGTCATGTGGACAGACTCGTTCCAGATCTGCATGATGTTCGCCGGGTTGATAGCTGTGCTCGTTAAAGGATCCATTGACGAAGGAGGCTTCGGTAACATCTGGAGATACATGGAGGAAGGAGACAGGATACAGTTCTGGGACATCGACCCAAGCCCCTTAAAGAGACATTCGCTGTGGGCCTTGATCTTCGGCGGTTGTTTCACGTGGCTTGCCGTCTACGGTGTAAACCAAGCCATGGTACAGAGAGCTTTATGTTGTCCCAGAAAGAAAGACGGACAAATAGCCATGTGGCTCAATCTTCCTGGTTTGACGGCTCTGCTCACTGTATGTGCCCTGTGTGGTATGGTTGTCTACGCAGAGTACAGATACTGTGATCCCTTAATTACCAATAGGATTGAGGCTAAAGATCAGTTACTGCCCCAGTATGTCATGGATCAGTTGTTTTATCCTGGTTTACCTGGTCTATTTACTGCATGTCTCTTCAGCGGAGCTCTAAGTACGATATCCTCAGGACTAAACTCTCTGGCTGCCGTTACACTTCAGGACCTAATCATTGACCGATGTTGTTCAAAAATATCTGAGACCAAGGCCGCCCGCATATCTAAGGCATTAGCCTTCAGCTATGGTCTGCTGATGATTGCTTTGTCGTACGTGGCTTCAAAACTTGGAGGGGTTCTGCAGGCTGCACTTGGTTTGTTTGGGATGATAGGTGGTCCAGTCCTCGGGCTGTTTATTTTGGGAATCATCTACCCTTGGGCTAATCACGTGGGGTCGTTCGTTGGGACGTTTGTCAGTTTGGTTATCACTCTGTGGATTGGCTTTGGTGCTCAGATATACAAGCCTTCAGTGTACAGGCCTCCAGTAAATATAACGGGTTGTCCGCTGAAGGAAGTCAACGAATCCTTCAGTTTCACGACGCTAGCCGCGAATTTCTCTACGACAGTGGCTCCATCTATCCCAGCTAGAGAACGACCAGGATATCTCGTCATTTATGAAGTGTCCTACATGTGGTACAGTCCCATCGCTGTTTTTATCGCAGTCGTCGTAGGCTTGTTGGTTAGCGCATGCACAGGGTTTAACAAACCT'..b'TTTAGGCTTAGAACGCCCTACCTTAACAATTACTATCTCACATGGTTTTATTTTATGGGGTTCAATTTAGAGACCCCTCTCGTCACTCATGAAAAATACCGAGAAGAAGAATTAGTACAAATAAAAAAGCTGGAAGAATGGCCTGAGAAATAATAGTTGAGGATGGGAGGAGTGGGATCGGTATTAGTAGGGCAATTTCTACGTCGAAGATTAGAAAGATTACTGCAAGTAAAAAAAATCGTAAAGAAAATGGGGTTCGGGCCGAGTG\n+>Ap194_1/1_1.000_914\n+CAGATAGTAGTGCGGAGATAAATAAAATGTTGGTTGGTGTTGCTTTGACTAGGGTGTTGTAGCATATCATAGAGATAGGGAAGGTGTTTAAGCTTCCTGGTTTTAGGCTTAGAACGCCCTACCTTAACAATTACTATCTCCGCACTACTATCTGGAGTGGCAATAGCTCTATCGGCCCACTCCATGCTATCGATATGAATGGGCCTTGAACTCAACCTATTCGGCTTCATTCCCCTTATTATAGCAACAAGATCTAATCAAGAAAAAGAGGCAGCCTGTAAATACTTTCTAGCCCAAGCCATCCCATCCGCTATCTTTCTACTAGCCCTAGTATTAATACCAGACATCCCTACAACCTCTGCCGTAATTCTCGTCGCACTATTTATAAAAATAGGAATCGCCCCATGTCACCAATGATTTCCTTCTGTTATAAACGCACTAGCTTGGCCGCAAGCATGGACCCTCATTACTGTACAAAAAATTGCACCATTCTTCATAATTCTCCACATAGTTGGTAACACGACCATTCTCACTTTCCATAGCAGCCGCTATTTCATCTATTATTGGCGGACTAGGCGGCATAAATCAAACACAACTACGCCCACTATTGGCCTACTCATCTATCGGGCACATAGGCTGAATACTAGGAGCAGTTTTAGTTTCAAATAGCGCTGCCACGCTCTATTTCTCTTCTTATCTCTTTATTGTATCAACAACAATTCTAAGCGCCGTTCTATTAAAAACTAACTCCTTGTTTTCCCTACCACTATTTAAATCATCAACAACTCTATCAACCATTCTATTCCTCTCCTTCATAAACATAGGGGGCCTTCCTCCATTCTTCGGTTTCTTTATTAAAGCTTTCGTAATACTTAACTTACTTTCCAGCAATCTGGCCCCCCTCACCTTCTTCT\n+>Ap195_1/1_1.000_302\n+CAGATAGTAGTGCGGAGATAAATAAAATGTTGGTTGGTGTTGCTTTGACTAGGGTGTTGTAGCATATCATAGAGATAGGGAAGGTGTTTAAGCTTCCTGGTTTTAGGCTTAGAACGCCCTACCTTAACAATTACTATCTCACATGGTTTTATTTTATGGGGTTCAATTTAGAGACCCCTCTCGTCACTCATGAAAAATACCGAGAAGAAGAATTAGTACAAATAAAAAAGCTGGAAGAATGGCCTGAGAAATAATAGTTGAGGATGGGAGGAGTGGGATCGGTATTAGTAGGGCAATTTCTA\n+>Ap196_1/1_1.000_374\n+CACTCGGCCCGAACCCCATTTTCTTTACGATTTTTTTTACTTGCAGTAATATTTCTAATCTTCGACGTAGAAATTGCCCTACTAATACCGATCCCACTCCTCCCATCCTCAACTATTATTTCTCAGGCCATTCTTCCAGCTTTTTTATTTGTACTAATTCTTCTTCTCGGTATTTTTCATGAGTGACGAGAGGGGTCTCTAAATTGAACCCCATAAAATAAAACCATGTGAGATAGTAATTGTTAAGGTAGGGCGTTCTAAGCCTAAAACCAGGAAGCTTAAACACCTTCCCTATCTCTATGATAAGCTACAACACCCTAGTCAAAGCAACACCAACCAACATTTTATTTATCTCCGCACTACTATCTGGAGTG\n+>Ap197_1/1_1.000_821\n+CAGATAGTAGTGCGGAGATAAATAAAATGTTGGTTGGTGTTGCTTTGACTAGGGTGTTGTAGCATATCATAGAGATAGGGAAGGTGTTTAAGCTTCCTGGTTTTAGGCTTAGAACGCCCTACCTTAACAATTACTATCTCCGCACTACTATCTGGAGTGGCAATAGCTCTATCGGCCCACTCCATGCTATCGATATGAATGGGCCTTGAACTCAACCTATTCGGCTTCATTCCCCTTATTATAGCAACAAGATCTAATCAAGAAAAAGAGGCAGCCTGTAAATACTTTCTAGCCCAAGCCATCCCATCCGCTATCTTTCTACTAGCCCTAGTATTAATACCAGACATCCCTACAACCTCTGCCGTAATTCTCGTCGCACTATTTATAAAAATAGGAATCGCCCCATGTCACCAATGATTTCCTTCTGTTATAAACGCACTAGCTTGGCCGCAAGCATGGACCCTCATTACTGTACAAAAAATTGCACCATTCTTCATAATTCTCCACATAGTTGGTAACACGACCATTCTCACTTTCCATAGCAGCCGCTATTTCATCTATTATTGGCGGACTAGGCGGCATAAATCAAACACAACTACGCCCACTATTGGCCTACTCATCTATCGGGCACATAGGCTGAATACTAGGAGCAGTTTTAGTTTCAAATAGCGCTGCCACGCTCTATTTCTCTTCTTATCTCTTTATTGTATCAACAACAATTCTAAGCGCCGTTCTATTAAAAACTAACTCCTTGTTTTCCCTACCACTATTTAAATCATCAACAACTCTATCAACCATTCTATTCCTCTCCTTCATAAACA\n+>Ap198_1/1_1.000_775\n+AGAAGAAGGTGAGGGGGGCCAGATTGCTGGAAAGTAAGTTAAGTATTACGAAAGCTTTAATAAAGAAACCGAAGAATGGAGGAAGGCCCCCTATGTTTATGAAGGAGAGGAATAGAATGGTTGATAGAGTTGTTGATGATTTAAATAGTGGTAGGGAAAACAAGGAGTTAGTTTTTAATAGAACGGCGCTTAGAATTGTTGTTGATACAATAAAGAGATAAGAAGAGAAATAGAGCGTGGCAGCGCTATTTGAAACTAAAACTGCTCCTAGTATTCAGCCTATGTGCCCGATAGATGAGTAGGCCAATAGTGGGCGTAGTTGTGTTTGATTTATGCCGCCTAGTCCGCCAATAATAGATGAAATAGCGGCTGCTATGGAAGTGAGAATGGTCGTGTTAACCAACTATGTGGAGAATTATGAAGAATGGTGCAATTTTTTGTACAGTAATGAGGGTCCATGCTTGCGGCCAAGCTAGTGCGTTTATAACAGAAGGAAATCATTGGTGACATGGGGCGATTCCTATTTTTATAAATAGTGCGACGAGAATTACGGCAGAGGTTGTAGGGATGTCTGGTATTAATACTAGGGCTAGTAGAAAGATAGCGGATGGGATGGCTTGGGCTAGAAAGTATTTACAGGCTGCCTCTTTTTCTTGATTAGATCTTGTTGCTATAATAAGGGGAATGAAGCCGAATAGGTTGAGTTCAAGGCCCATTCATATCGATAGCATGGAGTGGGCCGATAGAGCTATTGCCACTCCAGATAGTAGTGCGG\n+>Ap199_1/1_1.000_400\n+TGATCGTCTTATAAACCTAACTTGAAAAACCTTCCTACCATTTAGGGCTAGCAGCCCTATTAATTATCACACCTATCGCAGCGCTCTCACTATAATTATAAGTATTGCGCCGGGTTTGAACGGATAGCTCTGATGCTGCTAATTACGGGACCTAATAATCCCCAATACTTTATCCTTAGAGAGCTGTACCTCTTAGCACCAGTCTTTTAAACTGGCGAAAGCACACTTTATGCTTCTAAGGAATGAAACTAATTCTTATAATCCTACTAATCTCTTTTATCATCCCCGCCATTCTATTTTTACTCTCGATCTTTACTACTATGCGCATGCCAGAGAGCCGTGAAAAATTTAGGCCCTACGAGTGCGGGTTTGACCCCAATCACTCGGCCCGAACCCCATT\n'
b
diff -r 7a813e633d1c -r a83562c0719f test-data/velvet_out/PgPg_transcriptome_90109.fasta
--- a/test-data/velvet_out/PgPg_transcriptome_90109.fasta Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/velvet_out/PgPg_transcriptome_90109.fasta Mon Feb 03 14:37:31 2025 +0000
b
b'@@ -1,128 +1,398 @@\n->Pg1_1/1_1.000_474\n-GAGGTATGTTCGGGTTATAGGTGTGGTCCGACAATGGGTCGAGTAATCAGGAGTCAGCGTAAGGGTGCTGGCAGTGTATTCAAGGCACACACGAAACACAGGAAAGGTGCTGCAAGACTTCGAGCATTTGATTTTTCTGAAAGACATGGCTATATCAAAGGTGTTATAAGGGACATCATTCATGATCCAGGACGTGGCGCTCCATTGGCACGTGTCGTTTTCCGTGATCCATATAGGTACAAGCTGAGACATGAGAACTTCATCGCCTGCGAGGGCATGTACACCGGACAGTTCATTTACTGCGGCAAAAAGGCCACACTCCAGATAGGAAACATCCTTCCCGTCGGTGTGATGCCTGAGGGTACAGTCGTGTGCTCACTGGAAGAGAAGACTGGAGATCGTGGACGACTGGCCAAGTGCTTGGTAACTATGCCACTGTCATCTCCCACAATCCGGAAACAAAAAGGACTAGGG\n->Pg2_1/1_1.000_1300\n-GCTGTAGGCAACTGTGACAGAACAACAGGAGAGTGTAAGAAATGTATATATAATACAGCTGGCTTCTATTGTGAAAGATGTCTTCCTGGTTACTATGGTGATGCCTTAGCTGAACCGAAAGGGCAATGTAAAGCATGTAATTGTTACCCACCTGGTACTAATGACAGAGCCAGACAAGAAGGCTCCCTGACTTGTGATGAGAGATCTGGCCAGTGTCCGTGTAAACGTCAAGTTATTGGTAAAATGTGTGATACTTGTGAAGATGGCTTCTGGAACATAGACAGTGGACGAGGTTGTGAAGCATGTTTATGTAACCCAACTGGATCACACAACAGGAACTGTGACCTGCGTACTGGACAATGTCAGTGTAAGTCTGGTGTTACTGGCAGAAAATGTGATCAGTGTCTGCCTGACCACTGGGGATTCTCTCGTGATGGATGTAAAGCTTGTAACTGTAATATGGAAGGAGCTGTTAATACTCAGTGTGATTTGAGGACTGGGCAATGTATCTGTAGGCCAAGCATAGAGGGAGAGAAATGTGACAGATGTGTGGAGAATAAGTTTAACATCACTGCAGGATGTATTGATTGTCCTCCATGTTACTCACTTGTCCAAGATCAGGTACACATCCTCAGATCAAAAATAAATGAACTTCGTGAAATTATCCATAACATTGGTGACAACCCACATAAGGTTGATGATGCTGACTTCCGTAGGAAGTTAAGGGCTGTAAATGACTCTGTAAATGATCTGTGGAGAGACGCCATGCATGCTGGTGGTGGTGGAGACTCCTCTCTTGGCCAACAGATGGAGGCACTGCAGCAGGCTATCAGTGACATCATGACTGAATGTGGCCAGATAACAATTGATATAACATATGCCACATCATCATCGGACAGCAGTAAGGTAGATATTACTTATGCCGAGGAGGCAATTGATGAAGCAGAGAAGGCATTACTGGCTGCTGAAACCTACCTTCGTACAGAAGGTAGGAAGGCACTTCATGATGCCATTGAAGAGTTGAGGGATCTTGACGAGAAGTCACAACAACTGACTAAAATAGCAAGAGAAGCTCGGGAGGAGGCTGAAAAACAAGAAAAAGAAGCCAGTATGATAGATGAAACAGCTAACAAGGCATTGAATACATCAAAAGAAGCCTTAGTGTTAATTAATGAGGTACTGAGGAAGCCTGACGATATTGCCGATCAGATAGAAGATCTTAGACGAGAGGTTCTAGACACAGAAGTGGAATATCAGGCCACAAAAGCTGAAGCAGAACGAGCAGAGAAGTTGGCTACTG\n->Pg3_1/1_1.000_1813\n-TCTACCTGGAGTCTATCTAGGGGGGTGAGGCTTGTCTCTCCGGTGTTTTGTCACCGTGATACTATCTCGCCTTATCACATCCGATCGCCGTCGAGTGACGTCAGCAAAATGAGTGTCAGCGACTTAGAGGCTCGTCGAGAGGCCAGGAGGCGGGCCAGAGAAGAAAAACAATCGGTGCTACTCGGCTCTCCGGCCACCCAATCTGCCATAACGACTGACCACGACGACGAAGATATCGTGGAACGAATTGCGAGGAGGCGAAGAGAACGTCAGGAACGACTCGCAAAACTATCTGCTGATACCAGCTCTGTCGATTACGACATCGAGAAACGTCGCCAGGAAAGACGAGCAGCACGCGAAAACATAATGAGGGGTGAGATCAAGGACGGGTCAAACGACCATGAGAAGGAGAGCTCTTATGCAGAAGAGAAGTCGGCAGACGACACGAAAGAAGAATGGCCAACAGCGCGCGATGAAGAAGAAAATGGAAAGGACGAAGAGAAAGACAAAGGACGAGAAGAAGACGAACAGAGACAGAAAGAAGAGGAGGAAACGGTTAGAATAGAAACACAACAACAGGAAGGTGAGATAAATGAGGAAGAGCAAAATGAAAAGCGAGAATGGAAGATGGGTGGAAAATCTACAGTAGAAGAACAGGAAAACGGTCTGAGTGGTGAAGAAGGTGAGGAAGAGGAGGAAGAAGAGGAAGAAGGAGAAGAGGAGGAGGAGGAGGAAGAAGAAGAAGAAGGGGAAAATTATCAGCAGAGAGAGGATGATTTGGCGGAGGAGGAAAGAAAGATCCAGGAGGAAGAGGATCTCATCAGGGAGGAGGAGCAATTGAAAAGAGAAGAAGAACAACGGTGGAGAGAGATGGAAGAGAAGAGGCAGCAACAAGAGAAGGACGAAATGGAGTTTGAAGAAGACGAGAAGAGACGTAGAAAAGAAGAAGCGGAAATGGACGTGGACGTCGGGGAGAAGAACGAAGATCAGAGTTCTCCCGAAGAGAAGGACAAGGGAAGGACGATTGACGAGAAACGTCAGCAGCTGATGAAACACATGAATGGCTCGATCGATGGAACGACCCCGACACGGCCGAAAAATGACTCCCCTGATACGCCACCTAGCGAGACTAAAAGACGACAGAAGGCGATGGAATGGGAACAACGTCTCAAGAGAACGCCCTCGCAAAGCGAACCAAACGACAGAATCAAACAGATTGAGGAACAACGGGCCGCCGAGCGGAAGGAGCTACAGCGCGCGCGCGAGGCTCGCTGGAAAGAACGGGAAGAGAAGCTCAGACAAGAAGCCGAAACTCGGAAGAAGCGGGAAGAAGACTTGGCTGAGAGACGCCGAAAGGCGGCCGAGGAGAGGAAAACGTTGCGCAAACAATCAGATATCGCTCATAACGAACCACAACCTGAAGGTGGAGAGAATGACGATGGCTTGGCAGATATGGAGAAAGGCAAAAGGAAAGGGTTGGGAGGCCTCTCCCCTGAGAAGAAGAAGCTACTAAAGCAACTGATCATGCAAAAGGCAGCGGAAGACTTAAAGAAGCAACAAGAGGCCGAAGCTGAAGCGAAGAGGAAGATCATCCAGCAGCGTGTTCCAAAACTAGAGATTGATGGATTAGATCAAGGTCGTCTGGAGAAAATCGTACGTGACCTTTATAAAAAGGTCGTTGCCCTTGAGGAGGATAAATATGATTGGGAGGTGAAGCTTAGGAAACAGGACCAAGAGATGAATGAGCTTAACATCAAGGTTAACGATATTAAGGGCAAATTTGTTAAGCCCGTCCTGAAGAAGGTATCGAAGA\n->Pg4_1/1_1.000_231\n-ATCACGTTATATTTTTATCGCCTTAAATTCGAAGCCAAATCTTATCGAGTGAACTGTGCGTGTATCCCGACTGATACGTGTATCCCAACTGAATGTGTAAGTCGTCTGCCCGCGTGTTGTGTTGTTGCCGCCGTCGTCATCATCCCAGCGCGACTTAACTCCAGCATTCAGCTGACCTGTATGAAATGTGTGCTATTTTTCCAATGTTGCGGTTTGTGTGCGTGTGTGTGT\n->Pg5_1/1_1.000_1440\n-GTACTCATAGTTGGTGTAGAGACAATGGGCTGTATTTTGGGATCGCTGG'..b'ACATTAACACTTTGTGATGTGCTACCAACAGAGTTAGTTGCCGTTACTGTGTACTCAGCCGTGTCCTCAACAAGAGACTCTTTCACCTTCAGAGTGTAAGAGTTGTCCTTCGAGATCATCTTGTAATGACTGTCCTTGTCAGTTATAGGTTTTTTGTTCTTTGACCATGTGACCTCCGGCTCTGGATAACCTGTTGTTTTACAGGTGAGCTTGAAGCCTTCACCTGCCTCAACATTAACTGGCTGTAACTTGCCCTTAATATGTGGTGCAACCTTCTTCTTCTCGACACTAACAGTAACCTTCACGATGGTGGTACCTTTGCCAGACTTGGCACTGACTCTGTACTCACCAGCATCATCAGCCGTAGCATCGACGACCTGTAAGTAATACACATCACTGTCCATATCCCAGTCAACTTTGTACTTCTTGTCACCCTTTTTCTTCGGCTTAATCTTATTCGTGTCCTTGAACCAGGTAACCTCTGCCTCCTCTTCAATTTGACAAGTTAACTTAAATGTTTCTCCTTCTGTCACAACCAATGGCTGAGGTTCTTTTGTAAACTTTGGACCTTCTGGCTTCTCCTTTGACTTCTTATCCTTTTCAACTTTTTCTTGTTTTTCTTCTTCTTTCGGTGGTTCCTCTTCCTTTAGCTTTTCTTCTTCCTTAGGTTTTTCTGCTTCTTTTACTTTTTCATCTTCTTTGGGTTTTTCCTCTACGGGTTTTTTTTCTGCCTCCTGCTCTACAGCCTTTTCGGCTGCTTCCTTTGTTACTTCCTTTGTTATCTTCTCTTCCTTTTCAGTCTCTTTCTTCTTTTTTTCCTTTTTCTCTTTCGGTTTTTCTTCTTTTGGCTCTTCCTTTTTTTCCTTCTTCTTTTTCTTCTCTTCCTTTGGCTTATCTTCTGGTTTCTGCTCTTCTTCCTTATCAATTTTGTCCTCCACCACTTCCTTCTTTTTCTTCTTCTCCTTTGGTTTCTTTTCCTTTGGACTTTCTTCCACCTTTTCCACGGCTTCTTCAACAACCTCTTGTGGCTTTTCTTCCTCTCTTATTTCTTCTTTCTGTTTGTCTTCCTTGGGCTTCTCTTCCTTTGACCTTTCCTCTTCAACCCTTTCTTCTTCCATCTTCTCTTCTATCGGCCCTTCCTCTTTAGACTTTTCTCCCTCGGATCTTGGTTCTTTAGGTTTTTCTTTGTTGAGGATTTCTTCAGATGTTTCAATTATTTCTTCCTCTATTACCTTCACTTCTTCAGGAACAGAGACAGCAACATTAACAGTTGCTGAGACGGTTCCGCCGTCATTGCTGGCGACAATGGTGTACTGGCCGGCATCATCTGGCGTGGTATCGCTGATGAGCAACAGATGGACATCATCATTTGAGTCGAAGTCAAGTTTGACACGGTCAGTTGACTTTTCAAACTGCTTACTCCCTTTGAACCAGGTGACCTCAGGCTTTGGTTGTCCTAAGACGCGTGCACCGAGCGTAATGGTCTTTCCCGACTTCACGACGACGGGTTCGGGGAAGAGATCGAACCGAGGCAAAGAGACCGTTTTGTCGGTCACGACTTCCTCCAGTGTGATGTCGCCGAATGCTTCGCTGTCGACGTCCACTGTGGAGTATATTGTCTCCTCAATGATGACGGTCTCTTTCTTTACCACCGTCTCATTTGTCAGCTCCTCATTCTCCTCAAGTGACGTTTTTCCCTCATCCGTCTTATTTTCTTTAATTAATTTGGCTTCCTCTTTCATGGCTTCATCCACCGCAGACGGATTATTCACAGCTAATTTGGTTGCCATTTTGTCATCTTCTTCATCAGCTGCATTGCCTCTTTCTTCTTCATCACCAGTAATGGGTGCCGCCATTTCTTTGACATCCGGCACGTCTTTCTTGCTTTCTTTCATTGCCTCTGATATTTCCGAACTATTTTCTGCAGATGTACCATCCACCAGCGGGTTTGATTTCTCCTCAGCCGTTTCTTTTCTCGATTTCATCAGGATCAGCTGTGGCTTACTGTCCGCCTCTTTTGACTCGATTTCTTCCAACAGCTCCTTGGTGTCTGCGATGTCTATTTCTATGTTCAGATCCTGATCTGCGGAGATCTTCAGTGCCCTAGCAGCTTCTTCCTGTACTATGGCCACTTCCGTTAGACCACCATCCCTGCCATCGGCACCCTGGCGCAGCCTGTTCTCACCGGTATAGGAAATCGTGATGATGTCTTGTTTATCTTTATCTTTGTCCTCGACGATCACCGTCACGGTGACGGTCATCGAGACCTCGCCGTGAATGTTCGACGCTTTGACGGTGTAATCATCGGCATCATCAATGGTGCACTCTTTGATGATGAGCGTATACAAATCGGATGCAACATCCCAGTGGATCTCTATGTGCTGATCCTGTTTCTTGGGCTTTAGCTCTTTATCGCCCTTATACCAAGTGACCTGAGGTTGAGGGTCACCTGCTACTTTGCAGCTAAGTTTTACGGTTTCTCCTTCTTTCACAGTGACAGGCTCGGGGATGACTTCGAATCTTGGTCTTGGCAAATCTTCTACTCTCTCTTCTTGCTCTGCGGCCAGCACTTCTTCCACGGATTCTTTTGGCTTCTGGTCCATTTCCACTTCCTTTGGGGCGGTTTCTTTTATTGGTGGTTTTTCCTTCTTTTCAATTGCTTCTTCTCCATCTTCAGATTCTTCATTTACTTCTATCCCTTTCACTTTCTGTTTTTTAACCTTCTTAATTTTCTTTCCTGATATGTCCACTTCTTCTTTCATCTTAATAATAATCTTTTTCTTCTTCTTCTTTTTCACATCTTTCCCCTCAGATTCTCCTTCACTTGATTCTGACTCACTCGATGTTACTTCAGAGTCAACAAGCTCAGACTCAATTTCTGTTGTCTCTTCTTCAGTAGTAGTAAACTCAACCTCAACTTTATCAATCTTTTCGGATGTCTCCATCACATCAACTTCTGCCTTTTCTATTCTTATAGATTCACTCAATGAACTAATATTCACTGACACTGTACAAGATTCGGAGCCACAATCATTTGTAGCTTTCACTATGTAGTCACCAGAATCTTCAACTGTTGCATTTTTAATTATCAACATGTTTAGGTCATTAGAAGTATCCCAGTCAACCTTCACACGACCTTCTTTCTTTGGTTTGATTTTCTTCTCATCTTTATACCAAGTAATTGATGGATCTGGATTTCCTGTGACCCGACACGTTAAACGGATTGTCTCTCCTTCGTTTACAGTGACTGGCTGCGGTTCTGCGGTGATGACAGGGGCACTAGAGGCACGTGTCGCATCCAGCTGGTCATCAACTGGCTCTTGCTTCACTCTTTCCGTTGGCTCCTTGGCTGGGTATCTTCACAGCGTCGTCCAGCTCAGCCATGTCTTCACTGATGCCGACCTTGTTTTCCGCATAAACCCGAAAGAAATACTGATGGCCTTCCTTAACTTTGGTCACAGTCAAAGTTAAGGTTGTGCCATTTGTCTGTCCGACCTTCTTAAATTTATTTTTCTCGGCTTCTCTCATCACGATCAAGTAAGATGTTAGGTCCGTGTTGCCGGCATCAATCGGTGCATCCCAGCTCAGCGTCACCGAACTACTGTTTACTTCCTTAACAATTAGATTTGTGGGAGCCGATGGCACGGTCTGAGGTTTCTCAGCAGATGGCTGTTGATGCTTTTCAGAAGTGATTTCTTCTTCAGATGTCTGTTCGGGTTTGGGGGCTGTCTGCGCCTCGGTTGTGATTTTGGCTTCAGACTGAGGCTCGGCTTCTGTGACTTTGCTCTCCGTGACCTCTACTTGCTCTTCTACTATTTCCACTTCTTCATCCTTCTTGATCACTTTTGATGTTATTTTTACCGCAGAATCTATCTCAGCAGCTGATTCACTGATGCCAGCACTGTTCTCGGCATAAACCCTGATAAAGTATTCTTTGGCTGGCTCGATGTTAGAAGTAATGGAATATTTCAACGTACCGCCAGACACTTTACCAACTT\n'
b
diff -r 7a813e633d1c -r a83562c0719f test-data/velvet_up.output
--- a/test-data/velvet_up.output Fri Feb 01 10:22:32 2019 -0500
+++ b/test-data/velvet_up.output Mon Feb 03 14:37:31 2025 +0000
b
@@ -1,21 +1,27 @@
-Number of segment pairs = 4032; number of pairwise comparisons = 0
+Number of segment pairs = 39402; number of pairwise comparisons = 402
+'+' means given segment; '-' means reverse complement
+
+Overlaps            Containments  No. of Constraints Supporting Overlap
+
+
+DETAILED DISPLAY OF CONTIGS
+Number of segment pairs = 39402; number of pairwise comparisons = 343
 '+' means given segment; '-' means reverse complement
 
 Overlaps            Containments  No. of Constraints Supporting Overlap
 
 
 DETAILED DISPLAY OF CONTIGS
-Number of segment pairs = 4160; number of pairwise comparisons = 0
+Number of segment pairs = 39402; number of pairwise comparisons = 352
 '+' means given segment; '-' means reverse complement
 
 Overlaps            Containments  No. of Constraints Supporting Overlap
 
 
 DETAILED DISPLAY OF CONTIGS
-Number of segment pairs = 4422; number of pairwise comparisons = 1
-'+' means given segment; '-' means reverse complement
-
-Overlaps            Containments  No. of Constraints Supporting Overlap
-
-
-DETAILED DISPLAY OF CONTIGS
+cap3 outputs/Pg_transcriptome_90109.fasta -p 100 -o 60
+outputs/Pg_transcriptome_90109.fasta.cap.singlets and outputs/Pg_transcriptome_90109.fasta.cap.contigs
+cap3 outputs/Ap_transcriptome_35099.fasta -p 100 -o 60
+outputs/Ap_transcriptome_35099.fasta.cap.singlets and outputs/Ap_transcriptome_35099.fasta.cap.contigs
+cap3 outputs/Ac_transcriptome_25591.fasta -p 100 -o 60
+outputs/Ac_transcriptome_25591.fasta.cap.singlets and outputs/Ac_transcriptome_25591.fasta.cap.contigs