annotate tools/seq_rename/seq_rename.py @ 2:7c0642fc57ad draft

Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence. Includes additional tested added in v0.0.3
author peterjc
date Fri, 11 Oct 2013 04:39:16 -0400
parents
children e1398f2ba9fe
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
2
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
2 """Rename FASTA, QUAL, FASTQ or SSF sequences with ID mapping from tabular file.
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
3
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
4 Takes six command line options, tabular filename, current (old) ID column
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
5 number (using one based counting), new ID column number (also using one based
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
6 counting), input sequence filename, input type (e.g. FASTA or SFF) and the
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
7 output filename (same format as input sequence file).
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
8
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
9 When selecting from an SFF file, any Roche XML manifest in the input file is
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
10 preserved in both output files.
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
11
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
12 This tool is a short Python script which requires Biopython 1.54 or later
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
13 for SFF file support. If you use this tool in scientific work leading to a
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
14 publication, please cite the Biopython application note:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
15
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
16 Cock et al 2009. Biopython: freely available Python tools for computational
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
17 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
18 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
19
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
20 This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute UK.
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
21 All rights reserved. See accompanying text file for licence details (MIT
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
22 license).
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
23
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
24 This is version 0.0.4 of the script.
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
25 """
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
26 import sys
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
27
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
28 if "-v" in sys.argv or "--version" in sys.argv:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
29 print "v0.0.4"
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
30 sys.exit(0)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
31
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
32 def stop_err(msg, err=1):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
33 sys.stderr.write(msg.rstrip() + "\n")
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
34 sys.exit(err)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
35
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
36 #Parse Command Line
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
37 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
38 tabular_file, old_col_arg, new_col_arg, in_file, seq_format, out_file = sys.argv[1:]
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
39 except ValueError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
40 stop_err("Expected six arguments (tabular file, old col, new col, input file, format, output file), got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
41
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
42 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
43 if old_col_arg.startswith("c"):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
44 old_column = int(old_col_arg[1:])-1
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
45 else:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
46 old_column = int(old_col_arg)-1
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
47 except ValueError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
48 stop_err("Expected column number, got %s" % old_col_arg)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
49 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
50 if old_col_arg.startswith("c"):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
51 new_column = int(new_col_arg[1:])-1
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
52 else:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
53 new_column = int(new_col_arg)-1
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
54 except ValueError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
55 stop_err("Expected column number, got %s" % new_col_arg)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
56 if old_column == new_column:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
57 stop_err("Old and new column arguments are the same!")
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
58
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
59 def parse_ids(tabular_file, old_col, new_col):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
60 """Read tabular file and record all specified ID mappings."""
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
61 handle = open(tabular_file, "rU")
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
62 for line in handle:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
63 if not line.startswith("#"):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
64 parts = line.rstrip("\n").split("\t")
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
65 yield parts[old_col].strip(), parts[new_col].strip()
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
66 handle.close()
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
67
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
68 #Load the rename mappings
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
69 rename = dict(parse_ids(tabular_file, old_column, new_column))
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
70 print "Loaded %i ID mappings" % len(rename)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
71
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
72 #Rewrite the sequence file
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
73 if seq_format.lower()=="sff":
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
74 #Use Biopython for this format
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
75 renamed = 0
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
76 def rename_seqrecords(records, mapping):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
77 global renamed #nasty, but practical!
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
78 for record in records:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
79 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
80 record.id = mapping[record.id]
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
81 renamed += 1
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
82 except KeyError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
83 pass
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
84 yield record
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
85
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
86 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
87 from Bio.SeqIO.SffIO import SffIterator, SffWriter
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
88 except ImportError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
89 stop_err("Requires Biopython 1.54 or later")
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
90
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
91 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
92 from Bio.SeqIO.SffIO import ReadRocheXmlManifest
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
93 except ImportError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
94 #Prior to Biopython 1.56 this was a private function
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
95 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
96
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
97 in_handle = open(in_file, "rb") #must be binary mode!
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
98 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
99 manifest = ReadRocheXmlManifest(in_handle)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
100 except ValueError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
101 manifest = None
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
102 out_handle = open(out_file, "wb")
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
103 writer = SffWriter(out_handle, xml=manifest)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
104 in_handle.seek(0) #start again after getting manifest
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
105 count = writer.write_file(rename_seqrecords(SffIterator(in_handle), rename))
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
106 out_handle.close()
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
107 in_handle.close()
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
108 else:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
109 #Use Galaxy for FASTA, QUAL or FASTQ
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
110 if seq_format.lower() in ["fasta", "csfasta"] \
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
111 or seq_format.lower().startswith("qual"):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
112 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
113 reader = fastaReader(open(in_file, "rU"))
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
114 writer = fastaWriter(open(out_file, "w"))
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
115 marker = ">"
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
116 elif seq_format.lower().startswith("fastq"):
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
117 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
118 reader = fastqReader(open(in_file, "rU"))
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
119 writer = fastqWriter(open(out_file, "w"))
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
120 marker = "@"
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
121 else:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
122 stop_err("Unsupported file type %r" % seq_format)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
123 #Now do the renaming
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
124 count = 0
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
125 renamed = 0
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
126 for record in reader:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
127 #The [1:] is because the fastaReader leaves the > on the identifier,
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
128 #likewise the fastqReader leaves the @ on the identifier
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
129 try:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
130 idn, descr = record.identifier[1:].split(None, 1)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
131 except ValueError:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
132 idn = record.identifier[1:]
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
133 descr = None
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
134 if idn in rename:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
135 if descr:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
136 record.identifier = "%s%s %s" % (marker, rename[idn], descr)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
137 else:
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
138 record.identifier = "%s%s" % (marker, rename[idn])
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
139 renamed += 1
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
140 writer.write(record)
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
141 count += 1
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
142 writer.close()
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
143 reader.close()
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
144
7c0642fc57ad Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff changeset
145 print "Renamed %i out of %i records" % (renamed, count)