Mercurial > repos > peterjc > seq_rename
annotate tools/seq_rename/seq_rename.py @ 2:7c0642fc57ad draft
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
Includes additional tested added in v0.0.3
author | peterjc |
---|---|
date | Fri, 11 Oct 2013 04:39:16 -0400 |
parents | |
children | e1398f2ba9fe |
rev | line source |
---|---|
2
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
2 """Rename FASTA, QUAL, FASTQ or SSF sequences with ID mapping from tabular file. |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
3 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
4 Takes six command line options, tabular filename, current (old) ID column |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
5 number (using one based counting), new ID column number (also using one based |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
6 counting), input sequence filename, input type (e.g. FASTA or SFF) and the |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
7 output filename (same format as input sequence file). |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
8 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
9 When selecting from an SFF file, any Roche XML manifest in the input file is |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
10 preserved in both output files. |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
11 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
12 This tool is a short Python script which requires Biopython 1.54 or later |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
13 for SFF file support. If you use this tool in scientific work leading to a |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
14 publication, please cite the Biopython application note: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
15 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
16 Cock et al 2009. Biopython: freely available Python tools for computational |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
17 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
18 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
19 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
20 This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute UK. |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
21 All rights reserved. See accompanying text file for licence details (MIT |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
22 license). |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
23 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
24 This is version 0.0.4 of the script. |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
25 """ |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
26 import sys |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
27 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
28 if "-v" in sys.argv or "--version" in sys.argv: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
29 print "v0.0.4" |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
30 sys.exit(0) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
31 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
32 def stop_err(msg, err=1): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
33 sys.stderr.write(msg.rstrip() + "\n") |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
34 sys.exit(err) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
35 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
36 #Parse Command Line |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
37 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
38 tabular_file, old_col_arg, new_col_arg, in_file, seq_format, out_file = sys.argv[1:] |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
39 except ValueError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
40 stop_err("Expected six arguments (tabular file, old col, new col, input file, format, output file), got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv))) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
41 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
42 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
43 if old_col_arg.startswith("c"): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
44 old_column = int(old_col_arg[1:])-1 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
45 else: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
46 old_column = int(old_col_arg)-1 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
47 except ValueError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
48 stop_err("Expected column number, got %s" % old_col_arg) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
49 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
50 if old_col_arg.startswith("c"): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
51 new_column = int(new_col_arg[1:])-1 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
52 else: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
53 new_column = int(new_col_arg)-1 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
54 except ValueError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
55 stop_err("Expected column number, got %s" % new_col_arg) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
56 if old_column == new_column: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
57 stop_err("Old and new column arguments are the same!") |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
58 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
59 def parse_ids(tabular_file, old_col, new_col): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
60 """Read tabular file and record all specified ID mappings.""" |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
61 handle = open(tabular_file, "rU") |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
62 for line in handle: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
63 if not line.startswith("#"): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
64 parts = line.rstrip("\n").split("\t") |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
65 yield parts[old_col].strip(), parts[new_col].strip() |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
66 handle.close() |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
67 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
68 #Load the rename mappings |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
69 rename = dict(parse_ids(tabular_file, old_column, new_column)) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
70 print "Loaded %i ID mappings" % len(rename) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
71 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
72 #Rewrite the sequence file |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
73 if seq_format.lower()=="sff": |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
74 #Use Biopython for this format |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
75 renamed = 0 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
76 def rename_seqrecords(records, mapping): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
77 global renamed #nasty, but practical! |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
78 for record in records: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
79 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
80 record.id = mapping[record.id] |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
81 renamed += 1 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
82 except KeyError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
83 pass |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
84 yield record |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
85 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
86 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
87 from Bio.SeqIO.SffIO import SffIterator, SffWriter |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
88 except ImportError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
89 stop_err("Requires Biopython 1.54 or later") |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
90 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
91 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
92 from Bio.SeqIO.SffIO import ReadRocheXmlManifest |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
93 except ImportError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
94 #Prior to Biopython 1.56 this was a private function |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
95 from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
96 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
97 in_handle = open(in_file, "rb") #must be binary mode! |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
98 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
99 manifest = ReadRocheXmlManifest(in_handle) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
100 except ValueError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
101 manifest = None |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
102 out_handle = open(out_file, "wb") |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
103 writer = SffWriter(out_handle, xml=manifest) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
104 in_handle.seek(0) #start again after getting manifest |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
105 count = writer.write_file(rename_seqrecords(SffIterator(in_handle), rename)) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
106 out_handle.close() |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
107 in_handle.close() |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
108 else: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
109 #Use Galaxy for FASTA, QUAL or FASTQ |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
110 if seq_format.lower() in ["fasta", "csfasta"] \ |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
111 or seq_format.lower().startswith("qual"): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
112 from galaxy_utils.sequence.fasta import fastaReader, fastaWriter |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
113 reader = fastaReader(open(in_file, "rU")) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
114 writer = fastaWriter(open(out_file, "w")) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
115 marker = ">" |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
116 elif seq_format.lower().startswith("fastq"): |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
117 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
118 reader = fastqReader(open(in_file, "rU")) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
119 writer = fastqWriter(open(out_file, "w")) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
120 marker = "@" |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
121 else: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
122 stop_err("Unsupported file type %r" % seq_format) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
123 #Now do the renaming |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
124 count = 0 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
125 renamed = 0 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
126 for record in reader: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
127 #The [1:] is because the fastaReader leaves the > on the identifier, |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
128 #likewise the fastqReader leaves the @ on the identifier |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
129 try: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
130 idn, descr = record.identifier[1:].split(None, 1) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
131 except ValueError: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
132 idn = record.identifier[1:] |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
133 descr = None |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
134 if idn in rename: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
135 if descr: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
136 record.identifier = "%s%s %s" % (marker, rename[idn], descr) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
137 else: |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
138 record.identifier = "%s%s" % (marker, rename[idn]) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
139 renamed += 1 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
140 writer.write(record) |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
141 count += 1 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
142 writer.close() |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
143 reader.close() |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
144 |
7c0642fc57ad
Uploaded v0.0.4, automatic dependency on Biopython 1.62, new README file, citation information, MIT licence.
peterjc
parents:
diff
changeset
|
145 print "Renamed %i out of %i records" % (renamed, count) |