Repository 'sort_by_tissue'
hg clone https://toolshed.g2.bx.psu.edu/repos/lnguyen/sort_by_tissue

Changeset 0:3155d867c056 (2017-09-15)
Commit message:
planemo upload
added:
HPA_selection.txt
Lung.txt
README.txt
Salivary.txt
Trash.txt
Trash3.txt
Trash_detail.txt
Trash_detail3.txt
__init__.py
hpa_tissue_distribution.py
hpa_tissue_distribution.xml
normal_tissue.csv
sort_by_tissue.py
sort_by_tissue.xml
test-data/IDs.txt
test-data/na_sort_by_tissue.txt
test-data/sort_by_tissue_output.txt
test-data/trash_detail_sort_by_tissue.txt
test-data/trash_sort_by_tissue.txt
b
diff -r 000000000000 -r 3155d867c056 HPA_selection.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/HPA_selection.txt Fri Sep 15 11:04:37 2017 -0400
b
@@ -0,0 +1,1 @@
+Majority protein IDs Protein names Gene names iBAQ LYOP2 / iBAQ TNEG2 iBAQ LYOP3 / iBAQ TNEG3 Razor + unique peptides Filtered
b
diff -r 000000000000 -r 3155d867c056 Lung.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Lung.txt Fri Sep 15 11:04:37 2017 -0400
b
b'@@ -0,0 +1,12973 @@\n+TSPAN6\n+DPM1\n+SCYL3\n+C1orf112\n+FGR\n+CFH\n+FUCA2\n+GCLC\n+NFYA\n+NIPAL3\n+LAS1L\n+ENPP4\n+SEMA3F\n+CFTR\n+ANKIB1\n+CYP51A1\n+KRIT1\n+BAD\n+LAP3\n+CD99\n+HS3ST1\n+AOC1\n+HECW1\n+MAD1L1\n+LASP1\n+SNX11\n+TMEM176A\n+M6PR\n+KLHL13\n+CYP26B1\n+ICA1\n+DBNDD1\n+ALS2\n+CASP10\n+CFLAR\n+TFPI\n+NDUFAF7\n+RBM5\n+SARM1\n+PLXND1\n+AK2\n+CD38\n+FKBP4\n+KDM1A\n+RBM6\n+CAMKK1\n+RECQL\n+VPS50\n+HSPB6\n+ARHGAP33\n+NDUFAB1\n+PDK4\n+SLC22A16\n+ZMYND10\n+SLC25A13\n+ST7\n+CDC27\n+SLC4A1\n+HCCS\n+DVL2\n+PRSS22\n+UPF1\n+SKAP2\n+SLC25A5\n+CCDC109B\n+HOXA11\n+POLR2J\n+MEOX1\n+THSD7A\n+LIG3\n+RPAP3\n+ACSM3\n+AC004381.6\n+CIAPIN1\n+FAM214B\n+COPZ2\n+PRKAR2B\n+MSL3\n+CREBBP\n+BZRAP1\n+MPO\n+PON1\n+WDR54\n+CROT\n+ABCB4\n+KMT2E\n+RHBDD2\n+SOX8\n+IBTK\n+ZNF195\n+ITGAL\n+PDK2\n+ITGA3\n+ZFX\n+LAMP2\n+ITGA2B\n+C19orf60\n+CRLF1\n+OSBPL7\n+TMEM98\n+YBX2\n+KRT33A\n+ABCC8\n+CACNG3\n+TMEM132A\n+AP2B1\n+TAC1\n+ZNF263\n+CX3CL1\n+SPATA20\n+TNFRSF12A\n+MAP3K9\n+RALA\n+BAIAP2L1\n+KDM7A\n+AGK\n+ALDH3B1\n+TTC22\n+PHTF2\n+FARP2\n+USH1C\n+GGCT\n+IFRD1\n+COX10\n+GTF2IRD1\n+PAF1\n+VPS41\n+ARHGAP44\n+ELAC2\n+SCIN\n+ARSD\n+PNPLA4\n+ADIPOR2\n+PRSS21\n+MARK4\n+PROM1\n+CCDC124\n+CEACAM21\n+PAFAH1B1\n+NOS2\n+DNAH9\n+KIAA0100\n+SLC13A2\n+GAS7\n+TRAPPC6A\n+MATK\n+CEACAM7\n+CD79B\n+ST7L\n+TKTL1\n+PAX6\n+RPUSD1\n+LUC7L\n+CACNA2D2\n+BAIAP3\n+TSR3\n+PIGQ\n+CRAMP1\n+SELE\n+FMO3\n+E2F2\n+PSMB1\n+SYN1\n+JARID2\n+CDKL5\n+CDK11A\n+NADK\n+TFAP2B\n+DLEC1\n+CYTH3\n+SYPL1\n+CYB561\n+SPAG9\n+CELSR3\n+AASS\n+PLEKHG6\n+SS18L2\n+MPND\n+MGST1\n+CRY1\n+PGLYRP1\n+NFIX\n+ST3GAL1\n+MMP25\n+IL32\n+PKD1\n+MAPK8IP2\n+RHOBTB2\n+HEATR5B\n+SEC62\n+RPS20\n+CSDE1\n+UBE3C\n+REV3L\n+MASP2\n+IYD\n+FAM76A\n+TRAF3IP3\n+POMT2\n+VTA1\n+BAZ1B\n+RANBP9\n+SPRTN\n+METTL13\n+ZNF207\n+UQCRC1\n+STARD3NL\n+CD9\n+HHATL\n+NCAPD2\n+IFFO1\n+GIPR\n+PHF7\n+NISCH\n+STAB1\n+FUZ\n+SLC6A13\n+PRSS3\n+ZNF200\n+CD4\n+LRRC23\n+BTK\n+HFE\n+FYN\n+FMO1\n+TCEB3\n+CLCN6\n+MRC2\n+NME1-NME2\n+TSPAN9\n+BTBD7\n+APBA3\n+MKS1\n+ABHD5\n+AKAP8L\n+MBTD1\n+UTP18\n+RNF216\n+TTC19\n+PTBP1\n+DPF1\n+SYT7\n+LARS2\n+PIK3C2A\n+PLAUR\n+ANLN\n+WIZ\n+RABGAP1\n+DCN\n+QPCTL\n+PPP5C\n+CEP68\n+MAP4K3\n+TYROBP\n+TMEM159\n+GABRA3\n+BRCA1\n+ERCC1\n+CD22\n+MBTPS2\n+PRICKLE3\n+LTF\n+EXTL3\n+ELOVL5\n+ALOX5\n+CALCOCO1\n+UBR7\n+MAP4K5\n+EHD3\n+PSMC4\n+MAN2B2\n+SLC7A14\n+CLDN11\n+SLC25A39\n+MVP\n+NUB1\n+PGM3\n+RWDD2A\n+CLK1\n+POLR3B\n+ANGEL1\n+RNF14\n+DNASE1L1\n+DDX11\n+HEBP1\n+GPRC5A\n+MAMLD1\n+CD6\n+TACC3\n+UFL1\n+POLA2\n+ZC3H3\n+CAPN1\n+ACPP\n+MDH1\n+SLC30A9\n+MTMR11\n+COX15\n+CCDC88C\n+YAF2\n+WAS\n+DPEP1\n+BID\n+MATR3\n+NPC1L1\n+NUDCD3\n+ISL1\n+CHDH\n+IL20RA\n+CLCA1\n+CLCA4\n+GLT8D1\n+ATP2C1\n+IGF1\n+SLC38A5\n+RALBP1\n+RUFY3\n+CNTN1\n+SLC11A1\n+WWTR1\n+AGPS\n+CXorf56\n+ATP1A2\n+TTC27\n+ZNF582\n+VSIG2\n+PHLDB1\n+MARCO\n+CYP24A1\n+PRDM11\n+SYT13\n+SNAI2\n+CD74\n+HGF\n+ZRANB1\n+NCDN\n+ZFP64\n+MNAT1\n+SAMD4A\n+RUNX3\n+MRE11A\n+SERPINB1\n+CYP3A43\n+SLC7A9\n+SPAST\n+NRXN3\n+OSBPL5\n+CPS1\n+C8B\n+FHL1\n+RTFDC1\n+GABRA1\n+SLC45A4\n+RNF10\n+ZNF839\n+ZDHHC6\n+GRAMD1B\n+RNH1\n+NDUFS1\n+RB1CC1\n+ERP44\n+ALAS1\n+BIRC3\n+AKAP11\n+GLRX2\n+STRAP\n+ABCC2\n+DEF6\n+GCLM\n+UBR2\n+EHD2\n+DEPDC1\n+CCDC28A\n+RRAGD\n+HSF2\n+PHF20\n+HSD17B6\n+NR1H3\n+TYMP\n+NCAPH2\n+TOMM34\n+SEC63\n+KPNA6\n+VIM\n+FAS\n+RNASET2\n+CD44\n+KCNG1\n+AGPAT4\n+SLAMF7\n+BTN3A1\n+MIPEP\n+PRKCH\n+IFNGR1\n+B4GALT7\n+VRK2\n+TNFRSF1B\n+VEZT\n+POU2F2\n+BRD9\n+SNX1\n+TBPL1\n+ARNTL2\n+BCLAF1\n+SLC39A9\n+ANK1\n+TFB1M\n+RABEP1\n+HMGB3\n+BAK1\n+IKZF2\n+GRN\n+FAM13B\n+ARHGAP31\n+CENPQ\n+SARS\n+RANBP3\n+TSSC1\n+PNPLA6\n+ALG1\n+ZCCHC8\n+ABCF2\n+CHPF2\n+FUT8\n+UBA6\n+GAB2\n+ATP6V0A1\n+PIAS1\n+SLC4A7\n+APBA2\n+MAP2K3\n+EFCAB1\n+ASTE1\n+RNF19A\n+PEX3\n+GABARAPL2\n+MYOC\n+SH3YL1\n+FAM136A\n+VCL\n+DEPDC1B\n+NSMAF\n+ADSS\n+STAP1\n+TIMP2\n+RFC1\n+TBC1D23\n+CUL3\n+MYOM2\n+OTC\n+ZZZ3\n+SLC18A1\n+USP2\n+CASR\n+TUBG2\n+RPL26L1\n+NSUN2\n+FBXO42\n+MFAP3\n+MRI1\n+METTL1\n+AGA\n+PI4K2B\n+BOD1L1\n+MAT2B\n+EDC4\n+TRIO\n+VCAN\n+CLEC16A\n+MSR1\n+CDH1\n+SKIV2L2\n+DNAH5\n+ZFYVE16\n+FAM65A\n+C6\n+RAI14\n+SOX30\n+PNKP\n+BEST2\n+PHLPP2\n+SPDL1\n+STAU2\n+PQLC2\n+PHF23\n+CDH10\n+INPP4A\n+RAB27B\n+PSMA4\n+LSG1\n+PARP3\n+TNC\n+THAP3\n+FAM65C\n+AIFM2\n+C2orf83\n+SPATA7\n+RETSAT\n+CAPG\n+ZPBP\n+TG\n+BARX2\n+DCUN1D1\n+JADE2\n+LCP2\n+TRIT1\n+ADRB1\n+CUL7\n+CTNNA1\n+PHKA2\n+CNTLN\n+EPHA3\n+HSPA5\n+DSG2\n+GEMIN8\n+OFD1\n+GPM6B\n+MAGEC2\n+PREX2\n+WDR37\n+YTHDC2\n+CTPS2\n+ATP6V1H\n+POLR2B\n+FAM214A\n+ARAP2\n+TPR\n+CP\n+KIAA0556\n+DTNBP1\n+XK\n+C12orf4\n+SCML1\n+WWC3\n+ARHGAP6\n+FAM184B\n+MAP4\n+GOPC\n+USP28\n+HDAC9\n+NOP16\n+CC2D2A\n+RRM2B\n+ZNF800\n+SNX29\n+MRPS10\n+GUCA1A\n+RSF1\n+VPS'..b'ZBED5\n+GAGE12F\n+CT47A1\n+TAS2R39\n+TSPY10\n+CT47B1\n+ZNF853\n+ARHGEF38\n+USP17L8\n+B3GNT9\n+CDKN2AIPNL\n+CKMT1B\n+PATE4\n+RGL2\n+KIFC1\n+C2orf74\n+GAGE12C\n+AMY1A\n+CT47A5\n+TSPY9P\n+LRRC37A2\n+C9orf69\n+PAGE2B\n+TXNDC5\n+RNF103\n+RBM14\n+KLHL41\n+C2orf61\n+NME1\n+TNFSF12\n+CDRT4\n+APOBEC3G\n+WBP1\n+MRPS17\n+DEFA3\n+GET4\n+C1orf226\n+ADSL\n+LILRA4\n+TEX35\n+AMY2B\n+LY6G5B\n+PSMB9\n+PCDHGC3\n+COX19\n+DEFA1B\n+ACAD11\n+PPIL3\n+TNFRSF13B\n+AQP1\n+ISY1\n+PNMA2\n+ARHGEF25\n+TMEM189\n+RDH14\n+MIF\n+NSUN6\n+HLA-DOB\n+YAE1D1\n+CRCP\n+RPL36A\n+PDXP\n+EGFL8\n+ATP5J2\n+SSX2\n+ARPC1A\n+ATP5O\n+PLEKHO2\n+C8orf58\n+PISD\n+HOGA1\n+PWP2\n+PI4KA\n+AMACR\n+TCP10L\n+ARFGAP3\n+PEG10\n+CT47A2\n+EIF6\n+RBMY1E\n+MRPL20\n+ARPIN\n+SERPINB10\n+HLA-DMB\n+RGAG4\n+AP5Z1\n+RBMY1B\n+EIF4EBP3\n+PSG11\n+PSG4\n+MRPL33\n+MICAL3\n+STON1\n+PGBD3\n+PRAF2\n+EFNA4\n+C4orf48\n+NAT6\n+AMY2A\n+TNFRSF6B\n+UPK3B\n+IL10RB\n+CFB\n+ZNF487\n+WDR92\n+NME2\n+LEFTY1\n+CFAP57\n+NPIPB5\n+TTC4\n+ZMYM6NB\n+JMJD7\n+APOBEC3D\n+MRPS6\n+ZNF512\n+GSTA1\n+ACY1\n+NFS1\n+DDOST\n+TMEM199\n+GSTA2\n+P2RY11\n+TMEM141\n+XXbac-BPG116M5.17\n+DBNDD2\n+RBMY1D\n+ETV5\n+CFHR1\n+RBM12\n+SCARF2\n+APOBEC3C\n+CCDC13\n+ASPRV1\n+C4A\n+HBB\n+CRYBB2\n+N4BP2L2\n+CEBPA\n+H2AFJ\n+PGAM5\n+TWF2\n+MARS2\n+USP51\n+BCKDHA\n+INSL3\n+ADH1C\n+CDK11B\n+ABHD14A\n+RBM14-RBM4\n+FMN1\n+ATP5J2-PTCD1\n+HAUS5\n+TMEM150C\n+NAIP\n+SPATS1\n+HS3ST5\n+C15orf38-AP3S2\n+YJEFN3\n+XXbac-BPG246D15.9\n+KIAA1456\n+SMIM20\n+KIAA1210\n+RP5-877J2.1\n+CHCHD10\n+CDK3\n+GPR162\n+IQCJ-SCHIP1\n+SEPP1\n+PRODH2\n+ZNF674\n+ZNF345\n+SHANK3\n+ZNF550\n+PCDHA12\n+RGS21\n+CTC-534A2.2\n+CCDC71L\n+HOXA10\n+SERPINE3\n+C1orf210\n+SMIM18\n+ALG11\n+PRKDC\n+UTP14C\n+ZNF260\n+LYN\n+PINX1\n+PCDHGB7\n+NPIPB11\n+LRRC24\n+SIGLEC14\n+AP5B1\n+CHMP4A\n+ARMS2\n+INS\n+RP11-468E2.1\n+EEF1G\n+CKLF-CMTM1\n+SLC22A18AS\n+NPIPA2\n+BORCS8\n+RP11-872D17.8\n+DPP3\n+ANKHD1-EIF4EBP3\n+SAA2-SAA4\n+CHMP1B\n+EID3\n+FXYD6-FXYD2\n+EID1\n+HCAR3\n+CARD18\n+FDXACB1\n+CTC-435M10.3\n+CYP2A6\n+CTSO\n+DYX1C1\n+ZNF432\n+HMBS\n+ASIC5\n+POLG2\n+ZNF350\n+PGA5\n+HP\n+KIAA1147\n+LSM14A\n+TAS2R38\n+MGAM\n+FNTB\n+CNPY2\n+MGAM2\n+CUX1\n+MAP1LC3B2\n+CHURC1\n+C17orf49\n+PDF\n+RNASE12\n+RP11-574F21.3\n+SPESP1\n+BCL2L2-PABPN1\n+SYNJ2BP-COX16\n+ERCC6-PGBD3\n+DUXA\n+CEP95\n+RP11-407N17.3\n+TUBB3\n+TSPY1\n+NDUFC2-KCTD14\n+SMIM6\n+RP11-298I3.5\n+ITGB3\n+ZHX1-C8orf76\n+GH1\n+THTPA\n+MRPL46\n+RBM15B\n+HOXB7\n+FRRS1L\n+MRC1\n+C16orf95\n+RP11-529K1.3\n+CCPG1\n+SULT1A3\n+TMEM178B\n+EPPK1\n+BOP1\n+PECAM1\n+GAN\n+C15orf65\n+HPR\n+ISY1-RAB43\n+CORO7\n+RP13-1032I1.10\n+MRPL12\n+C19orf84\n+FAM58A\n+GTF2I\n+ZNF234\n+XXbac-BPG32J3.22\n+MYZAP\n+SRSF8\n+IKBKE\n+MSMB\n+NBPF11\n+MYH4\n+OTUD7B\n+GJA5\n+RBP3\n+RBM8A\n+TIMM23\n+RNF115\n+AP000275.65\n+RPL17\n+FSBP\n+TXNIP\n+SRGAP2\n+RASSF5\n+STRADA\n+NBPF15\n+NCOA4\n+GDF10\n+AARSD1\n+AC006538.4\n+UPK3BL\n+APOC4\n+ZNF285\n+S1PR2\n+CGB1\n+FDX1L\n+ZNF224\n+NDUFA7\n+AC011513.3\n+SSX4\n+NBPF12\n+SLC6A14\n+SMIM17\n+FAM156A\n+SSX2B\n+AC018755.18\n+CTAG1A\n+CTD-2207O23.3\n+CTD-3105H18.16\n+CT45A1\n+CTD-2521M24.10\n+CALR3\n+CT45A3\n+TRABD2B\n+IKBKG\n+SPIB\n+CT45A10\n+NBPF9\n+SSX4B\n+COMMD3-BMI1\n+C7orf55-LUC7L2\n+HIST2H4B\n+NBPF14\n+TAF15\n+HSPE1-MOB4\n+GAS2L2\n+RPS10-NUDT3\n+HIST2H4A\n+RASL10B\n+CT45A9\n+AC240274.1\n+NBPF10\n+CT45A2\n+LIX1L\n+FAM231D\n+HIST2H2AA4\n+NUDT3\n+POM121C\n+FAM47E-STBD1\n+CFAP206\n+DOC2B\n+DCP1A\n+C2orf15\n+GRIN2B\n+NBPF26\n+ZBTB8B\n+HIST1H4K\n+CWC25\n+SMIM11B\n+CT45A7\n+HIST1H2BM\n+AL592183.1\n+CYFIP1\n+HIST1H2BG\n+NOL12\n+HIST1H3G\n+SOCS7\n+HIST1H3B\n+GAGE13\n+CBSL\n+ADRA2B\n+HIST1H2BE\n+TPTE\n+TBC1D3L\n+WBSCR16\n+RIMBP3B\n+HIST1H4F\n+HIST1H2BO\n+CCL23\n+HIST1H3E\n+F8A2\n+TBC1D3B\n+HIST1H2AH\n+MLLT6\n+SYNRG\n+NUDT18\n+GAGE2E\n+HIST1H4L\n+LENG9\n+HIST1H2AK\n+AC171558.2\n+CCL4\n+SGK223\n+HIST1H3I\n+FCGBP\n+HNF1B\n+HIST1H4G\n+AATF\n+HIST1H2BH\n+HIST1H3A\n+LYZL6\n+RP11-449H3.3\n+RIMBP3\n+ARHGAP23\n+TUBGCP5\n+U2AF1L5\n+PRSS2\n+DUSP14\n+ORAI1\n+HIST1H4I\n+PIK3R6\n+PIP4K2B\n+AC004556.1\n+HIST1H2AJ\n+CCL14\n+HIST1H2BB\n+DACH1\n+HIST1H2AL\n+RP4-608O15.3\n+HIST1H4E\n+HIST1H2AE\n+TYW1B\n+F8A3\n+HIST1H4D\n+AC007325.2\n+F8A1\n+HIST1H2BF\n+SRCIN1\n+MARCKS\n+ZNF670\n+GPIHBP1\n+RP13-347D8.7\n+NEFL\n+ABC7-42404400C24.1\n+HIST1H3F\n+PSMB3\n+CISD3\n+CT45A8\n+ZNF8\n+SSTR3\n+MYO19\n+HIST1H3C\n+CT45A6\n+TBC1D3C\n+GGNBP2\n+HIST1H2AB\n+DHRS11\n+ACACA\n+HIST1H2BI\n+MRM1\n+HIST1H4A\n+RP1-321E8.5\n+HIST1H2AM\n+HIST1H4B\n+HIST1H3H\n+MRPL45\n+AC006449.2\n+AC090498.1\n+CH507-9B2.3\n+U51561.1\n+AL137860.1\n+PAGR1\n\\ No newline at end of file\n'
b
diff -r 000000000000 -r 3155d867c056 README.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README.txt Fri Sep 15 11:04:37 2017 -0400
b
@@ -0,0 +1,8 @@
+uthors**
+
+T.P. Lien Nguyen, Florence Combes, Yves Vandenbrouck CEA, INSERM, CNRS, Grenoble-Alpes University, BIG Institute, FR
+Sandra Dérozier, Olivier Rué, Christophe Caron, Valentin Loux INRA, Paris-Saclay University, MAIAGE Unit, Migale Bioinformatics platform
+
+This work has been partially funded through the French National Agency for Research (ANR) IFB project.
+
+Contact support@proteore.org for any questions or concerns about the Galaxy implementation of this tool.
b
diff -r 000000000000 -r 3155d867c056 Salivary.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Salivary.txt Fri Sep 15 11:04:37 2017 -0400
b
b'@@ -0,0 +1,12954 @@\n+TSPAN6\n+DPM1\n+SCYL3\n+C1orf112\n+FGR\n+CFH\n+FUCA2\n+GCLC\n+NFYA\n+NIPAL3\n+LAS1L\n+ENPP4\n+SEMA3F\n+CFTR\n+ANKIB1\n+CYP51A1\n+KRIT1\n+BAD\n+LAP3\n+CD99\n+HS3ST1\n+AOC1\n+HECW1\n+MAD1L1\n+LASP1\n+SNX11\n+TMEM176A\n+M6PR\n+KLHL13\n+CYP26B1\n+ICA1\n+DBNDD1\n+ALS2\n+CASP10\n+CFLAR\n+TFPI\n+NDUFAF7\n+RBM5\n+SARM1\n+PLXND1\n+AK2\n+CD38\n+FKBP4\n+KDM1A\n+RBM6\n+CAMKK1\n+RECQL\n+VPS50\n+HSPB6\n+ARHGAP33\n+NDUFAB1\n+PDK4\n+SLC22A16\n+ZMYND10\n+SLC25A13\n+ST7\n+CDC27\n+SLC4A1\n+HCCS\n+DVL2\n+PRSS22\n+UPF1\n+SKAP2\n+SLC25A5\n+CCDC109B\n+HOXA11\n+POLR2J\n+MEOX1\n+THSD7A\n+LIG3\n+RPAP3\n+ACSM3\n+AC004381.6\n+CIAPIN1\n+FAM214B\n+COPZ2\n+PRKAR2B\n+MSL3\n+CREBBP\n+BZRAP1\n+MPO\n+PON1\n+WDR54\n+CROT\n+ABCB4\n+KMT2E\n+RHBDD2\n+SOX8\n+IBTK\n+ZNF195\n+ITGAL\n+PDK2\n+ITGA3\n+ZFX\n+LAMP2\n+ITGA2B\n+C19orf60\n+CRLF1\n+OSBPL7\n+TMEM98\n+YBX2\n+KRT33A\n+ABCC8\n+CACNG3\n+TMEM132A\n+AP2B1\n+TAC1\n+ZNF263\n+CX3CL1\n+SPATA20\n+TNFRSF12A\n+MAP3K9\n+RALA\n+BAIAP2L1\n+KDM7A\n+AGK\n+ALDH3B1\n+TTC22\n+PHTF2\n+FARP2\n+USH1C\n+GGCT\n+IFRD1\n+COX10\n+GTF2IRD1\n+PAF1\n+VPS41\n+ARHGAP44\n+ELAC2\n+SCIN\n+ARSD\n+PNPLA4\n+ADIPOR2\n+PRSS21\n+MARK4\n+PROM1\n+CCDC124\n+CEACAM21\n+PAFAH1B1\n+NOS2\n+DNAH9\n+KIAA0100\n+SLC13A2\n+GAS7\n+TRAPPC6A\n+MATK\n+CEACAM7\n+CD79B\n+ST7L\n+TKTL1\n+PAX6\n+RPUSD1\n+LUC7L\n+CACNA2D2\n+BAIAP3\n+TSR3\n+PIGQ\n+CRAMP1\n+SELE\n+FMO3\n+E2F2\n+PSMB1\n+SYN1\n+JARID2\n+CDKL5\n+CDK11A\n+NADK\n+TFAP2B\n+DLEC1\n+CYTH3\n+SYPL1\n+CYB561\n+SPAG9\n+CELSR3\n+AASS\n+PLEKHG6\n+SS18L2\n+MPND\n+MGST1\n+CRY1\n+PGLYRP1\n+NFIX\n+ST3GAL1\n+MMP25\n+IL32\n+PKD1\n+MAPK8IP2\n+RHOBTB2\n+HEATR5B\n+SEC62\n+RPS20\n+CSDE1\n+UBE3C\n+REV3L\n+MASP2\n+IYD\n+FAM76A\n+TRAF3IP3\n+POMT2\n+VTA1\n+BAZ1B\n+RANBP9\n+SPRTN\n+METTL13\n+ZNF207\n+UQCRC1\n+STARD3NL\n+CD9\n+HHATL\n+NCAPD2\n+IFFO1\n+GIPR\n+PHF7\n+NISCH\n+STAB1\n+FUZ\n+SLC6A13\n+PRSS3\n+ZNF200\n+CD4\n+LRRC23\n+BTK\n+HFE\n+FYN\n+FMO1\n+TCEB3\n+CLCN6\n+MRC2\n+NME1-NME2\n+TSPAN9\n+BTBD7\n+APBA3\n+MKS1\n+ABHD5\n+AKAP8L\n+MBTD1\n+UTP18\n+RNF216\n+TTC19\n+PTBP1\n+DPF1\n+SYT7\n+LARS2\n+PIK3C2A\n+PLAUR\n+ANLN\n+WIZ\n+RABGAP1\n+DCN\n+QPCTL\n+PPP5C\n+CEP68\n+MAP4K3\n+TYROBP\n+TMEM159\n+GABRA3\n+BRCA1\n+ERCC1\n+CD22\n+MBTPS2\n+PRICKLE3\n+LTF\n+EXTL3\n+ELOVL5\n+ALOX5\n+CALCOCO1\n+UBR7\n+MAP4K5\n+EHD3\n+PSMC4\n+MAN2B2\n+SLC7A14\n+CLDN11\n+SLC25A39\n+MVP\n+NUB1\n+PGM3\n+RWDD2A\n+CLK1\n+POLR3B\n+ANGEL1\n+RNF14\n+DNASE1L1\n+DDX11\n+HEBP1\n+GPRC5A\n+MAMLD1\n+CD6\n+TACC3\n+UFL1\n+POLA2\n+ZC3H3\n+CAPN1\n+ACPP\n+MDH1\n+SLC30A9\n+MTMR11\n+COX15\n+CCDC88C\n+YAF2\n+WAS\n+DPEP1\n+BID\n+MATR3\n+NPC1L1\n+NUDCD3\n+ISL1\n+CHDH\n+IL20RA\n+CLCA1\n+CLCA4\n+GLT8D1\n+ATP2C1\n+IGF1\n+SLC38A5\n+RALBP1\n+RUFY3\n+CNTN1\n+SLC11A1\n+WWTR1\n+AGPS\n+CXorf56\n+ATP1A2\n+TTC27\n+ZNF582\n+VSIG2\n+PHLDB1\n+MARCO\n+CYP24A1\n+PRDM11\n+SYT13\n+SNAI2\n+CD74\n+HGF\n+ZRANB1\n+NCDN\n+ZFP64\n+MNAT1\n+SAMD4A\n+RUNX3\n+MRE11A\n+SERPINB1\n+CYP3A43\n+SLC7A9\n+SPAST\n+NRXN3\n+OSBPL5\n+CPS1\n+C8B\n+FHL1\n+RTFDC1\n+GABRA1\n+SLC45A4\n+RNF10\n+ZNF839\n+ZDHHC6\n+GRAMD1B\n+RNH1\n+NDUFS1\n+RB1CC1\n+ERP44\n+ALAS1\n+BIRC3\n+AKAP11\n+GLRX2\n+STRAP\n+ABCC2\n+DEF6\n+GCLM\n+UBR2\n+EHD2\n+DEPDC1\n+CCDC28A\n+RRAGD\n+HSF2\n+PHF20\n+HSD17B6\n+NR1H3\n+TYMP\n+NCAPH2\n+TOMM34\n+SEC63\n+KPNA6\n+VIM\n+FAS\n+RNASET2\n+CD44\n+KCNG1\n+AGPAT4\n+SLAMF7\n+BTN3A1\n+MIPEP\n+PRKCH\n+IFNGR1\n+B4GALT7\n+VRK2\n+TNFRSF1B\n+VEZT\n+POU2F2\n+BRD9\n+SNX1\n+TBPL1\n+ARNTL2\n+BCLAF1\n+SLC39A9\n+ANK1\n+TFB1M\n+RABEP1\n+HMGB3\n+BAK1\n+IKZF2\n+GRN\n+FAM13B\n+ARHGAP31\n+CENPQ\n+SARS\n+RANBP3\n+TSSC1\n+PNPLA6\n+ALG1\n+ZCCHC8\n+ABCF2\n+CHPF2\n+FUT8\n+UBA6\n+GAB2\n+ATP6V0A1\n+PIAS1\n+SLC4A7\n+APBA2\n+MAP2K3\n+EFCAB1\n+ASTE1\n+RNF19A\n+PEX3\n+GABARAPL2\n+MYOC\n+SH3YL1\n+FAM136A\n+VCL\n+DEPDC1B\n+NSMAF\n+ADSS\n+STAP1\n+TIMP2\n+RFC1\n+TBC1D23\n+CUL3\n+MYOM2\n+OTC\n+ZZZ3\n+SLC18A1\n+USP2\n+CASR\n+TUBG2\n+RPL26L1\n+NSUN2\n+FBXO42\n+MFAP3\n+MRI1\n+METTL1\n+AGA\n+PI4K2B\n+BOD1L1\n+MAT2B\n+EDC4\n+TRIO\n+VCAN\n+CLEC16A\n+MSR1\n+CDH1\n+SKIV2L2\n+DNAH5\n+ZFYVE16\n+FAM65A\n+C6\n+RAI14\n+SOX30\n+PNKP\n+BEST2\n+PHLPP2\n+SPDL1\n+STAU2\n+PQLC2\n+PHF23\n+CDH10\n+INPP4A\n+RAB27B\n+PSMA4\n+LSG1\n+PARP3\n+TNC\n+THAP3\n+FAM65C\n+AIFM2\n+C2orf83\n+SPATA7\n+RETSAT\n+CAPG\n+ZPBP\n+TG\n+BARX2\n+DCUN1D1\n+JADE2\n+LCP2\n+TRIT1\n+ADRB1\n+CUL7\n+CTNNA1\n+PHKA2\n+CNTLN\n+EPHA3\n+HSPA5\n+DSG2\n+GEMIN8\n+OFD1\n+GPM6B\n+MAGEC2\n+PREX2\n+WDR37\n+YTHDC2\n+CTPS2\n+ATP6V1H\n+POLR2B\n+FAM214A\n+ARAP2\n+TPR\n+CP\n+KIAA0556\n+DTNBP1\n+XK\n+C12orf4\n+SCML1\n+WWC3\n+ARHGAP6\n+FAM184B\n+MAP4\n+GOPC\n+USP28\n+HDAC9\n+NOP16\n+CC2D2A\n+RRM2B\n+ZNF800\n+SNX29\n+MRPS10\n+GUCA1A\n+RSF1\n+VPS'..b'47A3\n+CLEC2L\n+ZBED5\n+GAGE12F\n+CT47A1\n+TAS2R39\n+TSPY10\n+CT47B1\n+ZNF853\n+ARHGEF38\n+USP17L8\n+B3GNT9\n+CDKN2AIPNL\n+CKMT1B\n+PATE4\n+RGL2\n+KIFC1\n+C2orf74\n+GAGE12C\n+AMY1A\n+CT47A5\n+TSPY9P\n+LRRC37A2\n+C9orf69\n+PAGE2B\n+TXNDC5\n+RNF103\n+RBM14\n+KLHL41\n+C2orf61\n+NME1\n+TNFSF12\n+CDRT4\n+APOBEC3G\n+WBP1\n+MRPS17\n+DEFA3\n+GET4\n+C1orf226\n+ADSL\n+LILRA4\n+TEX35\n+AMY2B\n+LY6G5B\n+PSMB9\n+PCDHGC3\n+COX19\n+DEFA1B\n+ACAD11\n+PPIL3\n+TNFRSF13B\n+AQP1\n+ISY1\n+PNMA2\n+ARHGEF25\n+TMEM189\n+RDH14\n+MIF\n+NSUN6\n+HLA-DOB\n+YAE1D1\n+CRCP\n+RPL36A\n+PDXP\n+EGFL8\n+ATP5J2\n+SSX2\n+ARPC1A\n+ATP5O\n+PLEKHO2\n+C8orf58\n+PISD\n+HOGA1\n+PWP2\n+PI4KA\n+AMACR\n+TCP10L\n+ARFGAP3\n+PEG10\n+CT47A2\n+EIF6\n+RBMY1E\n+MRPL20\n+ARPIN\n+SERPINB10\n+HLA-DMB\n+RGAG4\n+AP5Z1\n+RBMY1B\n+EIF4EBP3\n+PSG11\n+PSG4\n+MRPL33\n+MICAL3\n+STON1\n+PGBD3\n+PRAF2\n+EFNA4\n+C4orf48\n+NAT6\n+AMY2A\n+TNFRSF6B\n+UPK3B\n+IL10RB\n+CFB\n+ZNF487\n+WDR92\n+NME2\n+LEFTY1\n+CFAP57\n+NPIPB5\n+TTC4\n+ZMYM6NB\n+JMJD7\n+APOBEC3D\n+MRPS6\n+ZNF512\n+GSTA1\n+ACY1\n+NFS1\n+DDOST\n+TMEM199\n+GSTA2\n+P2RY11\n+TMEM141\n+XXbac-BPG116M5.17\n+DBNDD2\n+RBMY1D\n+ETV5\n+CFHR1\n+RBM12\n+SCARF2\n+APOBEC3C\n+CCDC13\n+ASPRV1\n+C4A\n+HBB\n+CRYBB2\n+N4BP2L2\n+CEBPA\n+H2AFJ\n+PGAM5\n+TWF2\n+MARS2\n+USP51\n+BCKDHA\n+INSL3\n+ADH1C\n+CDK11B\n+ABHD14A\n+RBM14-RBM4\n+FMN1\n+ATP5J2-PTCD1\n+HAUS5\n+TMEM150C\n+NAIP\n+SPATS1\n+HS3ST5\n+C15orf38-AP3S2\n+YJEFN3\n+XXbac-BPG246D15.9\n+KIAA1456\n+SMIM20\n+KIAA1210\n+RP5-877J2.1\n+CHCHD10\n+CDK3\n+GPR162\n+IQCJ-SCHIP1\n+SEPP1\n+PRODH2\n+ZNF674\n+ZNF345\n+SHANK3\n+ZNF550\n+PCDHA12\n+RGS21\n+CTC-534A2.2\n+CCDC71L\n+HOXA10\n+SERPINE3\n+C1orf210\n+SMIM18\n+ALG11\n+PRKDC\n+UTP14C\n+ZNF260\n+LYN\n+PINX1\n+PCDHGB7\n+NPIPB11\n+LRRC24\n+SIGLEC14\n+AP5B1\n+CHMP4A\n+ARMS2\n+INS\n+RP11-468E2.1\n+EEF1G\n+CKLF-CMTM1\n+SLC22A18AS\n+NPIPA2\n+BORCS8\n+RP11-872D17.8\n+DPP3\n+ANKHD1-EIF4EBP3\n+SAA2-SAA4\n+CHMP1B\n+EID3\n+FXYD6-FXYD2\n+EID1\n+HCAR3\n+CARD18\n+FDXACB1\n+CTC-435M10.3\n+CYP2A6\n+CTSO\n+DYX1C1\n+ZNF432\n+HMBS\n+ASIC5\n+POLG2\n+ZNF350\n+PGA5\n+HP\n+KIAA1147\n+LSM14A\n+TAS2R38\n+MGAM\n+FNTB\n+CNPY2\n+MGAM2\n+CUX1\n+MAP1LC3B2\n+CHURC1\n+C17orf49\n+PDF\n+RNASE12\n+RP11-574F21.3\n+SPESP1\n+BCL2L2-PABPN1\n+SYNJ2BP-COX16\n+ERCC6-PGBD3\n+DUXA\n+CEP95\n+RP11-407N17.3\n+TUBB3\n+TSPY1\n+NDUFC2-KCTD14\n+SMIM6\n+RP11-298I3.5\n+ITGB3\n+ZHX1-C8orf76\n+GH1\n+THTPA\n+MRPL46\n+RBM15B\n+HOXB7\n+FRRS1L\n+MRC1\n+C16orf95\n+RP11-529K1.3\n+CCPG1\n+SULT1A3\n+TMEM178B\n+EPPK1\n+BOP1\n+PECAM1\n+GAN\n+C15orf65\n+HPR\n+ISY1-RAB43\n+CORO7\n+RP13-1032I1.10\n+MRPL12\n+C19orf84\n+FAM58A\n+GTF2I\n+ZNF234\n+XXbac-BPG32J3.22\n+MYZAP\n+SRSF8\n+IKBKE\n+MSMB\n+NBPF11\n+MYH4\n+OTUD7B\n+GJA5\n+RBP3\n+RBM8A\n+TIMM23\n+RNF115\n+AP000275.65\n+RPL17\n+FSBP\n+TXNIP\n+SRGAP2\n+RASSF5\n+STRADA\n+NBPF15\n+NCOA4\n+GDF10\n+AARSD1\n+AC006538.4\n+UPK3BL\n+APOC4\n+ZNF285\n+S1PR2\n+CGB1\n+FDX1L\n+ZNF224\n+NDUFA7\n+AC011513.3\n+SSX4\n+NBPF12\n+SLC6A14\n+SMIM17\n+FAM156A\n+SSX2B\n+AC018755.18\n+CTAG1A\n+CTD-2207O23.3\n+CTD-3105H18.16\n+CT45A1\n+CTD-2521M24.10\n+CALR3\n+CT45A3\n+TRABD2B\n+IKBKG\n+SPIB\n+CT45A10\n+NBPF9\n+SSX4B\n+COMMD3-BMI1\n+C7orf55-LUC7L2\n+HIST2H4B\n+NBPF14\n+TAF15\n+HSPE1-MOB4\n+GAS2L2\n+RPS10-NUDT3\n+HIST2H4A\n+RASL10B\n+CT45A9\n+AC240274.1\n+NBPF10\n+CT45A2\n+LIX1L\n+FAM231D\n+HIST2H2AA4\n+NUDT3\n+POM121C\n+FAM47E-STBD1\n+CFAP206\n+DOC2B\n+DCP1A\n+C2orf15\n+GRIN2B\n+NBPF26\n+ZBTB8B\n+HIST1H4K\n+CWC25\n+SMIM11B\n+CT45A7\n+HIST1H2BM\n+AL592183.1\n+CYFIP1\n+HIST1H2BG\n+NOL12\n+HIST1H3G\n+HIST1H3B\n+GAGE13\n+CBSL\n+ADRA2B\n+HIST1H2BE\n+TPTE\n+TBC1D3L\n+WBSCR16\n+RIMBP3B\n+HIST1H4F\n+HIST1H2BO\n+CCL23\n+HIST1H3E\n+F8A2\n+TBC1D3B\n+HIST1H2AH\n+SYNRG\n+NUDT18\n+GAGE2E\n+HIST1H4L\n+LENG9\n+HIST1H2AK\n+AC171558.2\n+CCL4\n+SGK223\n+HIST1H3I\n+FCGBP\n+HNF1B\n+HIST1H4G\n+AATF\n+HIST1H2BH\n+HIST1H3A\n+LYZL6\n+RP11-449H3.3\n+RIMBP3\n+ARHGAP23\n+TUBGCP5\n+U2AF1L5\n+PRSS2\n+DUSP14\n+ORAI1\n+HIST1H4I\n+PIK3R6\n+PIP4K2B\n+AC004556.1\n+HIST1H2AJ\n+CCL14\n+HIST1H2BB\n+DACH1\n+HIST1H2AL\n+RP4-608O15.3\n+HIST1H4E\n+HIST1H2AE\n+TYW1B\n+F8A3\n+HIST1H4D\n+AC007325.2\n+F8A1\n+HIST1H2BF\n+SRCIN1\n+MARCKS\n+ZNF670\n+GPIHBP1\n+RP13-347D8.7\n+NEFL\n+ABC7-42404400C24.1\n+HIST1H3F\n+PSMB3\n+CISD3\n+CT45A8\n+ZNF8\n+SSTR3\n+MYO19\n+HIST1H3C\n+CT45A6\n+TBC1D3C\n+GGNBP2\n+HIST1H2AB\n+DHRS11\n+ACACA\n+HIST1H2BI\n+MRM1\n+HIST1H4A\n+RP1-321E8.5\n+HIST1H2AM\n+HIST1H4B\n+HIST1H3H\n+MRPL45\n+AC006449.2\n+AC090498.1\n+CH507-9B2.3\n+U51561.1\n+AL137860.1\n+PAGR1\n\\ No newline at end of file\n'
b
diff -r 000000000000 -r 3155d867c056 __init__.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/__init__.py Fri Sep 15 11:04:37 2017 -0400
b
@@ -0,0 +1,1 @@
+
b
diff -r 000000000000 -r 3155d867c056 hpa_tissue_distribution.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/hpa_tissue_distribution.py Fri Sep 15 11:04:37 2017 -0400
[
@@ -0,0 +1,166 @@
+import argparse
+import re
+
+def options():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input",nargs="+", required=True, help="List of IDs")
+    parser.add_argument("--hpa", required=True, help="HPA file")
+    parser.add_argument("--tissues_del", required=True, help="List of tissues which expressed genes in are discarded")
+    parser.add_argument("--tissues_keep", help="List of tissues to keep regardless being expressed in list tissues_del..")
+    parser.add_argument("-o", "--output", default="HPA_selection.txt")
+    parser.add_argument("--trash", default="Trash.txt", help="Write filtered genes into a file")
+    parser.add_argument("--trash_file_detail", default="Trash_detail.txt", help="Write filtered genes with detailed information into a file")
+    parser.add_argument("--na_file", default="NaN.txt", help="Write genes whose name not found in HPA file")
+    parser.add_argument("--ncol", default="None", help="Number of column to filter")
+
+    args = parser.parse_args()
+    #print(args.mq, args.hpa, args.tissues_del, args.tissues_keep, args.output, args.trash, args.trash_file_detail)
+
+    filterHPA(args.input, args.hpa, args.tissues_del, args.tissues_keep, args.output, args.trash, args.trash_file_detail, args.na_file)
+    
+def isnumber(format, n):
+    float_format = re.compile("^[\-]?[1-9][0-9]*\.?[0-9]+$")
+    int_format = re.compile("^[\-]?[1-9][0-9]*$")
+    test = ""
+    if format == "int":
+        test = re.match(int_format, n)
+    elif format == "float":
+        test = re.match(float_format, n)
+    if test:
+        return True
+    else:
+        return False
+    
+def readHPA(HPAfile, tissues_del, tissues_keep):
+    # Read HPA file:
+    hpa = open(HPAfile, "r")
+    hpa = hpa.readlines()
+    # Extract tissues genes lists
+    tdel_dict = {}
+    tissues_del = tissues_del.split(",")
+    print("List of tissues to del", tissues_del)
+    tkeep_dict = {}
+    tissues_keep = tissues_keep.split(",")
+    print("List of tissues to keep", tissues_keep)
+    for line in hpa[1:]:
+        name = line.replace('"', "").split(",")[1]
+        tissue = line.replace('"', "").split(",")[2]
+        for t in tissues_del:
+            if tissue == t:
+                if t not in tdel_dict:
+                    tdel_dict[t] = [name]
+                else:
+                    if name not in tdel_dict[t]:
+                        tdel_dict[t].append(name)
+        for k in tissues_keep:
+            if tissue == k:
+                if k not in tkeep_dict:
+                    tkeep_dict[k] = [name]
+                else:
+                    if name not in tkeep_dict[k]:
+                        tkeep_dict[k].append(name)
+    
+    return tdel_dict, tkeep_dict
+
+def filterHPA(input, HPAfile, tissues_del, tissues_keep, output, trash_file, trash_file_detail, na_file, ncol):
+
+    if input[1] == "list":
+        content = input.split()
+    else if input.split(",")[1] == "file":
+        filename = input.split(",")[0]
+        file = open(filename, "r")
+        file_content = file.readlines()
+        file.close()
+        if header == "true":
+            header = file_content[0]
+            content = file_content[1:]
+        else:
+            header = ""
+            content = file_content[:]
+
+    # Remove empty lines
+    [content.remove(blank) for blank in content if blank.isspace()]
+
+    # Read HPA file
+    hpa = open(HPAfile, "r")
+    hpa = hpa.readlines()
+
+    # Get dictionary of tissues : genes
+    tdel_dict, tkeep_dict = readHPA(HPAfile, tissues_del, tissues_keep)
+    #print("Dictionary of tissue:genes to del", tdel_dict)
+    #print("Dictionary of tissue:genes to keep", tkeep_dict)
+
+    # Extract gene names and protein ids column number
+    print(ncol.replace("c", ""))
+    if isnumber("int", ncol.replace("c", "")):
+        gene_names_index = int(ncol.replace("c", "")) - 1
+        print(gene_names_index, type(gene_names_index))
+        for i in range(len(column_names)):
+            if column_names[i] == "Majority protein IDs":
+                prot_id_index = i
+        if prot_id_index == "":
+            raise ValueError("Could not find 'Majority protein IDs' column")
+    else:
+        raise ValueError("Please fill in the right format of column number")
+
+    # Filter
+    string = mq[0].rstrip()
+    string = string.replace("^M", "") + "\t" + "Filtered" + "\n"
+    filtered_genes = []
+    filtered_prots = []
+    na_genes = []
+    #print(len(mq))
+    for line in mq[1:]:
+        prot_string = line.rstrip() + "\t" 
+        line = line.split("\t")
+        name = line[gene_names_index].split(";")[0].replace('"', "")
+        prot = line[prot_id_index].split(";")[0].replace('"', "")
+
+        if name == "":
+            prot_string += "NaN - No gene name" + "\n"
+            string += prot_string
+        else:
+            tissue = sorted(set([t.split(",")[2].replace('"', "") for t in hpa if name in t]))
+
+            if all (name not in genes for genes in tdel_dict.values()):       
+                if len(tissue) != 0:
+                    print("Not in del list", name, len(tissue))
+                    prot_string += ",".join(tissue) + "\n"
+                    string += prot_string
+                else:
+                    print("No tissue information", name)
+                    prot_string += "NaN - no tissue information" + "\n"
+                    string += prot_string
+                    na_genes.append(name)
+            else:
+                if all (name not in genes for genes in tkeep_dict.values()):
+                    print("In del list only", name)
+                    filtered_genes.append(name)
+                    filtered_prots.append(prot)
+                else:
+                    print("In both del and keep", name, len(tissue))
+                    prot_string += ",".join(tissue) + "\n"
+                    string += prot_string
+
+    # Generate output file
+    output = open(output, "w")
+    output.write(string)
+
+    # Generate file of unknown gene name
+    na_file = open(na_file, "w")
+    na_file.write("\n".join(na_genes))
+        
+    # Generate trash files
+    output_trash = open(trash_file, "w")
+    output_trash.write("\n".join(filtered_prots))
+
+    output_trash_detail = open(trash_file_detail, "w")
+    print("Deleted genes", filtered_genes)
+    for gene in filtered_genes:
+        lines = [line for line in hpa if gene in line]
+        output_trash_detail.write("".join(lines))
+
+if __name__ == "__main__":
+    options()
+
+# python biofilter2.py --mq ../proteinGroups_Maud.txt --hpa /db/proteinatlas/normal_tissue.csv --tissues_del "retina" --tissues_keep "tonsil" --trash "Trash3.txt" --trash_file_detail "Trash_detail3.txt" -o test-data/output3.txt --na_file "Unknown.txt"
b
diff -r 000000000000 -r 3155d867c056 hpa_tissue_distribution.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/hpa_tissue_distribution.xml Fri Sep 15 11:04:37 2017 -0400
[
b'@@ -0,0 +1,187 @@\n+<tool id="biofilter" name="Retrieve tissue/cell distribution (resource: Human Protein Atlas)" version="0.1.0">\n+    <description>Filter by tissue name (using Human Protein Atlas resource)\n+    </description>\n+    <requirements>\n+    </requirements>\n+    <stdio>\n+        <exit_code range="1:" />\n+    </stdio>\n+    <command><![CDATA[\n+        python $__tool_directory__/hpa_tissue_distribution.py \n+        --input\n+        #if $input.input == "list"\n+            "$input.list"\n+        else if $inout.input == "file"\n+            "$input.file,$input.header,$input.ncol"\n+\t    #end if\n+        --hpa "$proteinatlas.value"\n+        -o "$hpa_output"\n+        --tissues_del "$opt_del.tdel"\n+        --trash "$trash_output"\n+        --trash_file_detail "$trash_detail_output"\n+        --na_file "$na_file"\n+        #if $opt_keep.tkeep:\n+            --tissues_keep "$opt_keep.tkeep"\n+        #end if\n+    ]]></command>\n+    <inputs>\n+\t    <conditional name="input" >\n+            <param type="select" name="input" label="Input" >\n+                <option value="list">Copy/paste your list of IDs </option>\n+                <option value="file">Choose a multiple-columns file</option>\n+                        \n+            </param>\n+            <when value="file">\n+                <param type="data" name="file" format="txt,tabular" label="Choose a multiple-columns file" help="Input file is a tab-delimited file containing different information of proteins, such as an output of MaxQuant software" />\n+                <param name="header" type="boolean" checked="true" truevalue="true" falsevalue="false" label="Does your input file contain header?" />\n+\t\t        <param type="text" name="ncol" value="c1" label="Please specify the column where you would like to apply the comparison" help =\'For example, fill in "c1" if you want to filter the first column\' />\n+            </when>\n+            <when value="list">\n+                <param type="text" name="list" label="Copy/paste your list of IDs " />\n+            </when>\n+        </conditional>\n+        <param name="proteinatlas" type="select" label="Human Protein Atlas" >\n+            <options from_file="proteinatlas.loc" selected="True" >\n+\t\t        <column name="name" index="1" />\n+\t\t        <column name="value" index="2" />\n+\t\t        <filter type="remove_value" meta_ref="proteinatlas" key="name" value="Full Human Protein Atlas" />\n+\t        </options>\n+\t    </param>\n+        <section name="opt_del" title="Choose tissues where expressed genes need to be discarded" expanded="True">\n+            <param name="tdel" type="select" label="Choose tissues where expressed genes need to be discarded" multiple="True" display="checkboxes">\n+                <option value="adrenal gland" >Adrenal gland</option>\n+                <option value="appendix" >Appendix</option>\n+                <option value="bone marrow" >Bone marrow</option>\n+                <option value="breast" >Breast</option>\n+                <option value="bronchus" >Bronchus</option>\n+                <option value="caudate" >Caudate</option>\n+                <option value="cerebellum" >Cerebellum</option>\n+                <option value="cerebral cortex" >Cerebral cortex</option>\n+                <option value="cervix" >Cervix</option>\n+                <option value="colon" >Colon</option>\n+                <option value="duodenum" >Duodenum</option>\n+                <option value="endometrium 1" >Endometrium 1</option>\n+                <option value="endometrium 2" >Endometrium 2</option>\n+                <option value="epididymis" >Epididymis</option>\n+                <option value="esophagus" >Esophagus</option>\n+                \n+                <option value="fallopian tube" >Fallopian tube</option>\n+                <option value="gallbladder" >Gallbladder</option>\n+                \n+                <option value="heart muscle" >Heart muscle</option>\n+                <option value="hippocampus" >Hippocampus</option>\n+             '..b'     <option value="bone marrow" >Bone marrow</option>\n+                <option value="breast" >Breast</option>\n+                <option value="bronchus" >Bronchus</option>\n+                <option value="caudate" >Caudate</option>\n+                <option value="cerebellum" >Cerebellum</option>\n+                <option value="cerebral cortex" >Cerebral cortex</option>\n+                <option value="cervix" >Cervix</option>\n+                <option value="colon" >Colon</option>\n+                <option value="duodenum" >Duodenum</option>\n+                <option value="endometrium 1" >Endometrium 1</option>\n+                <option value="endometrium 2" >Endometrium 2</option>\n+                <option value="epididymis" >Epididymis</option>\n+                <option value="esophagus" >Esophagus</option>\n+                \n+                <option value="fallopian tube" >Fallopian tube</option>\n+                <option value="gallbladder" >Gallbladder</option>\n+                \n+                <option value="heart muscle" >Heart muscle</option>\n+                <option value="hippocampus" >Hippocampus</option>\n+            \n+                <option value="kidney" >Kidney</option>\n+       \n+                <option value="liver" >Liver</option>\n+                <option value="lung" >Lung</option>\n+                <option value="lymph node" >Lymph node</option>\n+                <option value="nasopharynx" >Nasopharynx</option>\n+                <option value="oral mucosa" >Oral mucosa</option>\n+                <option value="ovary" >Ovary</option>\n+                <option value="pancreas" >Pancreas</option>\n+                <option value="parathyroid gland" >Parathyroid gland</option>\n+                \n+                <option value="placenta" >Placenta</option>\n+                <option value="prostate" >Prostate</option>\n+                <option value="rectum" >Rectum</option>\n+            \n+                <option value="salivary gland" >Salivary gland</option>\n+                <option value="seminal vesicle" >Seminal vesicle</option>\n+                <option value="skeletal muscle" >Skeletal muscle</option>\n+                <option value="skin 1" >Skin 1</option>\n+                <option value="skin 2" >Skin 2</option>\n+            \n+                <option value="small intestine" >Small intestine</option>\n+                <option value="smooth muscle" >Smooth muscle</option>\n+                <option value="soft tissue 1" >Soft tissue 1</option>\n+                <option value="soft tissue 2" >Soft tissue 2</option>\n+                <option value="spleen" >Spleen</option>\n+                <option value="stomach 1" >Stomach 1</option>\n+                <option value="stomach 2" >Stomach 2</option>\n+                <option value="testis" >Testis</option>\n+                <option value="thyroid gland" >Thyroid gland</option>\n+                <option value="tonsil" >Tonsil</option>\n+                <option value="urinary bladder" >Urinary bladder</option>\n+                <option value="vagina" >Vagina</option>\n+            </param>\n+        </section>\n+    </inputs>\n+    <outputs>\n+        <data name="hpa_output" format="txt" label="HPA selection from ${input1.name}" />\n+        <data name="trash_detail_output" format="txt" label="HPA information of excluded proteins from ${input1.name}" />\n+        <data name="trash_output" format="txt" label="Excluded protein from ${input1.name}" />\n+        <data name="na_file" format="txt" label="Genes without tissues information" />\n+    </outputs>\n+    <help><![CDATA[\n+This tool filters the proteins according to their tissue(s) of origin.\n+\n+**Input**\n+List of protein IDs (UniProt IDs) in text/tabular format\n+\n+**Option**\n+Firstly, you can choose tissue(s) that genes expressed in need to be discarded. If among these discarded genes, you want to keep genes from some tissue(s), you can choose them in the second list.\n+For example, TODO\n+    ]]></help>\n+    <citations>\n+    </citations>\n+</tool>\n'
b
diff -r 000000000000 -r 3155d867c056 normal_tissue.csv
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/normal_tissue.csv Fri Sep 15 11:04:37 2017 -0400
b
b'@@ -0,0 +1,1031836 @@\n+"Gene","Gene name","Tissue","Cell type","Level","Reliability"\n+"ENSG00000000003","TSPAN6","adrenal gland","glandular cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","appendix","glandular cells","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","appendix","lymphoid tissue","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","bone marrow","hematopoietic cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","breast","adipocytes","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","breast","glandular cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","breast","myoepithelial cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","bronchus","respiratory epithelial cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","caudate","glial cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","caudate","neuronal cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","cerebellum","cells in granular layer","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","cerebellum","cells in molecular layer","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","cerebellum","Purkinje cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","cerebral cortex","endothelial cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","cerebral cortex","glial cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","cerebral cortex","neuronal cells","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","cerebral cortex","neuropil","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","cervix, uterine","glandular cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","cervix, uterine","squamous epithelial cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","colon","endothelial cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","colon","glandular cells","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","colon","peripheral nerve/ganglion","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","duodenum","glandular cells","Low","Uncertain"\n+"ENSG00000000003","TSPAN6","endometrium 1","cells in endometrial stroma","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","endometrium 1","glandular cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","endometrium 2","cells in endometrial stroma","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","endometrium 2","glandular cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","epididymis","glandular cells","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","esophagus","squamous epithelial cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","fallopian tube","glandular cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","gallbladder","glandular cells","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","heart muscle","myocytes","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","hippocampus","glial cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","hippocampus","neuronal cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","kidney","cells in glomeruli","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","kidney","cells in tubules","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","liver","bile duct cells","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","liver","hepatocytes","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","lung","macrophages","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","lung","pneumocytes","Low","Uncertain"\n+"ENSG00000000003","TSPAN6","lymph node","germinal center cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","lymph node","non-germinal center cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","nasopharynx","respiratory epithelial cells","High","Uncertain"\n+"ENSG00000000003","TSPAN6","oral mucosa","squamous epithelial cells","Medium","Uncertain"\n+"ENSG00000000003","TSPAN6","ovary","ovarian stroma cells","Not detected","Uncertain"\n+"ENSG00000000003","TSPAN6","pancreas","exocrine glandular cells","Medium","Uncertain"\n+"ENSG00000000003"'..b'0283027","CAPS","kidney","cells in tubules","Not detected","Supportive"\n+"ENSG00000283027","CAPS","liver","bile duct cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","liver","hepatocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","lung","macrophages","Not detected","Supportive"\n+"ENSG00000283027","CAPS","lung","pneumocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","lymph node","germinal center cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","lymph node","non-germinal center cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","nasopharynx","respiratory epithelial cells","Low","Supportive"\n+"ENSG00000283027","CAPS","oral mucosa","squamous epithelial cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","ovary","follicle cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","ovary","ovarian stroma cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","pancreas","exocrine glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","pancreas","islets of Langerhans","Not detected","Supportive"\n+"ENSG00000283027","CAPS","parathyroid gland","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","placenta","trophoblastic cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","prostate","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","rectum","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","salivary gland","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","seminal vesicle","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","skeletal muscle","myocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","skin 1","fibroblasts","Not detected","Supportive"\n+"ENSG00000283027","CAPS","skin 1","keratinocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","skin 1","Langerhans","Not detected","Supportive"\n+"ENSG00000283027","CAPS","skin 1","melanocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","skin 2","epidermal cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","small intestine","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","smooth muscle","smooth muscle cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 1","adipocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 1","chondrocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 1","fibroblasts","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 1","peripheral nerve","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 2","adipocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 2","chondrocytes","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 2","fibroblasts","Not detected","Supportive"\n+"ENSG00000283027","CAPS","soft tissue 2","peripheral nerve","Not detected","Supportive"\n+"ENSG00000283027","CAPS","spleen","cells in red pulp","Not detected","Supportive"\n+"ENSG00000283027","CAPS","spleen","cells in white pulp","Not detected","Supportive"\n+"ENSG00000283027","CAPS","stomach 1","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","stomach 2","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","testis","cells in seminiferous ducts","Not detected","Supportive"\n+"ENSG00000283027","CAPS","testis","Leydig cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","thyroid gland","glandular cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","tonsil","germinal center cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","tonsil","non-germinal center cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","tonsil","squamous epithelial cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","urinary bladder","urothelial cells","Not detected","Supportive"\n+"ENSG00000283027","CAPS","vagina","squamous epithelial cells","Not detected","Supportive"\n'
b
diff -r 000000000000 -r 3155d867c056 sort_by_tissue.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/sort_by_tissue.py Fri Sep 15 11:04:37 2017 -0400
[
@@ -0,0 +1,162 @@
+import argparse
+import re
+
+def options():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input",nargs="+", required=True, help="MaxQuant file")
+    parser.add_argument("--hpa", required=True, help="HPA file")
+    parser.add_argument("--tissues_del", required=True, help="List of tissues which expressed genes in are discarded")
+    parser.add_argument("--tissues_keep", help="List of tissues to keep regardless being expressed in list tissues_del..")
+    parser.add_argument("-o", "--output", default="HPA_selection.txt")
+    parser.add_argument("--trash", default="Trash.txt", help="Write filtered genes into a file")
+    parser.add_argument("--trash_file_detail", default="Trash_detail.txt", help="Write filtered genes with detailed information into a file")
+    parser.add_argument("--na_file", default="NaN.txt", help="Write genes whose name not found in HPA file")
+
+    args = parser.parse_args()
+    #print(args.mq, args.hpa, args.tissues_del, args.tissues_keep, args.output, args.trash, args.trash_file_detail)
+
+    filterHPA(args.input, args.hpa, args.tissues_del, args.tissues_keep, args.output, args.trash, args.trash_file_detail, args.na_file)
+    
+def isnumber(format, n):
+    # Check if an element is integer or float number
+    float_format = re.compile("^[\-]?[1-9][0-9]*\.?[0-9]+$")
+    int_format = re.compile("^[\-]?[1-9][0-9]*$")
+    test = ""
+    if format == "int":
+        test = re.match(int_format, n)
+    elif format == "float":
+        test = re.match(float_format, n)
+    if test:
+        return True
+    else:
+        return False
+    
+def readHPA(HPAfile, tissues_del, tissues_keep):
+    # Read HPA file
+    hpa = open(HPAfile, "r")
+    hpa = hpa.readlines()
+    # Extract lists of genes expressed in tissues to keep and in tissue to delete 
+    tdel_dict = {}
+    if tissues_del:
+        tissues_del = tissues_del.split(",")
+    else:
+        tissues_del = []
+    #print("List of tissues to del", tissues_del)
+    tkeep_dict = {}
+    if tissues_keep:
+        tissues_keep = tissues_keep.split(",")
+    else:
+        tissues_keep = []
+    #print("List of tissues to keep", tissues_keep)
+    for line in hpa[1:]:
+        ensg = line.replace('"', "").split(",")[0]
+        tissue = line.replace('"', "").split(",")[2]
+        for t in tissues_del:
+            if tissue == t:
+                if t not in tdel_dict:
+                    tdel_dict[t] = [ensg]
+                else:
+                    if ensg not in tdel_dict[t]:
+                        tdel_dict[t].append(ensg)
+        for k in tissues_keep:
+            if tissue == k:
+                if k not in tkeep_dict:
+                    tkeep_dict[k] = [ensg]
+                else:
+                    if ensg not in tkeep_dict[k]:
+                        tkeep_dict[k].append(ensg)
+    
+    return tdel_dict, tkeep_dict
+
+def filterHPA(input, HPAfile, tissues_del, tissues_keep, output, trash_file, trash_file_detail, na_file):
+    input_type = input[1]
+    if input_type == "file":
+        input_file = input[0]
+        header = input[2]
+        ncol = input[3]
+        file_content = open(input_file, "r").readlines()            
+        if isnumber("int", ncol.replace("c", "")):
+            if header == "true":
+                header = file_content[0]
+                content = file_content[1:] #[x.strip() for x in [line.split("\t")[int(ncol.replace("c", ""))-1].split(";")[0] for line in file_content[1:]]]     # take only first IDs
+            else:
+                header = ""
+                content = file_content[:] #[x.strip() for x in [line.split("\t")[int(ncol.replace("c", ""))-1].split(";")[0] for line in file_content]]     # take only first IDs
+                #print(file_content[1:13])
+            ncol = int(ncol.replace("c", "")) - 1
+        else:
+            raise ValueError("Please fill in the right format of column number")        
+    else:
+        print(input[0])
+        header = ""
+        content = input[0].split()
+
+    # Read HPA file
+    hpa = open(HPAfile, "r")
+    hpa = hpa.readlines()
+
+    # Get dictionary of tissues : genes
+    tdel_dict, tkeep_dict = readHPA(HPAfile, tissues_del, tissues_keep)
+    #print("Dictionary of tissue:genes to del", tdel_dict)
+    #print("Dictionary of tissue:genes to keep", tkeep_dict)
+
+    # Filter
+    string = header.strip() + "\t" + "Filtered" + "\n"
+    filtered_genes = []
+    filtered_lines = []
+    na_genes = []
+    #print(len(mq))
+    for l in content:
+        line_string = l.rstrip() + "\t" #.replace("^M", "")
+        if input_type == "file":
+            gene = l.split("\t")[ncol].split(";")[0].replace('"', "")
+        else:
+            gene = l
+        if gene == "":
+            line_string += "NA - No ENSG ID" + "\n"
+            string += line_string
+        elif gene == "NA":
+            line_string += "NA - No ENSG ID" + "\n"
+        else:
+            tissue = sorted(set([t.split(",")[2].replace('"', "") for t in hpa if gene in t]))
+            if all (gene not in genes for genes in tdel_dict.values()):       
+                if len(tissue) != 0:
+                    print("Not in del list", gene, len(tissue))
+                    line_string += ",".join(tissue) + "\n"
+                    string += line_string
+                else:
+                    print("No tissue information", gene)
+                    line_string += "NA - no tissue information" + "\n"
+                    string += line_string
+                    na_genes.append(gene)
+            else:
+                if all (gene not in genes for genes in tkeep_dict.values()):
+                    print("In del list only", gene)
+                    filtered_genes.append(gene)
+                    filtered_lines.append(l)
+                else:
+                    print("In both del and keep", gene, len(tissue))
+                    line_string += ",".join(tissue) + "\n"
+                    string += line_string
+
+    # Generate output file
+    output = open(output, "w")
+    output.write(string)
+
+    # Generate file of unknown gene name
+    na_file = open(na_file, "w")
+    na_file.write("\n".join(na_genes))
+        
+    # Generate trash files
+    output_trash = open(trash_file, "w")
+    output_trash.write("\n".join(filtered_lines))
+
+    output_trash_detail = open(trash_file_detail, "w")
+    print("Deleted genes", filtered_genes)
+    for gene in filtered_genes:
+        lines = [line for line in hpa if gene in line]
+        output_trash_detail.write("".join(lines))
+
+if __name__ == "__main__":
+    options()
+
b
diff -r 000000000000 -r 3155d867c056 sort_by_tissue.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/sort_by_tissue.xml Fri Sep 15 11:04:37 2017 -0400
[
b'@@ -0,0 +1,229 @@\n+<tool id="biofilter" name="Sort your proteins by tissue expression profiles (From HPA)" version="0.1.0">\n+    <description>selects/discards proteins according to their expression profiles (absence/presence) in a list of tissues/organs)\n+    </description>\n+    <requirements>\n+    </requirements>\n+    <stdio>\n+        <exit_code range="1:" />\n+    </stdio>\n+    <command><![CDATA[\n+        python $__tool_directory__/sort_by_tissue.py \n+        --input\n+        #if $input.ids == "text"\n+            "$input.txt" "list"\n+        #else if $input.ids == "file"\n+            "$input.file" "file" "$input.header" "$input.ncol" \n+\t    #end if\n+        --hpa "$__tool_directory__/normal_tissue.csv"\n+        -o "$hpa_output"\n+        --tissues_del "$opt_del.tdel"\n+        --trash "$trash_output"\n+        --trash_file_detail "$trash_detail_output"\n+        --na_file "$na_file"\n+        #if $opt_keep.tkeep:\n+            --tissues_keep "$opt_keep.tkeep"\n+        #end if\n+    ]]></command>\n+    <inputs>\n+        <conditional name="input" >\n+            <param name="ids" type="select" label="Please provide your ENSG identifiers" help="Copy/paste or ID list from a file (e.g. table)" >\n+                <option value="text">Copy/paste your identifiers</option>\n+                <option value="file">Input file containing your identifiers</option>\n+            </param>\n+            <when value="text" >\n+                <param name="txt" type="text" label="Copy/paste your identifiers" help=\'IDs must be separated by spaces into the form field, for example: P31946 P62258\' >\n+                    <sanitizer>\n+                        <valid initial="string.printable">\n+                            <remove value="&apos;"/>\n+                        </valid>\n+                        <mapping initial="none">\n+                            <add source="&apos;" target="__sq__"/>\n+                        </mapping>\n+                    </sanitizer>\n+                </param>\n+            </when>\n+            <when value="file" >\n+                <param name="file" type="data" format="txt,tabular" label="Choose a file that contains your list of IDs" help="" />\n+                <param name="header" type="boolean" checked="true" truevalue="true" falsevalue="false" label="Does your input file contain header?" />\n+                <param name="ncol" type="text" label="The column number of ENSG IDs" help=\'For example, fill in "c1" if it is the first column, "c2" if it is the second column and so on\' />                \n+            </when>\n+        </conditional>\n+        <section name="opt_del" title="Step 1: Eliminate from my list genes expressed in the following tissue(s)" expanded="True">\n+            <param name="tdel" type="select" label="Eliminate from my list genes expressed in the following tissue(s)" multiple="True" display="checkboxes">\n+                <option value="adrenal gland" >Adrenal gland</option>\n+                <option value="appendix" >Appendix</option>\n+                <option value="bone marrow" >Bone marrow</option>\n+                <option value="breast" >Breast</option>\n+                <option value="bronchus" >Bronchus</option>\n+                <option value="caudate" >Caudate</option>\n+                <option value="cerebellum" >Cerebellum</option>\n+                <option value="cerebral cortex" >Cerebral cortex</option>\n+                <option value="cervix" >Cervix</option>\n+                <option value="colon" >Colon</option>\n+                <option value="duodenum" >Duodenum</option>\n+                <option value="endometrium 1" >Endometrium 1</option>\n+                <option value="endometrium 2" >Endometrium 2</option>\n+                <option value="epididymis" >Epididymis</option>\n+                <option value="esophagus" >Esophagus</option>\n+                \n+                <option value="fallopian tube" >Fallopian tube</option>\n+                <option value="gallbladder" >Gallbladder</option>\n+             '..b'>\n+                \n+                <option value="placenta" >Placenta</option>\n+                <option value="prostate" >Prostate</option>\n+                <option value="rectum" >Rectum</option>\n+            \n+                <option value="salivary gland" >Salivary gland</option>\n+                <option value="seminal vesicle" >Seminal vesicle</option>\n+                <option value="skeletal muscle" >Skeletal muscle</option>\n+                <option value="skin 1" >Skin 1</option>\n+                <option value="skin 2" >Skin 2</option>\n+            \n+                <option value="small intestine" >Small intestine</option>\n+                <option value="smooth muscle" >Smooth muscle</option>\n+                <option value="soft tissue 1" >Soft tissue 1</option>\n+                <option value="soft tissue 2" >Soft tissue 2</option>\n+                <option value="spleen" >Spleen</option>\n+                <option value="stomach 1" >Stomach 1</option>\n+                <option value="stomach 2" >Stomach 2</option>\n+                <option value="testis" >Testis</option>\n+                <option value="thyroid gland" >Thyroid gland</option>\n+                <option value="tonsil" >Tonsil</option>\n+                <option value="urinary bladder" >Urinary bladder</option>\n+                <option value="vagina" >Vagina</option>\n+            </param>\n+        </section>\n+    </inputs>\n+    <outputs>\n+        <data name="hpa_output" format="tabular" label="HPA selection" />\n+        <data name="trash_detail_output" format="tabular" label="HPA information of excluded genes" />\n+        <data name="trash_output" format="tabular" label="Excluded genes ID" />\n+        <data name="na_file" format="tabular" label="Genes without tissues information" />\n+    </outputs>\n+    <tests>\n+        <test>\n+            <conditional name="input">\n+                <param name="ids" value="file" />\n+                <param name="file" value="IDs.txt" />\n+                <param name="header" value="true" />\n+                <param name="ncol" value="c3" />\n+            </conditional>\n+            <section name="opt_del">\n+                <param name="tdel" value="salivary gland" />\n+            </section>\n+            <section name="opt_keep">\n+                <param name="tkeep" value="lung" />\n+            </section>\n+            <output name="hpa_output" file="sort_by_tissue_output.txt" />\n+            <output name="trash_detail_output" file="trash_detail_sort_by_tissue.txt" />\n+            <output name="trash_output" file="trash_sort_by_tissue.txt" />\n+            <output name="na_file" file="na_sort_by_tissue.txt" />\n+        </test>\n+    </tests>\n+    <help><![CDATA[\n+This tool filters the proteins according to their tissue(s) of origin using Human Protein Atlas (http://www.proteinatlas.org/).\n+\n+**Input**\n+\n+This tool requires a list of ENSG IDs, by copy/paste into text field or choose from a file.\n+\n+**Option**\n+\n+Firstly, you can choose tissues in which genes expressed need to be eliminated. If among these eliminated genes, you want to keep genes expressed in some other tissues, you can choose these tissues in the second list.\n+\n+For example, when you want to eliminate from input file the genes that are expressed in salivary, but among these genes, you want to keep the genes that are also expressed in lung:\n+\n+* Step 1: choose salivary gland\n+\n+* Step 2: choose lung\n+\n+-----\n+\n+.. class:: infomark\n+\n+**Authors**\n+\n+T.P. Lien Nguyen, Florence Combes, Yves Vandenbrouck CEA, INSERM, CNRS, Grenoble-Alpes University, BIG Institute, FR\n+Sandra D\xc3\xa9rozier, Olivier Ru\xc3\xa9, Christophe Caron, Valentin Loux INRA, Paris-Saclay University, MAIAGE Unit, Migale Bioinformatics platform\n+\n+This work has been partially funded through the French National Agency for Research (ANR) IFB project.\n+\n+Contact support@proteore.org for any questions or concerns about the Galaxy implementation of this tool.\n+\n+    ]]></help>\n+    <citations>\n+    </citations>\n+</tool>\n'
b
diff -r 000000000000 -r 3155d867c056 test-data/IDs.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/IDs.txt Fri Sep 15 11:04:37 2017 -0400
b
@@ -0,0 +1,26 @@
+V1 Ensembl.ENSP Ensembl.ENSG neXtProt_ID
+P04637 ENSP00000269305 ENSG00000141510 NX_P04637
+P08246 ENSP00000263621 ENSG00000197561 NX_P08246
+P63244 ENSP00000426909 ENSG00000204628 NX_P63244
+P10275 ENSP00000363822 ENSG00000169083 NX_P10275
+P00533 ENSP00000275493 ENSG00000146648 NX_P00533
+Q14524 ENSP00000328968 ENSG00000183873 NX_Q14524
+P05067 ENSP00000284981 ENSG00000142192 NX_P05067
+P35555 ENSP00000325527 ENSG00000166147 NX_P35555
+P35222 ENSP00000344456 ENSG00000168036 NX_P35222
+O95273 ENSP00000300213 ENSG00000166946 NX_O95273
+P00451 ENSP00000327895 ENSG00000185010 NX_P00451
+P38398 ENSP00000312236 ENSG00000012048 NX_P38398
+Q05086 ENSP00000232165 ENSG00000114062 NX_Q05086
+Q12802 ENSP00000354718 ENSG00000170776 NX_Q12802
+P68871 ENSP00000333994 ENSG00000244734 NX_P68871
+P04585 NA NA NA
+Q96EB6 ENSP00000212015 ENSG00000096717 NX_Q96EB6
+Q9NYL2 ENSP00000340257 ENSG00000091436 NX_Q9NYL2
+P31749 ENSP00000270202 ENSG00000142208 NX_P31749
+P01137 ENSP00000221930 ENSG00000105329 NX_P01137
+Q5S007 ENSP00000298910 ENSG00000188906 NX_Q5S007
+Q08379 ENSP00000416097 ENSG00000167110 NX_Q08379
+P02649 ENSP00000252486 ENSG00000130203 NX_P02649
+P35498 ENSP00000303540 ENSG00000144285 NX_P35498
+P12931 ENSP00000350941 ENSG00000197122 NX_P12931
b
diff -r 000000000000 -r 3155d867c056 test-data/na_sort_by_tissue.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/na_sort_by_tissue.txt Fri Sep 15 11:04:37 2017 -0400
b
@@ -0,0 +1,2 @@
+ENSG00000183873
+ENSG00000144285
\ No newline at end of file
b
diff -r 000000000000 -r 3155d867c056 test-data/sort_by_tissue_output.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/sort_by_tissue_output.txt Fri Sep 15 11:04:37 2017 -0400
b
b'@@ -0,0 +1,25 @@\n+V1\tEnsembl.ENSP\tEnsembl.ENSG\tneXtProt_ID\tFiltered\n+P04637\tENSP00000269305\tENSG00000141510\tNX_P04637\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P08246\tENSP00000263621\tENSG00000197561\tNX_P08246\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P63244\tENSP00000426909\tENSG00000204628\tNX_P63244\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P10275\tENSP00000363822\tENSG00000169083\tNX_P10275\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P00533\tENSP00000275493\tENSG00000146648\tNX_P00533\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+Q14524\tENSP00000328968\tENSG00000183873\tNX_Q14524\tNA - no tissue information\n+P05067\tENSP00000284981\tENSG00000142192\tNX_P05067\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P35555\tENSP00000325527\tENSG00000166147\tNX_P35555\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue '..b'1436\tNX_Q9NYL2\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P31749\tENSP00000270202\tENSG00000142208\tNX_P31749\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P01137\tENSP00000221930\tENSG00000105329\tNX_P01137\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+Q5S007\tENSP00000298910\tENSG00000188906\tNX_Q5S007\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+Q08379\tENSP00000416097\tENSG00000167110\tNX_Q08379\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P02649\tENSP00000252486\tENSG00000130203\tNX_P02649\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n+P35498\tENSP00000303540\tENSG00000144285\tNX_P35498\tNA - no tissue information\n+P12931\tENSP00000350941\tENSG00000197122\tNX_P12931\tadrenal gland,appendix,bone marrow,breast,bronchus,caudate,cerebellum,cerebral cortex,cervix,colon,duodenum,endometrium 1,endometrium 2,epididymis,esophagus,fallopian tube,gallbladder,heart muscle,hippocampus,kidney,liver,lung,lymph node,nasopharynx,oral mucosa,ovary,pancreas,parathyroid gland,placenta,prostate,rectum,salivary gland,seminal vesicle,skeletal muscle,skin 1,skin 2,small intestine,smooth muscle,soft tissue 1,soft tissue 2,spleen,stomach 1,stomach 2,testis,thyroid gland,tonsil,urinary bladder,vagina\n'