view classifier/classifier.xml @ 1:d8547ff82697 draft default tip

Deleted selected files
author nanettec
date Fri, 18 Mar 2016 05:16:02 -0400
parents ef9c2044d86a
children
line wrap: on
line source

<tool id="classifier5" name="Classify eQTLs" version="5.0.0">
	<description> as cis or trans</description>
	<command interpreter="python">
            classifier.py --rscript \$R_SCRIPT_PATH/classifier/eqtl_genes_positions_plot.txt --input1 $input1 --input2 $input2 --input3 $input3 --input4 $input4 --output1 $output1 --output2 $output2 --output3 $output3 --output4 $output4 --output5 $output5 --output6 $output6 --output7 $output7 --output8 $output8
	</command>
        <inputs>
            <param label="eQTL results file" name="input1" type="data" format="tabular" help="A tabular file with the mapped eQTLs and its associated statistics"></param>
	    <param label="Chr summary file" name="input2" type="data" format="tabular" help="A tabular file with a data summary per chromosome (bp)"></param>
	    <param label="Gene positions file" name="input3" type="data" format="tabular" help="A tabular file with the positions (bp) of each gene"></param>
            <param label="Lookup table file" name="input4" type="data" format="tabular" help="A tabular file with cM and bp positions for each interval"></param>
        </inputs>
	<outputs>
                <data format="tabular" name="output1" />
		<data format="tabular" name="output2" />
		<data format="tabular" name="output3" />
		<data format="tabular" name="output4" />
		<data format="tabular" name="output5" />
		<data format="tabular" name="output6" />
		<data format="tabular" name="output7" />
		<data format="pdf" name="output8" />
	</outputs>
	<requirements>
		<requirement type="set_environment">R_SCRIPT_PATH</requirement>
	</requirements>
	<tests>
          <test>
          </test>
	</tests>
	<help>
		
**What it does**

Calculates the average genetic interval size across all eQTLs.

Classifies an eQTL as 'cis' if it maps within half the above mentioned interval size of the gene exhibiting the eQTL.

Classifies an eQTL as 'trans' if it maps to a different region on the genome than the location of the gene exhibiting the eQTL (further away than half the above mentioned interval size from the gene).

Classifies an eQTL as 'no_result' if the location of the target gene is not known. 

-------

**Example input files**

eQTL results file, each row correspond to an eQTL (21 columns; only a part of the file is shown)::

 trait_name trait_number eQTL_number chr  peak_marker	peak_position	peak_LR		peak_LOD	R2		TR2		S		additive  dominance  LOD1_L_m	LOD1_L_pos  LOD1_R_m	LOD1_R_pos   LOD2_L_m	LOD2_L_pos  LOD2_R_m	LOD2_R_pos
 geneA	    106		 2	     10	  4	 	0.5206		13.0002477	2.821053751	0.1067186	0.2802598	2741.216084	-80.0805117	0	3	0.4045	 	5	0.6791		3	0.3583		5	0.7505
 geneB	    434		 3	     6	  3		0.1455		13.000651	2.821141267	0.0881461	0.3710748	38.650035	502.7692948	0	2	0.0847		3	0.2153		1	0.0112		3	0.2763
 geneC	    343		 2	     4	  10		1.1039		13.0012249	2.821265803	0.1168611	0.3068127	42.9667077	-101.8310204	0	10	1.0217		10	1.1078		9	0.9838		10	1.1118
 geneD	    384		 1	     1	  19		2.3414		13.0022994	2.82149897	0.1372476	0.1985604	2.1933164	-688.0268455	0	19	2.1956		20	2.4956		19	2.0883		20	2.5488
 geneD	    267		 2	     9	  8		1.2052		13.0026682	2.821578999	0.0862225	0.3794662	55.4157254	278.1351403	0	7	1.2023		8	1.2277		7	1.1994		8	1.2466


Chromosome summary file, each row correspond to a chromosome (6 columns; only a part of the file is shown). The last row  gives the total across the genome::

 chr	markers	cM	bp	int_positions	bins
 1	27	324.4	301354135	177	176
 2	14	169.11	237068873	92	91
 3	19	221.29	232140174	123	122
 4	20	188.37	241473504	105	104
 5	20	203.82	217872852	110	109
 6	17	195.85	169174353	106	105
 Total 	117	1302.84	1399083891	713	707
 
Gene positions file, each row correspond to a gene (4 columns; only a part of the file is shown)::

 gene   chr     start_bp        end_bp
 geneA	1	33214735	33217244
 geneB	2	216416829	216433258
 geneC	6	162556092	162559012
 geneD	4	197750322	197751855
 geneE	7	144322379	144325978
 geneF	10	88551726	88552391
 geneG	8	163218231	163219697
 geneH	4	28738352	28739816
 geneI	5	180868777	180878474
 geneJ	5	182124005	182130631


Lookup table file, each row correspond to a 2 cM interval (6 columns; only a part of the file is shown)::

 id	chr	marker	int	cM	bp      length_cM
 1	1	1	0.0001	0.0	2038278	2.0
 2	1	1	0.0201	2.0	2466324	2.0
 3	1	1	0.0401	4.0	2894370	2.0
 4	1	1	0.0601	6.0	3322416	1.53
 5	1	2	0.0754	7.53	3649871	2.0
 6	1	2	0.0954	9.53	4095673	2.0
 7	1	2	0.1154	11.53	4541476	2.0
 8	1	2	0.1354	13.53	4987278	2.0

-------

**Example output files**


eQTL full classification file, each row correspond to an eQTL (16 columns; only a part of the file is shown). A classification column was added to the eQTL results file::

 gene	index  chr  start_marker  start_int  end_marker	 end_int  peak_marker	peak_int	peakLR		rsq		rtsq	parent_up_reg   classification	 eQTL_bin    gene_bin
 geneA  1	6	13	1.5139		15	1.6431		13	1.5539		12.7532485	0.1337606	0.3630217	parentA     trans	691     	800
 geneC 	2	9	5	0.8106		6	0.9614		6	0.9214		20.344489	0.1559524	0.3123026	parentB     trans	902     	700
 geneC	3	9	8	1.2052		8	1.2452		8	1.2052		16.6822024	0.1244943	0.314542	parentA     cis		917     	920
 geneD	4	9	1	0.0001		2	0.2395		1	0.1201		19.531317	0.1753893	0.4300621	parentA     cis		860     	862
 geneH	5	1	1	0.0001		1	0.1001		1	0.0001		19.5727096	0.1373944	0.392982	parentB     trans	939     	465
 geneH	6	1	9	1.0268		11	1.2164		10	1.1261		13.5560176	0.095168	0.4823061	parentB     trans	1000    	465
 geneH	7	6	14	1.5977		15	1.8031		15	1.7231		19.8953622	0.3181244	0.3909106       parentB     no_result	904     	904
 geneI	8	9	7	1.0982		9	1.3079		8	1.2052		20.3966235	0.1305025	0.4233788	parentA     cis		977     	969
 
eQTL cis classification file, each row correspond to a cis eQTL (16 columns; only a part of the file is shown)::

 gene	index  chr  start_marker  start_int  end_marker	 end_int  peak_marker	peak_int	peakLR		rsq		rtsq	parent_up_reg   classification	 eQTL_bin    gene_bin
 geneC	3	9	8	1.2052		8	1.2452		8	1.2052		16.6822024	0.1244943	0.314542	parentA     cis		917     	920
 geneD	4	9	1	0.0001		2	0.2395		1	0.1201		19.531317	0.1753893	0.4300621	parentA     cis		860     	862
 geneI	8	9	7	1.0982		9	1.3079		8	1.2052		20.3966235	0.1305025	0.4233788	parentA     cis		977     	969

eQTL trans classification file, each row correspond to a trans eQTL (16 columns; only a part of the file is shown)::

 gene	index  chr  start_marker  start_int  end_marker	 end_int  peak_marker	peak_int	peakLR		rsq		rtsq	parent_up_reg   classification	 eQTL_bin    gene_bin
 geneA  1	6	13	1.5139		15	1.6431		13	1.5539		12.7532485	0.1337606	0.3630217	parentA     trans	691    	 	800
 geneC 	2	9	5	0.8106		6	0.9614		6	0.9214		20.344489	0.1559524	0.3123026	parentB     trans	902     	700
 geneH	5	1	1	0.0001		1	0.1001		1	0.0001		19.5727096	0.1373944	0.392982	parentB     trans	939     	465
 geneH	6	1	9	1.0268		11	1.2164		10	1.1261		13.5560176	0.095168	0.4823061	parentB     trans	1000    	465
 
Classification summary file, each row correspond to a class (6 columns)::

 class      number_eQTLs    percentage_eQTLs	average_peakLR  average_rsq     average_rtsq
 cis	    	4712		14.93%		36.0    	0.29    	0.47
 trans	    	20726		65.69%		36.0    	0.29    	0.47
 no_result	6111		19.369%		20.1    	0.16    	0.39
 total	    	31549		100.0%		19.5    	0.16    	0.38
 
Chromosome summary v2 file, each row correspond to a chromosome (11 columns; only a part of the file is shown). The last row  gives the total across the genome::

 chr     markers cM      	bp         interval.positions  bins    genes   cis eQTL    trans eQTL   unknown eQTL    all eQTL
 1       27      324.4   	301354135       177     	176     5185    782     	3209    761     	4752
 2       14      169.11  	237068873       92      	91      3782    512     	1897    510    	 	2919
 3       19      221.29  	232140174       123     	122     3608    469     	2098    614     	3181
 4       20      188.37  	241473504       105     	104     3389    493     	2006    491     	2990
 5       20      203.82 	217872852       110     	109     3964    657     	3077    762     	4496
 6       17      195.85  	169174353       106     	105     2744    413    	 	1933    516     	2862
 Total   117     1302.84	1399083891	713		707	22672	3326		14220	3654		21200

Gene positions v2 file, each row correspond to a gene (9 columns; only a part of the file is shown)::

 gene	chr	start_bp	end_bp	    num_eQTL  num_cis_eQTL   num_trans_eQTL  num_unknown_eQTL	gene_bin
 geneA    6       155513712       155518148       1       0       	1       	0       	682
 geneB    4       230729005       230729064       0       0       	0       	0       	472
 geneC    2       172852270       172853086       2       0       	1       	1       	229
 geneD    1       282744902       282749375       3       0       	3       	0       	154
 geneE    2       6556394 	  6560322 	  0       0       	0       	0       	189
 
eQTL per gene summary file (2 columns)::

 Average number of eQTLs per gene with eQTL      	 2.4
 Average number of cis eQTLs per gene with cis eQTL      1.0
 Average number of trans eQTLs per gene with trans eQTL  1.8
 Number of genes with only cis eQTL (no trans)   	 1402 (8.5%)
 Number of genes with only trans eQTL (no cis)   	 11042 (66.7%)
 Number of genes with cis and trans eQTL 	 	 4121 (24.9%)
 Number of genes with cis or trans eQTL  		 16565 (100.0%)


eQTL vs gene position plot (in pdf format, produced using R).

        </help>
</tool>