records from a file
sampline.py --input=$input --output=$out_file1 --nSample=$nSample --recSize=$recSize --nSkip=$nSkip $replacement
**What it does**
This tool selects random records from a file. Each record is defined by a fixed number of lines.
- When doing over-sampling, --replacement option is enforced by default.
-----
**Example 1: sampling from a BED file**
parameters::
1 line per record, sampling 5 lines, without replacement, output line 1 (track name) directly
Input::
track name=test.bed
chr1 148078400 148078582 CCDS993.1_cds_0_0_chr1_148078401_r 0 -
chr11 116124407 116124501 CCDS8374.1_cds_0_0_chr11_116124408_r 0 -
chr15 41826029 41826196 CCDS10101.1_cds_0_0_chr15_41826030_f 0 +
chr16 142908 143003 CCDS10397.1_cds_0_0_chr16_142909_f 0 +
chr2 220229609 220230869 CCDS2443.1_cds_0_0_chr2_220229610_r 0 -
chr20 33579500 33579527 CCDS13256.1_cds_0_0_chr20_33579501_r 0 -
chr20 33593260 33593348 CCDS13257.1_cds_0_0_chr20_33593261_f 0 +
chr5 131621326 131621419 CCDS4152.1_cds_0_0_chr5_131621327_f 0 +
chr7 113660517 113660685 CCDS5760.1_cds_0_0_chr7_113660518_f 0 +
chrX 152648964 152649196 CCDS14733.1_cds_0_0_chrX_152648965_r 0 -
Output::
track name=test.bed
chr11 116124407 116124501 CCDS8374.1_cds_0_0_chr11_116124408_r 0 -
chr16 142908 143003 CCDS10397.1_cds_0_0_chr16_142909_f 0 +
chr20 33579500 33579527 CCDS13256.1_cds_0_0_chr20_33579501_r 0 -
chr20 33593260 33593348 CCDS13257.1_cds_0_0_chr20_33593261_f 0 +
chr5 131621326 131621419 CCDS4152.1_cds_0_0_chr5_131621327_f 0 +
**Example 2: sampling reads from a fastq file**
parameters::
4 line per record, sampling 3 records, without replacement
Input::
@SRR066787.2496 WICMT-SOLEXA:8:1:28:2047 length=36
NNANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR066787.2496 WICMT-SOLEXA:8:1:28:2047 length=36
!!%!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@SRR066787.2497 WICMT-SOLEXA:8:1:28:463 length=36
GTGATTAAGAAGAGACTGGCATCACTAAGGTGACAT
+SRR066787.2497 WICMT-SOLEXA:8:1:28:463 length=36
@A=BBCBBAA@:@:@@@:,?AB:B?BB=*2:@=?AA
@SRR066787.2498 WICMT-SOLEXA:8:1:28:704 length=36
GAACCCAATTTTCAAAGAAGTGTGACTGCTTGTTTC
+SRR066787.2498 WICMT-SOLEXA:8:1:28:704 length=36
=?BAABBACCCCAA9>>A=>A?A;;@A>ABBABBB:
@SRR066787.2499 WICMT-SOLEXA:8:1:28:997 length=36
CGACTTCAGGCTCTCGCTAGCCTTCGCTTGACTGAC
+SRR066787.2499 WICMT-SOLEXA:8:1:28:997 length=36
BCCBCCB?A1ACAC>;@CCAAABB?8=BA>@?B?@:
@SRR066787.2500 WICMT-SOLEXA:8:1:28:582 length=36
TCTCTCTCTTTCTCTCTCTCTCTCTCTCTCTCTCTC
+SRR066787.2500 WICMT-SOLEXA:8:1:28:582 length=36
?.?.=9C8CCC:BACBCBC?CCC@CBBBCBBACAC8
Output::
@SRR066787.2497 WICMT-SOLEXA:8:1:28:463 length=36
GTGATTAAGAAGAGACTGGCATCACTAAGGTGACAT
+SRR066787.2497 WICMT-SOLEXA:8:1:28:463 length=36
@A=BBCBBAA@:@:@@@:,?AB:B?BB=*2:@=?AA
@SRR066787.2499 WICMT-SOLEXA:8:1:28:997 length=36
CGACTTCAGGCTCTCGCTAGCCTTCGCTTGACTGAC
+SRR066787.2499 WICMT-SOLEXA:8:1:28:997 length=36
BCCBCCB?A1ACAC>;@CCAAABB?8=BA>@?B?@:
@SRR066787.2500 WICMT-SOLEXA:8:1:28:582 length=36
TCTCTCTCTTTCTCTCTCTCTCTCTCTCTCTCTCTC
+SRR066787.2500 WICMT-SOLEXA:8:1:28:582 length=36
?.?.=9C8CCC:BACBCBC?CCC@CBBBCBBACAC8