Menu
![Dbsnp 138 vcf credit card Dbsnp 138 vcf credit card](http://www.nature.com/nm/journal/v22/n7/images_article/nm.4125-F2.jpg)
At the '/bundle/2.8/b37' directory, use 'get dbsnp_138.b37.vcf.gz' to download the latest database of known polymorphic sites to the current local working directory. This file is used in BaseRecalibrator to supply the parameter '-knownSites'.
![Vcf Vcf](/uploads/1/2/5/4/125485119/913653830.jpg)
GATK Pipeline for calling variants from one sample Synopsis: We will outline the GATK pipeline to pre-process a single sample starting from a paired of unaligned paired-ends reads (R1,R2) to variant calls in a vcf file. For demonstration, we will download reads for a CEPH sample (SRR062634) This tutorial is based on GATK version 3.7. The next version of GATK (4.0; currently in beta) will not only introduce a host of new features but also be open source. At this stage, it is assumed that the reference genome (genome.fasta) has been processed by bwa.
It is also assumed that the genome fasta has been indexed (genome.fai) and that a dictionary file (genome.dict) has been created. Finally, at least one snp and and one indel reference vcf, along with indices, must be available. Use the capture file For targeted sequencing (e.g., exome sequencing) using the capture file (a file indicating which regions were sequenced) may only improve the quality of the analysis but also speed up the process (by avoiding processing off target regions). A capture file is usually distributed with the “.bed” extension and is a tab delimited file of the form chr1 100 110 chr1 103 chr2 50 112 In this dummy example, chr1 was “captured” in the intervals 100-101 and 103, while chr2 captured in the interval 50-112 Note that the contigs (chr1 and chr2 in this case) have to match the contigs in your.dict file For the GATK pipeline, one can introduce the capture file when creating the recalibration table using ‘-L mycapture.bed’.