Introduction to Whole-Genome Sequencing
Whole genome sequencing provides the most comprehensive collection of an individual’s genetic variation. There are two approaches for assembling short shotgun sequence reads into longer contiguous genomic sequences. In the de novo assembly approach, sequence reads are compared to each other, and then overlapped to build longer contiguous sequences. The reference-based assembly approach involves mapping each read to a reference genome sequence.
Next-generation sequencing (NGS) have largely been used as a research tool and are currently being introduced in the clinics. In the future of personalized medicine, whole genome sequence data will be an important tool to guide therapeutic intervention. The tool of gene sequencing at SNP level is also used to pinpoint functional variants from association studies and improve the knowledge available to researchers interested in evolutionary biology, and hence may lay the foundation for predicting disease susceptibility and drug response. While whole genome sequencing is commonly associated with sequencing human genomes, NGS technology makes it equally useful for sequencing any species, such as agriculturally important livestock, plants, or disease-related microbes.
Advantages of Whole-Genome Sequencing
-Provides a high-resolution, base-by-base view of the genome
-Captures both large and small variants that might otherwise be missed
-Identifies potential causative variants for further follow-on studies of gene expression and regulation mechanisms
-Delivers large volumes of data in a short amount of time to support assembly of novel genomes
-Creating personalized plans to treat disease not only based on the mutant genes causing a disease, but also other genes in the patient’s genomes
Whole genome re-sequencing is to compare genome sample to available genome reference (Initial sequence of particular genome) for detecting variants (SNPs, Indels). This requires highly parallel system such as Illumina HiSeq to provide sufficient coverage depths for accurate variant detection. The table below shows very general guidelines for choosing appropriate coverage depth for different purposes.
|Individual Genome /General disease||Population Genomics||Cancer / Rare disease|
|Objective||Obtaining variants of each sample for downstream analysis||Population and phylogenetic analysis Disease and phenotype relationship analysis||Detecting cancer specific and/or rare variants|
|Sample Requirements||Minimum Quantity = 1μg Minimum Concentration = 30ng/μl OD260/280 = 1.6~2.2|
|WGS Basic Analysis Criteria|
|1. Data filtering|
|2. Summary of data production|
|3. Statistics of mapping|
|4. Detection of SNVs|
|5. Annotation of the resulting SNVs|
|6. Detection of Indels|
|7. Annotation of the resulting Indels|
de novo Sequencing
de novo sequencing requires special library preparation and highly parallel sequencing technology. There are two types of de novo genome sequencing which are draft map and fine map. To construct successful de novo genome, various libraries with different sizes (200bp, 500bp, 2kb, 5kb, 10kb, 20kb) are usually prepared.
|General Sequencing depth||Gene Coverage||Genome coverage||Accuracy||Contig sequence N50||Scaffold Sequence N50|
|Draft Map||60X||95%|| 90%
(except heterochromatin region)
|Fine Map||80X||98%|| 95%
(except heterochromatin region)
|Sample Requirements||Minimum Quantity = 4μg (2-5kb) Minimum Concentration = 30ng/μl OD260/280 = 1.6~2.2|