Supporting data for "Deep whole-genome sequencing of 90 Han Chinese genomes"

Dataset type: Genomic
Data released on July 12, 2017

Lan T; Lin H; Zhu W; Asker Melchior Tellier LC; Yang M; Liu X; Wang J; Wang J; Yang H; Xu X; Guo X (2017): Supporting data for "Deep whole-genome sequencing of 90 Han Chinese genomes" GigaScience Database. http://dx.doi.org/10.5524/100302

DOI10.5524/100302

Next generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data, due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low frequency and novel variants. Although whole exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole genome sequencing data is limited for any population, and a large amount of low-frequency, population-specific variants remains uncharacterized. We have performed whole genome sequencing at high depth (~80X) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genome Project samples, including 45 North Han Chinese and 45 South Han Chinese samples. 83 of these 90 have not been sequenced by the 1000 Genomes Project. We have identified 12,568,804 single nucleotide polymorphisms, 2,074,734 short InDels and 26,142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7,007,685 novel variants with low frequency (defined as minor allele frequency < 5%), including 5,816,839 SNPs, 1,172,919 InDels, and 17,927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, this Han Chinese deep sequencing data enhances characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement for the 1000 Genomes Project, as well as for other human genome projects.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 28938720)
(PubMed: 30801124)

Additional information:

https://github.com/HaoxiangLin/WGS_of_Han_Chinese_genomes

Accessions (data generated as part of this study):

BioProject: PRJEB11005





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SRS0001119606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18526
...
+
SRS0001129606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18532
...
+
SRS0001139606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18537
...
+
SRS0001149606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18542
...
+
SRS0001159606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18545
...
+
SRS0001169606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18547
...
+
SRS0001179606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18550
...
+
SRS0001189606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18552
...
+
SRS0001199606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18555
...
+
SRS0001209606HumanhumanHomo sapiens Miscellaneous parameter:Coriell panel:MGP00017
Miscellaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18558
...
+
Displaying 1-10 of 90 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Sequence variantsVCF2.13 GB2017-04-17
Sequence variantsVCF7.71 GB2017-04-17
Sequence variantsVCF109.84 MB2017-04-17
TextTEXT16.47 MB2017-04-17
TextPDF160 KB2017-04-17
TextPDF16 KB2017-04-17
DirectoryUNKNOWN479.7 MB2017-07-05
Sequence assemblyFASTA845.18 MB2017-04-17
Sequence assemblyFASTA845.03 MB2017-04-17
Sequence assemblyFASTA838.26 MB2017-04-17
Displaying 1-10 of 103 File(s).
Funding body Awardee Award ID Comments
Shenzhen Municipal Government of China CXB201108250094A

Protocols.io:

Date Action
July 12, 2017 Dataset publish
July 17, 2017 File WGS_of_Han_Chinese_genomes-master.zip updated
October 2, 2017 Manuscript Link added : 10.1093/gigascience/gix067
June 18, 2021 Manuscript Link updated : 10.1093/gigascience/gix067
June 18, 2021 Manuscript Link added : 10.1093/gigascience/giz001
May 23, 2023 External Link updated : https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.grkbv4w
May 23, 2023 External Link updated : https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.gr3bv8n
May 23, 2023 External Link updated : https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.gr4bv8w