Supporting data for "Population-wide Sampling of Retrotransposon Insertion Polymorphisms Using Deep Sequencing and Efficient Detection"

Dataset type: Genomic, Software
Data released on July 13, 2017

Yu Q; Zhang W; Zeng Y; Zhang X; Wang Y; Wang Y; Xu L; Huang X; Li N; Zhou X; Lu J; Guo X; Li G; Hou Y; Liu S; Li B (2017): Supporting data for "Population-wide Sampling of Retrotransposon Insertion Polymorphisms Using Deep Sequencing and Efficient Detection" GigaScience Database. http://dx.doi.org/10.5524/100318

DOI10.5524/100318

Active retrotransposons play important roles during evolution and continue to shape our genomes today, especially in genetic polymorphisms underlying a diverse set of diseases. However, studies of human retrotransposon insertion polymorphisms (RIPs) based on whole-genome deep sequencing at the population level have not been sufficiently undertaken, despite the obvious need for a thorough characterization of RIPs in the general population.
Herein, we present a novel and efficient computational tool named Specific Insertions Detector (SID) for the detection of non-reference RIPs. We demonstrate that SID is suitable for high depth whole-genome sequencing (WGS) data using paired-end reads obtained from simulated and real datasets. We construct a comprehensive RIP database using a large population of 90 Han Chinese individuals with a mean 68× depth per individual. In total, we identify 9342 recent RIPs, and 8433 of these RIPs are novel compared with dbRIP, including 5826 Alu, 2169 long interspersed nuclear element 1 (L1), 383 SVA, and 55 long terminal repeats (LTR). Among the 9342 RIPs, 4828 were located in gene regions and five were located in protein-coding regions. We demonstrate that RIPs can, in principle, be an informative resource to perform population evolution and phylogenetic analyses. Taking the demographic effects into account, we identify a weak negative selection on SVA and L1 but approximately neutral selection for Alu elements based on the frequency spectrum of RIPs.
SID is a powerful open-source program for the detection of non-reference RIPs. We built a non-reference RIP dataset that greatly enhanced the diversity of RIPs detected in the general population and should be invaluable to researchers interested in many aspects of human evolution, genetics, and disease. As a proof-of-concept, we demonstrate that the RIPs can be used as biomarkers in a similar way as single nucleotide polymorphisms (SNPs).

Additional details

Read the peer-reviewed publication(s):

(PubMed: 28938719)
(PubMed: 30753694)

Related datasets:

doi:10.5524/100318 Cites doi:10.5524/100096

Additional information:

https://github.com/Jonathanyu2014/SID

Accessions (data referenced by this study):

BioProject: PRJEB11005





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
HG004189606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004219606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004229606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004279606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004289606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004369606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004379606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004429606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004439606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
HG004489606HumanhumanHomo sapiens Isolation source:peripheral vein
Cell type:B-Lymphocyte
Tissue:Blood
...
+
Displaying 1-10 of 95 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Simulated_dataDirectoryUNKNOWN122 GB2017-06-27
TextTAR116.65 MB2017-06-27
Mixed archivezip17.73 MB2017-07-17
ImageTAR208.7 KB2017-06-27
ImageTAR1.58 MB2017-06-27
ReadmeTEXT0.5 KB2017-06-27
TextTAR15.65 MB2017-06-27
GitHub archivearchive374.52 KB2017-06-27
Phylogenetic treeUNKNOWN40.32 KB2017-06-27
TextUNKNOWN3.86 KB2017-06-27
Displaying 1-10 of 14 File(s).
Funding body Awardee Award ID Comments
Shenzhen Municipal Government of China Yong Hou JSGG20140702161347218
Shenzhen Municipal Government of China Guanglei Li KQCX20150330171652450

Protocols.io:

Date Action
July 13, 2017 Dataset publish
July 17, 2017 Additional file example.zip added
November 13, 2017 Manuscript Link added : 10.1093/gigascience/gix066
November 29, 2018 File example.zip updated
June 18, 2021 Manuscript Link updated : 10.1093/gigascience/gix066
June 18, 2021 Manuscript Link added : 10.1093/gigascience/giz008
May 23, 2023 External Link updated : https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.imrcc56
May 23, 2023 External Link updated : https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.grkbv4w