Supporting data for "PhenoSpD: an integrated toolkit for phenotypic correlation es-timation and multiple testing correction using GWAS summary statistics"

Dataset type: Software
Data released on July 13, 2018

Zheng J; Richardson TG; C Millard LA; Hemani G; Elsworth BL; Raistrick CA; Vilhjalmsson B; Neale BM; Haycock PC; Smith GD; Gaunt TR (2018): Supporting data for "PhenoSpD: an integrated toolkit for phenotypic correlation es-timation and multiple testing correction using GWAS summary statistics" GigaScience Database. http://dx.doi.org/10.5524/100474

DOI10.5524/100474

Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restrict-ed access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using only genome-wide association study (GWAS) summary results.
Here, we present an integrated R toolkit, PhenoSpD, to 1) use LD score regression to estimate phenotypic correlations using GWAS summary statistics; and 2) utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest 1) it is pos-sible to identify non-independence of phenotypes using samples with partial overlap, as overlap decreases the estimated phenotypic correlations will attenuate towards zero and multiple testing correction will be more stringent than in perfectly overlapping samples; 2) in contrast to LD score regression, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application for multiple testing correction. In a case study, PhenoSpD using UK Biobank GWAS results suggested 399.6 independent tests among 487 human traits, which is close to the 352.4 inde-pendent tests estimated using true phenotypic correlation. We further applied PhenoSpD to an estimated 5618 pair-wise phenotypic correlations among 107 metabolites using GWAS summary statistics from Kettunen et al. and PhenoSpD suggested the equivalent of 33.5 independent tests for theses metabolites.
PhenoSpD extends the use of summary level results, providing a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics. This is particularly valuable in the age of large-scale biobank and consortia studies, where GWAS results are much more accessible than individual-level data.
R code and documentation for PhenoSpD V1.0.0 is available online https://github.com/MRCIEU/PhenoSpD.





File NameSample IDData TypeFile FormatSizeRelease Date 
GitHub archivearchive96.62 KB2018-06-20
ReadmeTEXT3.07 KB2018-06-20
Displaying 1-2 of 2 File(s).
Funding body Awardee Award ID Comments
Medical Research Council NAMED PI? MC_UU_12013/4
Medical Research Council NAMED PI? MC_UU_12013/8
Cancer Research UK NAMED PI? C18281/A19169 the Integrative Cancer Epidemiolo-gy Programme
Cancer Research UK PC Haycock C52724/A20138 Population Research Fellow
UKRI Innovation TG Richardson MR/S003886/1 Research Fellow
Date Action
July 13, 2018 Dataset publish
August 28, 2018 Manuscript Link added : 10.1093/gigascience/giy090
June 14, 2021 External Link updated : http://mips.helmholtz-muenchen.de/proj/GWAS/gwas/gwas_server/shin_et_al.metal.out.tar.gz
June 14, 2021 External Link added : http://www.computationalmedicine.fi/data#NMR_GWAS
November 11, 2022 Manuscript Link updated : 10.1093/gigascience/giy090