Duke Neurogenetics Study Protocol

Genetics

Genotyping

DNA was isolated from saliva derived from Oragene DNA self-collection kits (DNA Genotek) customized for 23andMe (www.23andme.com). DNA extraction and genotyping were performed through 23andMe by the National Genetics Institute (NGI), a CLIA-certified clinical laboratory and subsidiary of Laboratory Corporation of America. One of two different Illumina arrays with custom content was used to provide genome-wide SNP data, the HumanOmniExpress or HumanOmniExpress-24 (Eriksson et al., 2010; Do et al., 2011; Tung et al., 2011; Hu et al., 2016).

Multi-dimensional Scaling (MDS) components analysis

Because self-reported race and ethnicity are not always an accurate reflection of genetic ancestry, an analysis of identity by state of whole-genome SNPs was performed in PLINK (version 1.9; Purcell et al., 2007). The first four multidimensional scaling components were used as covariates to reduce possible confounding effects of race/ethnicity. The decision to use only the first four components was based on an examination of a scree plot of eigenvalues, which showed that the eigenvalues became very similar after the fourth component. Before running the multidimensional scaling components analysis, SNPs were pruned for high LD (r2>0.1), C/G and A/T SNPs were removed, SNPs with a missing rate >.05 or a minor allele frequency <.01 were removed, SNPs that did not pass the Hardy-Weinberg equilibrium test (p<1e-6) were removed, sex chromosomes were removed, and regions with long-range LD were removed (the MHC and 23 additional regions; Price et al., 2008). Further, one from each pair of individuals with proportion identity by descent (i.e., pi hat) > .1875 were removed from analysis.

Click here for MDS QC plots (components and scree plots).

Imputation

Genotype imputation was performed on all DNS participants with genome-wide chip data using the prephasing/imputation stepwise approach implemented in SHAPEIT/IMPUTE2 (Delaneau et al. 2012 and Howie et al. 2011). Imputation was run separately for participants genotyped on the Illumina HumanOmniExpress (n = 458) and the Illumina HumanOmniExpress-24 (n = 729) arrays using biallelic SNPs only, the default value for effective size of the population (20,000), and chunk sizes of 3 Mb and 5 Mb for the respective arrays. Within each array batch, genotyped SNPs used for imputation were required to have missingness <.02, Hardy-Weinberg equilibrium p > 10-6, and minor allele frequency >.01. The imputation reference set consisted of 2504 phased haplotypes from the full 1000 Genomes Project Phase 3 data set (May 2013, >70 million variants, release "v5a", build GRCh37). Imputed SNPs were retained if they had high imputation quality (Info >.9), low missingness (<5%), and minor allele frequency >.01.

References

O. Delaneau, J. Marchini, JF. Zagury (2012) A linear complexity phasing method for thousands of genomes. Nat Methods. 9(2):179-81. doi: 10.1038/nmeth.1785

Howie B, Marchini J, Stephens M. Genotype imputation with thousands of genomes. G3 (Bethesda). ;1(6):457–470. doi:10.1534/g3.111.001198

Price AL, Weale ME, Patterson N, et al. Long-range LD can confound genome scans in admixed populations. Am J Hum Genet. 2008;83(1):132–139. doi:10.1016/j.ajhg.2008.06.005

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., De Bakker, P.I., Daly, M.J., 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81, 559-575.

Back to protocols