Linear mixed models for estimating heritability and testing genetic association in family data


Coronary heart disease (CHD) is one of the leading causes of death worldwide. Linear mixed models (LMMs) are presented in this thesis and they are applied to family data from the European Multicenter Study on Familial Dyslipidemias in Patients with Premature Coronary Heart Disease (EUFAM) -project. The data contain 23 quantitative traits relating to risk of CHD and roughly 28 million genetic variants. The data consist of nearly 1600 individuals from around 150 families. Linear mixed models are used when the data contain clustering or repeated measurements. In other words, when the observations are dependent. In the EUFAM data the observations come from families. In this case, the linear mixed models take the relatedness of the individuals into account. Linear mixed models are applied for both heritability estimation and genome-wide association testing in this thesis. Both in simulations and in the analyses with the EUFAM-data the need for LMMs can be seen. The LMM has more statistical power than the standard linear model when heritability exists in the data. The standard linear model also has inflated type I error rate. Both of these occur because the standard linear model does not take the relatedness of the individuals into account. For example, in the genome-wide analysis done for the EUFAM-data the standard linear model gives a massive amount of false positives when compared to the linear mixed model. The thesis proves the usefulness of and need for linear mixed models when analyzing family data.

Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta, Matematiikan ja tilastotieteen laitos