Week 6: Analysis

Apr 24, 2020

Hi everyone,

This week, I completed most of my analysis. My goal was to determine if there was any significant causal effect between sleep traits (chronotype, insomnia, sleep duration) and lung cancer development. This was done in R using the package “TwoSampleMR.” Because the procedure is fairly similar for each of the three sleep traits, I will only be discussing my insomnia analysis in this post. If you want to see the others, just leave a comment!

Before starting, there are two key files to have. One is a cleaned up version of an insomnia (risk exposure) genome wide association study (GWAS), and the other is a cleaned up lung cancer (outcome) GWAS. These provide summary data about each relevant single nucleotide polymorphism (SNP), such as the effect size, effect allele, reference allele, standard error, etc.

In these first couple lines, I import the package and locate the files. One complication that was encountered was the lung cancer file did not give the effect size of each SNP. It only provided the Odds Ratio. Because the outcome is binary (lung cancer or no lung cancer), we can take the natural log of the odds ratio to obtain the effect size.

So in line 8, I create a new data frame called “tmp,” where I create a new column called “beta” which is log(Odds ratio). Then I save this to a new file which I will read from later.

In this chunk, I read the exposure data and conduct MR. The loop is necessary because there in the exposure file, there were four studies, each giving a different effect size, standard error, p value, etc. So MR was performed for all four studies. Below is one of its outputs:

We are interested in the circled regions. There are 5 methods of analysis, but we will use Inverse variance weighted (3). Because the p-value is below 0.05, we can conclude that the effect of insomnia on lung caner development is significant. The other three studies had p-values ranging from 0.03 to 0.08, with only one of them above 0.05. The other risk exposures (sleep duration and chornotype) had very large p-values, which suggests that their effects could very well be due to random chance.

Next week, I will begin working on plots to help illustrate these effects. Thanks for reading! 

3 Replies to “Week 6: Analysis”

  1. Ethan H. says:

    Very cool! It’s pretty awesome to see you applying knowledge from AP Statistics to solve real-world problems.
    Can’t wait to see the plots!

  2. Lieselotte D. says:

    Its cool to see your process for analyzing the data. Is that process easy to adapt to other data sources?

    1. Alan Y. says:

      Absolutely! The process is quite similar if you want to investigate other exposures/outcomes.

Leave a Reply