Week 2: Searching for Data

Mar 27, 2020

Hello everyone and welcome back to my blog!

This week, I searched for the public summary data necessary to carry out the analysis part of my project. There are multiple sets of data to extract, so it is important to understand what is needed and why. As explained in my introduction blog, the idea behind Mendelian Randomization is that we can determine the strength of a causal relationship between a risk exposure and an outcome while minimizing confounding effects by analyzing genetic variants. These variants are called single nucleotide polymorphisms (SNPs) and describe variation among humans. If we can identify SNPs related to both the risk exposure and the outcome, then we can conclude that the risk exposure must affect the outcome, assuming certain conditions are met.

The goal of my project is to analyze the relationship between multiple risk exposures (e.g. sleep duration, chronotype) and the outcome (lung cancer). To do so, we need summary data on SNPs for each of the risk exposures and the outcome. These data come from Genome Wide Association Studies (GWAS) and describe the estimated strength and variability of each SNP’s association with the particular risk exposure (e.g. sleep duration) or the outcome (lung cancer).

The three risk exposures I will investigate are chronotype (morning vs. night preference), sleep duration (hours of sleep), and insomnia (frequent, rarely, or never). For chronotype, I will be using the GWAS conducted by Jones et al., 2019, which used data from the UK Biobank cohort and identified 351 chronotype-related SNPs. For sleep duration, I will use the GWAS conducted by Lane et al. (2019) which found 78 associated SNPs. For insomnia, I will use the GWAS conducted by Lane et al. (2019), which identified 45 insomnia-related SNPs.

Data on cancer is less accessible. Some, such as ovarian cancer data, are made publicly available, while others are confidential. Unfortunately, lung cancer data cannot be downloaded right off the internet. My mentor is currently reaching out to request access. Next week, I plan to have the data collected and start working on importing.

Thanks for reading! See you next week!

