Week 3: Bayesian Statistics and Calculations

Mar 27, 2021

Hello everyone, welcome back to another blog!

This week, I continued working on finishing the calculations for the bacterial species database, which is meant to help analyze factors that affect cell metabolism. Things have been going pretty smoothly, so I will be meeting with Mr. Swain again on Monday to discuss further tasks for my project. Here is a snippet of my completed database:

As for my progress in learning R this week, I completed the 4th data science course on Inferences and Modeling. Topics included confidence intervals, association tests, election forecasting, and Bayesian statistics. Through learning to apply the Central Limit Theorem to actual databases and understanding the relationships between expected values, standard errors, and p-values, I gained a better understanding of the diverse ways that datasets can be altered and examined. There are two different categories of analyzing statistics: Frequentist statistics and Bayesian statistics. Whereas frequentist statistics only considers the frequency of outcomes in a dataset without including any outside information, Bayesian statistics allows prior knowledge about event probabilities to modify observed results. Because the frequentist approach can have issues when sample sizes are small and when data is extreme, Bayesian statistics is more adaptable because its hierarchal models calculate both prior distribution and sampling distribution. Additionally, t-distributions are important to consider in polls because variability can come from time variations. The following equation thus includes a bias term, bi, and a trend of p given time t, f(t), to model the time effect:

  • Yi,j,t = d + b + hj + bi + f(t) + εi,j,t

The t-distribution follows N-1 degrees of freedom, which determine the weight of the distribution. With the function qt(), confidence intervals with a t-distribution can be created to shows wider tails than a normal distribution.

GeeksforGeeks – https://www.geeksforgeeks.org/students-t-distribution-in-statistics/

While learning about these distributions, I have also been practicing in Rstudio with filtering and manipulating databases to create detailed plots and spreads. As I progress further into the course, more advanced functions have helped to develop more accurate and useful visuals.

For next week, I will be starting a new task for my internship, which hopefully involves utilizing R analysis techniques that I’ve recently learned. I also plan on finishing the 5th data science course on Productivity Tools, which will entail integrating Rstudio, Git, and Unix together. Stay tuned and see you next week!




2 Replies to “Week 3: Bayesian Statistics and Calculations”

  1. Jiaming Z. says:

    Wow we got so much similarities in our projects! I am also reviewing Bayesian statistics and it’s various probability formulas. R is definitely the best language to do such complicated things. Good luck on using and coding in R!

  2. Eric M. says:

    Love the visuals, but I can’t understand them… In what regards are they more accurate and useful, and how do you interpret that?

Leave a Reply