I recently happened across this paper, which I found quite astonishing. Firstly, it’s written by the 23andMe research team. If there’s still anyone in need of convincing that important and exciting research happens in industry, they should read this paper, because it is high quality and the data they have...
[Read More]
Linear Regression
Developing intuition for regression coefficients
Linear regression is one of those things that is easy to use in practice, but difficult to develop a good intuition for. At least I struggle to have a good sense of what the regression coefficients are going to look like for all but the most trivial cases.
[Read More]
Reproducible analysis for the very lazy
These days, pretty much everyone acknowledges that scientific analysis should be reproducible. Of course, this is still very often not the case. For the most part, I don’t think that people start out trying to write code and structure their data so no one else can reproduce their results. But...
[Read More]
Batch effects in single cell RNA sequencing
Part 2 - Handling batch effects without batch correction
In a previous post, I wrote about the perils of over-diagnosing batch effect. In this post I’m going to do something even more controversial and provide advice for how to deal with batch effects without using some black box “batch correction” tool. Why would you want to do this? Are...
[Read More]
Batch effects in single cell RNA sequencing
Part 1 - Diagnosing batch effects from UMAP
With the ever increasing number of single cell transcriptomics data sets available, people are wanting to do combined analyses more and more frequently. Of course, the first thing that happens when people do this is that data from different samples, labs, and experiments don’t “mix well”. The identification of poor...
[Read More]