48 Replicate RNA-seq experiment
I and other members of the team have talked about this work at meetings over the last 18 months, but today the first of three (hopefully four) papers about a 48 biological-replicate RNA-seq experiment from my group (www.compbio.dundee.ac.uk), the Data Analysis Group (www.compbio.dundee.ac.uk/dag.html), and collaborators Tom Owen-Hughes (http://bit.ly/1PkCBjH), Gordon Simpson (http://bit.ly/1JobrGZ) and Mark Blaxter (http://bit.ly/1GXtC8M) was submitted to a journal and posted on arXiv (http://arxiv.org/abs/1505.00588). The data generated for this experiment has also been submitted to ENA and should be released in the next few hours.
Clearly, referees will have things to say about our manuscript, but I thought it was worth writing a brief summary here of the justification for doing this work and to provide somewhere for open discussion.
Paper I: The paper submitted today, deals with the statistical models used in Differential Gene Expression (DGE) software such as edgeR and DESeq as well as the effect of “bad” replicates on these models.
Paper II: Will be on arXiv in the next day or so, and benchmarks the most popular DGE methods with respect to replicate number. This paper leads to a set of recommendations for experimental design.
Paper III: Is in preparation, but examines the benefits of ERCC RNA spike-ins to determine concerted shifts in expression in RNA-seq experiments as well as estimating the precision of RNA-seq experiments. There will be an R-package accompanying this paper.
The main questions we were aiming to answer in this work when we started it 2 years ago were:
- How many replicates should we do?
- Which of the growing number of statistical analysis methods should we use?
- Are the assumptions made by any of the methods in (2) correct?
- How useful are spike-ins to normalise for concerted shifts in expression?
The aim with the experimental design was to control for as many variables as possible (batch and lane effects and so on) to ensure that we were really looking at differences between DGE methods and not being confused by variation introduced elsewhere in the experiment. This careful design was the result of close collaboration between us, (a dry-lab computational biology group), Tom Owen-Hughes’ yeast lab at Dundee, and Mark Blaxter’s sequencing centre at Edinburgh.
This experiment is probably the highest replicate RNA-seq experiment to date and one of the deepest. I hope that the careful design means that in addition to our own analysis, the data will be useful to others who are interested in RNA-seq DGE methods development as well as the wider yeast community.