Welcome to the MapMan Family of Software Forum

Please do not hesitate to register and post your question.

Don't forget to subscribe to your posted message so you get notified on updates.
Every question you post will help others and or enhance the software!

Post a question, post a bug!

Welcome to the MapMen Family of Software Forum Welcome to the MapMen Family of Software Forum

Using MapMan

Grouping biological repeats for DEseq using the RobiNA GUI

RSS (Opens New Window)

Richard John Barker

Rank: Youngling

Posts: 7

Join Date: 1/29/13

Recent Posts

Grouping biological repeats for DEseq using the RobiNA GUI

edger deseq normalisation

1/29/13 1:59 AM

I have RNAseq data as counts per gene/isoform/cds/tss and would like to use the RobiNA GUI to normalise these files using DESeq (and edgeR). I want to compare my different time points post treatment back to the oringal control samples, which can be done easily using Ctrl and click. But unfortunately I'm not sure how to define which files are biological repeats and should be grouped together in order to improve the statistics. Thanks in advance for any advice... Best wishes, Richard

Marc Lohse

Rank: Jedi Master

Posts: 271

Join Date: 6/30/09

Recent Posts

RE: Grouping biological repeats for DEseq using the RobiNA GUI

1/29/13 9:48 AM as a reply to Richard John Barker.

Hi Richard,

biological replicates are defined in the library configuration step as follows:
First, you have to define the different treatments you have performed in
your experiment. For example if you have a cold shock treatment and a
control treatment both performed on WT and a mutant organism you would
enter sth like WT_control, WT_cold, MUT_control and MUT_cold

Then you can move on to defining which input fastq file contains reads from which
sample. Let's assume you have done 3 biological replicates for each treatment. You
would have 12 samples: 3 WT_control, 3 WT_cold etc.

To properly enter this design, you would choose the treatment from the drop-down
list below the list of input files, then you would select one files that contains the
reads obtained from one sample e.g. WT_control replicate 1. In case the data
for one sample has been split into several files (e.g. when the same library was
sequences several times to obtain a sufficient amount of reads) you would CTRL-click
to select all of these files.

Then you add the sample to the sample list by clicking the "add" button. A blue box
with all the files representing one sample will appear in the list below the treatment
drop-down list.

Repeat this until all files have been assigned to samples of treatments and you're
ready to move on to the next step.

In the end you should see 12 boxes - one for each sample of the 4 conditions of your
experiment.

I hope this answers your question - if not please do not hesitate to ask again!

cheers,
Marc

Richard John Barker

Rank: Youngling

Posts: 7

Join Date: 1/29/13

Recent Posts

RE: Grouping biological repeats for DEseq using the RobiNA GUI

1/29/13 6:29 PM as a reply to Marc Lohse.

Hi Marc

Thanks for the swift reply.

Is there anyway to assign biological rep's when the data is input as a Tab delimited text file? I have already done the fastq alignment and generated read counts using other commercial and achedemic software packages (My virtual machine containing RobiNA didn't have enough RAM so failed to complete). So i'm currently just trying to use the RobiNA GUI to normalise the RNAseq counts i already have... So is there anyway to assign biological rep's at the normalisation stage using Tab delimited text files?

The first row in my Tab delimited text file looks like this....
ID 0GSRep1 0GSRep2 0GSRep3 5GSRep1 5GSRep2 5GSRep3 20GSRep1 20GSRep2 20GS Rep3 150GSRep1 0MSRep1 0MSRep2 0MSRep3 5MSRep1 5MSRep2 5MSRep3 20MSRep1 20MSRep2 20MSRep3 150MSRep1

In addition, i also have proteomic data that i would like to normalise, do you know if this will work?

Thanks again, Richard

Marc Lohse

Rank: Jedi Master

Posts: 271

Join Date: 6/30/09

Recent Posts

RE: Grouping biological repeats for DEseq using the RobiNA GUI

1/30/13 10:57 AM as a reply to Richard John Barker.

Hi Richard,

when importing a counts table, the column names in the header have to adhere to
a format that tells RobiNA which column contains the data of which replicate sample
and condition.

This sounds more complicated than it is - in your case, i guess, you'd just have to
(make a copy of your original file and then) rename the column headers to sth like:

0GS_1 0GS_2 0GS_3 5GS_1 5GS_2 5GS_3 20GS_1 20GS_2 20GS _3 150GS_1 0MS_1 0MS_2 0MS_3 5MS_1 5MS_2 5MS_3 20MS_1 20MS_2 20MS_3 150MS_1

... the general column header format for the counts table to be imported properly would be
CONDITION_REPLICATE

I want to make one comment, though. I am not exactly sure what you mean by
normalizing the data. Using RobiNA, you can normalize the data to be expressed
as RPKM values (quite similar to the FPKM values computed by cufflinks). For this
to work properly, however, RobiNA would need the lengths of each gene - and these
are not provided when importing a counts table. RobiNA also offers the possibility
to do GC content bias normalization of the data - again, this only works if the
GC contents of each gene are known. The GC contents are computed from the
reference transcriptome or genome... when importing a counts table, the
reference mapping step is skipped because the reads have already been mapped
to a reference. So the GC content information is not available and hence this
correction step will not be available.

In any case - the RPKM values are not used for the assessment of differential gene
expression. They are just generated as auxiliary information.

You state that you would like to use DESeq or edgeR - which implicates that you would
like to infer differentially expressed genes. Both of these methods do not perform
(and require) RPKM - normalization steps. Even if there is a gene-specific effect, this
effect will be the same across all samples (after correcting for differences in library size)
for this gene and hence becomes irrelevant when the task is to assess differential gene expression.

(For more details, please refer to the excellent edgeR and DESeq user guides:
http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf
)

So - if you want to do differential expression analysis, you will be able to use RobiNA
after renaming the counts table column headers.

If you want to normalize the count data because you intend to de e.g. a correlation
analysis across all genes and conditions, you can RobiNA if you feed it with the raw
fastq files and rerun the mapping to a reference transcriptome/genome.

I hope this answers your question,
cheers,
Marc

Richard John Barker

Rank: Youngling

Posts: 7

Join Date: 1/29/13

Recent Posts

RE: Grouping biological repeats for DEseq using the RobiNA GUI

1/30/13 5:57 PM as a reply to Marc Lohse.

Hi Marc

Thanks, that's great the new headers worked a treat and my biological repeats were neatly fused.

Unfortunately when i then started the DESeq analysis with my current data set i got the following error
I've checked my data there aren't any decimal places, there are some 0 values, or alternatively could it be that my isoform names have decimal places? e.g. isoform name in column A = AT1G01020.1

Loading required package: Biobase
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following object(s) are masked from 'package:stats':
xtabs
The following object(s) are masked from 'package:base':
anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
rownames, sapply, setdiff, table, tapply, union, unique
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: locfit
locfit 1.5-8 2012-04-25
Warning message:
package 'locfit' was built under R version 2.15.1
Error in if (any(round(countData) != countData)) stop("The countData is not integer.") :
missing value where TRUE/FALSE needed
Calls: newCountDataSet
Execution halted

Initally i'm mainly interested in finding the differentially spliced genes but I am also trying to realign my fastq files using the RobiNA system to get RPKM values but i had issues with the power of my PC last time i attempted this.

Marc Lohse

Rank: Jedi Master

Posts: 271

Join Date: 6/30/09

Recent Posts

RE: Grouping biological repeats for DEseq using the RobiNA GUI

1/31/13 9:24 AM as a reply to Richard John Barker.

Hi Richard,

i fear the problem you are experiencing is something that i cannot easily
help you with remotely... if you could send me the counts table that you
are trying to import i will examine the problem on my machine.
Naturally i will use your data only for analysing the problem and delete
the file as soon as that's done.

If you decide to send the file, please mail it to lohse<at>mpimp-golm.mpg.de

cheers,
Marc

Marc Lohse

Rank: Jedi Master

Posts: 271

Join Date: 6/30/09

Recent Posts

RE: Grouping biological repeats for DEseq using the RobiNA GUI

2/1/13 10:08 AM as a reply to Marc Lohse.

Hi Richard,

i got your file, thanks. It turns out that, when importing your counts table into R,
NA values are generated in the table. This causes the estimateGLMCommonDisp
function in edgeR to fail on the data set. Deeper investigation shows that it is
four values in the same row with the AGI code AT5G42560.2.

In the raw counts table this row (#31364) is actually missing values for the last four columns
which is actually conceptually wrong - if the transcript was not detected in these
samples, the count should be given as zero.

If you correct the file and set the value of the four last columns to "0"
the file loads fine and the analysis finishes successfully.

best greetings,
Marc

Richard John Barker Rank: Youngling Posts: 7 Join Date: 1/29/13 Recent Posts	RE: Grouping biological repeats for DEseq using the RobiNA GUI Answer 2/4/13 8:58 PM as a reply to Marc Lohse. Hi Marc Thanks! That's brilliant it works perfectly, i have no idea how those empty cells appeared but now they're gone it works perfectly... Can't wait to put the rest of my data through RobiNA. Best wishes, Richard
	Top

Marc Lohse Rank: Jedi Master Posts: 271 Join Date: 6/30/09 Recent Posts	RE: Grouping biological repeats for DEseq using the RobiNA GUI Answer 2/5/13 10:43 AM as a reply to Richard John Barker. I have added a line to the R code that will prevent this error from breaking the analysis - it will be included in the next release of RobiNA which is due in the first half of this year. cheers, Marc
	Top

Marc Lohse

Rank: Jedi Master

Posts: 271

Join Date: 6/30/09

Recent Posts

RE: Grouping biological repeats for DEseq using the RobiNA GUI

Answer

2/5/13 11:21 AM as a reply to Marc Lohse.

Hi again,

I inspected the other file you had sent me and found a mix of formats that confounded the
data import into the R engine. The first line of the counts table looked like this:

AT1G01010.1 980 488 410 "1,225" 472 458 "1,044" 542 353 969 930 460 333 953 339 458 "1,022" 514 353 "1,017"

as you can see there is a mixture of plain numbers and quoted numbers with a comma
as thousands separator. The quotes make the import routine interpret the field as text
so that the counts table is, in the end, imported as a mix of text and numerical data. In
addition to this, even when trying to convert the quotes strings to numbers, the comma
could be interpreted as a decimal point (in germany, the comma is actually the standard
decimal separator that kids learn to use in school). I have removed all quotes and commata
from the file, assuming that "1,225" was actually 1225 and after that the file imports smoothly.

I hope this helps,
cheers,
Marc