Of the remaining unannotated transcripts, 1,670 and 1,873 had ORFs encoding at least 20 amino acids by longest ORF search. Amino acid length was widely distributed, the mean and median were 125 and 77 amino acids in the shoot, and 123 and 74 in the root. We used the G test with how to order a 1% FDR and identified 213 and 436 differentially expressed Cufflinks transcripts. Even though the lengths of Cufflinks transcripts were not completely identical between shoot and root, at least 55 differentially expressed transcripts were common to the two tissues. In response to salinity stress, 5 and 13 unannotated transcripts were upregulated. These unannotated transcripts encoded, for example, proteins similar to indole 3 glycerol phosphate lyase and gibberellin 2 beta dioxygenase. Of the other differentially expressed genes, Root CUFF.
256193. 0 was upregulated, it encoded pro teins similar to MSL2. For a complete list of unannotated transcripts see Additional file 3, Table S3. Comparison of sequence based and array based technologies for gene expression profiling Our sequence based gene expression profiling was vali dated against array based technology. First, signal intensity and RPKM from the same RNA materials were compared. These two independent measures of transcript abundance were correlated, especially at moderately high signal intensities. However, the correlation was not as strong at extremely high signal intensities, suggesting that the array signal intensity was saturated but the RPKM was not. Next, the ratios of differentially expressed genes were compared.
The ratio obtained from the array and the cor responding ratio obtained from RPKM was highly corre lated over a broad range. The histogram was highest at log21, suggesting that most genes were expressed evenly both before and 1 h after salinity stress. However, a few discrepancies were found, increased changes in the expression of 17 genes were found by using the array, but not by using mRNA Seq, conversely, increased changes in the expression of 7 genes were found by using mRNA Seq, but not by using the array. To further examine these discrepancies, we used quantitative real time poly merase chain reaction. The qRT PCR results suggested that most of the former discrepancy was due to technical inaccuracy in the array experiments. However, qRT PCR supported only three of the seven mRNA Seq data in the latter discrepancy.
Despite these discrepancies, our sequence based approach was generally valid as a gene expression profiling technol ogy for use with previously annotated genes. Discussion Estimation of variation and abundance of whole transcripts Brefeldin_A in rice How many reads are required to cover whole transcripts in the rice cell As the number of reads increased, the cumulative coverage approached a plateau. We summed four technical replicates.