-F.Chang, unpublished). A total of 1,054 additional reactions and three shatter libraries were necessary to close gaps and to raise the quality of the finished sequence. Illumina http://www.selleckchem.com/products/Imatinib(STI571).html reads were also used to correct potential base errors and increase consensus quality using a software Polisher developed at JGI [32]. The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Illumina and 454 sequencing platforms provided 199.5 �� coverage of the genome. The final assembly contained 697,305 pyrosequence and 20,331,123 Illumina reads Genome annotation Genes were identified using Prodigal [33] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [34].
The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes – Expert Review (IMG-ER) platform [35]. Genome properties The genome consists of a 4,888,353 bp long chromosome with a GC content of 33.8% (Table 3 and Figure 3). Of the 4,347 genes predicted, 4,285 were protein-coding genes, and 62 RNAs; 122 pseudogenes were also identified. The majority of the protein-coding genes (59.5%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.
Table 3 Genome Statistics Figure 3 Graphical circular map of the chromosome. From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew. Table 4 Number of genes associated with the general COG functional categories Insights from genome sequence A closer look on the genome sequence of strain IC166T revealed a set of genes which might be responsible for the yellow-orange color of C. algicola cells by encoding enzymes that are involved in the synthesis of carotenoids. Carotenoids are produced by the action of geranylgeranyl pyrophosphate synthase (Celal_1770), phytoene synthase (Celal_2446), phytoene desaturase (Celal_2447), lycopene cyclase (Celal_1771) and carotene hydroxylase (Celal_2445).
Geranylgeranyl pyrophosphate synthases Cilengitide start the biosynthesis of carotenoids by combining farnesyl pyrophosphate with C5 isoprenoid units to C20-molecules, geranylgeranyl pyrophosphate. The phytoene synthase catalyzes the condensation of two geranylgeranyl pyrophosphate molecules followed by the removal of diphosphate and a proton shift leading to the formation of phytoene.