Annotation of novel genomes is a complex problem . Efforts at automated annotation of molluscan genomic sequences have demonstrated the challenge facing the future selleckchem DAPT secretase annotation of cephalopod genomes. Long branch lengths within the phylum, the taxonomic distances to well annotated animal genomes, and the relatively low quantity of previous molecular and genetic work in the Mollusca will demand the generation of additional resources to assist and train automated gene detection programs. Of primary importance will be the generation of transcript inventories to identify genes, refine gene models, detect start points and intron-exon boundaries, and train automated gene identification algorithms. Transcriptome data such as those from RNAseq are quick and relatively inexpensive to generate, and will be immensely useful.
Systematic sequencing of nervous system tissues and embryonic stages can be combined with relatively early-stage assemblies to generate gene models and exon structures. In addition, pairs of Octopus species (O. vulgaris and O. bimaculoides) and Idiosepius species (I. notoides and I. paradoxus), through comparative sequence analysis, may be critical for annotation. Annotation efforts are labor-intensive but also offer an opportunity to grow the cephalopod research community and attract outside expertise. For example, domain experts of particular gene families or pathways can be recruited to assist in the description of likely protein function.
Bioinformatics researchers interested in the problems of annotation across long phylogenetic distances, the assessment of unique gene families and the evolution of biochemical novelty, and the likely challenges of extensively RNA-edited transcriptomes, will also be enlisted. Finally, annotation provides an outreach opportunity to involve young scientists and K-12 classrooms in cutting-edge scientific discovery on these fascinating organisms. Data sharing plan An important goal of the CephSeq Consortium is to share data rapidly and effectively both within and beyond the Consortium. Data sharing is necessary to foster the broadest possible impact of our sequencing and annotation efforts. This sharing will prove critically important for the cephalopod community. We expect sequence homology within the taxon to be an important foundation for collaboration within the field because cephalopods have evolved many new and unique character features.
Sharing data prior to publication could significantly accelerate cephalopod research. However, data sharing policies must Carfilzomib also recognize that there is significant publication, funding, and career recognition risks involved in making data available before publication: often the first to publish a particular observation garners the most recognition. Broad data-sharing agreements such as the Ft. Lauderdale agreement  have already been adopted by the international genomics community, and, most significantly, by many large sequencing centers.