Drosophila pseudoobscura genome annotation from FlyBase Release 1.04. See http://flybase.net/annot/dpse_release1.04.txt Date 20041216 DATA CONTENTS Feature count ------------------------------------------------------------ CDS 9946 chromosome_arm 2 gene 12197 gene:genewise 18179 gene:genscan 17182 gene:twinscan 18330 golden_path 21 golden_path_fragment 2649 intron 7953 mRNA 9946 mRNA:genewise 18179 mRNA:genscan 17182 mRNA:twinscan 18330 match:blastn:na_dbEST.dpse 34611 match:blastz 34576 orthologous_region 12179 source 23 supercontig 2802 syntenic_region 1258 ------------------------------------------------------------ Note on features: Largest assembly units are chromosome_arm = 2, 3 golden_path = 4_group1, 4_group2, 4_group3, 4_group4, 4_group5, XL_group1a, XL_group1e, XL_group3b, XL_group3a, XR_group3a, XR_group5, XR_group6, XR_group8, XR_group9 U The 'U' chromosome is the unordered collection of contigs which are not assigned to other units. These are artifically ordered by ID in one collection for presentation purposes. Do not be misled by sequence positions which are not valid outside the contig. Computed genes are assigned names with 'GA' prefix for this species, with an FBgn ID gene id. See FlyBase document NOMENCLATURE FOR ANNOTATION-BASED GENE SYMBOLS IN DROSOPHILIDAE Computational features are :genewise, :genscan, :twinscan and match: Data are from Postgres Chado database, release 1.04, 20041216, Copy at ftp://flybase.net/genomes/Drosophila_pseudoobscura/dpse_r10_20041216/pgsql/ BULK FILE SET See ftp://flybase.net/genomes/Drosophila_pseudoobscura/current/ blast/ - NCBI blast database set for selected fasta/ feature sets. dna/ - contains dna raw format files per chromosome-arm fasta/ - dna and protein data per chromosome and feature type; gff/ - GFF v3 standard feature files per chromosome gnomap/ - Gnomap standard feature files per chromosome (drive genome map views) These last two contain chromosome locations of above listed features ------------------------------------------------- COMPUTATIONAL ANALYSIS OVERVIEW Date: Wed, 27 Oct 2004 14:22:38 -0400 (EDT) From: Peili Zhang Here's a brief description of what FlyBase has done in the comparative analyses of the pseudoobscura (abbreviated as dpse below) genome against melanogaster (abbreviated as dmel below) and in generating the first computation-derived version of the dpse genome annotation. We started off by mapping the locations of the putative orthologs on dpse genome relative to dmel. To achieve this goal, we first selected one protein isoform per gene from the dmel annotation, then ran TBLASTN against the dpse WGS contigs using the selected dmel protein set as query. From this exercise, we derived the locations of putative orthologs on dpse genome for more than 12,000 dmel genes. The putative ortholog locations were further confirmed or modified when we took into account the synteny information of the genes on dmel genome. Finally we generated the syntenic blocks between dmel and dpse and further extended the blocks using the blastz HSPs between the two genomes. To generate the first version of the dpse genome annotation computationally, we created a gene feature at each of the putative ortholog positions. In addition, three gene predictors, Twinscan, Genscan & Genewise, were run independently on the dpse genome. After semi-automatic filtering of the predictions to retain only one gene prediction for each locus, most of the predicted proteins are the reciprocal best hits to their dmel counterparts. Next, we checked for the overlap between the dpse gene features created for each of the putative orthologs derived from TBLASTN and the gene predictions, and attached a predicted gene model to each of ~90% of the gene loci on dpse genome. This completes the generation of the dpse genome annotation release 1.0, which is now publicly available through Genbank. The links to the CON records can be found at the bottom of the WGS project master record AADE00000000. Please note that only the gene models annotated on dpse genome have been submitted to Genbank. The orthologous regions and sytenic blocks data derived primarily from TBLASTN, the blastz HSPs, the alignments of dpse ESTs onto the dpse genome and the unfiltered gene predictions from all three predictors etc. will be publicly available on official FlyBase web site (http://www.flybase.org)