Projects

Release 2 Notes

October 2000

ANNOTATED GENOMIC SEQUENCE

RELEASE 2

The annotated D. melanogaster sequence was first released on March 24, 2000, and constitutes Release 1 of the genomic sequence. Approximately 330 of the gaps in that sequence have now been filled. Some annotations have been corrected or added by Celera/BDGP, but no annotations have yet been corrected by FlyBase. Celera/BDGP recently submitted this new annotated sequence to GenBank as Release 2. Because FlyBase/BDGP will continually update the sequence and annotations on approximately a six month cycle, there will be future releases (see RELEASE 3). Multiple versions of sequence and annotations present organizational challenges both to the public databases and to BDGP, and will probably cause confusion. We will try to make it easy to distinguish among the various releases.

[NOTE: The orientation of the Release 2 scaffolds is supposed to be in the correct orientation left-right across the chromosome, whereas the orientation in Release 1 was often random.].

For now, Release 2 is available only at the data libraries (NCBI, EBI, DDBJ) but not on the BDGP or FlyBase web sites, and not through the "Drosophila genome" BLAST database at the NCBI [NOTE: Release 2 is now available at both BDGP and FlyBase and NCBI's "Drosophila genome" database has Release 2 data]. In the future, both Release 1 and Release 2 versions of our GadFly annotation database will be available at the BDGP and FlyBase web sites. Users must be certain to check the Release number of any genomic sequence or annotation. Version numbers appear after the accession number, for example:

Date Release GenBank Version
March 2000 Release 1 AE003452.1
October 2000 Release 2 AE003452.2

If the genomic sequence did not change between March and October, GenBank has retained the .1 version number but changed the date to October, for example:

Date Release GenBank Version
March 2000 Release 1 AE003650. 1
October 2000 Release 2 AE003650. 1

Links from FlyBase/BDGP Release 1 pages (e.g., from GadFly annotation report pages) to accession numbers at NCBI may go to the Release 2 sequence, though they should go to Release 1 sequence. We are working on fixing this. [NOTE: This is fixed now.] In the future, accession number links from FlyBase/BDGP Release 1 pages will go to Release 1 sequence at the data libraries (NCBI, EBI, DDBJ), links from FlyBase/BDGP Release 2 pages will go to Release 2 sequence, and so on. Links from FlyBase gene reports will go to the most recent release. You can always query at NCBI using the accession with version number.

Release number will appear prominently at the top of each GadFly query and report page, and also at the download sites for sequence and XML-formatted annotations. Please make a note of the release number you are working with.

Because of limited resources, certain analyses (for example, the mapping of P element insertions) performed on the Release 1 data will not be repeated on Release 2. However, the Release 1 results will always be accessible, and we will repeat these analyses for Release 3.

[NOTE: Until recently we had very little evidence for the Release 2 annotations, and much more for Release 1, so Release 1 has been the default. However, we have performed BLAST and interpro analysis of the Release 2 sequence and will soon make Release 2 the default, as it is at NCBI. ].

Some Statistics Comparing the Releases:
Number of genes: 13991 in Release 1
13744 in Release 2
Number of peptides: 14080 in Release 1
14332 in Release 2
Number of unchanged peptide sequences in Release 2: 13218
Number of changed peptide sequences in Release 2: 748
Number of new transcripts in Release 2: 336
Number of transcripts deleted/changed name: 114

RELEASE 3

The BDGP is currently finishing the genomic sequence to high quality (Phase 3) and FlyBase/BDGP is reannotating this finished sequence to create Release 3, which will gradually be deposited in GenBank during 2001. Release 3 will provide improvements in annotation and sequence quality relative to Release 2, and will include the corrections submitted by the public in error reports.

TRANSPOSABLE ELEMENTS

As a result of the whole genome shotgun assembly, the sequence of each transposon in Releases 1 and 2 is a consensus derived from a number of elements of that transposon type. The extent of the consensus varies among the transposons depending on the length of the traces that run from unique sequence into the transposon. The sequence is most often not the actual sequence of the particular transposon at that location. Users are warned not to base too much on any analysis of these transposable element sequences.

As we finish the sequence to high quality, we will attempt to replace these consensus sequences with the actual sequences present at each location in the y; cn bw sp strain. This corrected sequence will be found in Release 3.

QUESTIONS?

Thank you for your patience while we devise an appropriate way to manage the problems presented by the new releases and transposons. Please address any questions to bdgp@fruitfly.berkeley.edu