Genetic nomenclature for Drosophila melanogaster

The rules for the genetic nomenclature of Drosophila melanogaster have evolved over the last 85 years or so. This document is a statement of these rules, as adopted by FlyBase. These rules are based on those published in Lindsley and Zimm (1992), The genome of Drosophila melanogaster (Academic Press).

This document is a guide to the nomenclature of Drosophila melanogaster. Although much of the existing nomenclature conforms to these rules, some does not. Past practice, and the continued existence of names and conventions that clearly flout these rules (even in FlyBase), is not an excuse for bad future practice. Now that the Drosophila database is kept in electronic form, a consistent and non-redundant nomenclature is of special importance. The nomenclature now used by FlyBase will evolve towards these standards. For internal reasons FlyBase sometimes differs from or extends current nomenclatural standards. These differences or extensions are explained in this document.

Advice on nomenclature can be obtained from FlyBase:

FlyBase, Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA. Telephone (+1) 617 496 5668, Fax (+1) 617 495 9300, e-mail flybase-help at morgan harvard edu.

In particular, the FlyBase consortium welcomes the opportunity to give advice on the naming of genes, alleles, aberrations and transgene constructs. We will undertake checks for users, so as to avoid conflicts of names and symbols.

1. Gene names and symbols

1.1. Names. Gene names must be concise. They should allude to the gene's function, mutant phenotype or other relevant characteristic. The name must be unique and not have been used previously for a Drosophila gene (see paragraph 9). The name should be inoffensive.

A gene can have only one valid FlyBase name and symbol. All other published symbols for a gene are recorded in FlyBase as synonyms (see section 9. Valid Symbols & Synonyms below for an explanation of how valid symbols are determined).

On occasion FlyBase is wholly unable, on the basis of a publication, to assign any meaningful name to a gene, or putative gene. Then, FlyBase will give the gene the name and symbol anon-, with some distinguishing suffix. If and when further information becomes available this name will be changed to something more meaningful, keeping the anon- name stored as a synonym.

1.1.1. Case of initial letter. The name begins with a lowercase letter when the gene is named for a mutant phenotype recessive to the wild-type in a normal diploid.

The name begins with an uppercase letter when the gene is named for a mutant phenotype that is dominant to the wild-type in a normal diploid.

Genes named after a protein product or other molecular feature begin with an uppercase letter.

1.1.2. Genes named for RNAs. Genes for tRNAs have names of the form tRNA:XN:m, where X is the 1-letter amino-acid code (in upper-case) (IUPAC-IUB, 1969, J. Biol. Chem. 243(13): 3557--3559), N is a number signifying the particular isoform and m is, preferably, a cytogenetic map position followed by a lower case letter, e.g., tRNA:S7:23Ea, tRNA:S7:23Eb for the two different serine-7 tRNA genes that map to polytene chromosome region 23E.

Genes named for small-nuclear RNAs have similar names, i.e., snRNA:n:m, where n is the type of snRNA and m signifies a cytogenetic map position, with a distinguishing final letter should more than one similar class of snRNA gene map to the same polytene chromosome lettered subdivision; e.g., snRNA:U6:96Aa, snRNA:U6:96Ab.

By historical convention the gene encoding the major ribosomal RNAs is called bobbed (bb).

Annotated RNA genes will also have CR prefix synonyms (see 1.1.3.).

1.1.3. Genes identified by genomic sequencing projects. Large scale sequencing projects use a variety of prediction methods to identify genes. Some of these genes are already known, others are new. Prior to the whole genome shotgun (WGS) sequencing effort of Celera Genomics, each gene identified by the European and Berkeley Drosophila Genome Projects was given a formal name consisting of three components, a prefix (BG: for 'Berkeley' genes, EG: for 'European' genes), a clone name and an integer, for example, BG:DS07851.5 and EG:152A3.3. Genes identified by STS and EST sequences were named with a prefix to indicate the project (E for European, B for Berkeley), either EST: or STS: (to indicate a cDNA or genomic sequence), and a clone name and a suffix to indicate from which end of the clone the sequence was determined (T for T7 promoter, S for SP6 promoter). For example, ESTS:4C4T is a gene named for a European STS determined from the T7 promoter of cosmid 4C4. BEST:CK00010.c is an example of a gene named for a Berkeley EST sequence cluster. STS sequences that show no sequence matches are not named as genes.

The prefix CG (by agreement, for Computed Gene, although annotation of CG's is not limited to computational methods) followed by an integer (no colon) was used for genes identified during the annotation of the WGS sequence, for example, CG1427 and CG1749. Gene symbols based on mutant phenotype, molecular feature or determined function take precedence over CG symbols, and CG symbols take precedence over BG: and EG: symbols. When a gene previously identified by a CG symbol is renamed by virtue of mutant phenotype, molecular feature or determined function, the new symbol should be explicitly related to the corresponding CG symbol, allowing the CG symbol to be made a synonym of the new gene symbol.

In Release 1 and 2 of the WGS sequence annotation, only protein-coding genes were annotated, and CGnnnn identifiers were assigned to genes, CTnnnn identifiers to transcripts, and pp-CTnnnn identifiers to peptides. In Releases 3.0 and 3.1, non-protein-coding genes, such as tRNAs, snRNAs, snoRNAs, microRNAs, miscellaneous non-coding RNAs, and pseudogenes, were assigned identifiers of the form CRnnnnn. Transposable elements were given TEnnnnn identifiers. Transcripts were assigned identifiers composed of the gene identifier followed by a suffix -RX; e.g., CG12345-RA, CG12345-RB. For peptides, the -RX suffix is replaced by a -PX suffix, with the second identifying letter always in agreement with that of the corresponding transcript; e.g., CG12345-PA, CG12345-PB. In Release 3.2, the CG symbols were replaced with valid FlyBase gene symbols where available. For example, CG8094, CG8094-RA, and CG8094-PA become gene Hex-C, transcript Hex-C-RA, and protein Hex-C-PA, and CG8094 became a synonym. The Release 1 and 2 CT identifiers are now obsolete, and there is no mapping between CT identifiers and the Release 3 CGnnnn-RA identifiers, although in most cases the CT identifier has become a synonym of the gene (in some cases, a Release 2 gene corresponds to more than one Release 3 gene, e.g. if exons were redistributed or split between two new Release 3 genes).

1.2. Drosophila prefix. A prefix to indicate that the gene is from Drosophila, e.g. D, Dm, Dmel or Dro is redundant and is, therefore, not used. If it is necessary to draw a distinction between a melanogaster gene and that of another organism which would otherwise have the same symbol, Dmel\ should be used as the preferred prefix, in line with the principle used to denote species other than melanogaster within FlyBase.

1.2.1. Genes from species other than D. melanogaster. FlyBase includes genes from all species of Drosophilidae plus genes from other families that have been introduced into Drosophila (see section 3.2.2). For species other than Drosophila melanogaster, the valid gene symbol follows a species abbreviation indicating the species of origin. The prefix has the form Nnnn\, where N is the initial letter of the genus (i.e., D for Drosophila or Dettopsomyia) and nnn is a unique code, usually the first three letters of the species name (e.g., sim for D. simulans). A list of valid species abbreviations is available on FlyBase.

The valid gene symbols for other Drosophila species, wherever possible, should be identical to their Drosophila melanogaster homologues. Exceptions to this recommendation include cases where the D. melanogaster name incorporates a polytene chromosome location and hence is only of relevance to melanogaster, and cases where the symbol in the other species has already been used to refer to a different gene.

Outside of the family Drosophilidae, the valid gene symbol for the species of origin of the gene should be used and should respect the capitalization rules for the wild-type gene of that species.

All gene symbols should be italicized, regardless of the nomenclature rules for the species of origin.

1.3. Common prefixes. One of a number of common prefixes may be used in the names of genes that fall into one of the following classes (where n designates the chromosome, m a distinguishing symbol and a a gene whose phenotype is modified by an enhancer or suppressor):

enhancer
e(a)m, E(a)m

female sterile fs(n)m, Fs(n)m

lethal l(n)m

male sterile ms(n)m, Ms(n)m

male/female sterile mfs(n)m, Mfs(n)m

maternal mat(n)m, Mat(n)m

meiotic mei

Minute M(n)m

mitotic mutant mit(n)m, Mit(n)m

mutagen sensitive mus

'Polygene' PL(n)m

resistance rst(n)m, Rst(n)m

suppressor su(a)m, Su(a)m

'tumor' tu(n)m (i.e., genes controlling production of melanotic pseudotumors)

1.4. Lethals.

1.4.1. For lethal mutations, if a specific phenotype or a specific gene product can be associated with a lethal locus, then that phenotype or product should be used for the name of the gene; this is not prefixed by lethal, l(n). Otherwise, the general term 'lethal' (l(n)m) is applied, until analysis of the gene allows a more informative name to be assigned.

1.4.2. Lethals not named for a specific phenotype are named according to the lettered subdivision of the polytene map that they occupy. Separate lethal loci within the same subdivision are differentiated by lower case letters, e.g., l(1)1Aa and l(1)1Ab, etc. for lethals in region 1A. When a lettered subdivision has more that 26 lethal complementation groups, l(1)1Az will be followed by l(1)1Aaa, l(1)1Aab, etc. If no polytene mapping information is available then the gene is given an arbitrary code, e.g., l(3)SG44. When more information becomes available such genes will be renamed.

1.5. Common series. Genes of similar function may be given names that are only differentiated by a suffix. Preferably, this should be a polytene chromosome position, e.g. Actin-5C, Actin-42A, Actin-57B etc. Lower case letters are to be used to distinguish different genes mapping within the same chromosome subdivision that encode similar proteins (e.g., nicotinic Acetylcholine receptor alpha 96Aa, nicotinic Acetylcholine receptor alpha 96Ab). A similar system is employed for Minutes, except that, for historical reasons, their names include in a chromosome designator in parentheses, e.g., M(3)62A.

1.6. X vs. 1 for the X chromosome. The symbols X and 1 for the X chromosome are synonymous. The symbol 1 is preferred in formal description of genes, aberrations and their symbols.

1.7. Symbols. A symbol is assigned to each gene. This symbol is an abbreviation of the name that uniquely designates the gene in question; it combines brevity with information.

1.7.1. A symbol must be unique. A symbol previously used for a gene, but now considered to be a synonym, should not be re-used for a new gene (see section 9. Valid Symbols & Synonyms).

1.7.2. Symbols should not contain spaces, superscripts or subscripts. They should only contain characters from the following set:

a-z A-Z 0-9 : - ( )

The : character is only used in special contexts (e.g., in the symbols of genes named after RNAs, in mitochondrial gene symbols and, as ::, in the symbols of protein fusion genes). The ( and ) characters are only used in compound symbols, e.g., where they bracket a chromosome designation. The use of Greek, or other non-roman letters, is discouraged. The character / is reserved for separating homologues in genotypes and is not supported by FlyBase as a component of any symbol.

An exception to the use of superscripts in symbols is when an allele name is an integral part of a gene name, e.g., su(w^a).

1.7.3. When a gene name has a suffix, e.g., Actin-5C, then the same suffix is added to the symbol, Act5C. Hyphens are not used in symbols, except to separate numbers or letters which, if strung together, would lose their descriptive content.

1.7.4. Mitochondrial genes. Genes encoded by the mitochondrial DNA should all have the prefix mt:. For example, the gene encoding subunit 4 of the mitochondrial NADH dehydrogenase has the symbol mt:ND4, that encoding the mitochondrial leucine tRNA with the UUR anticodon is mt:tRNA:L:UUR. The symbol MT:DNA is used to represent the entire mitochondrial genome.

2. Allele names and symbols

2.1. Superscripts. Alleles at a particular gene are designated by the same name and symbol and are differentiated by distinguishing superscripts. In written text the allele designation may be separated from that of the gene by a hyphen, e.g., white-apricot.

2.2. Symbols. Allele symbols should be short, preferably no more than three characters long, and cannot contain spaces, superscripts, or subscripts. Whenever possible superscript characters should be limited to the following set:

a-z A-Z 0-9 - + : .

The + symbol is reserved for the wild-type allele. Consecutive allele numbers should be used wherever possible.

Greek characters may be used but are discouraged.

The character \ is reserved in all gene symbol contexts for species identification.

The character / is reserved as a homologue separator in genotypes and cannot be used in allele symbols.

In text in which superscripting is not possible, such as ASCII files, superscripted text should be enclosed between the characters [ and ].

FlyBase makes exceptions to the brevity rule when recording in vitro mutagenesis constructs that are represented with alleles. Where these are not otherwise named FlyBase confers symbols according to a system including the initial of the last name of the first author of the first paper in which the allele was initially reported ('I' in the following examples). The most frequently used classes include:

cIa	for 'construct a of Author-lastname'
Scer\UAS.cIa	for 'S. cerevisiae UAS construct a of Author-lastname'
tIa	for 'transgene a of Author-lastname'
mIa	for 'minigene a of Author-lastname'
hs.PI	for 'heat shock construct of Author-lastname'
gene_symbol.PI	for 'gene promoter fusion of Author-lastname'

In addition, exceptions have been required for some large series of alleles and collections of mutations. Nevertheless, brevity of allele symbols is very much to be encouraged.

2.2.1 It is unacceptable to use, as a superscripted allele symbol, elements of the genotype in which the allele arose, since such a designation implies something more than a trivial connection between allele and element. Alleles that are revertants of a pre-existing allele are an exception to this rule.

2.2.2. While historically, the numeral 1 has been the implied superscript of nonsuperscripted symbols, this practice has created considerable ambiguity and is now discouraged. As with all other alleles, the numeral 1 should be explicitly designated (e.g., sc¹, not sc).

2.2.3. For a recessive allele of a gene named as a dominant, or a dominant allele of a gene named as a recessive, the superscripts r and D, respectively, may be used; e.g., Hn^r, Hn^r2, and ci^D.

2.2.4. For a wild-type allele, a superscripted plus character may be used; e.g., b⁺ or B⁺. The plus symbol alone implies the normal (wild-type) allele or alleles in any context, such as y¹/+.

It may be necessary to distinguish among more than one 'wild-type' allele. In such cases the different wild-type alleles should be given a distinguishing number, which would follow the + character in the superscript, e.g., ry⁺³.

2.2.5. Absence of a particular locus may informally be noted by use of a superscript minus character with the symbol; e.g., bb^-. This is not acceptable as a designation of a particular allele.

2.2.6. Revertants or partial revertants of mutant alleles are designated by the superscript rv followed by a distinguishing number; these are placed after the allele designator, e.g., D^4rv32, the 32nd revertant of D⁴. Revertants of dominant mutations that are deficiencies are treated not as alleles but as deficiencies and are accordingly not superscripted but listed with the distinguishing number, e.g., Df(2L)Sco^rv4.

2.2.7. Alleles specifying the absence of a particular enzyme or other protein are designated by the superscript n (null) followed by a distinguishing number or letter, e.g., Adhⁿ¹, or, where lack of function is inviable, by l (lethal), followed by a distinguishing number, e.g., Nrg^l2.

2.2.8. An allele known to be mutant but whose specific identity is unknown is given an asterisk as an allele designation, e.g., w^*.

3. Transposons and Transgene Constructs

Transposons or transgene constructs integrated into the Drosophila genome, if they cause a mutant phenotype, are both alleles and aberrations (similar to other classes of aberrations that are associated with mutant phenotypes). Where such insertions produce no mutant phenotype, they are named purely according to aberration conventions. Where transposon/transgene insertions produce a mutant phenotype by disrupting an endogenous gene, they are given names both as an allele of the mutated endogenous gene and as an aberration. The name of the allele follows conventions outlined in section 2. Rules for naming natural transposons and transgene constructs and their insertion into the genome follow.

Generic naturally occurring transposons are symbolized as ends{}, where ends stands for the symbol of a given transposon, such as P for P-element. Doc{}, copia{} and P{} are examples. A defined natural variant of the transposon family can be named by including a symbol for that name inside the brackets. A specific insertion of a given transposon is described by including an additional unique symbol following the brackets.

Insertions of natural transposons annotated as genome sequence features also have synonyms of the form TEnnnnn, for example, copia{}910 has the synonym TE20021.

Symbols for constructed transposons, or transgene constructs, must always include a construct symbol, which defines a particular construct. A full transgene construct genotype consists of the source of transposon ends, included genes, construct symbol, and insertion identifier, in the form ends{genes=construct-symbol}. Once defined, ends{construct-symbol} (or less formally, construct-symbol alone) can be used in most circumstances to refer to a specific transgene construct. The symbol for a specific insertion of a given transgene construct has the form ends{construct-symbol}insertion-identifier. Further details are given in the sections that follow.

Some examples:

P{w^+mC ovo^D1-18=ovoD1-18}: the full genotype of the P-element transgene construct P{ovoD1-18}
P{ovoD1-18}13X6: a viable insertion of the construct P{ovoD1-18}
P{Scer\GAL4^wB w^+mW.hs Ecol\ampR Ecol\ori=GawB}: the full genotype of the transgene construct P{GawB}
P{GawB}h^1J3: an insertion of the construct P{GawB} that disrupts the h gene
H{w^+mC Ecol\ori Tn\kanR Ecol\lacZ^HZ50a=Lw2}: the full genotype of the hobo transgene construct H{Lw2}
H{Lw2}dpp^151H: an insertion of the transgene construct H{Lw2} that disrupts the dpp gene

This nomenclature is formally similar to that used for aberrations, where the ends{symbol} prefix is similar to the Df(n), Dp(n;m), etc., prefixes of aberrations, and the identifier suffix is similar to the gene^-allele suffix of aberrations with associated alleles, or the alphanumeric string suffix of other aberrations. Specific rules for assembling the components of a transgene construct genotype follow.

3.1. Transposon ends. Pairs of terminal repeats which together form a transposon are symbolized by opposing braces, {}. The source of the transposon ends is indicated outside the braces, at the left end of the string by a symbol derived from the name of the transposon family:

P	=	P-element
H	=	H-element (hobo)
I	=	I-element
M	=	mariner-element
Mi	=	Minos-element

3.1.1. Isolated terminal repeats are indicated with the family symbol followed by 3' or 5', e.g., P5' represents the isolated 5' end of a P{} transposon.

3.1.2. Multiple sets of matched transposon ends are indicated by nesting ends{} symbols, e.g., P{I{neo[RT]W[+]}}. A P transgene construct containing ry^+t7.2 and an isolated hobo terminal repeat from the 5' end of a hobo element would be described as P{ry^+t7.2 H5'}.

Formally, this system can be extended to any insertion of mobile DNA, for example, the copia, gypsy and FB elements. Thus, the ct^MR2 mutation, caused by the insertion of a gypsy element, is called gypsy{}ct^MR2. When a mobile element inserts into a mutant gene already carrying a mobile element, it is the new insertion that is named. For example, a jockey insertion into ct^MR2 generates ct^MRpD, this is called jockey{}ct^MRpD. The name describes the new insertion which has caused the new phenotype. A full genotype description, including all sets of transposable element ends, is only provided when the progenitor allele is also fully described.

FlyBase uses this nomenclature not only because of its rigor, but also because its more general use may be needed if such elements are engineered.

3.2. Included genes. A full transgene construct description lists within the braces all functional genes, including non-Drosophila genes such as antibiotic resistance genes, bacterial and phage origins of replication, and the FLP1 recombination target (FRT), separated by spaces. The left-right order of these elements reflect their 5' to 3' order (with respect to the transposon ends) within the construct. If the order of a gene is unknown, it is placed at one end of the list, followed or preceded by a comma.

3.2.1. Drosophila melanogaster genes. Valid gene symbols are used to name D. melanogaster genes. Wild-type alleles of intact genes are indicated by a superscripted '+t' followed by an identifier, e.g., ry^+t7.2 or Adh^+t3.2. A convenient identifier (used in these examples) is the size of the genomic fragment carrying the wild-type gene. Transgene-construct-borne genes that do not confer wild-type function are given unique allele designations without the preceding '+t', e.g., ftz^B or y^D225. Replacement of promoter or other control sequences can be indicated in the allele designation: dpp^hs.PP, e.g., for a dpp gene controlled by a heat shock promoter.

3.2.2. Species of origin. Species of origin is indicated for non-melanogaster Drosophila genes present in transgene constructs. A species code composed of the first letter of the genus (capitalized) and a three letter code, usually the first three letters of the species (lower case) is added to the gene symbol with a separating backslash, e.g., Dvir\Dfd^+t7.6 for the wild-type Deformed gene from Drosophila virilis (see paragraph 1.2.1).

For genes from species other than those of Drosophila the valid gene symbols are used following a four-letter symbol, as above, indicating the species of origin, e.g., Hsap, for humans, Gdom, for chicken, Hsim, for Herpes simplex, Ecol for E. coli etc. For viruses, the name or abbreviation, e.g., Abelson, Adeno5, Cmeg, or symbolic name, e.g., T4, M13, the greek symbol lambda, is sometimes used instead of a genus-species-derived four-letter symbol. In all cases, these symbols are separated from the gene symbol by a backslash \. A file of these species abbreviations is available on FlyBase.

FlyBase considers transposable elements, the mitochondrial DNA and other similar entities to be species (this is because each can contain several different genes). It is for this reason that, for example, the P-element Transposase has the symbol P\T in constructs.

3.2.3. Fusion genes. Fusion genes are defined (by FlyBase) as the fusion of protein coding regions of distinct genes constructed by in vitro mutagenesis. They are named using the gene symbols of their component parts, separated by a double colon, e.g., Antp::Scr or Act88F::Scer\act1 .

The order of gene symbols stated in the fusion gene will be alphabetical. The complexity of these constructs is such that were each to be named according to its molecular composition, for example in the 5' to 3' direction, the number of named fusion genes would rapidly become impractical.

An exception to the 'alphabetical order' rule will be made for cases where the fusion is between a D. melanogaster and a non- melanogaster gene. In such cases the melanogaster gene symbol will be stated first, e.g., tra2::Hsap\SFRS2.

For historic reasons, some promoter fusions involving reporter genes such as Ecol\lacZ, though technically protein fusions, are simply treated as alleles of Ecol\lacZ. The symbol for the additional gene(s) contributing to the fusion indicated as part of a superscript, e.g., Ecol\lacZ^P\T.A92. In these special cases there is no distinction made between promoter fusions and protein fusions in the gene name.

3.2.4. Modified genes. Modified genes, cDNAs and in vitro mutagenized sequences are treated as alleles, and will be curated by FlyBase as such. They should be named, therefore, by the same conventions used to name classical alleles. The following allele symbols have been assigned by FlyBase to the commonly used modified genes of D. melanogaster:

w^+mC: The mini-white gene constructed by Pirrotta (1988) by deleting the Hin dIII- Xba I fragment from the long 5'-intron of the w⁺ gene. Carried by Casper plasmids and their derivatives.
w^+mW.hs: The mini-white gene constructed by Klemenz et al. (1987). Carried by the W6, W8 family of plasmids and their derivatives.

Genes modified by the addition of a tag allowing the product to be identified, marked or purified represents a special class of modified genes. Tags are used to mark a transcript, e.g., with a piece of M13 DNA allowing the transcript to be identified by in situ hybridization. Tags are also be used to mark a protein, for purposes of purification (e.g., (His)₆), for purposes of identification (epitope tags) or for purposes of targeting to a cellular compartment (nls tags). FlyBase considers as tags constructs designed for these purposes and curates these modified genes as alleles of the tagged gene. Tagged genes have symbols with the format 'T:y' where T stands for Tag and y is the species\gene symbol of the tag, e.g., T:Hsap\Myc, T:Ivir\HA1, T:Hsap\p53, T:Zzzz\His6 (the Zzzz 'species' prefix is used when the tag is artificial).

A complete list of tagged gene symbols and their definitions is available from FlyBase through the Genes query form. Change the 'Species' option from the default 'Dmel' to 'All'. Type 'T:*' (don't use the quotation marks) in the 'Symbol/synonym (case insensitive)' field and submit the query.

3.3. Construct symbol. Every construct must be assigned a symbol which, in conjunction with the description of the terminal repeats, uniquely describes a transgene construct, for example, P{lacW}, H{PDelta2-3}. Symbols must be unique, but should be kept as short as possible.

3.3.1. Full genotype. In the full genotype of a transgene construct, the construct symbol is the final entry within the braces, separated from the final gene symbol by the equal sign, e.g., P{lacZ^P\T.W w^+mC ampR ori=lacW} is the full genotype of P{lacW}.

3.3.2. Short form and partial genotypes. Once defined, a transgene construct can be referred to by either the transgene symbol, e.g., P{lacW} (or, less formally, lacW), or the symbol plus insertion identifier (see below) in most contexts. Additional components can be added as needed for clarity. For example, in stock genotypes it is preferable to include the visible markers, as in P{w^+mC=lacW}th^j5C8 or P{w^+t11.7 ry^+t7.2= wA}3-1, to avoid misunderstandings about the expected phenotypes of the flies.

3.4. Insertion identifier. The right-most position of the transgene symbol, outside the outer-most bracket, is reserved for a string that identifies a specific insertion into the genome of the defined construct. There are four cases to consider for naming insertions.

3.4.1. Insertion hits a known gene. When a mutant phenotype associated with a transgene construct insertion is assigned to a known gene, the insertion-induced allele should be named by the normal rules. Since such insertions cause new alleles, the gene-^allele description is used as the identifier of the associated insertion (just as with other alleles identified as aberrations). For example, a P{lacW} insertion referred to as l(2)k05007 and then shown to be an allele of CycE becomes P{lacW}CycE^k05007. Insertion-induced alleles in stock genotypes should include the aberration name of the construct, i.e., P{lacW}CycE^k05007. In most other circumstances the insertion aberration prefix can be dropped and the mutation referred to in the usual way, in this case, CycE^k05007.

3.4.2. Insertion defines a new gene. Often insertions cause a phenotype that cannot be associated with any known gene. In that case the insertion defines the first allele of a new gene, which is named by the normal rules, e.g., P{lacW}Trf¹.

3.4.3. A mapped insertion with no phenotype. If an insertion has no phenotype but is mapped to the polytene chromosomes, then it is preferable to use the polytene chromosome subdivision to which it maps as its identifier, e.g., P{bw⁺L}60B. If a similar construct already has this name then that of the new one would be P{bw⁺L}60B-2 or similar.

If the insertion is not mapped then there is no alternative but to give the insertion an arbitrary number or code, e.g., P{A92}A45. This symbol must be unique and as simple as possible using only characters from the set:

a-z A-Z 0-9 -

4. Cytogenetic descriptions

Breakpoints should be according to the revised salivary gland chromosome maps published by C. B. and P. N. Bridges (see Lindsley and Zimm, 1992), except for chromosome 4, where the map of Sorsa (Chromosome maps of Drosophila Vol. II, CRC Press, 1988) should be used.

4.1. Range designations. For the location of a single object (breakpoint of aberration, gene position, site of transposon insertion, etc.) the range is given as "(d1)(S1)(b1)-(d2)(S2)(b2)", where:

d	=	numbered division (1 to 102)
S	=	lettered subdivision (A to F)
b	=	band number (1 to n, depending upon the particular subdivision)

For ranges not known to the accuracy of a band, see paragraph 4.5.

If the range encompasses two different numbered divisions (i.e., d1 does not equal d2), then the full designations for both the left end and the right end of the range will be used, e.g., 32A3-33A2.

If the range is within a single numbered division (i.e., d1=d2) but within different subdivisions (i.e., S1 does not equal S2), then the numbered division designation is not repeated to the right of the hyphen, e.g., 32A3-D4.

If the range is within both the same single numbered division and the same lettered subdivision (i.e., d1S1=d2S2), then neither the division nor the subdivision designation will be repeated, e.g., 32A3-5.

If a location is known to a single band, then the location will be given as (d1)(S1)(b1) with no hyphen and no repetition of the band location, e.g., 32A3.

If a location is known to a single doublet, then the location will be given as (d1)(S1)(b1)-(b1+1) where (b1) and (b1+1) represent the two succeeding bands of the doublet, e.g., 32A1-2.

If only one end of a location range is within a doublet, the location will simply refer to the band number maximizing the range, e.g., 32C1-D5 will be used, not 32C1,2-D5 and 32B4-C2 will be used, not 32B4-C1,2.

It is sometimes necessary to represent interbands in data curated by FlyBase. Interbands have the same symbol as the immediately preceding band, with the suffix symbol +. The interband between the Bridges' bands 3A4 and 3A5 is, therefore, represented as 3A4+.

4.2. Telomeres. Telomeres are designated by nAt, where n is a chromosome number, A is the chromosome arm, and t indicates the telomere:

1Lt	=	the telomere of the left arm of X
1Rt	=	the telomere of the right arm of X
YLt	=	the telomere of the long arm of Y
YSt	=	the telomere of the short arm of Y
2Lt	=	the telomere of the left arm of 2
2Rt	=	the telomere of the right arm of 2
3Lt	=	the telomere of the left arm of 3
3Rt	=	the telomere of the right arm of 3
4Lt	=	the telomere of the left arm of 4
4Rt	=	the telomere of the right arm of 4

If the telomere is of unknown origin, use:

undefined telomere

4.3. Centromeres and centric heterochromatin. Centromeres are designated as ncen, where n indicates the chromosome, i.e.,1cen, Ycen, 2cen, 3cen and 4cen.

4.3.1. Centric heterochromatic blocks will be indicated as hn, where n is a consecutive number.

4.4. Composite chromosome architecture. The designations of the chromosomes, including polytene band ranges, heterochromatic blocks and centromeres are:

YLt h1 -- h17 Ycen h18 -- h25 YSt
1Lt 1A1 -- 20F4 h26 -- h32 1cen h33 -- h34 1Rt
2Lt 21A1 -- 40F7 h35 -- h37 h38L 2cen h38R h39 -- h46 41A1 -- 60F5 2Rt
3Lt 61A1 --- 80F9 h47 -- h52 h53L 3cen h53R h54 -- h58 81F1 -- 100F5 3Rt
4Lt h59 -- h61 4cen 101F1 -- 102F8 4Rt

Note that the centromeres of chromosomes 2 and 3 lie within heterochromatic bands h38 and h53 respectively. Some heterochromatic bands, (h25, h42) are divided into two (h25A, h25B, h42A, h42B) in some stocks.

4.5. Accuracy of cytological descriptions. In designating cytological position, the level of accuracy of the determination should be reflected in the specificity of the statement.

Some examples should make these distinctions clear. Note that the polytene subdivision described here, 77B, has 9 bands.

Case 1 - High level of uncertainty about subdivision location:: If the observer thinks that the location of a rearrangement breakpoint might be in 77B but could also possibly be in 77A or 77C, then the position should be reported as 77A-C.
Case 2 - Low level of uncertainty about subdivision location:: If the observer's best estimate is that the true breakpoint position is very likely to be in 77B, then the observer should report the position as 77B.
Case 3 - No uncertainty about subdivision location:: If the observer is absolutely certain that the location is within 77B, then the location should be reported as 77B1-9.

5. Chromosome aberrations

Chromosome aberrations have names that consist of a prefix, indicating the class of aberration, an indication of the chromosome, or chromosomes (or their arms) involved contained within parentheses and a specific designation which identifies the particular rearrangement.

5.1. General principles for naming aberrations.

5.1.1. Aberrations not named after a gene: The suffix (i.e., the component of the name following the parentheses) should include only letters and digits. There should be no superscripts or subscripts except for the particular cases of synthetic inversions with L and R superscripts (see 5.4.4). They should not contain spaces. The characters ( and ) are only to be used to enclose the designation of a chromosome or chromosome arm.

5.1.2. Aberrations named after a gene but not associated with an allele: Here the association with the gene carries circumstantial information about the aberration's breakpoints. The suffix should comprise the gene symbol, followed by a hyphen if needed for clarity, followed by any alphanumeric of the investigator's choosing. There should be no superscripts.

5.1.3. If a gene whose symbol appears in an aberration changes its name, e.g., for reasons of newly-discovered allelism, then this name change is propagated to the aberration(s) in question. The old name will become a synonym.

5.1.4. Aberrations named for a specific associated allele: Here the suffix should be exactly the same as the allele designation, i.e. the gene symbol followed by the superscripted allele symbol. If the allele designation (either gene or allele part) changes, that change will be propagated to the aberration.

5.2. Translocations.

5.2.1. Translocations have the symbol T(n1;n2...)m, where n1, n2 ... indicate the numbers of the chromosomes involved in the translocation.

When chromosomes are listed within the parenthetical information of a translocation symbol they are listed in the order: 1, Y, 2, 3, 4. The numbers of the different chromosomes are separated by semicolons, with no spaces.

5.2.2. The separable components of translocations.

Previous conventions for naming such aneuploid segregants have been difficult to employ and do not contain sufficient information in the derivative name to permit automated recognition of the relationship between aneuploid segregant and euploid progenitor.

FlyBase will employ the following conventions for different classes of euploid chromosomal aberrations and their aneuploid derivatives.

5.2.2.1. Translocation segregants. Translocations, standardly named T(n1;n2)m, consist of two or more translocated chromosomes, each of which can potentially exist as an aneuploid segregant. Such segregants will be named using telomeres of the rearranged chromosomes as landmarks for specific segregants. Two-break translocations are often called reciprocal translocations if two chromosome segments have simply been exchanged.

The general form of the name of a segregant will be Ts(n1Pt;n2Qt)m. Ts stands for 'Translocation segregant"' n1Pt and n2Qt for the designation of the landmark telomere(s) (e.g., 2Lt, 3Rt) and m is the same suffix as the progenitor translocation from which the segregant is derived.

Example 1: Two-break reciprocal translocation. No ambiguity about the locations of either breakpoint relative to the centromere.: T(2;3)rg35 (= T(2;3) 27E-F;62C2-D1); The two aneuploid segregants are therefore named:
Example 2: Three-break reciprocal translocation. No ambiguity about the locations of any breakpoint relative to the centromere.: T(1;2;3)OR9 (= T(1;2;3)19-20;49F;81F); The three aneuploid segregants are accordingly named:

5.2.2.2. Complex segregants and recombinants. For many complex translocations or inversions with four or more breakpoints, multiple aneuploid segregants or recombinants can potentially occur. It is impossible to invent a naming scheme for these complex cases that would automatically reveal the specific aneuploid chromosome complement. In such instances, resulting aneuploids will be given appropriate names as follows:

The first duplication or deletion is assigned the unique suffix of the parental euploid rearrangement. The new order of the resulting chromosome must be reported.

Succeeding duplications or deletions are assigned other unique suffixes. Their new orders must also be reported.

5.3. Rings. Ring chromosomes have the symbol R(n)m , where n indicates the number of the chromosome and m is a specific designation.

5.4. Inversions.

5.4.1. Inversions have the symbol In(nA)m, where n indicates the number of the chromosome involved, A the arm or arms involved and m is a specific designator.

In the case of multiple-break intrachromosomal rearrangements, the distinction between inversions and transpositions often becomes ambiguous. An intrachromosomal rearrangement that can be partitioned into a duplicated and a deficient product by exchange with a normal-sequence chromosome is designated a transposition even though it may carry an inverted segment; otherwise, it is designated an inversion.

5.4.2. If it is not known whether or not an inversion is paracentric (does not include the centromere) or pericentric (includes the centromere) then the indicator of chromosome arm(s) is omitted, i.e., In(n)m.

5.4.3. By convention, In(1) implies In(1L).

5.4.4. Recombinant products between two inversions. Recombination between similar inversions may produce viable recombinant inversions with the left end of one and the right end of the other. Superscripts L and R are used to identify the sources of the two ends; for example; In(2L)Cy^Lt^R.

5.5. Transpositions. Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type. Rearrangments that alter the pairing of telomeres are classified as translocations.

5.5.1. Transpositions have the symbol Tp(n1;n2)m, where n1 is the 'donor' chromosome, n2 the 'recipient' chromosome and m a specific designation. For intrachromosomal transpositions n1 = n2.

5.5.2. Separable components of transpositions.

5.5.2.1. Interchromosomal transpositions. Segregants of interchromosomal transpositions will continue to be referred to as in the past. For a transposition with the name Tp(n1;n2)m, the chromosome segregant containing the duplicated material will be named Dp(n1;n2)m, and the chromosome containing the deleted material will be named Df(n1A)m, where A refers to the chromosome arm of the deletion.

Example: Tp(3;1)kar^5l (= Tp(3;1)87C7-D1;88E2-3;20): The two aneuploid segregants are:; Dp(3;1)kar^5l (= 1Lt-20|87D1-88E2|20-1Rt); Df(3R)kar^5l (= 3Lt-87C7|88E3-3Rt)

5.5.2.2. Intrachromosomal transpositions. Segregants here are produced by recombination with a structurally normal chromosome, not by chromosome segregation. For transpositions in which the transposed segment is in the uninverted orientation relative to the standard map, there may be two potential duplication and two potential deletion derivatives (one set resulting from recombination events in the region between the deficiency and duplication components of the transposition, and one set resulting from recombination events within the transposed segment). For transpositions of the type Tp(n1;n1)m, the reported duplication segregant will be named Dp(n1;n1)m and the new order must be reported to eliminate any ambiguity. Similarly, the reported deletion recombinant is referred to as Df(n1A)m, where A refers to the chromosome arm bearing the deletion. In rare cases in which the alternative duplication or deletion recombinant (generated by recombination within the transposed segment) is also reported, it will be given a different suffix from the progenitor transposition and the new order will be reported.

Example: Tp(3;3)Dl^II13 (= Tp(3;3)88F5-9;91A3-8;92A2): The primary aneuploid recombinants would then be:; Dp(3;3)Dl^II13 (= 3Lt-92A2|88F9-91A3|92A2-3Rt); Df(3R)Dl^II13 (= 3Lt-88F5|91A8-3Rt)

If subsequently, the other deletion or duplication recombinant is generated, it will be given a novel suffix, perhaps completely unrelated to the progenitor, e.g.:

Df(3R)xxx (= 3Lt-91A3|92A2-3Rt)
Dp(3;3)xxx (= 3Lt-88F5|91A8-92A2|88F5-3Rt)

5.6. Deficiencies (deletions).

Deficiencies (deletions) have the symbol Df(nA)m, where n is the number of the deleted chromosome, A is the chromosome arm and m is a specific designator.

Intragenic deletions are not treated as deficiencies, but as alleles; at least two adjacent loci must be removed or disrupted before a lesion is considered a deletion.

5.7. Duplications.

Duplications have the symbol Dp(n1;n2)m, where n1 is the 'donor' chromosome, n2 the recipient and m a specific designator; n1 may equal n2.

Duplications may be: tandem (in direct or inverted order), insertional or free. Direct and inverted tandem duplications are not distinguished by their symbols. Ambiguity must be avoided by explicit description of the new order (see section 9. Valid Symbols & Synonyms).

5.7.1. When the duplicated sequences are carried as a free centric element, the letter f (free) follows the semicolon within the parentheses, replacing n2; e.g., Dp(1;f)101.

5.7.2. Higher order repeats. Higher-order repeats are also symbolized Dp, with the number of repeats indicated in the parenthetical chromosomal designation, i.e., Dp(1;1) = duplication, Dp(1;1;1) = triplication, and so forth.

5.8. Y derivatives. In the past many Y chromosome derivatives (e.g., marked- Y chromosomes) were named in a rather special way, as m1Ym2 , where m1 is a marker (or markers) carried on YL and m2 a marker (or markers) carried on YS. Such chromosomes should be named as duplications, following the normal rules. Thus a y⁺Y is Dp(1;Y )y⁺ and Ymal⁺ is Dp(1;Y)mal⁺.

5.9. Autosynaptic elements. A pericentric inversion can be converted to two reciprocal autosynaptic elements by recombination between the inverted segment and a normal homolog. For a pericentric of the type In(nLR)m, the two autosynaptic products are LS(n)m and DS(n)m, where LS refers to the product carrying the two left (L = levo) telomeres and DS to that carrying the two right (D = dextro) telomeres. Chromosome elements of very similar structures to autosynaptic elements can be recovered by other means; by convention, these are also called autosynaptic elements if autosynaptic elements were used in their recovery.

5.9.1. In stocks, autosynaptic elements must be carried as balanced pairs; their symbols are then separated by a double slash thus, LS(n)m1//DS(n)m2. In the special case where the two members of such a balanced pair are reciprocal recombinant products (e.g., LS(n)m1//DS(n)m1) then such a genotype can be called AS(n)m1.

5.10. Compound chromosomes.

Compound chromosomes may be subdivided into two classes, homocompounds, consisting of two copies of the same chromosomal arm attached to a common centromere, and heterocompounds in which two arms from different chromosomes are connected through the centromere of one of them. They are designated by the symbol C followed parenthetically by the designation of the involved chromosome arm or arms.

In stock genotypes, the linkage relationship of markers on compound chromosomes is indicated with a colon, e.g., C(4)RM-P2, ci¹ ey^R: gvl¹ svⁿ.

5.10.1. Homocompounds. Homocompound chromosomes are classified according to relative orientation of their arms (i.e., tandem, reversed or ring) and the position of their centromeres (i.e., acrocentric or metacentric): reversed acrocentrics (C(n)RA), reversed metacentrics (C(n)RM), reversed rings (C(n)RR), tandem acrocentrics (C(n)TA), tandem metacentrics (C(n)TM), and tandem rings (C(n)TR), where n is a the number of a chromosome or chromosome arm. In each case the symbol is followed by a specific designator, separated by a hyphen.

5.10.1.1. When the component arms differ in sequence by something other than whole-arm inversion, the tandem or reversed classification becomes ambiguous. Furthermore, when the component arms are separable from each other by a single break, the terms acrocentric and metacentric are descriptive; however, when elements of the two arms become interspersed (as for example by interarm rearrangements), these terms lose meaning. Consequently, the more-complex compounds are given arbitrary symbols.

5.11. Heterocompounds. Heterocompound chromosomes have the symbol C followed by the chromosome or arms involved within parentheses, e.g., C(1;Y), C(2L;3R). The chromosomal origin of the centromere in such compounds is frequently ambiguous. It is usually necessary to describe the structure of any given heterocompound in some more detail, by its new order. The distinction between some heterocompound chromosomes and whole-arm translocations can be moot.

5.12. Free chromosome arms.

The term 'free' is used with respect to the left and right arms of the major autosomes, and to the long and short arms of the Y chromosome, when an arm exists as an individual chromosome element. The symbol for a free arm is: F(nA)m, where n = Y, 2 or 3, A = L, R, or S and m is a symbol (note that L indicates Left for the X chromosome and autosomes, but Long for the Y chromosome). In practice, all free arms carry some chromosome material from another chromosome arm or element.

5.13. Complex rearrangements.

Occasionally an author must report an aberration whose cytology is either ambiguous or cannot (with existing knowledge) be described within one of the usual classes of aberration. These aberrations should be named according to the format Ab(N1;N2;..)identifier or, when associated with a named allele, Ab(N)gene^allele. Ab stands for Aberration, N represents the chromosome(s) or chromosome arm(s) that are known to be involved. If one or more of these cannot be identified then a ? symbol is used. If one break is heterochromatic but no further identification is possible then h is used. Examples are: Ab(3R)faf^BX9 and Ab(3L;h)ME178.

The availability of the Ab prefix is only for the last resort, and should not be used without very good reason. If further information becomes available allowing a more formal description of a complex aberration then the Ab symbol should be replaced and relegated to synonymy.

5.14. Combinations of rearrangements.

The elementary categories of chromosome aberrations are not mutually exclusive, and some aberrations combine several of them. In such cases the symbol used should be the one most relevant to the anticipated value of the aberration, such as Df for a deficient translocation that was generated in a screen for deficiencies. When no preference exists, the symbol used is the one that stands highest in the following ranking: T > interchromosomal Tp > R > In > intrachromosomal Tp > Dp > Df. This is especially so when the components are inseparable.

FlyBase uses the following verbal definitions for classes of three-break aberrations:

Deficient translocation: A translocation in which one of the four broken ends loses a segment before re-joining, e.g., T(1;3)ct^268-21.
Deficient inversion: Three breaks in the same chromosome; one central region lost, the other inverted, e.g., In(1)N^264-108.
Inversion-cum-translocation: The first two breaks are in the same chromosome, and the region between them is rejoined in inverted order to the other side of the first break, such that both sides of break one are present on the same chromosome. The remaining free ends are joined as a translocation with those resulting from the third break, e.g., T(1;2)C324.
Bipartite duplication: The (large) region between the first two breaks listed is lost, and the two flanking segments (one of them centric) are joined as a translocation to the free ends resulting from the third break, e.g., Dp(1;2)K1.
Cyclic translocation: Three breaks in three different chromosomes. The centric segment resulting from the first break listed is joined to the acentric segment resulting from the second, rather than the third, e.g., T(1;2;3)OR14.
Bipartite inversion: Three breaks in the same chromosome; both central segments are inverted in place (i.e., they are not transposed), e.g., In(3LR)BTD7.
Uninverted insertional duplication: A copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments, e.g., Dp(1;1)hdp-b2.
Uninverted insertional transposition: The segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments, e.g., Tp(1;1)B^263-48.
Inverted insertional duplication: A copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments, e.g., Dp(1;1)y^bl.
Inverted insertional transposition: The segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments, e.g., In(2R)C72.
Unoriented insertional duplication: A copy of the segment between the first two breaks listed is inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded, e.g., Dp(1;1)hdp-b4.
Unoriented insertional transposition: The segment between the first two breaks listed is removed and inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded, e.g., Tp(1;2)v⁺75d.

5.14.1. A complicated rearrangement may be separable genetically into its simpler component aberrations, which are usually sufficiently designated with the distinguishing symbol of the original aberration. When, however, the original is named after a phenotype associated with one of the component aberrations, designation of the other component with the symbol of the mutant is inappropriate.

5.14.2. A rearrangement superimposed upon another rearrangement may be given a name, which more often than not refers to the entire complex since the newly induced aberration is likely to be inseparable from the original; e.g., In(2LR)SM1 is a large pericentric inversion superimposed upon In(2L)Cy In(2R)Cy.

5.15. Balancers

Balancers can be described in one of three ways: by a complete genotype, by a short genotype or by a single symbol. For FlyBase purposes a single symbol is needed for every balancer variant. If a symbol is not reported for a new balancer variant FlyBase will assign one.

Balancer symbols should be concise, contain no spaces and should contain characters from the following set:

a-z A-Z 0-9 : - ( ) {}

Marked variants of classical balancers should be named beginning with the symbol of the parental variant followed by a hyphen followed by a concise distinguishing string, e.g., TM3-DZ.

Where new balancer variants are reported in the literature the authors' symbol for the variant, if provided, is used by FlyBase. Commas used by authors in publications may be transmuted into hyphens by FlyBase for purposes of making use of a genotype-like string that almost qualifies as a symbol. Likewise, when authors use [] to denote limits of an element insertion, these are transmuted into {} by FlyBase, to maintain consistency with other sections of the database. The use of invalid gene symbols and complete transposable element construct/insertion symbols in balancer symbols is discouraged.

As an alternative to the concise balancer symbol, balancers may be reported using balancer short genotypes, which combine the symbol of a classical balancer with new allele, aberration or transgene insertion symbols to define a unique balancer variant, e.g., TM3, ry^RK Sb¹ (= TM3-vKa).

Balancers may, of course, also be reported using a full balancer genotype that lists all aberration, allele and insertion symbols that comprise the unique balancer variant.

Any variant reported in the literature or donated to a stock center but not given a symbol by the authors is given the symbol 'parental_variant-vIa' and the name 'parent_variant-variant a of Initial (of first author last name)' by FlyBase, e.g., TM3-vKa for TM3-variant a of Karess .

6. The cytological description of aberrations

For all but the simplest two-break chromosome aberrations the explicit description of the new chromosome order is essential (see paragraph 4.5).

In descriptions of aberrations the cytological breakpoints of the aberration are listed after the symbol, the different items of chromosomal information being separated by semicolons without spaces. Cytological descriptions of new orders are always in roman type.

6.1. New order. The following conventions for specifying sequences of aberrations are to be adopted. The sequence of each chromosome involved in an aberration is specified from one end to the other according to salivary gland chromosome band terminology. Points of breakage and reunion are indicated by vertical bars, and segments between these points are designated by the most extreme band known to be present at each end, separated by a dash. Thus, the new order of

Tp(2;3)P (= Tp(2;3)58E3-F2;60D12-E2;96B5-C1): is represented as; 2Lt-58E3|60E2-2Rt; 3Lt-96B5|60D12-58F2|96C1-3Rt.

6.2. Ambiguities. Were the order of the inserted segment 60D12-58F2 not known, the segment would have been included within parentheses; i.e.,
3Lt-96B5|(58F2-60D14)|96C1-3Rt.

Hierarchies of ambiguities are represented by parentheses within parentheses.

6.3. Complex rearrangements. Breaks rejoin cyclically to produce chromosome aberrations (e.g., A with B and B with A) and multiple breaks may rejoin in one or more cycles. Thus four breaks may interact to form one four-break rearrangement or two two-break rearrangements. A complex rearrangement consisting of two or more simple cyclic rearrangements is indicated in the descriptive symbol; e.g.

T(1;2)OR72 (= T(1;2)19E;29F + In(2LR)24F;54B)
or
T(1;2)C314 (= T(1;2)5D;40-41 + T(1;2)9D;51D + T(1;2)20;56F)

New symbols are required if any of these components (or any new combination of these components) were to be derived separately.

6.4. Order of description. Information on new order is written as follows: each chromosomal element starts at the free end with the lower value and the elements are listed in ascending order, Y falling between 20 and 21.

6.5. Rings. Rings are differentiated from rod-shaped chromosomes by vertical bars at the beginning and end of the element; the circle is broken for linear designation at the breakpoint with the lowest numerical value; e.g., |1A4-20 1cen 20F-20A1| for R(1)2.

6.6. New orders of Y derivatives. The constitution of a Y fragment may be designated by listing its genetic elements in order with any ambiguities in order enclosed within parentheses, e.g., KL(bw⁺--ba⁺) Ycen bb⁺ KS. When there is a hierarchy of ambiguities in order, a hierarchy of parentheses is used, as in ((ci⁺--spa⁺)KL) Ycen bb⁺KS.

7. Naming genotypes

7.1. Gene separators. In designations of genotypes with several mutant genes, allele symbols of genes on the same chromosome are separated by spaces (e.g., y¹ w¹ f¹ B¹).

7.2. Homologue separators. Allele symbols of genes on homologous chromosomes are separated by a slash bar (e.g., y¹ w¹ f¹/B¹). The X and Y chromosomes are considered to be homologues for this purpose and the different genotypes of males and females are not usually made explicit. For example, Dp(1;Ybb^-)B^S/ y¹ car¹ describes a stock in which females are homozygous for the y¹ car¹ X chromosome, and males are hemizygous for y¹ car¹ and the B^S-marked Y chromosome. If desired, multiple genotypes in a stock can be fully described, using an ampersand (&) to separate the genotypes, e.g., y¹ car¹ & Dp(1;Ybb^-)B^S/ y¹ car¹.

It is convention to list allele symbols only once for a genotype that is homozygous for all of the mutations on a particular chromosome, i.e., y¹ w¹ f¹ implies y¹ w¹ f¹/y¹ w¹ f¹. If, however, any one of these mutations were to be heterozygous, then the mutant genotypes of each chromosome would be given, i.e., y¹ w¹ f¹/y¹ f¹.

It is convention to write genotypes with the maternally contributed chromosomes preceding those paternally contributed. For example, in the cross of cn¹/cn¹ females to cn⁺/cn⁺ males, the progeny genotype would be written cn¹/cn⁺; from the reciprocal cross it would be written cn⁺/cn¹.

7.3. Nonhomologue separators. Allele symbols of genes on nonhomologous chromosomes are separated by semicolons and spaces (e.g., bw¹; e^s; ey¹).

7.4. Chromosome descriptions.

7.4.1. In describing a chromosome, inclusion of several types of information is often desirable; e.g., arrangement and mutant allele content. Such categories are separated by a comma followed by a space; e.g., In(1)FM7, y^31d w^a v^Of B¹, which designates an X chromosome carrying the FM7 inversion, the recessive alleles yellow-31d, white-apricot and vermillion-of-Offermann, and the dominant allele Bar-1. Alleles are listed in the order of the standard genetic map irrespective of their order on the chromosome in question.

7.4.2. Description of the gene content of autosynaptic elements requires particular rules. Mutations mapping distal to the breakpoint are indicated after a comma that follows the name of the element itself; mutations mapping proximal to the breakpoint (i.e. within the heterosynaptic region and necessarily hemizygous) are indicated after a second comma; e.g., LS(2)m, b¹, cn¹ would be homozygous for b¹ but hemizygous for cn¹. If the status of a particular mutation is unknown, then its symbol is enclosed within ().

7.4.3. Mutant alleles on the different chromosomal components of translocations or interchromosomal transpositions are separated by a colon. The translocated chromosomes are separated from their homologues by a slash. For example: T(2;3)CyO-TM2, Cy¹ l(2)DTS513¹: Ubx¹³⁰/S¹.

In contrast with past practice the + character is not to be used to indicate the presence of more than one separable aberration on the same chromosome, i.e., In(2L)Cy In(2R)Cy is used, rather than either In(2L+2R)Cy or In(2L)Cy + In(2R)Cy.

7.5. Cross descriptions. It is a convention that when genetic crosses are described the female genotype is written to the left of the times symbol (x), and the male genotype to the right.

7.6. Uncertainty. Uncertainty of specific alleles, genes, and aberrations are all indicated in genotypes with an asterisk, e.g., w^* for a mutant allele of w when the specific allele is unknown, l(2)* for a lethal allele on the second chromosome when the gene is unknown, and C(1)* for a compound X chromosome when the nature of the attachment is unknown.

7.7. Nicknames. In a relatively few cases, FlyBase will support an alternative symbol for a genotype component, a nickname. Nicknames are supported when a simplified symbol is already in use by Drosophila workers and is more widely understood than the rigorous valid symbol. For example, Dp(2;2)Cam11 is a valid nickname for In(2LR)TE35B-226^LTE35B-4^R and w^67c23 is a valid nickname for Df(1)w67c23. Implementation of nicknames within FlyBase is still in progress and the distinction between nicknames and synonyms may not be evident in FlyBase reports.

8. Cytotype

It may be necessary to indicate the cytotype of a stock with respect to one or more systems of hybrid dysgenesis. We suggest that this is done by appending the indication of cytotype to the end of the stock description as a single letter code enclosed within <>. This symbol should be separated by the last component of the genotype by a comma, e.g., y¹ w¹ f¹, <P> would indicate a P-cytotype stock with these three markers. If more than one cytotype needs to be designated then these should be separated by a semi-colon, e.g., <P;I>.

9. Valid Symbols & Synonyms

9.1. Precedence. Cases arise in which the same gene, aberration or allele has received two or more names; other things being equal, the earlier-applied name is adopted.

The criterion for 'earlier' is date of publication (or, if this is in doubt, of acceptance of a paper by a journal). Publication in Drosophila Information Service, in FlyBase or, e.g., the abstracts of a Drosophila Research Conference, qualifies. Publication of a name or symbol in a Nucleotide Sequence Data Library (EMBL, DDBJ, GenBank) accession also qualifies.

9.2. Exceptions.

9.2.1. Exceptions to this rule are made in the case of lethals named after their cytogenetic location or lower-order conventions, e.g., lethals given arbitrary names. When, on further study, such a lethal is found to possess a characteristic phenotype that suggests an alternative descriptive name, it is valid to rename the gene.

9.2.2. Exceptions will also be made on a case-by-case basis by FlyBase where two genes, previously considered to be different, are found to be identical and where the younger name (which would normally be relegated to a synonym) has been much more used in the literature than the older.

9.3. Merges. When a name x is found to be synonymous with the name y (and y is the valid name by these criteria) then an allele z of x (i.e., x^z) will be renamed y^x-z, except when z is 1 and is the only mutant allele, in which case x¹ changes to y^x.

10. Representation of gene, allele and aberration names and symbols in text

10.1. Italic. Gene, allele, aberration and transposon/transgene-construct names and symbols are italicized in printed text.

10.2. Non-italic. When a full gene name or gene symbol is used to indicate phenotype, rather than genotype, then that name or symbol is printed in roman (non-italic) type; i.e., white indicates a genotype and white a phenotype.

10.3. Superscripts and subscripts. In ASCII text the characters [ and ] are used to enclose superscripted characters, and [[ and ]] used to enclose subscripts.

10.4. Cytogenetic terms. Cytogenetic designations are not italicized except when part of an aberration symbol.

10.5. Reserved characters. The following characters are reserved for special use in gene, allele, and aberration names and symbols or in genotypes:

\	reserved for use in symbols of genes from species other than D. melanogaster
/	reserved for use as a homologue separator in stock genotypes
{ }	reserved for use in transposon and transgene construct symbols
< >	reserved for use in transgene construct names and for cytotype designation in stocks
[]	reserved for indicating superscripts in ASCII text
[[ ]]	reserved for indicating subscripts in ASCII text
( )	reserved for use in compound gene names and symbols (e.g., l(1)) and for aberration symbols, and for the indication of ambiguous genotypes
;	reserved as a separator of chromosome (chromosome arm) numbers in aberration names and symbols, and to separate markers or aberrations on non-homologous chromosomes in stock genotypes
:	reserved for use in symbols of defined classes, i.e., transgene constructs, genes encoding special RNAs (tRNAs, snRNAs), fusion genes and mitochondrial genes, and, in stock genotypes, to indicate the association between markers on reciprocol components of translocations, or arms of compound chromosomes.

11. Gene products

11.1. Proteins. Generic protein products that are named for the gene may be symbolically designated in text by the gene symbol, but this symbol is all in roman capital letters. When the full gene name is used for protein, rather than the gene symbol, only the first letter of the name is capitalized. When the gene name or symbol are used as adjectives modifying 'protein' the rules for gene names and symbols apply. For example, the protein product(s) of the hedgehog gene could be correctly denoted as hedgehog protein, hh protein, Hedgehog, or HH.

There are no fixed rules for denotation of proteins not named for the gene. Abbreviations are sometimes fully capitalized, such as XDH for xanthine dehydrogenase, the product of the rosy gene, and are sometimes in mixed case, such as AChE for acetylcholine esterase, the product of the Ace gene.

11.2. RNAs. There is no convention for symbolically designating generic RNA products of genes in text.

Appendix A. The naming of genes encoding ribosomal proteins.

The naming of genes encoding ribosomal proteins in Drosophila has been somewhat chaotic in the past and the opportunity now exists to make a rationalization in view of the complete catalog of yeast ribosomal protein genes now available as well as a comprehensive review of eukaryotic ribosomal proteins (Wool et al. 1995). Although a universal system for naming ribosomal proteins in eukaryotes is yet to be agreed, there is general agreement of the principles. For this reason Andrew Lambertsson and FlyBase have worked together to establish a rational naming for Drosophila genes encoding ribosomal proteins. The principles we have adopted are these.

A: Protein of cytoplasmic ribosomes.

1. As far as possible genes encoding ribosomal proteins are named after the protein encoded, by homology with the ribosomal proteins of mammals, according to the system of Wool et al., (1995). The generic symbols to be used are RpS for genes encoding ribosomal proteins of the small subunit and RpL for those of the large. Thus the gene encoding the ribosomal protein S3 has the gene symbol RpS3 and the gene name Ribosomal protein S3.

2. For historical reasons some mammalian ribosomal proteins have very similar names, distinguished only by the suffix 'a' or 'A', for example, S3 and S3A (S3a), which are quite different proteins. For Drosophila gene symbols and gene names the suffix is a capital 'A'.

Some ribosomal proteins are encoded by duplicate genes in Drosophila. These genes are distinguished by a lower-case suffix, a, b, etc. Thus the two genes encoding the protein S14 are RpS14a and RpS14b.

3. Genes previously named as Minutes and now known to encode a ribosomal protein are renamed according to the ribosomal protein encoded. For example M(3)95A, encoding the S3 ribosomal protein, is now called RpS3. FlyBase preserves M(3)95A as a synonym.

4. Some genes named after specific mutant phenotypes encode ribosomal proteins. These genes are not renamed, but the corresponding 'formal' name is added as a synonym. For example, RpS2 is added as a synonym for sop (string of pearls). Genes named after gene products (Ape, Apurinic endonuclease, for example) are renamed according to the ribosomal protein encoded.

5. Some genes in Drosophila had been named as Ribosomal proteins but using non-standard numbering. A classic example is Rp49, which encodes the L32 ribosomal protein. These genes have been renamed.

6. There remain a few genes said to encode ribosomal proteins but whose identity is uncertain due to lack of available data. These have not been renamed, but will be if information becomes available. Known examples are Rp7/8, Rp21 and Rp34.

B: Proteins of mitochondrial ribosomes.

1. Genes encoding a protein of the mitochondrial ribosomes have the symbol prefix Rpm, followed by the 'name' of the protein. The gene names are for the form Ribosomal protein mitochondrial *, where * indicates the 'name' of the protein. Thus RpmL3, Ribosomal protein mitochondrial L3 would be the symbol and name for the gene encoding the mitochondrial L3 ribosomal protein. Exceptions are genes named after mutant phenotypes, e.g., tko, encoding the mitochondrial S12 protein. By the rules for cytoplasmic ribosomal proteins this does not need to be renamed, since tko refers to a mutant phenotype.

Updates.

Substantive changes made to this document since its presentation at the Atlanta Drosophila meeting in April, 1995, are noted here.

Version 2.01, April 25, 1995: The rules for naming fusion genes (para 3.2.3) have been changed.

Version 2.02, May 13, 1995: A new paragraph (7.7) on the naming of ambigous genotypes has been added.

Version 2.06, November 22, 1995: Corrections have been made to the examples of names of transposons to conform with current FlyBase practice. The list of 'honorary genes' has been updated.

Version 3.0, March 18, 1996: The symbol for complex aberrations has been changed from complex to Ab. The placement of the <> symbols indicating orientation of an FRT has been changed to conform with current usage. The colon is introduced as a separator of markers on reciprocal components of translocations and arms of compound chromosomes as a way of clarifying the relationships and expected behaviors of these elements in stocks. The list of 'honorary genes' has been updated. A table of contents has been added. Assorted small changes have been made in the document.

Version 3.01, March 29, 1996: The rules for naming genes identified by sequencing projects have been changed, and new FlyBase mirror sites have been added.

Version 3.02, August 7, 1996, corrects the explanation of < and > used to indicate FRT orientation. The list of 'honorary genes' has been removed.

Version 3.03, August 21, 1996, clarifies what constitutes a transposon symbol.

Version 4.0, February 19, 1997, includes naming of in vitro mutagenesis constructs (Section 2.2.) and balancers (Section 5.14.).

Version 4.1, June 3, 1997, includes modification of rules for naming multiple transposon insertions (Section 3.1.2.), a clarification of rules for representing proteins in text (Section 11.1.), and a proposal for naming genes that encode ribosomal proteins (Appendix A.).

Version 4.2, March 8, 1998, includes modified rules for naming genes identified only by genomic sequencing projects (Section 1.1.3).

Version 4.3, May 21, 1998, includes minor changes to the Introduction and format of the document.

Version 4.4, February 9, 1999, includes a change to Section 5.13. supporting the identification of an unknown breakpoint as heterochromatic.

Version 5.0, July 6, 1999, all references to 'honorary genes' have been removed (this category is no longer used by FlyBase) and a description of nicknames has been added (Section 7.7.).

Version 5.01, August 23, 1999, assorted minor corrections were made.

Version 6.0, November 23, 1999, updates Section 9.1. to include FlyBase's new policy on the use of sequence accessions to determine precedence of gene names and symbols. Many links have been added and assorted corrections made.

Version 6.1, December 27, 1999, updates Section 1.1.3. to include genes identified by Celera.

Version 6.2, April 5, 2000, updates Section 1.1.3. with the derivation of anonymous gene symbol prefixes.

Version 6.3, May 12, 2000, updates Section 2.2. to clarify the current rules for allele symbols.

Version 7, August 28, 2000, updates Section 11.2., eliminating the convention (which was never adopted by Drosophilists) that RNA products of genes are designated in text by the gene symbol in all italic capital letters.

Version 7.1, April 18, 2001, updates Section 1.1.3. to make explicit the need for authors to provide the CG gene symbol when renaming a CG-named gene.

Version 7.2, April 24, 2001, updates Section 1.1.3. to clarify the on-going assignment of CG names.

Version 8.0, August 1, 2001, updates Sections 1.1. to clarify the one-gene:one-valid-symbol rule, 1.1.1. to clarify the case of certain gene symbols, and 5.15. to make explicit the various ways in which balancers can be described.

Version 8.1, August 28, 2001, updates Section 5.13. to change 'h?' to 'h' as the symbol for undefined heterochromatic breakpoints in complex aberration symbols.

Version 8.2, October 25, 2001, updates Section 3 to include foreign gene prefixes in example genotypes.

Version 8.3, November 26, 2001, rewords Section 7 to clarify that genotypes specify alleles of genes.

Version 8.4, March 22, 2002, updates cytology in examples in Sections 5.5.2.2. and 6.2.

Version 9, August 16, 2004, updates sequence annotation nomenclature in Section 1.1.3. and emphasizes in 1.7.2. and 10.5. the prohibition against use of the character / in gene and other symbols. 11. is slightly modified to clarify that these options apply to generic proteins and transcripts from a given gene.

enhancer	e(a)m, E(a)m
female sterile	fs(n)m, Fs(n)m
lethal	l(n)m
male sterile	ms(n)m, Ms(n)m
male/female sterile	mfs(n)m, Mfs(n)m
maternal	mat(n)m, Mat(n)m
meiotic	mei
Minute	M(n)m
mitotic mutant	mit(n)m, Mit(n)m
mutagen sensitive	mus
'Polygene'	PL(n)m
resistance	rst(n)m, Rst(n)m
suppressor	su(a)m, Su(a)m
'tumor'	tu(n)m (i.e., genes controlling production of melanotic pseudotumors)

Genetic nomenclature for Drosophila melanogaster

Table of Contents

10. Representation of gene, allele and aberration names and symbols in text