The Ensembl Release Cycle

Ensembl data is released on an approximately two-month cycle (occasionally longer if a lot of development work is being undertaken). Whatever its length, the cycle works as follows:

  1. Genebuild

    This stage varies in length for each species, as it takes longer if the genome is more complex complexity of the genome(s) involved. Individual species are updated on an irregular schedule, depending on the availability of new assemblies and evidence. New species are added frequently from a number of sequencing projects around the world, and all species databases may receive minor updates. These can include patches to correct erroneous data and updates to data that changes regularly (such as cDNAs for human and mouse).

    The genebuild team members take evidence for genes and transcripts, such as protein and mRNAs, and combine it with manual annotation data in the analysis pipeline, to create an Ensembl core database and optionally otherfeatures and cdna databases. Once these are complete, they are handed over to the other Ensembl data teams for further processing (see below).

  2. Additional core data
    The role of the core team is two-fold: to provide API support for the core and core-like (otherfeatures and cdna) databases, and to run scripts that add supplementary data to the database (e.g. gene counts) and check that the database contents are as complete and accurate as possible. These latter scripts, known as healthchecks, help to pick out any anomalous data produced by the automated pipeline, such as unusually long genes.
  3. Other databases
  4. Mart
    The mart team build their own normalised database tables from the Ensembl data, so that it can be accessed through the BioMart data-mining tool.
  5. Web

    Whilst the genomic data is being prepared, the web team works on new displays and new website features. They then bring together all the finished databases and make the content available online in a number of ways:

    The web team also populates an additional database, ensembl_website, which contains help, news, and other web-specific information. If there are new displays, or if existing ones have changed substantially, the outreach team update the help content.

  6. Release
    When the new release is ready to go live, a copy of the current version is set up as an archive, and the webserver is updated to point to the new site.

This is necessarily a simplified account of a process that takes around 50 people several months to complete!