1. What is DOOR2?

    DOOR2 (Database of prOkaryotic OpeRons, Version 2.0) is an operon database developed by Computational Systems Biology Lab (CSBL) at University of Georgia. The operons in this database are predicted based on essential genomic features.

  2. What is your prediction algorithm?

    Our algorithm is a data-mining classifier. Its features include Intergenic distance, Neighborhood conservation, Phylogenetic distance, Information from short DNA motifs, Similarity score between GO terms of gene pairs, and Length ratio between a pair of genes. The classifier is a trained decision tree based on the training data from E. coli and B. subtilis. Please see Section 8 in our tutorials and read the paper below for more detail.

    Dam, P., Olman, V., Harris, K., & Xu, Y. (2007). Operon Predition using both genome-specific and general genomic information. Nucleic Acids Research, 35(1), 288-98.

  3. How is your data quality?

    Based on the algorithm paper, our accuracy can reach 90.2% and 93.7% on B. subtilis and E. coli genomes, respectively.

    Based on another evaluation paper (see reference below) published at Brief Bioinformatics by Brouwer RW et al., this algorithm is consistently best at all aspects including sensitivity and specificity for both true positives and true negatives, and the overal accuracy reach ~90%.

    Brouwer, R. W., Kuipers, O. P., & van Hijum, S. A. (2008). The relative value of operon predictions. Briefings in bioinformatics, 9, 367-75.

  4. What is the size of your database?

    Currently DOOR has 1,323,902 operons for 2072 prokaryotic genomes.

    Some other research groups also provide operons (predicted or collected from literature), such as the OperonDB provided by Steven L. Salzberg's group at University of Maryland, the predicted operons in MicrobesOnline at VIMSS (Virtual Institute of Microbial Stress and Survival), ODB at Kyoto University in Japan, DBTBS in Japan, and RegulonDB at Mexico.

    At the time when we first started to develop this database, MicrobesOnline is providing operons for 620 genomes, OperonDB is providing operons for 550 genomes, ODB is provding operons for 203 genomes, and RegulonDB is providing operons in E. coli only while DBTBS is for B. subtilis only. All operons in OperonDB and MicrobesOnline are predited, and most operons in ODB are also predicted. RegulonDB and DBTBS operons are based on experiment and literature. In addition, we will keep updating this database when new prokaryotic genomes are available.

  5. Where can I find the overall statistics of DOOR?

    You can find our overall statistics in the Data Statistics tab in our Home page.

  6. Are your operons verified by experiments OR are they based on literature information?

    Although most of the operons in DOOR are not verified by experiments, we try to provide relevant literature information extracted from ODB along with the operons to make our database more comprehensive. In addition, we believe that the operon data provided in DOOR will be quite useful for scientific analysis involving operon evolution, operon transfer study, etc.

    We would like to emphasize that if our users are looking for strictly experimentally verified operons, then they should look into DBTBS and RegulonDB first.

  7. Do you provide operons in RNA genes?

    Unfortunately no, we currently do not provide operons which include RNA genes, since they are rarely seen in predicted operon databases.

  8. How is your query capability?

    We provide very powerful query capability for our users in order to best assist them in finding what they are looking for easily. Please refer to our tutorials for detailed description.

  9. How do I identify the similarities between operons?

    We have defined the similarity scores between operons based on weighted maximum matching between operons. The equation used is shown below. Similar operon groups can be used to predict accurate orthologous genes, and their upstream regions can be used to find the consensus binding motifs.


  10. Which motif finding programs are you using?

    We have integrated two motif finding programs in the database: the popular MEME and our in-house program BoBro. MEME is a very popular motif finding program, so we integrated it according to public interest. Our in-house motif finding program BoBro actually outperfroms MEME in many aspects based on our experiences, thus we integrated it as well.

  11. Why can't I find any data for the binding sites?

    We have incorporated experimentally verified data for binding sites from Regtransbase database in DOOR. If you would like to find out more about binding sites, please visit Regtransbase. We have also included the binding motifs for E. coli from RegulonDB.

  12. How many operons need to be selected in order to use the motif finding function?

    Users need to select at least 3 operons for motif finding. Make sure that you select the operons and not just the genes. Otherwise, a message stating "there are not enough sequences in your fasta file" will show up.

  13. What kind of information do you need for operon prediction?

    Three files are required for operon prediction: (1) Gene location in NCBI ptt file format; (2) Protein sequence in NCBI faa file format; and (3) Chromosome sequence in NCBI fna file format. To find out how our operon prediction is conducted, please see Section 8 in our tutorials for more detailed procedures.

  14. How to cite our papers?

    If you would like to use the results obtained from our operon database, please cite the following papers:

    Mao, F., Dam, P., Chou, J., Olman, V., & Xu, Y. (2009). DOOR: a Database for prOkaryotic OpeRons. Nucleic Acids Res, 37, D459-D463.

    Dam, P., Olman, V., Harris, K., & Xu, Y. (2007). Operon Predition using both genome-specific and general genomic information. Nucleic Acids Research, 35(1), 288-98.