QServer: QUalitative BIClustering server

Release 2.0.0, on December 20, 2011

1. What is the QServer ?

QServer is a web server based on the biclustering algorithm QUBIC (1). The detailed information of the technique biclustering can be found in the main page of this server or the Wikipedia:Biclustering.

2. How to use QServer ?

By clicking "BiCluster" in the left navigation bar, a user can go to the interface of the QServer . A user needs to choose which organism he/she is working on, and a data matrix generated from multiple expression array data sets. A subset of gene names may also be provided, so that only biclusters containing these genes will be displayed in the Results page.

If the user is familiar with the biclustering parameters or wants to manipulate the parameters to get a stable result, he/she may change the parameters in the bottom part of the BiCluster page. The meanings of the parameters may be found in the paper (1).

To start testing the QServer without any data sets, just click the Retrieve page and click the button "Example" to load the example Job. The results of the example data set (an E. coli K12 microarray data set) will be loaded, after clicking the button "Submit".

3. How to generate the data matrix as the input of QServer ?

The data matrix file is a TAB-delimited text file with the following format:

#MatrixName Condition1 Condition2 Condition3 Condition4 Condition5
Gene1 7.21 8.36 5.21 9.22 10.11
Gene2 6.55 7.34 7.10 6.99 6.88
Gene3 3.58 5.66 7.12 4.34 3.21
Gene4 5.55 2.12 2.12 8.23 9.99


The expression levels should be normalized before being included in the matrix. Each row consists of the expression levels of one gene, and each column lists the expression levels of all the genes under one condition/microarray.

4. How to directly generate the data matrix from the raw microarray data sets?

For the user's convenience, we provide a Perl script to generate the data matrix normalized from multiple raw microarray data sets. The Perl script, MatrixMaker 1.0, can be downloaded from here: http://csbl1.bmb.uga.edu/~ffzhou/QServer/MatrixMaker.1.0.tar.gz. This program is for Linux only. You may also need to download the Linux version of Affymetrix Power Tools (the program apt-probeset-summarize) from the Affymetrix web site.

4.1. decompress the file MatrixMaker.1.0.tar.gz after downloading:
$ tar -zxvf MatrixMaker.1.0.tar.gz
4.2. run the program to get the syntax of MatrixMaker:
$ cd MatrixMaker
$ ./MatrixMaker.pl

The reason, why we didn't incorporate this matrix-making functionality directly into our server, is that the raw microarray data set is too big to be uploaded. For example, the raw files (.CEL) for the study GSE2125 occupy 389 Mb, even after be compressed by GZip. The decompressed raw files of GSE2125 need 1.5 Gb. After making the matrix locally, the RMA matrix just needs 21 Mb, and the GZip compressed one is 11 Mb, a reduction of 97.17% in file size.

5. Explanation of the parameters in the "BiCluster" page.

QUBIC has a number of parameters, namely, the flag d to parse discrete data, the range r of possible ranks, the percentage q of the regulating conditions for each gene, the required consistency level c for a bicluster, the desired number o of the output biclusters, and the control parameter f for overlaps among to-be-identified biclusters. For each of these parameters, we allow the user to adjust the default value to provide some flexibility.

Number of blocks to report : number of biclusters to report. default: 100

Data type : the flag of discrete input data, which format is used in the data. default: continuous data

Quantile discretization for continuous data : quantile discretization for continuous data (0,0.5). default: 0.06

The number of ranks : the number of ranks as which we treat the up(down)-regulated value when discretization. default: 1
The parameters r and q affect the granularity of the biclusters. A user can start with a small value of r (the default value is 1 so the corresponding data matrix consists of values ‘+1’, ‘–1’ and ‘0’), evaluate the results, and then use larger values (should not be larger than half of the number of the columns) to look for fine structures within the identified biclusters. The choice of q’s value depends on the specific application goals; that is if the goal is to find genes that are responsive to local regulators, we should use a relatively small q-value; otherwise we may want to consider larger q-values. The default value of q is 0.06 in QUBIC (this value is selected based on the optimal biclustering results on simulated data).

Filtering overlapping blocks : filtering overlapping blocks (0 - 1.0]
default: 1 (do not remove any biclusters)
The parameter f controls the level of overlaps between to-be-identified biclusters, whose default value is set to 1, to ensure that no two reported biclusters overlap more than f, while users should decrease the parameter f if they do not want to get too many overlapped biclusters, if any.

Minimum condition (column) numbers : minimum number of conditions in a bicluster. default: 5% of the total columns.

Conservative of the block : consistency level of the bicluster (0.5-1.0]. default: 0.95
The parameter c means the minimum ratio between the number of identical valid symbols in a column and the total number of rows in each output bicluster. The larger of c, the higher co-expression level of the genes under specific conditions in output biclusters. Users should start with large value of c [0.95,1], then release it if the results are not very satisfactory.

Promoter motif finding program : bobro or meme. default: bobro.

[1] G Li, Q Ma, H Tang, AH Paterson and Y Xu, "QUBIC: a qualitative biclustering algorithm for analyses of gene expression data", Nucleic Acids Research 2009 37:e101. PubMed.

 


For scientific or technical questions about this database, please contact Dr Fengfeng Zhou.