Datasets
- Synthetic data: Prelic et al. (2006). [Link]
- Synthetic data: Scaling pattern and overlap pattern generated by ourselves
QUBIC benchmark large_data - E. coli data: M3D microarray database. [Link]
- Yeast data: Prelic et al. (2006). [Link]
This is a subset of the original Gasch dataset. - Leukemia data: Armstrong et al. (2002). [Link]
We pre-process the original data by changing letters and negative PM-MM values to '0', and discrete the significant value to characteristic symbol.
Original data, Filtered data, Pre-processing script
Results
The following section contains results from QUBIC and other programs.- Prelic's benchmark
QUBIC Other programs - QUBIC benchmark
QUBIC Other programs - E. coli data
QUBIC Other programs - Yeast data
QUBIC Other programs - Leukemia data
QUBIC - The detailed results of Table 10
Table 10 - The software NNN
NNN
In addition, scripts to validate the enrichment of functional classifications for E. coli and yeast data can be found here. There are following utility scripts in the package.
-
evaluate.py - run it e.g.
$ python evaluate.py qubic.ecoli ecoli keggwill produce qubic.ecoli.kegg, which contains the most significant KEGG assignment in E. coli for the biclusters found in qubic.ecoli. -
stats.py - simply run as
$ python stats.pywill output the significance statistics for all the produced files in the current directory. -
makefile - type
$ make qubicto benchmark only qubic or simply$ maketo run all tests
The comparison results on Prelic's benchmark, qubic benchmark, E. coli and yeast datasets are summarized in the following three EXCEL sheets.
Parameters
The following biclustering softwares are tested. Details are included to reproduce the results.| Parameter settings for various biclustering algorithms | ||||
| Algorithm | Parameters | Additional parameters on real data* | ||
| QUBIC | o=100 c=0.95 q=0.06 r=1 f=1 k= 5% of columns (default setting) | discretize to three classes (up, down, no-change) after pre-process | ||
| BIMAX | (minimum gene and chip number)=2 | n/a | ||
| ISA | (threshold genes and chips)=2, (seeds number)=100 | n/a | ||
| SAMBA | overlap factor 0.1, 100 probes to hash, kernel 4-4 | n/a | ||
| RMSBE | α=.4, β=.5, γ=1.2, random 10 genes and 10 chips | random 300 genes and 40 chips | ||
* Additional parameters refer to the slight changes on parameters for larger-sized microarray data.
Page last updated: Jan. 06, 2009
