|
|
|
=======================================================================
CUBIC -- Conservative continUous Block with high Information Content
CUBIC Optimized 1.0 (06/15/2004)
Copyright (C) 2004 CSBL/University of Georgia
All Rights Reserved
=======================================================================
1. Introduction
Cubic is a tool developed by the Computational Systems Biology Laboratory
at the University of Georgia to be used for protein binding site prediction.
Cubic was originally developed by Dr. Victor Olman and then was further
developed and optimized (upto 10 times speedup on single-CPU machines) by
Dr. Jizhu Lu. Cubic accepts up-stream sequences and a few other parameters as
input and generates a set of binding sites and their statistical significance
as output. Please read the "LICENSE" file carefully before you use this program.
2. Usage
The basic form of the command line to run cubic is "cubic [-options]".
Available options are:
-h : help; print brief help on version and usage
-i : input file in fasta format (default standard input)
-o : output file with motifs found (default standard output)
-p : output file with results for MST (default standard output)
-w : the length of a motif (default 10)
-m : the mandatory number of sequences expected to have motifs
(default 10)
-l : the length of the information rich left part of a motif
(default 10)
-g : the length of the gap following the information rich left part
of the motif (default 0)
-b : the number of the best consensus motifs to be found (default 3)
-a : the number of additional sequences to have motifs (default 3)
--Ind : to output index of sequences instead of annotation line by
default
3. Explanation of the output
Cubic program could generate two output files: one is an output file with
motifs found and the other is an output file with the results for MST.
(1) Here is an example of the former output file and the explanation:
Candidates_# Profile_Volume Motif_Length Seqs Added BS_structure
2 10 12 109 2 B B B B B B n n B B B B
Consensus=ATTTTTnnAAAA Score= 81.9477 Palindromicity=4 p_value= 6.350228E-02
for 7 first BS ratio=1.131336E+00 coverage=10
Explanation: The information is explained by itself.
0.083 0.303 0.299 <- 0.5252 0.7777 1.0000 0.5252 1.0000 1.0000 * * 0.4276 1.0000 1.0000 1.0000 -> 0.083 0.083 0.232
Explanation: Each number represents the information content of each position
of the binding site candidate. The information contents of three extra
positions at the left side and right side of the binding site candidate are
also given respectively.
14 AATTTTacAAAA 204 6.7642
48 ATTTTTgaAAAA 19 7.2628
57 TTTATTcaAAAA 122 6.8726
59 ATTTTTacAAAA 123 7.2628
76 TTTTTTttGAAA 188 6.8023
77 ATTTTTtaAAAA 207 7.2628
87 TTTTTTttGAAA 265 6.8023
88 ATTTTTttTAAA 91 6.8878
96 ATTATTaaAAAA 201 7.0677
105 ATTATTaaAAAA 201 7.0677
Explanation: The first column is the index of sequences. Three other columns
show the motif found in the sequence, the start position of the motif and
the score respectively.
------------------------------------------
55 AATTTTagGAAA 200 -0.2654
85 AGTATTcaAAAA 209 -0.4099
Explanation: Two lines show the information of two additional sequences.
Consensus=AAATTTnnCAAA Score= 81.3746 Palindromicity=3 p_value= 7.953626E-02
for 4 first BS ratio=1.187530E+00 coverage=11
0.209 0.299 0.243 <- 0.6259 1.0000 1.0000 0.4673 1.0000 1.0000 * * 0.7777
0.4673 1.0000 1.0000 -> 0.243 0.428 0.368
1 AAATTTagCCAA 224 7.1317
5 AAAATTtaCAAA 63 7.1317
11 AAATTTagCCAA 224 7.1317
14 TAATTTtaCAAA 203 6.9032
16 AAAATTtaCAAA 63 7.1317
31 AAATTTagCCAA 224 7.1317
38 AAAATTtaCAAA 63 7.1317
71 TAAATTgaCAAA 115 6.8109
92 AAATTTaaCCAA 285 7.1317
96 AAATTTaaGAAA 211 6.7267
------------------------------------------
105 AAATTTaaGAAA 211 0.0000
57 AAATTTcaTAAA 2 -0.2142
Explanation: They show the information of the second binding site candidate.
(2) Below is an example of the output file with the results for MST and the
explanation:
Start = ATTTTTnnAAAA
0 5350 -1 0.9748003 ATTTTTgaAAAA 48 19
0 5350 5350 0.9748003 ATTTTTacAAAA 59 123
0 5350 5350 0.9748003 ATTTTTtaAAAA 77 207
1 4993 -1 1.1852221 ATTATTaaAAAA 96 201
1 4993 4993 1.0490668 ATTATTaaAAAA 105 201
2 19038 -1 1.1852221 TTTTTTaaAAAA 77 208
3 18572 4993 1.2594886 TTTATTcaAAAA 57 122
4 5364 -1 1.2805684 ATTTTTttTAAA 88 91
5 19062 19038 1.2842190 TTTTTTttGAAA 76 188
5 19062 19062 1.1262810 TTTTTTttGAAA 87 265
6 18582 18572 1.4291015 TTTATTatTAAA 96 198
6 18582 18582 1.2159903 TTTATTatTAAA 105 198
7 1851 -1 1.6167946 AATTTTacAAAA 14 204
7 1851 1851 1.2013865 AATTTTttAAAA 77 206
8 1860 1851 1.4365389 AATTTTagGAAA 55 200
9 6658 1851 1.6707096 CATTTTttAAAA 88 200
10 10781 6658 1.3313529 GATTTTtgAAAA 48 18
11 10794 6658 1.6371212 GATTTTgtTAAA 12 32
12 10786 10794 1.5758646 GATTTTtaCAAA 59 122
13 5193 5364 1.6886353 ATTGTTcaCAAA 4 85
14 3817 1851 1.6903207 AGTATTcaAAAA 85 209
15 3912 3817 1.6437459 AGTGTTgaAAAA 65 222
16 9524 18572 1.8028722 CTTCTTgaAAAA 29 14
17 16116 1860 1.9991775 TCTTTTaaCAAA 46 146
18 5351 -1 2.0062287 ATTTTTtaAAAT 88 201
19 19039 5351 1.5492558 TTTTTTttAAAG 88 92
20 19040 19039 1.4131005 TTTTTTtgAAAT 76 189
20 19040 19040 1.4131005 TTTTTTtgAAAT 87 266
21 19063 19039 1.6482527 TTTTTTgaGAAG 87 116
22 19069 19063 1.5845160 TTTTTTctTAAC 99 126
23 19070 19069 1.5057576 TTTTTTtaTAAT 77 194
24 19053 19069 1.6576121 TTTTTTttCAAC 12 102
25 5088 5351 1.8081574 ATTCTTtaAAAG 45 65
Explanation: The second column shows the index of word which
is a child node in the Minimum Spanning Tree and the third
column shows the index of word which is the father of the
node shown in the second column in the MST ("-1" means it is
a root). The 4th column shows the distance between the father
node and the child node in Prim algorithm. The 5th column
shows the motif. The 6th column shows the index of sequence
from which the motif was found. The 7th column shows the start
position of the motif in the sequence.
Start = AAATTTnnCAAA
0 230 -1 1.0918388 AAAATTtaCAAA 5 63
0 230 230 1.0321925 AAAATTtaCAAA 16 63
0 230 230 1.0321925 AAAATTtaCAAA 38 63
1 661 -1 1.0918388 AAATTTagCCAA 1 224
1 661 661 1.0321925 AAATTTagCCAA 11 224
1 661 661 1.0321925 AAATTTagCCAA 31 224
1 661 661 1.0321925 AAATTTaaCCAA 92 285
2 14231 -1 1.3824093 TAATTTtaCAAA 14 203
3 13995 14231 1.2269275 TAAATTgaCAAA 71 115
4 664 -1 1.6416526 AAATTTaaGAAA 96 211
4 664 664 1.2262444 AAATTTaaGAAA 105 211
5 671 664 1.5047567 AAATTTcaTAAA 57 2
6 222 671 1.3467484 AAAATTcaAAAA 10 30
6 222 222 1.2871020 AAAATTcaAAAA 30 30
7 240 222 1.3792827 AAAATTtgTCAA 83 57
8 666 664 1.6166568 AAATTTtaGGAA 55 199
9 237 666 1.4160283 AAAATTtcGTAA 48 275
10 678 666 1.6023598 AAATTTatTTAA 12 216
11 227 678 1.4443514 AAAATTtaAGAA 96 210
11 227 227 1.3847051 AAAATTtaAGAA 105 210
12 14224 671 1.6373188 TAATTTacAAAA 83 243
13 14240 14224 1.3896563 TAATTTttTAAA 77 205
Explanation: The above shows the results for the second binding
site candidate.
4. Contact information:
Please send any bugs or suggestions to Dr. Jizhu Lu at jlu@csbl.bmb.uga.edu
or mail to the following address:
Dr. Jizhu Lu
Dept. of Biochemistry & Molecular Biology
Davison Life Sciences Complex, Room A110
120 Green Street
University of Georgia
Athens, GA 30602-7229
Any feedback will be appreciated.
|
|
UGA Home Page
Department
of Biochemistry and Molecular Biology
Institute of Bioinformatics
|