=======================================================================
  CUBIC -- Conservative continUous Block with high Information Content
                    CUBIC Optimized 1.0 (06/15/2004)
             Copyright (C) 2004 CSBL/University of Georgia
                          All Rights Reserved
=======================================================================

1. Introduction
   Cubic is a tool developed by the Computational Systems Biology Laboratory
at the University of Georgia to be used for protein binding site prediction. 
Cubic was originally developed by Dr. Victor Olman and then was further
developed and optimized (upto 10 times speedup on single-CPU machines) by 
Dr. Jizhu Lu. Cubic accepts up-stream sequences and a few other parameters as 
input and generates a set of binding sites and their statistical significance 
as output. Please read the "LICENSE" file carefully before you use this program.

2. Usage
  The basic form of the command line to run cubic is "cubic [-options]".
  Available options are:
   -h        : help; print brief help on version and usage
   -i : input file in fasta format (default standard input)
   -o : output file with motifs found (default standard output)
   -p : output file with results for MST (default standard output)
   -w     : the length of a motif (default 10)
   -m     : the mandatory number of sequences expected to have motifs
(default 10)
   -l     : the length of the information rich left part of a motif
(default 10)
   -g     : the length of the gap following the information rich left part
of the motif (default 0)
   -b     : the number of the best consensus motifs to be found (default 3)
   -a     : the number of additional sequences to have motifs (default 3)
   --Ind     : to output index of sequences instead of annotation line by
default

3. Explanation of the output
   Cubic program could generate two output files: one is an output file with
motifs found and the other is an output file with the results for MST. 

(1) Here is an example of the former output file and the explanation:

Candidates_# Profile_Volume Motif_Length Seqs Added        BS_structure
           2             10           12  109     2  B B B B B B n n B B B B
Consensus=ATTTTTnnAAAA Score=  81.9477 Palindromicity=4  p_value= 6.350228E-02
for 7 first BS ratio=1.131336E+00 coverage=10

Explanation: The information is explained by itself.

0.083 0.303 0.299 <- 0.5252 0.7777 1.0000 0.5252 1.0000 1.0000 * * 0.4276 1.0000 1.0000 1.0000 ->  0.083 0.083 0.232

Explanation: Each number represents the information content of each position
of the binding site candidate. The information contents of three extra
positions at the left side and right side of the binding site candidate are
also given respectively.

 14 AATTTTacAAAA  204   6.7642
 48 ATTTTTgaAAAA   19   7.2628
 57 TTTATTcaAAAA  122   6.8726
 59 ATTTTTacAAAA  123   7.2628
 76 TTTTTTttGAAA  188   6.8023
 77 ATTTTTtaAAAA  207   7.2628
 87 TTTTTTttGAAA  265   6.8023
 88 ATTTTTttTAAA   91   6.8878
 96 ATTATTaaAAAA  201   7.0677
105 ATTATTaaAAAA  201   7.0677

Explanation: The first column is the index of sequences. Three other columns
show the motif found in the sequence, the start position of the motif and 
the score respectively.

------------------------------------------
 55 AATTTTagGAAA  200  -0.2654
 85 AGTATTcaAAAA  209  -0.4099

Explanation: Two lines show the information of two additional sequences. 
 
Consensus=AAATTTnnCAAA Score=  81.3746 Palindromicity=3  p_value= 7.953626E-02
for 4 first BS ratio=1.187530E+00 coverage=11
0.209 0.299 0.243 <- 0.6259 1.0000 1.0000 0.4673 1.0000 1.0000 * * 0.7777
0.4673 1.0000 1.0000 ->  0.243 0.428 0.368
  1 AAATTTagCCAA  224   7.1317
  5 AAAATTtaCAAA   63   7.1317
 11 AAATTTagCCAA  224   7.1317
 14 TAATTTtaCAAA  203   6.9032
 16 AAAATTtaCAAA   63   7.1317
 31 AAATTTagCCAA  224   7.1317
 38 AAAATTtaCAAA   63   7.1317
 71 TAAATTgaCAAA  115   6.8109
 92 AAATTTaaCCAA  285   7.1317
 96 AAATTTaaGAAA  211   6.7267
------------------------------------------
105 AAATTTaaGAAA  211   0.0000
 57 AAATTTcaTAAA    2  -0.2142

Explanation: They show the information of the second binding site candidate.

(2) Below is an example of the output file with the results for MST and the
explanation:

Start = ATTTTTnnAAAA
   0 5350   -1   0.9748003 ATTTTTgaAAAA  48  19
   0 5350 5350   0.9748003 ATTTTTacAAAA  59 123
   0 5350 5350   0.9748003 ATTTTTtaAAAA  77 207
   1 4993   -1   1.1852221 ATTATTaaAAAA  96 201
   1 4993 4993   1.0490668 ATTATTaaAAAA 105 201
   2 19038   -1   1.1852221 TTTTTTaaAAAA  77 208
   3 18572 4993   1.2594886 TTTATTcaAAAA  57 122
   4 5364   -1   1.2805684 ATTTTTttTAAA  88  91
   5 19062 19038   1.2842190 TTTTTTttGAAA  76 188
   5 19062 19062   1.1262810 TTTTTTttGAAA  87 265
   6 18582 18572   1.4291015 TTTATTatTAAA  96 198
   6 18582 18582   1.2159903 TTTATTatTAAA 105 198
   7 1851   -1   1.6167946 AATTTTacAAAA  14 204
   7 1851 1851   1.2013865 AATTTTttAAAA  77 206
   8 1860 1851   1.4365389 AATTTTagGAAA  55 200
   9 6658 1851   1.6707096 CATTTTttAAAA  88 200
  10 10781 6658   1.3313529 GATTTTtgAAAA  48  18
  11 10794 6658   1.6371212 GATTTTgtTAAA  12  32
  12 10786 10794   1.5758646 GATTTTtaCAAA  59 122
  13 5193 5364   1.6886353 ATTGTTcaCAAA   4  85
  14 3817 1851   1.6903207 AGTATTcaAAAA  85 209
  15 3912 3817   1.6437459 AGTGTTgaAAAA  65 222
  16 9524 18572   1.8028722 CTTCTTgaAAAA  29  14
  17 16116 1860   1.9991775 TCTTTTaaCAAA  46 146
  18 5351   -1   2.0062287 ATTTTTtaAAAT  88 201
  19 19039 5351   1.5492558 TTTTTTttAAAG  88  92
  20 19040 19039   1.4131005 TTTTTTtgAAAT  76 189
  20 19040 19040   1.4131005 TTTTTTtgAAAT  87 266
  21 19063 19039   1.6482527 TTTTTTgaGAAG  87 116
  22 19069 19063   1.5845160 TTTTTTctTAAC  99 126
  23 19070 19069   1.5057576 TTTTTTtaTAAT  77 194
  24 19053 19069   1.6576121 TTTTTTttCAAC  12 102
  25 5088 5351   1.8081574 ATTCTTtaAAAG  45  65

Explanation: The second column shows the index of word which
is a child node in the Minimum Spanning Tree and the third 
column shows the index of word which is the father of the 
node shown in the second column in the MST ("-1" means it is
a root). The 4th column shows the distance between the father 
node and the child node in Prim algorithm. The 5th column 
shows the motif. The 6th column shows the index of sequence 
from which the motif was found. The 7th column shows the start 
position of the motif in the sequence.
 
Start = AAATTTnnCAAA
   0  230   -1   1.0918388 AAAATTtaCAAA   5  63
   0  230  230   1.0321925 AAAATTtaCAAA  16  63
   0  230  230   1.0321925 AAAATTtaCAAA  38  63
   1  661   -1   1.0918388 AAATTTagCCAA   1 224
   1  661  661   1.0321925 AAATTTagCCAA  11 224
   1  661  661   1.0321925 AAATTTagCCAA  31 224
   1  661  661   1.0321925 AAATTTaaCCAA  92 285
   2 14231   -1   1.3824093 TAATTTtaCAAA  14 203
   3 13995 14231   1.2269275 TAAATTgaCAAA  71 115
   4  664   -1   1.6416526 AAATTTaaGAAA  96 211
   4  664  664   1.2262444 AAATTTaaGAAA 105 211
   5  671  664   1.5047567 AAATTTcaTAAA  57   2
   6  222  671   1.3467484 AAAATTcaAAAA  10  30
   6  222  222   1.2871020 AAAATTcaAAAA  30  30
   7  240  222   1.3792827 AAAATTtgTCAA  83  57
   8  666  664   1.6166568 AAATTTtaGGAA  55 199
   9  237  666   1.4160283 AAAATTtcGTAA  48 275
  10  678  666   1.6023598 AAATTTatTTAA  12 216
  11  227  678   1.4443514 AAAATTtaAGAA  96 210
  11  227  227   1.3847051 AAAATTtaAGAA 105 210
  12 14224  671   1.6373188 TAATTTacAAAA  83 243
  13 14240 14224   1.3896563 TAATTTttTAAA  77 205

Explanation: The above shows the results for the second binding
site candidate.

4. Contact information:
   Please send any bugs or suggestions to Dr. Jizhu Lu at jlu@csbl.bmb.uga.edu
or mail to the following address:

   Dr. Jizhu Lu
   Dept. of Biochemistry & Molecular Biology
   Davison Life Sciences Complex, Room A110
   120 Green Street
   University of Georgia
   Athens, GA 30602-7229

   Any feedback will be appreciated.