Running Prospect
We suggest users to run BLAST / PSI-BLAST
first before using PROSPECT to make sure whether any homolog for
the target exists in PDB. In case a remote homolog found by PSI-BLAST
has only alignment for partial sequence and it is not included in the
DALI or FSSP list, it is suggested to include them in the template library
(see Templates) to verify if it is the
true fold or to generate the full alignment. One can do the same thing
for a PDB structure having similar function to the target.
The programs included as part of the prospect suite are:
For the various examples below, we use 'LINUX' as the architecture in the
commands.
Sequence Profile
We use PSI-Blast to generate sequence profiles. The program
blastpgp produces a 'checkpoint file' at the end of a search iteration.
We include the script get_chk_file, which can be used with the command:
get_chk_file <seq file>
<seq file> : The path to the sequence file
This is produce the file <seq file>.chk
You can run blastpgp yourself, with the command:
blastpgp -b 0 -j 3 -h 0.001 -d /data/nr/nr -i test.seq -C test.seq.chk
This tells blastpgp to show none of the alignments (-b 0), with
3 iterations (-j 3), and an e-value threshold of 0.001 (-h 0.001), using
a database found at /data/nr/nr (-d), the sequence test.seq (-i), and saving
the checkpoint file (thing we're intersted in here) to file test.seq.chk
(-C)
The chk file is a non-ascii file which is not transferable between
machines with different encoding schemes (i.e. between big and little endian
machines).
read_chk.<Architecture> <chk file>
This will read all of the information in the file out to ASCII format.
To save it, simple pipe the output to a file, like
read_chk.LINUX test.seq.chk > test.seq.freq
This file should be comprised of one line with the number of amino
acids in the sequnce (N), followed by the sequnce on the next line, and
then N lines
Secondary Structure Prediction
prospect_ssp.<Architecture>
[-freqfile <file>]/[-seqfile <file>]/[-chkfile <file>]
[-p]
-chkfile <file>
The frequency profile for a sequence, created by psi-blast.
-freqfile <file>
The ASCII version of a check point (chk) file, created with
the tool read_chk.
-seqfile <file>
The raw sequence, typically in standard FASTA format
-p
Print output in PHD style format
Alignment
Threading is the actual 'work horse' of the prospect suite. It
is the program that actually does the actual threading procedure. Typically
you don't call threading directly, but rather call it through prospect, which
then calls threading aginst an entire database.
threading.<Architecture>
[-phdfile <file>]/[-seqfile <file>] [-freqfile <file>]
[-global]/[-global_local]/[-np]/[-wp] [-reliab] [-o <output file>]
[TemplateName ]/[-tempfile <file>]
-phdfile <file>
A secondary structure prediction in PHD format, can be generated
by prospect_ssp
-seqfile <file>
The amino acid sequence file, typically in FASTA format.
Note, if you provide a secondary structure prediction, you don't need
to provide this file.
-freqfile <file>
The frequency profile for the sequence, as outputed by
read_chk
-global -global_local -np -wp
Select the type of threading procedure to use. See
threading methods for more information.
-reliab
Calculate the zscore when doing global and global local
threading. Does not apply to NP and WP threading.
-o <output file>
Name of the file to output to.
[TemplateName ]/[-tempfile <file>]
Define the name of the template to thread against. Or
if neccessary, the path of the threading file to use defined by the -tempfile
arguement.
e.g.,
thread.LINUX -seqfile 1ltsd.seq 1bova
The options at the command line are complementary with the settings
specified in configuations.
Threading Methods
The following 'threading methods' are avalible (and are mutually
exclusive, so you pick one)
- "-global": Global alignment
- "-global_local": Global-local alignment, i.e., no
end gap penalty for a query sequece.
- "-np": without using pairwise interaction
- "-wp": With pairwise interaction
Threading against a Database
For fold recognition:
prospect.<Architecture> [-phdfile
<file>]/[-seqfile <file>] [-freqfile <file>] [-global]/[-global_local]/[-np]/[-wp]
[-reliab] [-ncpus N] [-o <output file> ]
-phdfile <file>
-seqfile <file>
-freqfile <file>
-global -global_local -np -wp
All of these parameters are passed directly to threading
-scop
Thread aginst the SCOP database.
-fssp
Thread aginst the FSSP database (this is the default action).
-custom
Thread aginst all templates found in the template paths
that are not part of FSSP or SCOP (i.e. templates that you've generated with
make_template)
-all
Thread aginst all found templates.
-tfile <file>
Thread against all the templates listed in <file>.
-o <output file>
Name of the output XML file
-ncpus N
Try to launch N simultanous threading jobs.
e.g.,
Run default template list aginst sequence
prospect.LINUX -seqfile t0052.seq
Run custom subset of templates against phd file
prospect.LINUX -tfile subset.list -phdfile agouti.phd
The following optional flags are available:
- "-tfile TemplateListFile": a list of template used
for threading (default: FSSP located at $PROSPECT_PATH/data/parameters/fssp.list).
After Prospect
Sorting prospect Results
SortProspect is provided to scan and sort the names of templates
in prospect files in order to determine which templates are the best
matches.
sortProspect.<Architecture> <prospect
outfile> [-r/-z] [-s] [-1] [-top x]
<prospect outfile>
The file outputed by prospect
-r
Sort by raw score
-z
Sort by zscore
-s
Save the prospect output file with the enteries in the order
of sorting
-1
One colum view, only print out the names of templates
-top x
Print out the top x scores of the sort
There are different methods to sort scores, amoung them: by raw score,
Zscore, and SVM score. SVM sort is default. Zscore, and raw score
can be activated by the -z and -r flags respectively.
To sort according to the raw score in the fold recognition,
use:
sortProspect.LINUX -r OutputFile.xml
To sort according to the SVM score, and save sort back to the file:
sortProspect.LINUX OutputFile.xml -s
Confidence Index
Using Modeller
To create a modeller runfile from a prospect threading, call
modellerProspect.<Architecture>
-prosfile <prospect outfile> <template name> [-o output directory
name]
-prosfile <prospect outfile> <template name>
[-o output directory name]
The example would be:
modellerProspect.LINUX -prosfile test.seq.xml 1aac -o test_1aac
This command would create a modeller alignment from the template
1aac, and place the files needed to run modeller in the directory test_1aac.
To actually run modeller:
cd test_1aac
modeller run.top
Prospect File Utilities
catProspect.<Architecture> <file1>
.... <file n>
- Takes several prospect files, and place all of the in a single
prospect record
mergeProspect.<Architecture> <file1>
<file2> <file3>
Replace records in file1 with those from file2, save into file 3.
This is a tool lets use replace newer threadings into a prosect
file, while keeping the older ones that don't need to be replaced.
Viewing Prospect Results
convertProspect.<Architecture>
<prospect file> [-wrap] [-table] [-html] [-pdb <templateName>]
<prospect file>
The file created by prospect
-wrap
Line wrap the alignments at 60 characters
-table
For html pages, produce a table at the top of the page
that summerizes the results
-html
Output to HTML
-pdb <templateName>
Copy the template's PDB file, setting temperatures to show
the 'level' of alignments, hotter means alignments with excact matches,
while colder means no alignment.
Template Generation
Prospect 2.0 includes the tools needed for you to create new templates.
make_template.<Architecture> [-pdbfile
<file>] [-c] [-n <name>] [-d <start res> <end res>]
[-o <file>]
-pdbfile <file>
Path to the PDB file that the template will be based on.
Please check the templates page for notes
on the requirements for base PDB files.
-c
The chain to use for the template
-n <name>
Base name for the template.
-d <start res> <end res>
If you are creating a template based on a domain, rather
then a chain, this selects the area between and including the start residue
ID and the end residue ID.
-o <file>
Name of the output template
|