|
|
|
|
Template Format and Creating New Templates
PROSPECT users can choose either the SCOP Domain Library or the
FSSP
Chain Library as the template database for threading. PROSPECT uses
FSSP as the default template library, since FSSP updates frequently
following the PDB release. The FSSP library used in PROSPECT covers
PDB structures released before MAY 2002. The SCOP domain library was
constructed from the version 1.59 release (15 May 2002).
Each template is contained in a single XML file (this
is one of the changes made between Prospect 1 and Prospect 2). We
attempted to include the library of all templates that would be nessacary
for threading, but of course, 'power users' may wish to create their own
templates and thread aginst them.
New templates can be generated from PDB files with the Prospect
suite's 'make_template'. This process requires Psi-Blast, the
NR database, Makemat, and the DSSP program.
Usage:
make_template -pdbfile <file> [-c 'Chain Letter']/[-r <start
residue> <end residue>] [-n <template name >]
The following enviromental variables will need to be set:
- BLASTPGP_EXE
- BLASTPGP_DB
- MAKEMAT_EXE
- DSSP_EXE
It is also worth noting that a 'minimium' (one with only ATOM enteries)
PDB file will not work with make_template. Make_template runs psi-blast on
the sequence of the template, but because frequently the ATOMs enteries don't
have all of the residues that are part of the chain make_template extracts
the sequence from the SEQRES enteries.
It is also a good idea to have a properly formated HEADER entery, so that
make_template can get the ID for the template from the file (or it can be
defined by the '-n' flag)
Using Custom Templates:
There are two methods that you can use. First, custom
templates can be used by threading with the -tempfile flag i.e.
threading.LINUX -phdfile myseq.ss -tempfile custom_template.xml
This method, however, is not suggested, because later tools, such as convertProspect
will not be able to find the template. Instead put the template in
one of the directories that have been defined in $PROSPECT_PATH/data/parameters/template_paths
A good plan would be to put personal custom templates in $HOME/prospect_templates
and put templates that you want to share with other users on your system
in $PROSPECT_PATH/data/templates_local
Segment Formats:
The fssp portion of the XML file:
REM 1eso with 154 residues (58, 52) and 9 cores
| REM |
F |
RS |
NUM |
SS |
ACC |
x-Cb |
y-Cb |
z-Cb |
x-Ca |
y-Ca |
z-Ca |
| RES |
1 |
A |
2 |
L |
84 |
3.272 |
3.144 |
-9.294 |
2.420 |
1.910 |
-9.163 |
| RES |
4 |
S |
3 |
E |
69 |
3.293 |
-1.759 |
-6.554 |
4.056 |
-0.439 |
-6.662 |
| RES |
4 |
E |
4 |
E |
78 |
8.146 |
-0.153 |
-4.910 |
7.456 |
-1.438 |
-5.377 |
| RES |
4 |
K |
5 |
E |
162 |
9.042 |
-5.858 |
-4.437 |
8.815 |
-4.549 |
-3.671 |
| RES |
4 |
V |
6 |
E |
3 |
10.608 |
-3.092 |
0.509 |
11.042 |
-4.049 |
-0.610 |
| RES |
4 |
E |
7 |
E |
130 |
14.309 |
-7.618 |
-0.130 |
13.145 |
-6.962 |
0.609 |
| RES |
4 |
M |
8 |
E |
0 |
12.084 |
-7.323 |
5.161 |
13.400 |
-7.190 |
4.397 |
| RES |
4 |
N |
9 |
E |
43 |
17.660 |
-8.126 |
6.095 |
16.260 |
-8.570 |
6.535 |
| RES |
4 |
L |
10 |
E |
38 |
17.388 |
-10.690 |
10.731 |
17.001 |
-9.323 |
10.157 |
| RES |
4 |
V |
10A |
E |
9 |
18.814 |
-5.342 |
11.790 |
19.210 |
-6.815 |
11.942 |
| RES |
2 |
T |
10B |
E |
76 |
22.120 |
-8.432 |
15.178 |
21.395 |
-7.098 |
15.002 |
| RES |
1 |
S |
10C |
T |
58 |
24.149 |
-4.845 |
18.202 |
23.867 |
-4.830 |
16.703 |
Header: a general description of the template
| REM |
1eso with |
154 residues |
(58, |
52) |
and |
9
cores |
| |
| |
| |
| |
| |
|
| |
| |
| |
| |
| |
| |
|
| |
| |
| |
| |
| |
| |
|
number of core secondary structures
|
| |
| |
| |
| |
| |
|
|
| |
| |
| |
| |
number of
residues considered as core residues, i.e., with flag 4 |
|
|
| |
| |
| |
| |
|
|
|
| |
| |
| |
number of
residues as alpha-helix or beta-sheet, i.e., with flag 2-4
|
|
|
|
| |
| |
| |
|
|
|
|
| |
| |
total number of residues
in the template name |
|
|
|
|
| |
| |
|
|
|
|
|
| |
template name,
same as PDB code |
|
|
|
|
|
REM: entry label (REM for remarks and RES for
protein residues)
F: flag (1 for sequence only, 2 for an alpha-helix
or beta-sheet residue, 3 for an alpha-helix or beta-sheet residue without
C-beta coordinates, and 4 for a core residue)
RS: one-letter code of amino acid type (X for
residues other than the standard 20 amino acid types).
NUM: residue number in PDB (including possible
extra character associated with it).
SS: secondary structure type, using the same convention
as in DSSP (e.g., H for alpha-helix and E for beta-sheet).
ACC: Solvent accessible surface area calculated
by DSSP.
x-Cb, y-Cb, z-Cb: C-beta coordinates.
x-Ca, y-Ca, z-Ca: C-alpha coordinates.
|