Thesis (Index)
<-
Sean Forman <-
You Are Here
Next: Amino Acids
Up: TORSION ANGLE SELECTION AND
Previous: Tweaking Results
Subsections
Discussion and Future Work
In this thesis, we presented work that is a part of a larger,
multiple-investigator project to build an ab initio protein
structure predictor called HOPS. These contributions included a
technique for selecting the backbone torsion angles which define the
parameters of our search space and a technique for the alignment of
-strands into properly formed
-sheets. We provided background on the
protein folding problem and described the techniques we used to solve
the two problems mentioned. We also provided examples of the
algorithms at work.
Ab initio protein structure prediction is difficult, and our
results display this difficulty. We have not successfully predicted a
known structure as of yet, but we are continuing to improve our energy
function and speed up the manner in which we search through our search
tree.
The primary strength of our technique is the full atom representation
we use. Most techniques significantly simplify the details of an
actual protein. We feel that given the computational sophistication we
are utilizing in our search algorithms, we are able to accept the
added computational burden. Additionally, the exhaustive nature of
our search will make us much less susceptible to local minimums in our
search.
Preliminary tests of the complete technique indicate that we will need
to make certain that we can pack the protein tightly enough. With
discrete torsion angles, we often see predicted structures that are
much looser and more extended than native proteins would be. This is
caused by the discrete torsion angle set and the fact that this set of
angles makes it difficult to work around steric clashes.
With a project as large and varied as this one, there are numerous
areas where new techniques can be tried and incremental improvements
can be made. Here is a list of some of the improvements to HOPS that
could be made in the future.
-
- Tweaking and optimism - Optimism is already incorporated into
tweaking (see Section 5.2.4), but we could further
expand its use. We have done some work on studying the
characteristics of successful and unsuccessful tweaks and have looked
at a series of rules that could be applied prior to attempting the
tweak. This would save some computational costs as we fold the
proteins.
-
- Caching of ineffective tweaks - Along similar lines to the first
point, we could form a cache of unsuccessful tweaking events, which
would prevent us from attempting tweaks we already have determined to
be unsuccessful. This would be more difficult to implement than the
clash cache, which deals with local clashes. Storing an unsuccessful
tweak between amino acids 20 or more amino acids apart would be more
difficult and of less benefit.
-
- Tweaking and disulfides - Tweaking could also be used in an
attempt to align two Cys, so that they form a disulfide
bridge.
-
- Alignment of secondary structure like sheets and helices - Just
as with disulfide bridges, we could use tweaking to align existing
secondary structures into larger motifs.
-
- Incremental alignment of a new strand to a previous strand -
Currently strand propogation occurs when a specific series of backbone
angles are used in two
-strands. With some modifications, we may be able
to build a strand and then fit a mating strand to the existing one
using some optimization techniques as each amino acid for the mating
strand is added. This could produce sheets that are less rigidly
shaped than the ones we produce now.
-
- Implementing secondary structure information - Utilizing nagging
we have the ability to freeze a portion of a protein in a certain
conformation and then fold the protein around that frozen portion.
This could work in concert with secondary structure prediction
techniques. For instance, if some portion of a protein was predicted
to contain a helix with high probability, we could set that portion as
a helix and then fold the remainder of the protein around it. This
would be especially effective using nagging, as a nagger could search
this restricted space while the root processor runs the complete
search.
-
- Refinement of the scoring function - While we feel the scoring
function we use captures a significant amount of the forces at work in
protein folding, there is always the potential for continued
incremental improvement.
-
- Utilization of other techniques - There is a stunning variety of
ad hoc heuristics at use in predicting protein structure.
Comparative modeling techniques (see
Section 2.2.1) are just a small subset.
There are numerous techniques for predicting secondary structure and
other characteristics of an unknown target. We are currently
maintaining a ``pure'' ab initio technique with no inputs
other than the amino acid sequences and basic thermodynamic
characteristics. Our results would likely improve if they were
informed by results from other techniques.
-
- Choice of cluster data - The quality and number of cluster
points selected for inclusion in our search tree likely has the
greatest effect on the success of our search. While we have spent
time deciding the best way to choose the cluster points when given a
set of data, we have not spent as much time studying how to construct
the input torsion angle data given a target protein.
During 2002, the Fifth Critical Assessment of Techniques for Protein
Structure Prediction (CASP V) will take place. Continuing our work in
this area and implementing new ideas and features should make entering
HOPS into this assessment possible.
Without a doubt, the next ten to twenty years will see dramatic
advances in the field of protein structure prediction. Computational
scientists began taking an active interest in biology only recently
and their collaboration with biologists, chemists and biochemists
should bear significant fruit very soon. For instance, IBM's Blue
Gene initiative will pour $100 million into protein structure
prediction and as a result hopes to develop a petaflop
supercomputer during the next seven years [38].
The completion of the Human Genome Project signals the end of the
first step in understanding the processes at work in our bodies. From
there, we will need to find the structures of these molecules, the
means in which they form, and from there how they interact and
regulate the processes in our bodies [36].
Protein structure prediction is the next step in that progression.
Better algorithms and computers will allow us to begin predicting
protein structures with improved accuracy in the next decades.
Next: Amino Acids
Up: TORSION ANGLE SELECTION AND
Previous: Tweaking Results
sforman@sju.edu