Thesis (Index)   <-  Sean Forman   <-  You Are Here



Next: Amino Acids Up: TORSION ANGLE SELECTION AND Previous: Tweaking Results Subsections


Discussion and Future Work

In this thesis, we presented work that is a part of a larger, multiple-investigator project to build an ab initio protein structure predictor called HOPS. These contributions included a technique for selecting the backbone torsion angles which define the parameters of our search space and a technique for the alignment of $ \beta$-strands into properly formed $ \beta$-sheets. We provided background on the protein folding problem and described the techniques we used to solve the two problems mentioned. We also provided examples of the algorithms at work.

Evaluation of HOPS

Ab initio protein structure prediction is difficult, and our results display this difficulty. We have not successfully predicted a known structure as of yet, but we are continuing to improve our energy function and speed up the manner in which we search through our search tree.

The primary strength of our technique is the full atom representation we use. Most techniques significantly simplify the details of an actual protein. We feel that given the computational sophistication we are utilizing in our search algorithms, we are able to accept the added computational burden. Additionally, the exhaustive nature of our search will make us much less susceptible to local minimums in our search.

Preliminary tests of the complete technique indicate that we will need to make certain that we can pack the protein tightly enough. With discrete torsion angles, we often see predicted structures that are much looser and more extended than native proteins would be. This is caused by the discrete torsion angle set and the fact that this set of angles makes it difficult to work around steric clashes.

Future Direction in Our Work

With a project as large and varied as this one, there are numerous areas where new techniques can be tried and incremental improvements can be made. Here is a list of some of the improvements to HOPS that could be made in the future.

Tweaking and optimism - Optimism is already incorporated into tweaking (see Section 5.2.4), but we could further expand its use. We have done some work on studying the characteristics of successful and unsuccessful tweaks and have looked at a series of rules that could be applied prior to attempting the tweak. This would save some computational costs as we fold the proteins.
Caching of ineffective tweaks - Along similar lines to the first point, we could form a cache of unsuccessful tweaking events, which would prevent us from attempting tweaks we already have determined to be unsuccessful. This would be more difficult to implement than the clash cache, which deals with local clashes. Storing an unsuccessful tweak between amino acids 20 or more amino acids apart would be more difficult and of less benefit.
Tweaking and disulfides - Tweaking could also be used in an attempt to align two Cys, so that they form a disulfide bridge.
Alignment of secondary structure like sheets and helices - Just as with disulfide bridges, we could use tweaking to align existing secondary structures into larger motifs.
Incremental alignment of a new strand to a previous strand - Currently strand propogation occurs when a specific series of backbone angles are used in two $ \beta$-strands. With some modifications, we may be able to build a strand and then fit a mating strand to the existing one using some optimization techniques as each amino acid for the mating strand is added. This could produce sheets that are less rigidly shaped than the ones we produce now.
Implementing secondary structure information - Utilizing nagging we have the ability to freeze a portion of a protein in a certain conformation and then fold the protein around that frozen portion. This could work in concert with secondary structure prediction techniques. For instance, if some portion of a protein was predicted to contain a helix with high probability, we could set that portion as a helix and then fold the remainder of the protein around it. This would be especially effective using nagging, as a nagger could search this restricted space while the root processor runs the complete search.
Refinement of the scoring function - While we feel the scoring function we use captures a significant amount of the forces at work in protein folding, there is always the potential for continued incremental improvement.
Utilization of other techniques - There is a stunning variety of ad hoc heuristics at use in predicting protein structure. Comparative modeling techniques (see Section 2.2.1) are just a small subset. There are numerous techniques for predicting secondary structure and other characteristics of an unknown target. We are currently maintaining a ``pure'' ab initio technique with no inputs other than the amino acid sequences and basic thermodynamic characteristics. Our results would likely improve if they were informed by results from other techniques.
Choice of cluster data - The quality and number of cluster points selected for inclusion in our search tree likely has the greatest effect on the success of our search. While we have spent time deciding the best way to choose the cluster points when given a set of data, we have not spent as much time studying how to construct the input torsion angle data given a target protein.

During 2002, the Fifth Critical Assessment of Techniques for Protein Structure Prediction (CASP V) will take place. Continuing our work in this area and implementing new ideas and features should make entering HOPS into this assessment possible.

Future Directions in the Field

Without a doubt, the next ten to twenty years will see dramatic advances in the field of protein structure prediction. Computational scientists began taking an active interest in biology only recently and their collaboration with biologists, chemists and biochemists should bear significant fruit very soon. For instance, IBM's Blue Gene initiative will pour $100 million into protein structure prediction and as a result hopes to develop a petaflop supercomputer during the next seven years [38].

The completion of the Human Genome Project signals the end of the first step in understanding the processes at work in our bodies. From there, we will need to find the structures of these molecules, the means in which they form, and from there how they interact and regulate the processes in our bodies [36]. Protein structure prediction is the next step in that progression. Better algorithms and computers will allow us to begin predicting protein structures with improved accuracy in the next decades.


next up previous
Next: Amino Acids Up: TORSION ANGLE SELECTION AND Previous: Tweaking Results
sforman@sju.edu