December 12, 2011
My project involves finding Eulerian paths through a de Bruijn graph created from DNA reads that combine to form a path through the graph where every edge is traveled only once. Each graph contains a single fragment, and all the fragments can be assembled to for the sequence.
A* search is used to find the shortest paths from designated nodes and then on the remaining edges until all edges are exhausted.
The use of the A* search algorithm to assemble DNA reads to create fragments using a heuristic function that calculated the number of edges between the current node and the goal node.
Evaluation of Results
This was an opportunity to study the A* search algorithm that searches for the best path through the de Bruijn graph to the goal node. The first time through all the edges may not be included in the resulting path. Finding a vertex already in the path and setting the goal as the start node. Finding that path and combining it into the path already found. Checking to see if there are any unvisited edges and finally assembling the fragment.
While working on the project, I realized the need for an error correction phase which was included in the edge building phase of the code. The idea is that the removal of erroneous edges eliminates some of the duplication that occur in the fragments.