Chapter 6: Q30E (page 198)

Reconstructing evolutionary trees by maximum parsimony. Suppose we manage to sequence a particular gene across a whole bunch of different species. For concreteness, say there are n species, and the sequences are strings of length k over alphabet $\underset{}{\sum = {A, C, G, T}}$ . How can we use this information to reconstruct the evolutionary history of these species?
Evolutionary history is commonly represented by a tree whose leaves are the different species, whose root is their common ancestor, and whose internal branches represent speciation events (that is, moments when a new species broke off from an existing one). Thus we need to find the following:
• An evolutionary tree with the given species at the leaves.
• For each internal node, a string of length K: the gene sequence for that particular ancestor.
For each possible tree T annotated with sequences $s (u) \in \sum_{} k$ at each of its nodes , we can assign a score based on the principle of parsimony: fewer mutations are more likely.
localid="1659249441524" $score (T) = \sum_{(u . v) \in E (T)} (number of positions on which s (u) ands (v) disagree)$
Finding the highest-score tree is a difficult problem. Here we will consider just a small part of it: suppose we know the structure of the tree, and we want to fill in the sequences s(u) of the internal nodes u. Here’s an example with k=4 and n=5:

(a) In this particular example, there are several maximum parsimony reconstructions of the internal node sequences. Find one of them.
(b) Give an efficient (in terms of n and k ) algorithm for this task. (Hint: Even though the sequences might be long, you can do just one position at a time.)

Short Answer

Expert verified

(a) One of the reconstructions of the internal nodes sequences:

(b)Algorithm for the given task.

$δ (i, j) = \{\begin{cases} 1 i f i = j \\ 0 o t h e r w i s e \end{cases} L = l e f t - s u b t r e e (T) R = r i g h t - s u b t r e e (T) S (T, C) = \underset{a \in \sum_{}, b \in \sum_{}}{m i n} (S (L, a) + S (R, b) + δ (c, a) + δ (c, b))$

Step by step solution

Step 1:Find one of the reconstruction

(a)

Consider the given tree,start mutation(change in sequence) from one position at a time, this also removes ambiguity of making sequence. Now according to mutation, there can be many evolution trees. One of the possible evolutionary tree is as follows,

Therefore, one of the reconstruction is obtained.

Step 2:Give an efficient algorithm

(b)

Consider one bit at time to define the algorithm. Define the state S(T,c), Where $c \in \{A, C, G, T\}$ . To represent the problem domain,T is the subtree and the minimum score when the root node of T is c. So the following algorithm is,

$δ (I, J) = \{\begin{cases} 1 i f i = j \\ 0 o t h e r w i s e \end{cases} L = l e f t - s u b t r e e (T) R = r i g h t - s u b t r e e (T) S (T, C) = \underset{a \in \underset{}{\sum, b \in \sum_{}}}{m i n} (S (L, a) + S (R, b) + δ (c, a) + δ (c, b))$

Since inner nodes has two sons, the number of inner nodes that can be obtained is the number of leaf nodes minus 1. The state number can be calculated in s=O(n) time.

And the transition cost of the state is O (1). Thus, the whole algorithm complexity is O (nk)

Therefore, the efficient algorithm has been obtained with O (nk) complexity.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Short Answer

Step by step solution

Step 1:Find one of the reconstruction

Step 2:Give an efficient algorithm

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Databases

Data Representation in Computer Science

Algorithms in Computer Science

Big Data

Issues in Computer Science

Theory of Computation

Study anywhere. Anytime. Across all devices.