Exon chaining.Each gene corresponds to a subregion of the overall genome (the DNA sequence); however, part of this region might be “junk DNA.” Frequently, a gene consists of several pieces called exons, which are separated by junk fragments called introns. This complicates the process of identifying genes in a newly sequenced genome.

Suppose we have a new DNA sequence and we want to check whether a certain gene (a string) is present in it. Because we cannot hope that the gene will be a contiguous subsequence, we look for partial matches—fragments of the DNA that are also present in the gene (actually, even these partial matches will be approximate, not perfect). We then attempt to assemble these fragments.

Let x 1Kndenote the DNA sequence. Each partial match can be represented by a triple li,ri,wi, where xliKriis the fragment and is a weight

representing the strength of the match (it might be a local alignment score or some other statistical quantity). Many of these potential matches could be false, so the goal is to find a subset of the triples that are consistent (nonoverlapping) and have a maximum total weight.

Show how to do this efficiently.

Short Answer

Expert verified

Dynamic programming solves the problem in On

Step by step solution

01

Step 1:Explain Exon Chaining

The DNA sequence or genome is written asx1Kn

Each partial matching given represents the starting and the weight of the match.

From these given sets of intervals, the maximum chain of intervals in required.

The greedy solution to this problem repeats the process of taking the chain with the highest scores first and then the largest left in the range until no more chains can be taken. Then sum the taken chains to get the maximum score.

But by constructing a graph, this problem can be solved in Ontime complexity.

02

Step 2:Give Algorithm

Algorithm is as follows,

With a graph GV,E, the algorithm is given as follows:

(G, n)

for i= 1 to 2n

resi=0

for i = 1 to 2n

if viin G corresponds to right end l

jileft end index of vertex for l

wi-weight

resj=maxresj+wi,resj-1

else

resi=resi-1

return res2n

03

Explain the algorithm

Explanation:

The Exon chaining problem can be solved using dynamic programming in a graph.

And that same approach is provided above.

The problem for n interval can be solved using 2nvertices in graph.

Assume that the set of left and right interval ends is sorted into ascending order.

And all positions are definite. This forms an ordered array of vertices.

There are 3n-1edges in the graph. In the algorithm, residenotes the length of the longest path ending at vertex vi in the graph.

Therefore the final solution is res2n

Thus, dynamic programming solves the problem in O (n)

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Pebbling a checkerboard. We are given a checkerboard which has 4 rows and ncolumns, and has an integer written in each square. We are also given a set of 2n pebbles, and we want to place some or all of these on the checkerboard (each pebble can be placed on exactly one square) so as to maximize the sum of the integers in the squares that are covered by pebbles. There is one constraint: for a placement of pebbles to be legal, no two of them can be on horizontally or vertically adjacent squares (diagonal adjacency is fine).

  1. Determine the number of legal patterns that can occur in any column (in isolation, ignoring the pebbles in adjacent columns) and describe these patterns.

Call two patterns compatible if they can be placed on adjacent columns to form a legal placement. Let us consider subproblems consisting of the first columns 1kn. Each subproblem can be assigned a type, which is the pattern occurring in the last column.

  1. Using the notions of compatibility and type, give an O(n)-time dynamic programming algorithm for computing an optimal placement.

Optimal binary search trees. Suppose we know the frequency with which keywords occur in programs of a certain language, for instance:

begin5%do40%else8%end4%

if10%then10%while23%

We want to organize them in a binary search tree, so that the keyword in the root is alphabetically bigger than all the keywords in the left subtree and smaller than all the keywords in the right subtree (and this holds for all nodes). Figure 6.12 has a nicely-balanced example on the left. In this case, when a keyword is being looked up, the number of comparisons needed is at most three: for instance, in finding “while”, only the three nodes “end”, “then”, and “while” get examined. But since we know the frequency 196 Algorithms with which keywords are accessed, we can use an even more fine-tuned cost function, the average number of comparisons to look up a word. For the search tree on the left, it is

cost=1(0.04)+2(0.40+0.10)+3(0.05+0.08+0.10+0.23)=2.42

By this measure, the best search tree is the one on the right, which has a cost of Give an efficient algorithm for the following task. Input: n words (in sorted order); frequencies of these words: p1,p2,...,pn.

Output: The binary search tree of lowest cost (defined above as the expected number of comparisons in looking up a word).

Figure 6.12 Two binary search trees for the keywords of a programming language.

Cutting cloth. You are given a rectangular piece of cloth with dimensions X×Y, whereX and Yare positive integers, and a list of products that can be made using the cloth. For each producti[1,n] you know that a rectangle of cloth of dimensionsai×bi is needed and that the final selling price of the product is ci. Assume the,ai biandci are all positive integers. You have a machine that can cut any rectangular piece of cloth into two pieces either horizontally or vertically. Design an algorithm that determines the best return on theX×Y piece of cloth, that is, a strategy for cutting the cloth so that the products made from the resulting pieces give the maximum sum of selling prices. You are free to make as many copies of a given product as you wish, or none if desired.

Consider the following 3-PARTITION problem. Given integersa1,...,an, we want to determine whether it is possible to partition of {1,...,n} into three disjoint subsets I,J,Ksuch that

aiiI=ajjJ=akkk=13aii1 .

For example, for input(1,2,3,4,4,5,8) the answer is yes, because there is the partition(1,8),(4,5),(2,3,4). On the other hand, for input(2,2,3,5) the answer is no. Devise and analyze a dynamic programming algorithm3-PARTITION for that runs in time polynomial in n and in Σiai.

You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like “itwasthebestoftimes...”). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(.): for any string w,

dict(w)={trueifwisavalidwordfalseotherwise

Give a dynamic programming algorithm that determines whether the string s[.]can be reconstituted as a sequence of valid words. The running time should be at mostO(n2) , assuming calls to dict take unit time.

In the event that the string is valid, make your algorithm output the corresponding sequence of words.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free