Sequence alignment. When a new gene is discovered, a standard approach to understanding its function is to look through a database of known genes and find close matches. The closeness of two genes is measured by the extent to which they are aligned. To formalize this, think of a gene as being a long string over an alphabet ={A,C,G,T}. Consider two genes (strings) x=ATGCCand y=TACGCA. An alignment of x and y is a way of matching up these two strings by writing them in columns, for instance:

A-T-GCCTA-CGC

Here the “_” indicates a “gap.” The characters of each string must appear in order, and each column must contain a character from at least one of the strings. The score of an alignment is specified by a scoring matrixδof size (+1)×(+1), where the extra row and column are to accommodate gaps. For instance the preceding alignment has the following score:

δ(-T)+δ(A,A)+δ(T,-)+δ(G,G)+δ(C,C)+δ(C,A)

Give a dynamic programming algorithm that takes as input two strings X[1K n] and Y {1K m} and a scoring matrix δand returns the highest-scoring alignment. The running time should be O(mn) .

Short Answer

Expert verified

The dynamic algorithm that runs in O(nm) time has been obtained.

Step by step solution

01

Explain the given problem

Consider the two strings x and y with the length n and m respectively. Define a 2-Dimension array or a matrix that stores the score of aligning the string. Each value of the matrix signifies the score of aligning the sequence .

02

Defining the Recursive Relation

Based on he given condition, The matrix should be constructed using the following condition:

  • Align the strings as it is in input but omit the last element of these strings and then align characters xpand yp
  • The recurrence will be:Si,j=maxSi-1,j+δxi,-Si,j-1,j+δ-,yiSi-1,j-1+0δxi,yj
03

Analysis of the Recurrence Relation

Since string x1x2x3Kxphas n characters and string y1y2y3kyqhas characters, The recursion relation will run up to n*mtimes. Hence time complexity is: O (nm).

Therefore, the dynamic algorithm that runs in O (nm) time has been obtained.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Exon chaining.Each gene corresponds to a subregion of the overall genome (the DNA sequence); however, part of this region might be “junk DNA.” Frequently, a gene consists of several pieces called exons, which are separated by junk fragments called introns. This complicates the process of identifying genes in a newly sequenced genome.

Suppose we have a new DNA sequence and we want to check whether a certain gene (a string) is present in it. Because we cannot hope that the gene will be a contiguous subsequence, we look for partial matches—fragments of the DNA that are also present in the gene (actually, even these partial matches will be approximate, not perfect). We then attempt to assemble these fragments.

Let x 1Kndenote the DNA sequence. Each partial match can be represented by a triple li,ri,wi, where xliKriis the fragment and is a weight

representing the strength of the match (it might be a local alignment score or some other statistical quantity). Many of these potential matches could be false, so the goal is to find a subset of the triples that are consistent (nonoverlapping) and have a maximum total weight.

Show how to do this efficiently.

You are given a convex polygon P on n vertices in the plane (specified by their x and y coordinates). A triangulation of P is a collection of n-3diagonals of such that no two diagonals intersect (except possibly at their endpoints). Notice that a triangulation splits the polygon’s interior into n-2 disjoint triangles. The cost of a triangulation is the sum of the lengths of the diagonals in it. Give an efficient algorithm for finding a triangulation of minimum cost. (Hint: Label the vertices of P by 1,....,n, starting from an arbitrary vertex and walking clockwise. For 1i<jn, let the subproblem A(i,j)denote the minimum cost triangulation of the polygon spanned by vertices i,i+1,...,j.).

A contiguous subsequence of a list Sis a subsequence made up of consecutive elements of S. For instance, if Sis 5,15,30,10,5,40,10

then15,30,10 is a contiguous subsequence but5,15,40 is not. Give a linear-time algorithm for the following task:Input: A list of numbers a1,a2,...,an.

Output: The contiguous subsequence of maximum sum (a subsequence of length zero has sum zero).For the preceding example, the answer would be 10,5,40,10, with a sum of 55. (Hint: For each j{1,2,...,n}, consider contiguous subsequences ending exactly at position j.)

Consider the following variation on the change-making problem (Exercise 6.17): you are given denominations x1,x2,...,xn, and you want to make change for a value v, but you are allowed to use each denomination at most once. For instance, if the denominations are 1,5,10,20,then you can make change for 16=1+15and for 31=1+10+20but not for 40(because you can’t use 20 twice).

Input: Positive integers; x1,x2,...,xnanother integer v.

Output: Can you make change for v, using each denominationxi at most once?Show how to solve this problem in time O(nV).

The garage sale problem (courtesy of Professor Lofti Zadeh). On a given Sunday morning, there are n garage sales going on, g1,g2,g3............gn. For each garage sale gj, you have an estimate of its value to you, vj. For any two garage sales you have an estimate of the transportation cost dijof getting from gito gj. You are also given the costs d0jand dj0of going between your home and each garage sale. You want to find a tour of a subset of the given garage sales, starting and ending at home, that maximizes your total benefit minus your total transportation costs. Give an algorithm that solves this problem in time O(n22n).

(Hint: This is closely related to the traveling salesman problem.)

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free