Local sequence alignment. Often two DNA sequences are significantly different, but contain regions that are very similar and are highly conserved. Design an algorithm that takes an input two strings x[1Kn]and y[1Km]and a scoring matrix δ(as defined in Exercise 6.26), and outputs substrings x'andy'of x and y respectively, that have the highest-scoring alignment over all pairs of such substrings. Your algorithm should take time O(mn).

Short Answer

Expert verified

The complexity of the program is Omn

Step by step solution

01

Local Sequence Alignment

The total scoring alignment is the cost of editing the strings using insertion, deletion, or gap penalties.

Suppose the given strings are x=x1,x2,Kxnandy=y1,y2,Kym.

The first step is to determine the similarity score of the elementsδa,b and the gap penalty of length k.

In the next step the first row and column of the scoring matrix M of size role="math" localid="1658918665081" n+1times m+1to zero Then in the next step, the scoring matrix is filled.

Then tracing back from the highest score to zero in the scoring matrix gives the best alignment.

02

Step 2:Give Algorithm

Algorithm:

The algorithm can be written as given below:

δa,b- score

Gk- gap penalty of length k

Mn+1m+1- scoring matrix

for o=0ton

M01=0

for o=0 to m

M10=0

for o=1 to n

for p=1 to n

Mop=max1Mo-1p-1+δao,bpmaxk1,Mo-kp-Gkmaxl1,Mop-1-Gk

Traceback from highest alignment score to 0.

03

Step 3:Explain Algorithm

Explanation:

Mopis a optimal score of aligning.

There are only a polynomial number of subproblems.

Every subproblems can be solved easily by solving smaller subproblems.

See, in step-7 we have three cases. first case is xo=ypsecond case is, xo aligns to a gap and, third case is ypaligns to a gap.

The calculated scoring matrix is of size

So, the complexity of the program is Onmo=

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Reconstructing evolutionary trees by maximum parsimony. Suppose we manage to sequence a particular gene across a whole bunch of different species. For concreteness, say there are n species, and the sequences are strings of length k over alphabet={A,C,G,T}. How can we use this information to reconstruct the evolutionary history of these species?

Evolutionary history is commonly represented by a tree whose leaves are the different species, whose root is their common ancestor, and whose internal branches represent speciation events (that is, moments when a new species broke off from an existing one). Thus we need to find the following:

• An evolutionary tree with the given species at the leaves.

• For each internal node, a string of length K: the gene sequence for that particular ancestor.

For each possible tree T annotated with sequencess(u)kat each of its nodes , we can assign a score based on the principle of parsimony: fewer mutations are more likely.

localid="1659249441524" score(T)=(u.v)E(T)(numberofpositionsonwhichs(u)ands(v)disagree)

Finding the highest-score tree is a difficult problem. Here we will consider just a small part of it: suppose we know the structure of the tree, and we want to fill in the sequences s(u) of the internal nodes u. Here’s an example with k=4 and n=5:


(a) In this particular example, there are several maximum parsimony reconstructions of the internal node sequences. Find one of them.

(b) Give an efficient (in terms of n and k ) algorithm for this task. (Hint: Even though the sequences might be long, you can do just one position at a time.)

Exon chaining.Each gene corresponds to a subregion of the overall genome (the DNA sequence); however, part of this region might be “junk DNA.” Frequently, a gene consists of several pieces called exons, which are separated by junk fragments called introns. This complicates the process of identifying genes in a newly sequenced genome.

Suppose we have a new DNA sequence and we want to check whether a certain gene (a string) is present in it. Because we cannot hope that the gene will be a contiguous subsequence, we look for partial matches—fragments of the DNA that are also present in the gene (actually, even these partial matches will be approximate, not perfect). We then attempt to assemble these fragments.

Let x 1Kndenote the DNA sequence. Each partial match can be represented by a triple li,ri,wi, where xliKriis the fragment and is a weight

representing the strength of the match (it might be a local alignment score or some other statistical quantity). Many of these potential matches could be false, so the goal is to find a subset of the triples that are consistent (nonoverlapping) and have a maximum total weight.

Show how to do this efficiently.

A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position first has a better cost of 20+17=37.

Give a dynamic programming algorithm that, given the locations of m cuts in a string of length , finds the minimum cost of breaking the string into m +1 pieces.

Given an unlimited supply of coins of denominations, we wish to make change for a value ; that is, we wish to find a set of coins whose total value is . This might not be possible: for instance, if the denominations are and 10 then we can make change for 15 but not for 12. Give an dynamic-programming algorithm for the following problem.Input:,; .Question: Is it possible to make change for using coins of denominations ?

Sequence alignment. When a new gene is discovered, a standard approach to understanding its function is to look through a database of known genes and find close matches. The closeness of two genes is measured by the extent to which they are aligned. To formalize this, think of a gene as being a long string over an alphabet ={A,C,G,T}. Consider two genes (strings) x=ATGCCand y=TACGCA. An alignment of x and y is a way of matching up these two strings by writing them in columns, for instance:

A-T-GCCTA-CGC

Here the “_” indicates a “gap.” The characters of each string must appear in order, and each column must contain a character from at least one of the strings. The score of an alignment is specified by a scoring matrixδof size (+1)×(+1), where the extra row and column are to accommodate gaps. For instance the preceding alignment has the following score:

δ(-T)+δ(A,A)+δ(T,-)+δ(G,G)+δ(C,C)+δ(C,A)

Give a dynamic programming algorithm that takes as input two strings X[1K n] and Y {1K m} and a scoring matrix δand returns the highest-scoring alignment. The running time should be O(mn) .

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free