Chapter 6: 20 E (page 169)

Optimal binary search trees. Suppose we know the frequency with which keywords occur in programs of a certain language, for instance:
$\begin{array}{l} begin 5 % \\ do 40 % \\ else 8 % \\ end 4 % \end{array}$
$\begin{array}{l} if 10 % \\ then 10 % \\ while 23 % \end{array}$
We want to organize them in a binary search tree, so that the keyword in the root is alphabetically bigger than all the keywords in the left subtree and smaller than all the keywords in the right subtree (and this holds for all nodes). Figure 6.12 has a nicely-balanced example on the left. In this case, when a keyword is being looked up, the number of comparisons needed is at most three: for instance, in finding “while”, only the three nodes “end”, “then”, and “while” get examined. But since we know the frequency 196 Algorithms with which keywords are accessed, we can use an even more fine-tuned cost function, the average number of comparisons to look up a word. For the search tree on the left, it is
$cost = 1 (0.04) + 2 (0.40 + 0.10) + 3 (0.05 + 0.08 + 0.10 + 0.23) = 2.42$
By this measure, the best search tree is the one on the right, which has a cost of Give an efficient algorithm for the following task. Input: n words (in sorted order); frequencies of these words: $p_{1}, p_{2}, . . ., p_{n} .$
Output: The binary search tree of lowest cost (defined above as the expected number of comparisons in looking up a word).
Figure 6.12 Two binary search trees for the keywords of a programming language.

Short Answer

Expert verified

To obtain minimum cost binary search tree, we need to calculate the cost of each possible binary search tree which can be obtain from main tree. This problem can be easily solve using dynamic programming paradigm because we have subproblem as each subtree at root node will be a problem itself to calculate minimum cost of subtree.

Step by step solution

Binary Search Tree (BST) and Dynamic programming approach.

A binary search tree has following properties:

The left subtree of a node contains nodes with keys having lesser value
The right subtree of a node contains node with keys having greater value.
The left and the right subtree must also be a binary search tree.

In dynamic programming there are all possibilities and more time as compared to greedy programming. and the Dynamic programming approach always gives the accurate or correct answer. In dynamic programming have to compute only distinct function call because as soon as compute and store in one data structure so that after this reuse afterward if it is needed.

Defining Recurrence Relation

Here we need to define two functions:

$W (i, j) :$ It is the sum of all probabilities of all the nodes within that tree or subtree.
$E (i, j) :$ It will pick the ’r’ root node that will create further two subtrees.

Our recurrence relations are:

$\begin{array}{l} E (i, j) = m i n \{E (i, r - 1) + w (i, j)\}; f o r i \leq r \leq j \\ w (i, j) = W (i, j - 1) + p \end{array}$

In $E (i, j) :$ , Left Subtree is from $‘ i ’ t o (r - 1)$ and Right Subtree is from $(r + 1) t o ‘ j ’ .$

$W (i, j) :$ will merge these two subtrees: Left Subtree and Right subtree.

Algorithm

$p [1 \dots . n] i$ s array of words frequencies of n

$\begin{array}{l} f o r (s = 1 t o n) \\ \begin{array}{l} f o r (i = 0 t o n - s) \end{array} \\ \begin{array}{l} j = i + s \end{array} \\ \begin{array}{l} i f (i = j) \end{array} \end{array}$

data-custom-editor="chemistry" $\begin{array}{l} T (i, j) = p [i] \\ \begin{array}{l} f o r (k = i t o j + 1) \end{array} \\ \begin{array}{l} T (i, j) = T (i, j) + p [k] \end{array} \\ \begin{array}{l} r e t u r n T (0, n - 1) \end{array} \end{array}$