Dynamic Programming (DP)

Synopsis

DP is a problem solving strategy based on successive decomposition, that is a problem is solved by imbedding it - via decompostion - in a family of related problems and solving the "relation" between the solutions to these problems. Typically this involves translating a problem into a functional equation and then solving this equation.

Although the basic strategy underlying DP's appraoch to problem solving is old, its modern idiom was formulated in the early 1950s by Richard Bellman - the Father of DP. Bellman's formulation is characterised by two features:

In the OR/MS world DP is regarded primarily as an optimization method and is used as such. However, DP is also used extensively in the context of problems that are not optimization problems.

DP's approach to problem solving is regarded by many more of an Art than Science.

Keywords

Principle of optimality, sequential decision process, functional equation, state, stage, transitions, policy, curse of dimensionality, Richard Bellman

References

Resources

Links

Contributor

Moshe Sniedovich
Department of Mathematics and Statistics
The University of Melbourne
Parkville, VIC 3052
Australia
E-mail: m.sniedovich@ms.unimelb.edu.au
WWW: http://www.ms.unimelb.edu.au/~moshe/


Body

Draft, in preparation .... and subject to changes ...

Table of contents:

Introduction

Dynamic programming (DP) is an approach to problem solving based on successive decomposition. Using DP, you solve a problem by decomposing it into sub-problems and then investigate the relationship between the solutions to these sub-problems. Typically the relationship between the solutions to these problems is expressed as a functional equation, that is an equation with an unknown function.

Sometime it is easy to solve the functional equation, sometime it is difficult, and sometime it is impossible.

The good news is that it is not too difficult to formulate a recipe for solving problem the DP way> In fact, here is one:

The bad news - most students will verify this - is that it is not easy to implement this recipe. Indeed, there is a general agreement that DP is more Art than Science. One manifestation of this is that there is no such thing as a "general purpose DP software package".

We should hasten to stress in this regard that on the face of it this is rather surprising because most human beings use DP (without realising this!) routinely. The difficulty arises when we have to use DP to solve problems that we are not familiar with.

Our first goal is to illustrate how DP solves a number of simple problems. Then we shall refine our recipe and consider some of the more technical aspects of the method.

Examples

The examples we provide in this section are intended to illustrate how you can use the recipe given earlier as a general guide. Each example is designed to illustrate a specific point.

The first point about the recipe (and DP) we wish to illustrate is that it indeed represent an approach to problem solving that we all use regularly.


Example 1

We are given a list of numbers and our mission is to compute the sum of the elements of this list. Let us call the list say x and let x[j] denote the j-th element of the list. Also, let us assume that the list consists of n element so that j=1,2,...,n. . For instance, consider the case where x = (-1,2,-3,4,5) thus n=5.

We are instructed by the recipe to regard the problem we wish to solve as a sequential or multistage problem. This is easy: let us assume that the elements if the list are spread all over the world (say Melbourne, Tel Aviv, Paris, New York, Pretoria) and that we compute the sum of these elements as we go from one city to the other.

This view of the problem can be represented by the following table

jCityNumber

1Melbourne -1
2Tel Aviv 2
3Paris -3
4New York 4
5Pretoria 5

sum ?

Next, our task is to decompose the problem into subproblems. Obviously in anticipation to the next instruction, we shall use a decomposing scheme that will yield subproblems whose solutions till be related to the solution of the original problem.

This is not difficult because we know that our original problem (computing the sum of the 5 elements of x) is strongly related to the subproblem of computing the sum of the first 4 elements of x. This in turn, is strongly related to the subproblem of computing the sum of the first 3 elements of x. and so on.

The keyword here is then partial sums. So let us define

f(j) := sum of the first j elements of x, j=1,2,3,4,5.

and adjust the tabular view of the problem:

jCityNumber   f(j)

1Melbourne -1?
2Tel Aviv 2?
3Paris -3?
4New York 4?
5Pretoria 5?

So, our strategy is to compute the total sum - which by definition is equal to f(5) - by iteratively computing the partial sums of the elements of the list.

To accomplish this we observe that from the definition of f(j) it follows that

f(1)

=

x[1]

(1)

f(j)

=

f(j-1) + x[j] , j=2,3,4,5

This is the DP functional equation for the model we have built for our problem. Our object of desire, f, is regarded as a function and we know that this function satisfies the equation given in (1).

We solve the functonal equation for j=2,3,4,5 - in this order, namely we set f(1) = x[1] and then use (1) to compute f(j) for j=2, then for j=2, and so on. The following is a summary of the process:

jCityNumber   f(j)

1Melbourne -1-1
2Tel Aviv 21
3Paris -3-2
4New York 42
5Pretoria 57

We conclude then that the sum of the elements of the given list is equal to 7.

End of Example 1

The second example is a bit more difficult - but just a bit. It is intendent to stress the fact that the functional equation of DP does not have to be algebraic or numeric in natrure.

Example 2

Suppose that we have a container of volume V and a collection of n items, call them j=1,2,3,...,n. Let v(j) denote the volume of item j.

Your mission is to determine whether there is a sub-collection of this items that will fill the container to the brim. We are not particularly interested in what this sub-collection is, what items it will include. We just want to know whether the answer is "yes" or "no".

And just in case you need a concrete example, here is one:

Item
j

  Volume  
v(j)


1

38

2

23

3

52

4

18

5

49

6

17

7

27

8

14


Container

189

It is possible to fill the container to the brim with a subset of the items?

It is not difficult to cast this problem as a sequential decision problem: we go through the items, one by one, and decide whether or not to place the item in the container.

It is also not difficult to imbed the original problem in a family of related problems. In fact all we have to do is generalise a bit the problem by regarding V - the volume of the container - as a parameter. In any case, consider the following family of problems:

Given a collection of items whose volumes are known, can we fill to the brim a container of volume z, where z is in the interval [0,V] for some given positive number V?

To derive a functional equation for this family of problems, let Problem P(z) denote the above problem for the given value of z, and let f(z) denote the solution to Problem P(z). That is let

f(z)

:=

"yes"

if and only if it is possible to fill to the brim a container of volume z

and

f(z)

:=

"no"

if and only if it is not possible to fill to the brim a container of volume z.

The key observation is this: f(z) is equal to "yes" if and only if there is an item of volume y such that f(z-y)="yes". In words, the only way we can fill a container of volume z to the brim is to fill a (virtual) container of volume z-y to the brim and add an item of volume y to the container.

In short, the functional equation is as follows: f(0) = "yes" and for z > 0 we have

f(z)

=

"yes"

if and only if f(z-v(j)) = "yes" for some item j.

and

f(z)

=

"no"

if and only if f(z-v(j)) = "no" for all items j=1,2,...,n.

Observe that by inspection

f(v(j))

=

"yes"

for all items j=1,2,...,n

and

f(v(j))

=

"no"

for all z in the open interval (0,v(j*))

where j* denotes the item with the smallest volume.

Hence the functional equation can be easily (but perhaps a bit tediously) solved for any positive value of z.

End of Example 2

As indicated in the synopsis, in OR/MS DP is used primarily as an optimization method. Our third example is designed to illustrate this point.

The problem itself is a simple version of the shortest path problem - one of the most famous problems in the OR/MS Universe.

Example 3

Suppose that you are given a set of cities (call them j=1,2,...,n) and a table specifying the travel time (hrs) between cities, call them t(i,j), that is let t(i,j) denote the travel time from city i to city j. You are also given a pair of cities one reffered to as the Home city the other as the Destination. Your mission is to determine the best way to travel from the Home city to the Destination, where best means "as fast as possible". It is assumed that the travel times are additive: the travel time from city A to city C via city B is equal to the sum of the travel time from city A to city B and the travel time from city B to city C.

For simplicity and without loss of generality let j=1 be the Home city and let j=n be the Destination.

The following table represents a concrete instance of this generic problem.

1 2 3 4 5 6
1 - 5 6 3 14 15
2 15 - 3 6 15 14
3 13 16 - 2 7 31
4 11 16 14 - 3 2
5 1 7 12 15 - 6
6 3 2 5 4 7 -

The travel times ( t(i,j) ) are read from the table in the usual manner, that is t(i,j) is the entry at the intesection of row i and column j.

What is the shortest travel time from city 1 to city 6?

The sequential nature of the problem is crystal clear: think about it as if at each city you visit you decide on which city should be visited next.

The decompostion scheme is also intuitive in nature: to get to the destination from the home city we may have to visit some intermediate cities. The question therefore naturally arises: what is the best way to reach city j from the Home city - for an arbitraty city j ?

To derive a functional equation for this problem, let

f(j)

=

shortest travel time from the home city to city j, j=1,2,...,n

and

f(j;i)

=

shortest travel time from the home city to city j, given that city j is visited immediately after city i.

Then clearly

f(j)

=

min {f(j;i) : i in D(j)} , j=2,3,...,n

where D(j) is the set of all cities except city j. This is so because by the nature of the problem we must visit some city (call it i) just before we reach city j.

Upon reflection, we observe that it is not difficult to determine the value of i for which f(j) = f(j;i) as by definition f(j;i) is equal to the sum of two things:

But because we wish to make this sum as small as possible, it is obvious that in this case it would be best to reach city i as fast as possible. Thus,

f(j;i)

=

t(i,j) + f(i)

hence

f(j)

=

min {t(i,j) + f(i): i in D(j)}

, j=2,3,...,n

with f(1) := 0.

So far so good: we have formulated a simple functional equation for the shortest path problem.

Solving this equation, however, is a totally different matter. Depending on certain properties of the problem (eg whether there are cycles in the underlying network) the task of solving the functional equation can be simple, complicated or even very complicated.

Sequential Decision Processes

To be in a DP mood, regard your problem as if it constitutes a sequential decision process. You task is to make a number of decisions in a sequential manner. The conceptual framework is as follows: you observe a system whose initial state, call it s(1), is known. You then make a decision, call it x(1), whereupon the system changes its state to some state s(2)=T(s(1),x(1)) dictated by the transition function T. You then make your second decision, x(2), whereupon the system changes its state to s(3)=T(s(2),x(2)). This process continues: after a number of iterations the system will be in state s(n) and you'll make your n-th decision, x(n), whereupon the system will change its state to s(n+1)=T(s(n),x(n)).

This process can either continues indefinitely, or it may terminate when the system reaches some final state s(k).

The important thing about this process is that the state of the system after n decisions, namely s(n+1), is uniquely determined by the state observed just prior to s(n+1), namely s(n), and the decision made at that state, namely x(n). That is, s(n)=T(x(n-1),s(n-1)).

More to come .....


Disclaimer:This page, its contents and style, are the responsibility of the author (Moshe Sniedovich, Department of Mathematics and Statistics) and do not represent the views, policies or opinions of The University of Melbourne.