Thinking Functionally with Haskell (29 page)

BOOK: Thinking Functionally with Haskell
4.99Mb size Format: txt, pdf, ePub

The accumulating parameter method yields

labels t
= labels2 t []

labels2 (Leaf x) xs
= x:xs

labels2 (Fork u v) xs = labels2 u (labels2 v xs)

and
T
(
labels2
)(
n
) = Θ(
n
). This improves the running time of
labels
from quadratic to linear time.

The induction step in the proof that
labels (build xs) = xs
is to assume the
hypothesis for all lists strictly shorter than
xs
:

labels (build xs)

=
{assume
xs
has length at least two

and let
(ys,zs) = halve xs
}

labels (Fork (build ys) (build zs))

=
{definition of
labels
}

labels (build ys) ++ labels (build zs)

=
{induction, since
ys
and
zs
are strictly shorter than
xs
}

ys ++ zs

=
{definition of
halve xs
}

xs

The induction here is
general induction
: in order to prove
P
(
xs
) for all finite lists
xs
it is sufficient to prove that: (i)
P
(
[]
); and (ii)
P
(
xs
) holds under the assumption that
P
holds for all lists of length strictly less than
xs
.

Answer to Exercise J

One key property is that

(xs ++ [x] ++ ys)!!k | k < n = xs!!k

| k==n = x

| k > n = ys!!(n-k)

where n = length xs

The other key property is that sorting a list does not change the length of the list. Hence

select k []
= error "list too short"

select k (x:xs) | k < n = select k ys

| k==n = x

| otherwise = select (n-k) zs

where ys = [y | y <- xs, y < x]

zs = [z | z <- xs, x <= z]

n = length ys

The worst-case running time for a list of length
n
occurs when
k
= 0 and the length of
ys
is
n
−1,
i.e.
when
x:xs
is in strictly decreasing order. Thus
T
(
select
)(0,
n
) =
T
(
select
)(0,
n
−1) +Θ(
n
), with solution
T
(
select
)(0,
n
) = Θ(
n
2
). But, assuming a reasonable distribution
in which each permutation of the sorted result is equally likely as input, we have
T
(
select
)(
k
,
n
) = Θ(
n
).

7.10 Chapter notes

There are many books on algorithm design, but two that concentrate on functional programming are
Algorithms: A Functional Programming Approach
(second edition) (Addison-Wesley, 1999) by Fethi Rabbi and Guy Lapalme, and my own
Pearls of Functional Algorithm Design
(Cambridge, 2010).

Information about profiling tools comes with the documentation on the Haskell Platform. The source book on sorting is Don Knuth’s
The Art of Computer Programming, Volume 3: Sorting and Searching
(second edition) (Addison-Wesley, 1998).

Chapter 8
Pretty-printing

This chapter is devoted to an example of how to build a small library in Haskell. A library is an organised collection of types and functions made available to users for carrying out some task. The task we have chosen to discuss is
pretty-printing
, the idea of taking a piece of text and laying it out over a number of lines in such a way as to make the content easier to view and understand. We will ignore many of the devices for improving the readability of a piece of text, devices such as a change of colour or size of font. Instead we concentrate only on where to put the line breaks and how to indent the contents of a line. The library won’t help you to lay out bits of mathematics, but it can help in presenting tree-shaped information, or in displaying lists of words as paragraphs.

8.1 Setting the scene

Let’s begin with the problem of displaying conditional expressions. In this book we have used three ways of displaying such expressions:

if p then expr1 else expr2

 

if p then expr1

else expr2

 

if p

then expr1

else expr2

These three layouts, which occupy one, two or three lines, respectively, are considered acceptable, but the following two are not:

if p then
expr1 else expr2

 

if p

then expr1 else expr2

The decision as to what is or is not acceptable is down to me, the author. You may disagree with my choices (some do), and a flexible library should provide you with the ability to make your own reasonable choices. In any case, two basic questions have to be answered. Firstly, how can we describe the acceptable alternatives while rejecting the unacceptable ones? Secondly, how do we choose between the acceptable alternatives?

A quick answer to the second question is that the choice depends on the permitted line width. For instance we might choose a layout with the fewest lines, subject to the condition that each line fits within the allotted line width. Much more on this later.

As to the first question, one answer is just to write out all the acceptable alternatives. That’s going to involve a lot of writing. A better alternative is to provide the user with a suitable
layout description language
. As a rough and ready guide we might write something like

if p <0> then expr1 (<0> + <1>) else expr2 +

if p <1> then expr1 <1> else expr2

where
<0>
means a single space,
<1>
means a line break and
+
means ‘or’. The expression above yields our three layouts described earlier. However, the danger with providing the user with an unfettered choice of alternatives is that it becomes difficult to make a decision about the best layout without exploring every possible alternative, and that could take a long time.

Another possibility is to allow only restricted choice by forcing the user to describe layouts in terms of certain functions and operations provided by the library. For example, consider the description
group (group (if p <1> then expr1) <> <1> else expr2)
where
group
augments a set of layouts with one additional layout in which every
<1>
is replaced by
<0>
, thereby flattening the layout to just one line, and
(<>)
means concatenation lifted to sets of alternatives. For example,

group (if p <1> then expr1)
= {if p <0> then expr1, if p <1> then expr1}

group (if p <1> then expr1) <> <1> else expr2

= {if p <0> then expr1 <1> else expr2,
if p <1> then expr1 <1> else expr2}

group (group (if p <1> then expr1) <> <1> else expr2)
= {if p <0> then expr1 <0> else expr2,
if p <0> then expr1 <1> else expr2,
if p <1> then expr1 <1> else expr2}

Thus our set of three acceptable layouts is captured by the above description which contains two occurrences of
group
.

There is another aspect to the problem of displaying conditional expressions. What if
expr1
or
expr2
are themselves conditional expressions? Here we might want to allow a layout like

if p

then if q
then expr1

else expr2

else expr3

The point is that we should allow for
indentation
in our description language. Indentation means putting in a suitable number of spaces after each line break. This idea can be captured by providing a function
nest
so that
nest i x
is a layout in which
each
line break in layout
x
is followed by
i
spaces.

8.2 Documents

For the sake of a name let us agree to call a
document
some entity that represents the set of possible layouts of a piece of text. Documents are given as elements of the type
Doc
whose definition is left for later on. On the other hand, a layout is simply a string:
type Layout = String
We are deliberately being cagey about what a document actually is because we want to consider two representations of
Doc
. For now we concentrate on the operations on documents that our library might provide.

The first operation is a function
that takes a given line width and a document, and returns the best layout. How to define this function efficiently is really the main concern of the chapter.

pretty :: Int -> Doc -> Layout
The second operation is a function
layouts :: Doc -> [Layout]

that returns the set of possible layouts as a list. Why should we want such a function when we have
pretty
? Well, it takes a little experimentation to find the definitions that describe the layouts we regard as acceptable. The way to experiment is to formulate an initial definition and then rework it after inspecting all the resulting layouts on a small number of examples. That way we can see whether some layouts should be excluded or others added. So, whatever our final representation of documents turns out to be, we should provide
layouts
as a sensible diagnostic tool for the user.

The remaining operations deal with constructing documents. First up is the operation of concatenating two documents to give a new one:
(<>) :: Doc -> Doc -> Doc
Document concatenation should surely be an associative operation so we require of any implementation of
(<>)
that
(x <> y) <> z) = x <> (y <> z)
for all documents
x
,
y
and
z
.

Whenever there is an associative operation there is usually an identity element, so we also provide an empty document
nil :: Doc
We require
nil <> x = x
and
x <> nil = x
for all documents
x
.

The next operation is a function
text :: String -> Doc
that takes a string not containing newlines into a document. To provide for documents containing more than one line, we can provide another basic document
line :: Doc
For example,
is a document with a single layout that consists of two lines. You might think that
line
is unnecessary because we could always allow newline characters in text strings, but to indent a document we would then have to inspect the contents of every
text
. Far better is to have an explicit newline document; that way we know where line breaks are.

text "Hello" <> line <> text "World!"

Next, the function

nest :: Int -> Doc -> Doc
provides a way of nesting documents:
nest i
indents a document by inserting
i
spaces
after
every newline. Note the emphasis: indentation is not done at the beginning of a document unless it begins with a newline. The reason for this choice is explained below.

Finally, to complete a library of eight operations, we have the function
group :: Doc -> Doc
This is the function that produces multiple layouts. The function
group
takes a document and adds an extra layout, one that consists of a single line of text with no line breaks.

We have named eight operations and given informal descriptions of what they are intended to mean, but can we be more precise about their properties and the relationships between them? An even more fundamental question is whether these operations are sufficiently flexible to allow for a reasonable class of layouts.

Let’s first concentrate on what equational laws we might want. Finding such laws can boost our confidence that we have in hand an adequate and smoothly integrated box of tools, and that there isn’t some crucial gadget we have missed. Such laws can also influence the meanings of operations and guide implementations. We have already asserted that
(<>)
should be associative with identity element
nil
, but what else should we require?

Well, for
text
we want the following properties:

text (s ++ t) = text s <> text t
text "" = nil

In mathematical language this asserts that
text
is a
homomorphism
from string concatenation to document concatenation. An impressive (and possibly intimidating) name for something quite simple. Note that the associativity of string concatenation implies the associativity of document concatenation, at least for
text
documents.

For
nest
we require the following equations to hold:

nest i (x <> y)
= nest i x <> nest i y
nest i nil
= nil
nest i (text s)
= text s
nest i line
= line <> text (replicate i ' ')
nest i (nest j x)
= nest (i+j) x
nest 0 x
= x
nest i (group x)
= group (nest i x)

Other books

39 Weeks by Terri Douglas
James and Dolley Madison by Bruce Chadwick
Branding the Virgin by Alexa Riley
The Hollow Land by Jane Gardam
The Edge of Always by J.A. Redmerski
The Essential Galileo by Galilei, Galileo, Finocchiaro, Maurice A.
The Stepmother by Claire Seeber
Sweetest Taboo by J. Kenner