Tuesday, March 26, 2013

SuperFib - an example of using lexical closure in Python

This story begins with simple implementation of a function that calculates Fibonacci numbers.

def fib(n):
    return 1 if n == 1 or n == 2 else fib(n-1) + fib(n-2)

Assuming that we are using 1-based indices, we can run a couple of tests and see that it works. Of course, fib(35) takes some time to calculate - about 15 seconds on my laptop.

Now, if you don't know what lexical closure is, I recommend reading about it on Wikipedia) before going further. Let's try to cache the results, but also be better than using conventional memoization - not only will we cache the final result of each call, but also intermediate results.

def fib():
    _dict = dict()
    def inner(n):

        nonlocal _dict
        if n == 1 or n == 2:
            return 1
                return _dict[n]
            except KeyError:
                _dict[n] = inner(n-1) + inner(n-2)
                return _dict[n]
    return inner

f = fib()
f(1000) returns instantly, with:

However, if I call f(1500) without calling f(1000) first, the calculation will not finish and my interpreter will crash or at least restart because of stack evaluation overflow.

But if I ran f(1000) and then f(1500) and then f(2000) and so on and so forth, advancing by 500 each time, I can very quickly get to quite far elements of Fibonacci sequence. Calculating 10 000th element is quick and easy, compared to conventional approach. We can quickly get to the point where printing the numbers to the console takes more time than actually calculating them.

Friday, March 22, 2013

Unit Testing code with IO file operations (in Python)

We may often come across a piece of code that was written without Unit Tests at all. In addition, the piece of code may be dealing with IO like file writing and reading, which makes it more difficult to Unit Test it when we are trying to refactor and modify. Let's suppose the code in question looks like this:

def writeInitialsToFile(filename, name, surname):
    initials = name[0] + '.' + surname[0] + '.'
    with open(filename, 'w') as file:

def readInitials(filename):
    initials = None
    with open(filename, 'r') as file:
        initials = file.readline()
    return initials

A straightforward and bad idea would be to write a couple of Unit Tests that make use of a real file and simply test the reading and writing. Is therea a better way to test this code?

First of all, we need a way to replace the real file with something else. For both reading and writing we will now have a couple of functions, one that expects a stream for reading or writing and the other that creates the stream and calls the corresponding reading or writing function:

def readInitialsFromFileStream(fileStream):
    return fileStream.readline()

def readInitialsFromFile(filename):
    initials = None
    with open(filename, 'r') as fileStream:
        initials = readInitialsFromFileStream(fileStream)
    return initials

def writeInitialsToFileStream(fileStream, name, surname):
    initials = name[0] + '.' + surname[0] + '.'

def writeInitialsToFile(filename, name, surname):
    with open(filename, 'w') as fileStream:
        writeInitialsToFileStream(fileStream, name, surname)

Now, we can test at least the pair: readInitialsFromFileStream and writeInitialsToFileStream. In order to test these functions, we don't even need to create a file in the file system - we can just pass something that has similar characteristic, but is not a real file: io.StringIO.

A quasi - Unit Test piece (without the full unittest.TestCase class, to shorten things a bit) would look like this:

testStream = io.StringIO()
assert('T.M.', readInitialsFromFileStream(testStream))

testStream = io.StringIO()
writeInitialsToFileStream(testStream, 'Thomas', 'Mann')
assert('T.M.', testStream.readline())

Why is it better than using a real file? Some of the reasons (probably not all) are:
  • unit tests should work flawlessly regardless of the environment; when using real file, we may have different permissions depending on the computer and operating system where the tests are run
  • some tests (although this was not necessarily clear from the example) may require that the file has certain name or even certain content - otherwise they'll fail; by using a real file with pre-set name and content we are creating an unnecessary dependency that may impact the maintainability and, in the long run, usefullness of the tests
  • in order to avoid using a real file we had to separate the creation of a stream from writing to / reading from the stream; this forced us to make a first step towards fulfilling the Single Responsibility Principle

Sunday, March 17, 2013

Long if-else statement without if (in Python)

Let's suppose we find ourselves in a situation where we have no other option than to write a lengthy if-else statement (definitely not something that would make cleaner by polimorphism).

def inevitable_long_if (n):
    if n == 0:
    elif n == 1:
    elif n == 2:

Can we re-write it so that we do not use IFs?

import collections
import functools

d = collections.defaultdict(lambda : functools.partial(print, 'Manu-manu'))
d[0] = functools.partial(print, 'Zero')
d[1] = functools.partial(print, 'One')
d[2] = functools.partial(print, 'Two')

Now, let's call some to see how it works:

d[0]()  ==>  Zero
d[1]()  ==>  One
d[2]()  ==>  Two
d[7]()  ==> Manu-manu

The first version is 9 lines long, the second one (without imports) is 4 lines long. To me, the second version is also much more readable, but you decide.

Monday, March 4, 2013

Code Kata: Eating habits and Markov chains

The Wikipedia entry:
contains nice and clear explanation of the idea of Markov chains. Based on the paragraph about eating habits, we can form an interesting coding exercise:
  • A person eats lunch every day
  • We form a set of rules similar to the one presented in Wikipedia - one part of the exercise is how to represent these rules in a program. What would be the convenient way to represent this internally and how would you translate a possible textual representation (such as one entered from standard input) into the internal format? Would the two format be quite similar or quite different from each other?
  • We run a simulation for a number of days - say, 365 days - and we get statistical distribution of the number of each type of meals that would be eaten
Does it seem like a long program to write? My initial solution in Python was 23 lines long and then refined solution took just 14 lines.

See also