python


2
Jun 11

10 Python one liners to impress your friends

After 10 Scala / Ruby / Clojure / CoffeeScript one liners to impress your friends, i thought it might be interesting to quickly try out the same in Python too.

Without much ado.. here goes. Note that the variable declarations and imports are on separate lines as necessary. Also every line is written so as to print out the results to stdout for quick verification

For what it is worth, this hardly took any time – this post is probably one of the quickest I have written.

1. Multiple Each Item in a List by 2

1
print map(lambda x: x * 2, range(1,11))

2. Sum a List of Numbers

1
print sum(range(1,1001))

3. Verify if Exists in a String

1
2
3
4
wordlist = ["scala", "akka", "play framework", "sbt", "typesafe"]
tweet = "This is an example tweet talking about scala and sbt."

print map(lambda x: x in tweet.split(),wordlist)

4. Read in a File

1
print open("ten_one_liners.py").readlines()

5. Happy Birthday to You!

1
print map(lambda x: "Happy Birthday to " + ("you" if x != 2 else "dear Name"),range(4))

6. Filter list of numbers

1
print reduce(lambda(a,b),c: (a+[c],b) if c > 60 else (a,b + [c]), [49, 58, 76, 82, 88, 90],([],[]))

7. Fetch and Parse an XML web service

1
2
3
4
from xml.dom.minidom import parse, parseString
import urllib2
# note - i convert it back into xml to pretty print it
print parse(urllib2.urlopen("http://search.twitter.com/search.atom?&q=python")).toprettyxml(encoding="utf-8")

8. Find minimum (or maximum) in a List

1
2
print min([14, 35, -7, 46, 98])
print max([14, 35, -7, 46, 98])

9. Parallel Processing

1
2
3
4
import multiprocessing
import math

print list(multiprocessing.Pool(processes=4).map(math.exp,range(1,11)))

10. Sieve of Eratosthenes

There is no Sieve of Eratosthenes operator, but that is hardly a constraint.

1
2
3
n = 50 # We want to find prime numbers between 2 and 50

print sorted(set(range(2,n+1)).difference(set((p * f) for p in range(2,int(n**0.5) + 2) for f in range(2,(n/p)+1))))

1
Sep 10

Code Kata : Ruby Programming Challenge for Newbies in Python

An interesting contest caught my eye today. There’s a site Ruby Learning by Pune’s Satish Talim (Twitter : @IndianGuru) which organises regular Ruby Programming Challenge for Newbies and it introduced the 13th challenge earlier yesterday : RPCFN: Economics 101 (#13) by Dr. Bruce Scharlau.

While neither being a regular rubyist nor being a newbie, I thought it made for a decent exercise and a diversion for a little while, albeit in python. As an added interest I wrote the solutions twice. Once in a very procedural way and once leveraging the functional programming constructs.

Here’s the brief problem as stated in the challenge :

The file cia-1996.xml (links back to rublylearning.org) is the data from the CIA World Factbook of 1996 in XML format. It has details about 260 countries across five continents. Your challenge, should you choose to accept it, is to uncover the following details buried within this file:

  • What is the population of the country with the most people? Yes, we know it’s China, but just how many people lived there in 1996?
  • What are the five countries with the highest inflation rates, and what were those rates in 1996?
  • What are the six continents in the file and which countries belong to which continent? Can you also produce them in alphabetical order?

I used python 2.6 with the lxml xml parser for this exercises.

Parsing the xml
Since the source file is a zip file, one needs to open the zip, and extract the xml out of it. Since I am using python 2.6 and not python 2.7 I couldn’t use the with construct which would’ve not required the explicit zipfile close() statement.

1
2
3
4
5
6
7
8
9
10
import zipfile
from lxml import objectify
from operator import itemgetter
import itertools

# Unzip file, extract xml, convert to object    
zip = zipfile.ZipFile("../cia-1996.zip")
xmlfile = zip.open("cia-1996.xml")
root = objectify.fromstring("".join(line for line in xmlfile.readlines()))
zip.close()

This opens the zip file, extracts the xml out of it, concatenates all the lines as a single string and converts the data into a single object referred to as root
Note: the itemgetter and itertools imports are used subsequently.

Procedural : Find the country with the highest population

1
2
3
4
5
6
7
8
9
# Country with highest population
maxp = 0
maxc = ''
for country in root.country :
    if int(country.get('population',0)) >= maxp :
        maxc, maxp =country.get('name'),int(country.get('population',0))
print (maxc,maxp)

# Output is : ('China', 1210004956)

Procedural : Top 5 countries with highest inflation rates

1
2
3
4
5
6
7
8
# Top 5 countries with highest inflation rates
inflation_country_tuples = []
for country in root.country :
    inflation_country_tuples.append((float(country.get('inflation',0.0)), country.get('name')))
inflation_country_tuples = sorted(inflation_country_tuples, key=itemgetter(0), reverse=True)
print inflation_country_tuples[0:5]

# Output : ((244.0, 'Belarus'), (94.0, 'Turkey'), (85.0, 'Azerbaijan'), (83.299999999999997, 'Malawi'), (71.299999999999997, 'Yemen'))

This extracts the inflation, name tuple from each country and creates a list out of it, sorts the list using inflation in a descending order and then prints the first five elements.

Procedural : Sorted continents, each associated with all their sorted countries

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Countries by continent
continent_country_tuples = []
for country in root.country :
    continent_country_tuples.append((country.get('continent',''), country.get('name')))
continent_country_tuples = sorted(continent_country_tuples)
current_continent = None
countries_of_continent = None
continent_grouped_countries = []
for continent, country in continent_country_tuples :
    if continent != current_continent :
        if current_continent :
            continent_grouped_countries.append((current_continent, countries_of_continent))
        countries_of_continent = []
        current_continent = continent
    countries_of_continent.append(country)
continent_grouped_countries.append((current_continent, countries_of_continent))
print continent_grouped_countries

# Output : too long to include here

Functional Programming Solutions

Incidentally the same problem can also be solved using very functional programming constructs as follows. This shows an interesting contrast of solutions in both the procedural and functional programming ways. The logic used across both the sets is virtually the same thought the constructs are different.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Country with highest population
print reduce(lambda (maxc,maxp), c :
             (c.get('name'),int(c.get('population',0)))
                if int(c.get('population',0)) >= maxp else
                    (maxc,maxp),
             root.country,('',0))

# Top 5 countries with highest inflation rates
print tuple(itertools.islice(
            sorted((
                (float(country.get('inflation',0.0)), country.get('name'))
                        for country in root.country),
                key=itemgetter(0),reverse=True),
                5))

# Countries by continent
print tuple((continent,tuple(country[1] for country in countries))
        for continent, countries in itertools.groupby(
            sorted((country.get('continent',''), country.get('name'))
                    for country in root.country),itemgetter(0)))

Update

I further structured the functional programming approach code. This code has no list comprehensions (for loops) at all. The code is as follows :

1
2
3
4
5
6
7
8
9
10
# Find the country with the maximum population
print reduce(lambda country_pop_max, country_pop_next : country_pop_next if country_pop_next[1] > country_pop_max[1] else country_pop_max,
        map(lambda country : (country.get('name'), int(country.get('population', 0))),root.country),('',0))

print tuple(itertools.islice(sorted(
            map(lambda country : (country.get('name'), float(country.get('inflation', 0.0))),root.country),
            key=itemgetter(1),reverse=True),5))
       
print map(lambda (continent, continent_country_tuples) : (continent, map(lambda (continent, country) : country, continent_country_tuples)),
    itertools.groupby(sorted(map(lambda country : (country.get('continent'), country.get('name')),root.country)),itemgetter(0)))

Since the above is likely to be a little too cryptic and confusing, here’s the detailed commented code (only comments and whitespace added)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
print reduce(
        # Comparator to decide the country with the maximum population
        lambda country_pop_max, country_pop_next : country_pop_next if country_pop_next[1] > country_pop_max[1] else country_pop_max,
        # get a sequence of tuples of (country name, country population)    
        map(lambda country : (country.get('name'), int(country.get('population', 0))),root.country),
        # initial seed
        ('',0))

print tuple(
        # take the top 5 items
        itertools.islice(
            # sort the list
            sorted(
                # get a sequence of tuples of (country name, inflation)  
                map(lambda country : (country.get('name'), float(country.get('inflation', 0.0))),root.country),
                # sorting to be done using the second element of the tuple
                key=itemgetter(1),
                # sort to be done in the descending order
                reverse=True),
            # count of elements to be sliced
            5))
       
print map(
    # function to flatten the continent_country_tuple tuple into a country tuple      
    lambda (continent, continent_country_tuples) :
        # first element of the tuple is continent
        (continent,
        # as the second element, return only the country from the continent country tuple to form a tuple of countries                                        
        map(lambda (continent, country) : country, continent_country_tuples)),
    # group the continent country tuples by continent
    itertools.groupby(
        # sort the continent country tuples
        sorted(
            # extract a continent country tuple from the country
            map(lambda country : (country.get('continent'), country.get('name')),root.country)),
        # this is the function to extract the key to sort by
        itemgetter(0)))

23
Aug 10

Clojure style multi methods in python

What are multi-methods used for ? Simply put, they allow for function overloading, ie. they allow for different implementations of the same function to be provided for different contexts, and the appropriate context and therefore the implementing function to be automatically selected and performed at runtime. The typical method of function overloading is based on differing type based signatures (a carryover from the C++ / Java days). There is a multimethod module in python. That can be used for type based switching. There is another way function overloading gets used, an approach used by clojure which it terms as multi methods based on an intermediate function used to compute a value, and then switches based upon that value. Note: Switching = Coming to a fork on the road and then deciding which path to take.

For an easier reading on clojure multimethods, you may want to first read Clojure multi-methods, a nice post by Alex Miller. One more reason to read that post, is that the specific situation I use to demonstrate the usage of multi methods is also identical to the one described in that post.

Here’s one way to implement clojure style multimethods in python. However I will put the cart before the horse, and demonstrate how multi methods could be used in python, before explaining the underlying support functions I use to make that feasible.

Situation

We have a person structure with a name and an age as member attributes. We would like to print a person’s name but prefix it with Young, Adult or Senior based on the person’s age.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Declare the class
class Person(object):
    """ A person. Has a name and an age"""
    def __init__(self,name,age):
        self.name = name
        self.age = age

# Instantiate the objects
jimmy = Person("Jimmy",5)
alex = Person("Alex",36)
edna = Person("Edna",99)

# Run the multi method
print name(jimmy)
print name(alex)
print name(edna)

#Expected output is
#        Young Jimmy
#        Adult Alex
#        Senior Edna

To do the same, we will actually declare a switcher function which computes a value to switch upon.

1
2
3
4
5
def ticket(person):
    """ The switcher function to decide the result to be used for multi method switching """
    if person.age < 16 : return "young"
    elif person.age >= 66 : return "senior"
    else : return "adult"

We shall need a switcher which will switch based on this value.

1
2
# Declare the existence of a multi method switcher
name = multi(ticket)

Ahh, so we now need three methods to be declared, one each for each value. These we shall do so using decorator @multi_method. Note all the three functions have the same name. Both the decorator and the support that allows the functions to have the same name comes up later.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Add the first switched function for result value "young"
@multi_method(name, "young")
def name(person):
    return "Young " + person.name

# Add the next switched function for the result value "adult"
@multi_method(name, "adult")
def name(person):
    return "Adult " + person.name

# Add yet another switched function for the result value "senior"
@multi_method(name, "senior")
def name(person):
    return "Senior " + person.name

Now this can be made to work. But it needs a couple of constructs behind the scenes. First we need the constructor for a multi function, which accepts the switcher function (in this case ticket).

1
2
3
4
5
6
7
8
9
10
11
12
13
def multi(switcher_func):
    """ Declares a multi map based method which will switch to the
    appropriate function based on the results of the switcher func"""

   
    def dispatcher(*args, **kwargs):
        key = switcher_func(*args, **kwargs)
        func = dispatcher.dispatch_map[key]
        if func :
            return func(*args,**kwargs)
        else :
            raise Exception("No function defined for dispatch key: %s" % key)
    dispatcher.dispatch_map = {}
    return dispatcher

This declares an inner function dispatcher, which has the logic to perform the switch based on the supplied switcher function and the run time arguments to the function. What this method however does is that it sets up a dictionary called dispatch_map as an attribute of the dispatcher function which will be used to store the possible result values and the resultant function to switch to.

Having done that, we still need to define each of the individual functions for each possible value of the switcher function. That is done using the following decorator where the dispatch_map that got defined earlier is populated with the appropriate result and the function to dispatch to.. Note that the function name gets changed in order to avoid a conflict.

1
2
3
4
5
6
7
8
9
10
def multi_method(dispatcher, result):
    """ The multi method decorator which allows one method at a time
    to be added to the broader switch for the given result value"""

    def inner(wrapped):
        dispatcher.dispatch_map[result] = wrapped
        # Change the method name from clashing with the multi and allowing
        # all multi methods to be written using the same name
        wrapped.func_name = "_" + wrapped.func_name + "_" + str(result)
        return dispatcher
    return inner

There’s a way to use these capabilities for class members. So if I wanted to dispatch the __str__ method based on the values, the class would now be

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Person(object):
    """ A person. Has a name and an age"""
    def __init__(self,name,age):
        self.name = name
        self.age = age

    __str__ = multi(ticket)
   
    @multi_method(__str__,"young")
    def str_young(self):
        return "Young " + self.name
   
    @multi_method(__str__,"adult")
    def str_adult(self):
        return "Adult " + self.name
   
    @multi_method(__str__,"senior")
    def str_senior(self):
        return "Senior " + self.name

Not bad for an hours work.

Update 1:
Even as I had just tweeted this codeblog post, another tweet caught my attention (possibly unrelated) which expected the switching condition for each function to be defined at each function level and not as a composite function returning some value. Since this was fresh in my mind, I thought that would be an interesting option too, and in a few minutes had another implementation based on the conditions being specified at a per function level. Here’s the separate version for the same. Pay attention to the fact that there is no ticket function, and that condition for each function is declared in the decorator itself.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def multi():
    """ Declares a multi map based method which will switch to the
    appropriate function based on the results of the switcher func"""

   
    def dispatcher(*args, **kwargs):
        for condition, func in dispatcher.functions :
            if condition(*args, **kwargs) :
                return func(*args,**kwargs)
        raise "No function identified for dispatch key"
   
    dispatcher.functions = []
    return dispatcher

def multi_method(dispatcher, condition):
    """ The multi method decorator which allows one method at a time
    to be added to the broader switch for the given result value"""

    def inner(wrapped):
        dispatcher.functions.append((condition, wrapped))
        # Change the method name from clashing with the multi and allowing
        # all multi methods to be written using the same name
        wrapped.func_name = "_" + wrapped.func_name + "_" + str(condition)
        return dispatcher
    return inner

class Person(object):
    """ A person. Has a name and an age"""
    def __init__(self,name,age):
        self.name = name
        self.age = age

# Declare the existence of a multi method switcher
name = multi()

# Add the first switched function for result value "young"
@multi_method(name, lambda p : p.age < 16)
def name(person):
    return "Young " + person.name

# Add the next switched function for the result value "adult"
@multi_method(name, lambda p : 16 <= p.age < 66)
def name(person):
    return "Adult " + person.name

# Add yet another switched function for the result value "senior"
@multi_method(name, lambda p : 66 <= p.age)
def name(person):
    return "Senior " + person.name

# Instantiate the objects
jimmy = Person("Jimmy",5)
alex = Person("Alex",36)
edna = Person("Edna",99)

# Run the multi method
print name(jimmy)
print name(alex)
print name(edna)

Update 2

I was requested to demonstrate this in terms of the clojure polymorphism as described in Clojure – runtime_polymorphism (code snippet from that page listed below) :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(defmulti encounter (fn [x y] [(:Species x) (:Species y)]))
(defmethod encounter [:Bunny :Lion] [b l] :run-away)
(defmethod encounter [:Lion :Bunny] [l b] :eat)
(defmethod encounter [:Lion :Lion] [l1 l2] :fight)
(defmethod encounter [:Bunny :Bunny] [b1 b2] :mate)
(def b1 {:Species :Bunny :other :stuff})
(def b2 {:Species :Bunny :other :stuff})
(def l1 {:Species :Lion :other :stuff})
(def l2 {:Species :Lion :other :stuff})
(encounter b1 b2)
-> :mate
(encounter b1 l1)
-> :run-away
(encounter l1 b1)
-> :eat
(encounter l1 l2)
-> :fight

Here’s how the same effect can be had in python using the first version of multi and multi_method I described above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Declare the existence of a multi method switcher
encounter = multi(lambda x,y : (x["Species"], y["Species"]))

@multi_method(encounter, ("Bunny","Lion"))
def encounter(a1, a2):
    return "run-away"

@multi_method(encounter, ("Lion","Bunny"))
def encounter(a1, a2):
    return "eat"

@multi_method(encounter, ("Bunny","Bunny"))
def encounter(a1, a2):
    return "mate"

@multi_method(encounter, ("Lion","Lion"))
def encounter(a1, a2):
    return "fight"

b1 = {"Species" : "Bunny", "Other" : "Stuff"}
b2 = {"Species" : "Bunny", "Other" : "Stuff"}
l1 = {"Species" : "Lion", "Other" : "Stuff"}
l2 = {"Species" : "Lion", "Other" : "Stuff"}

print encounter(b1, b2)
print encounter(b1, l1)
print encounter(l1, b1)
print encounter(l1, l2)

The resultant output is

1
2
3
4
mate
run-away
eat
fight

23
Feb 10

Simple Dependency Injection in Python

I wrote this code a simple dependency injection capability in my application. Hopefully the comments should be self explanatory.

If you want to just understand the usage, see the class decorator usage on lines 52 and 63 and the code from line 69 onwards

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# Global registry for instance registration
resource_registry = {}

def provides(name):
    """Class decorator for an instance provider.
    The name passed in is what the attribute is called.
    The instance will be auto registered in the registry
    with the value of the handle
    TODO : the name could also be a function which could return the
    attribute name to extract the instance name from
    """

    def wrapper(cls):
        def init_and_register(init):
            def inner(self,*args,**kwargs):
                "The new wrapper constructor"
                # Call the old constructor
                init(self,*args,**kwargs)
                # Update the resource registry with the desired key and instance
                resource_registry[self.__dict__[name]] = self
            return inner
        # Replace the constructor with a wrapped constructor
        cls.__init__ = init_and_register(cls.__init__)
        return cls
    return wrapper

def requires(*resource_types):
    """ Class decorator to inject instances from the registry
    into the instance under construction"""

    def wrap_constructor(self,*args,**kw):
        "The constructor wrapper"
        # First call the original constructor
        self.__oldinit__(*args,**kw)
        # If this instance requires dependencies to be injected
        if hasattr(self,'_resource_types') :
            # for each dependency
            for resource_type in self._resource_types:
                # inject the dependency directly as in instance variable
                # from the resource registry
                self.__dict__[resource_type] = resource_registry[kw[resource_type]]

    def inner(cls):
        cls._resource_types = resource_types
        cls.__oldinit__ = cls.__init__
        cls.__init__ = wrap_constructor
        return cls
   
    return inner

# The following class indicates a sample resource provider which could be
# injected. The 'handle' passed to the class decorator is the name of the attribute
# whole value will be used as the key to register the instance name in the registry
@provides('handle')
class Provider(object):
    def __init__(self,handle):
        self.handle = handle
    def __str__(self):
        return 'Provider(%s)' % self.handle

# A sample user object (ie. the object into which the dependency is to be injected
# The constructor needs to accept kw based arguments for the injection to happen
# The requires decorator lists the attribute names under which the dependencies will
# be injected into the instance of this class
@requires('first','second')
class User(object):
    def __init__(self, **kwargs):
        pass

if __name__ == '__main__' :
    # Initialise the providers. They auto register themselves.    
    p1 = Provider('one')
    p2 = Provider('two')
   
    # Instantiate the user. specify the key  under which the
    # injected objects have been registered in the values of the
    # keyword arguments (the key being the attribute name declared
    # earlier in the class decorator)
    u = User(first = 'one', second = 'two')
   
    # Test that the dependencies have been appropriately injected
    assert u.first == p1
    assert u.second == p2

18
Jan 10

Extracting data using recursive descent parsing

The other day there was an interesting query on the bangpypers mailing list : How should I do it?. It was a question which wondered how to extract data out of a datastructure. The question reproduced below was as follows :


I have a txt file in the following format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
"confident" => {
  count => 4,
  trans => {
     "ashahvasahta" => 0.74918568,
    "atahmavaishahvaasa" => 0.09095465,
    "pahraaram\.nbha" => 0.06990729,
         "mailatae" => 0.02856427,
           "utanai" => 0.01929341,
             "anaa" => 0.01578552,
         "uthaanae" => 0.01403157,
         "jaitanae" => 0.01227762,
    },
},
"consumers" => {
  count => 4,
  trans => {
    "upabhaokahtaa" => 0.75144362,
    "upabhaokahtaaom\.n" => 0.12980166,
    "sauda\�\�\�dha" => 0.11875471,
    },
},
"a" => {
  count => 1164,
  trans => {
              "eka" => 0.14900491,
           "kaisai" => 0.08834675,
             "haai" => 0.06774697,
             "kaoi" => 0.05394308,
              "kai" => 0.04981982,
         "\(none\)" => 0.04400085,
              "kaa" => 0.03726579,
              "kae" => 0.03446450,
    },
},

and I need to extract “confident” , “ashahvasahta” from the first record, “consumers”, “upabhaokahtaa” from the second record… i.e. “word in english” and the “first word in the probable-translations”

The first level key is an english word and the entries against the subkey “trans” seem translations with their associated weightages. The task is to extract the english word followed by the (sanskrit) translation word that has the highest weightage associated with it corresponding to the word.

While the problem could have been solved in different (and perhaps simpler) ways, it seemed an attractive one since it also could be solved using a recursive descent parser – something that I had not worked upon before (which then became the primary motivator for solving it). A quick study showed up a number of recursive descent parsers written in python language and the one I quickly settled on was pyparsing. A good introductory writeup on the same can be found at O’reilly onlamp.com at Building Recursive Descent Parsers with Python.

The code below is a suggested solution for the same. Hopefully the comments inline are sufficiently self explanatory.

A quick legend to some of the pyparsing constructs :
Forward() : This is a forward declaration to a token whose structure will be defined later
Word(characters) : A contiguous sequence of one or more characters that form the word
Group() : A construct which separately classifies the matching data in the results
token ^ token : The caret indicates an OR ie one of the two tokens should be matched.
Optional(token) : An instruction which matches 0 or 1 occurences of token (equivalent to ‘?’ in regex)
.setResultsName(name) : An instruction which marks the matched result with the given name
token.suppress() : An instruction which matches but then ignores the matched data in the result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# coding=utf-8
from pyparsing import *
import pprint
import sys

data = '''
"confident" => {
   count => 4,
   trans => {
     "ashahvasahta" => 0.74918568,
     "atahmavaishahvaasa" => 0.09095465,
     "pahraaram\.nbha" => 0.06990729,
     "mailatae" => 0.02856427,
     "utanai" => 0.01929341,
     "anaa" => 0.01578552,
     "uthaanae" => 0.01403157,
     "jaitanae" => 0.01227762,
   },
},
"consumers" => {
 count => 4,
 trans => {
   "upabhaokahtaa" => 0.75144362,
   "upabhaokahtaaom\.n" => 0.12980166,
   "sauda\�\�\�dha" => 0.11875471,
   },
},
"a" => {
 count => 1164,
 trans => {
             "eka" => 0.14900491,
          "kaisai" => 0.08834675,
            "haai" => 0.06774697,
            "kaoi" => 0.05394308,
             "kai" => 0.04981982,
        "\(none\)" => 0.04400085,
             "kaa" => 0.03726579,
             "kae" => 0.03446450,
   },
}
'''


# Setup pyparsing tokens

# Set up a forward reference to a dict (to be defined later)
dict_ = Forward()
# A key is a word of alphabets & numbers or is a quoted string
key = (Word(alphas + nums) ^ quotedString).setResultsName("key")
# A value is a sequence of digits and . or a dictionary
val = (Word("0123456789.") ^ dict_).setResultsName("value")
# A key value pair is a key followed by => operator followed by value
key_value_pair = Group(key + Literal("=>").suppress() + val)
# A key value pair list is a comma delimited list of key value pairs
key_value_pair_list = delimitedList(key_value_pair)
# A dictionary is { plus a key value pair list plus an optional trailing
# comma followed by a }
dict_ << Group(Literal("{").suppress() + key_value_pair_list +
            Optional(Literal(",").suppress()) + Literal("}").suppress())

# parse data
parsed = key_value_pair_list.parseString(data)

# function to extract the parsed data into a dictionary
# Note : the function is kind of weird and does type checks
# due to the way ParseResult is modeled  
def extract(result):    
        if 'key' in result.keys() :
            if isinstance(result.value,ParseResults) :
                return ( result.key,  extract(result.value) )
            else :
                return ( result.key,  result.value )
        else :
            return(dict(extract(elem) for elem in result))

# This converts the parsed ParseResult structures into a dictionary      
extracted = extract(parsed)

# print the dictionary
pprint.pprint(extracted, sys.stdout)

# print the english word and first translated word

print "\n\n============\nTranslations\n============\n"
translated = {}
for english, translations in extracted.items() :
    current_word = ''
    current_weight = 0.0
    for word,weight in translations['trans'].items() :
        if float(weight) > current_weight :
            current_word = word
            current_weight = float(weight)
    translated[english] = current_word
print translated

As usual comments are most welcome. Enjoy!


14
Jan 10

Implementing request interceptors for tornado

One of my favourite new software introductions last year was the Tornado web server. However ever since my Webwork 2 (and subsequently Struts 2) days, I had learnt to greatly appreciate the power and cleanliness of implementing interceptors for a variety of aspects related to request preprocessing. The same feature is also available on a variety of wsgi based web application frameworks except that these are referred to as wsgi middleware.

So it was with great disappointment that I realised Tornado simply did not have any such interceptors. Worse, there was no way to plug in or roll one’s own, since the tornado request processing pipeline had no place where one could plug something in. In a statically typed language it would have required me to take one of the two options – (a) fork Tornado codebase, change it to introduce plugins or (b) live without it. Well every once in a while in a dynamic typed language you run into a situation where metaprogramming saves your bacon. It was relatively easy to implement one for Tornado.

So how does one do it. Tornado during request processing calls a method called _execute on the handler (which is always available on all handlers since its in the base class for handlers). I wrote a class decorator called interceptor which wraps a class, and in doing so actually wraps one of its methods. It saves the reference to the current _execute method, replaces it with a call to another user supplied method (as a part of applying the interceptor), and calls the saved _execute method reference after the user supplied method returns successfully. Note that as per the semantics, if the user supplied method returns False, further request processing is aborted. It is also possible to chain a number of such interceptors.

In the example below, I have taken the basic authentication logic found in one of the Tornado examples and reformatted the same as an interceptor. The first two functions authenticator and user_extractor are rather trivialised implementations of the user supplied authentication logic.

So here’s the complete sample tornadoweb application with the MainHandler being wrapped with an interceptors which triggers basic authentication. Enjoy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import base64
import logging
import logging.config
import tornado.httpserver
import tornado.ioloop
import tornado.web

log = logging.getLogger("root")

def authenticator(realm,handle,password):
    """
    This method is a sample authenticator.
    It treats authentication as successful
    if the handle and passwords are the same.
    It returns a tuple of handle and user name
    """

    if handle == password :
        return (handle,'User Name')
    return None

def user_extractor(user_data):
    """
    This method extracts the user handle from
    the data structure returned by the authenticator
    """

    return user_data[0]

def basic_authenticate(realm, authenticator,user_extractor) :
    """
    This is a basic authentication interceptor which
    protects the desired URIs and requires
    authentication as per configuration
    """

    def wrapper(self, transforms, *args, **kwargs):
        def _request_basic_auth(self):
            if self._headers_written:
                raise Exception('headers have already been written')
           
            self.set_status(401)
            self.set_header('WWW-Authenticate','Basic realm="%s"' % realm)
            self.finish()
           
            return False
        request = self.request
        format = ''
        clazz = self.__class__
        log.debug('intercepting for class : %s', clazz)
        try:
            auth_hdr = request.headers.get('Authorization')
           
            if auth_hdr == None:
                return _request_basic_auth(self)
            if not auth_hdr.startswith('Basic '):
                return _request_basic_auth(self)
           
            auth_decoded = base64.decodestring(auth_hdr[6:])
            username, password = auth_decoded.split(':', 2)
           
            user_info = authenticator(realm, unicode(username), password)
            if user_info :
                self._user_info = user_info
                self._current_user = user_extractor(user_info)
                log.debug('authenticated user is : %s',
                          str(self._user_info))
            else:
                return _request_basic_auth(self)
        except Exception, e:
            return _request_basic_auth(self)
        return True
    return wrapper

def interceptor(func):
    """
    This is a class decorator which is helpful in configuring
    one or more interceptors which are able to intercept, inspect,
    process and approve or reject further processing of the request
    """

    def classwrapper(cls):
        def wrapper(old):
            def inner(self, transforms, *args, **kwargs):
                log.debug('Invoking wrapper %s',func)
                ret = func(self,transforms,*args,**kwargs)
                if ret :  
                    return old(self,transforms,*args,**kwargs)
                else :
                    return ret
            return inner
        cls._execute = wrapper(cls._execute)
        return cls
    return classwrapper

@interceptor(basic_authenticate('dummy_realm',authenticator,user_extractor))
class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

application = tornado.web.Application([
    (r"/", MainHandler),
])

if __name__ == "__main__":
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(8888)
    tornado.ioloop.IOLoop.instance().start()

14
Jan 10

Functionally programming tic tac toe

I am a novice functional programmer. So if you find any areas of suggested improvement on my FP skills, do add a comment below.

Below is a sample Tic Tac Toe written using Functional Programming constructs (or at least whatever I understood about them).

Right now, I’ve implemented only a simple player who basically marks off a cell randomly. It should be possible to define more intelligent players similarly. Hopefully all the comments in the code should be sufficiently self explanatory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
import random
import functools
import itertools
import copy

def row_gen(board):
    """
    Generator which returns tuples of tuples.
    The outer tuple represents a row.
    The inner tuple represents each cell with
    its coordinates and data
    """

    return (
        tuple((row,col,board[row][col])
              for col in range(3))
        for row in range(3))

def col_gen(board):
    """
    Generator which returns tuples of tuples.
    The outer tuple represents a column.
    The inner tuple represents each cell with
    its coordinates and data
    """

    return (
        tuple((row,col,board[row][col])
              for row in range(3))
        for col in range(3))

def dia_gen(board):
    """
    Generator which returns two tuples of tuples.
    The outer tuple represents a diagonal.
    The inner tuple represents each cell with
    its coordinates and data
    """

    return (
        tuple((i,i,board[i][i]) for i in range(3)),
        tuple((i,2-i,board[i][2-i]) for i in range(3)))

def empty(board):
    """ Gets a list of all empty cells on the board """
    return (
        (row,col)
        for row in range(3)
            for col in range(3)
                if board[row][col] == None)

def is_empty(board):
    """ checks if any cell on the board is still empty """
    for row in range(3) :
        for col in range(3) :
            if board[row][col] is None :
                return False
    return True

def random_player(chr,board):
    """Represents a player who randomly marks his next move """
    empty_cells = list(empty(board))
    play = empty_cells[random.randint(0,len(empty_cells)-1)]
    board = copy.copy(board)
    board[play[0]][play[1]] = chr
    return board

def show(board):
    """ Shows the current representation of the board """
    print '+-+-+-+'
    for row in range(3) :
        print '|%s|' % '|'.join(
            board[row][col]if board[row][col] is not None else ' '
            for col in range(3))
        print '+-+-+-+'

if __name__ == '__main__' :
    # Initialise the board with all values set to None      
    board = list(list(None for i in range(3)) for j in range(3))
    show(board)
   
    # Initialise the two players. Each is given a separate char
    player1 = functools.partial(random_player,'X')
    player2 = functools.partial(random_player,'O')
   
    # Setup an iterator to continuously cycle through both players
    players = itertools.cycle((player1,player2))
   
    # Play game
    over = False
    while not over and not is_empty(board) :
        # Ask the next player to make his move
        board = players.next()(board)
        show(board)
       
        # Note a cell bag set is all rows,
        # all cols and all diagonals
        for cell_bag_set in (
                row_gen(board),
                col_gen(board),
                dia_gen(board)) :
           
            # Note a cell_bag can be a row, column or a dia
            for cell_bag in cell_bag_set :
                vals = set(val[2] for val in cell_bag)
                if len(vals) == 1 and  \
                        vals.__iter__().next() is not None :
                    over = True
                    print 'Over'
                    break

13
Jan 10

Dynamically adding methods with metaprogramming : Ruby and Python

I came across an interesting post Metaprogramming: Ruby vs. Javascript, which discusses and contrasts about how metaprogramming can be implemented in Ruby and Javascript. I thought it might be fun to document the same from a python perspective as well.

Here are the discussed samples. All the ruby code is quoted from from the blog post linked to above whereas the python code is what I wrote.

1. Initial class declaration and initialisation

We first declare a Ninja class and create two instances.

Ruby

1
2
3
4
5
6
7
8
9
10
class Ninja
  attr_accessor :name
 
  def initialize(name)
    @name = name
  end
end
 
drew = Ninja.new("Drew")
adam = Ninja.new("Adam")

Python

1
2
3
4
5
6
class Ninja(object):
    def __init__(self,name) :
        self.name = name

drew = Ninja('drew')
adam = Ninja('adam')

2. Add a method to the class
In this step we add a method to the class dynamically. It is therefore available to all the instances of the class

Ruby

1
2
3
4
5
6
7
8
9
10
class Ninja
  def battle_cry
    puts "#{name} says zing!!!"
  end
end
 
drew.battle_cry
# => Drew says zing!!!
adam.battle_cry
# => Adam says zing!!!

Python

1
2
3
4
5
6
7
8
def battle_cry(self):
    print '%s says zing!!!' % self.name
Ninja.battle_cry = battle_cry

drew.battle_cry()
# => Drew says zing!!!
adam.battle_cry()
# => Adam says zing!!!

3. Add a method to an instance
In this case we add a method only to a particular instance. The same method will not be available to other instances of the same class.

Ruby

1
2
3
4
5
6
def drew.throw_star
  puts "throwing a star"
end
 
drew.throw_star
# => throwing a star

Python

1
2
3
4
5
6
7
import types
def throw_star(self):
    print 'throwing a star'
drew.throw_star = types.MethodType(throw_star,drew)

drew.throw_star()
# => throwing a star

4. Invoke a method dynamically
In this case we supply the method name as a string and invoke it.
Ruby

1
2
drew.send(:battle_cry)
# => Drew says zing!!!

Python

1
2
drew.__getattribute__('battle_cry')()
# => Drew says zing!!!

5. Defining class level methods dynamically with closures
Here a class level method is defined which closes over some of the attributes in its context (in this case the method color is able to access the variable color_name as a closure).

Ruby

1
2
3
4
5
6
7
8
9
10
color_name = 'black'
 
Ninja.send(:define_method, 'color') do
  puts "#{name}'s color is #{color_name}"
end
 
drew.color
# => Drew's color is black
adam.color
# => Adam's color is black

Python

1
2
3
4
5
6
7
8
9
10
color_name = 'black'

def color(self):
    print "%s's color is %s" % (self.name, color_name)
Ninja.color = color

drew.color()
# => Drew's color is black
adam.color()
# => Adam's color is black

6. Defining a method dynamically on an instance that closes over local scope and accesses the instance’s state

Note that in this case the function is being defined on the instance using a dynamic name (‘swing’)

Ruby

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Object
  def metaclass
    class << self; self; end
  end
end
 
sword_symbol = "*********"
 
drew.metaclass.send(:define_method, 'swing') do |sound_effect|
  puts "#{name}: #{sword_symbol} #{sound_effect}"
end
 
drew.swing 'slash!!'
# => Drew: ********* slash!!

Python

1
2
3
4
5
6
7
8
sword_symbol = "*********"

def foo(self,sound_effect):
    print "%s : %s %s" % (self.name,sword_symbol,sound_effect)
   
drew.__dict__['swing'] = types.MethodType(foo,drew)
drew.swing('slash!!')
# => Drew : ********* slash!!

4
Jan 10

Find the greatest product of five consecutive digits in the 1000-digit number.

Problem Statement

Find the greatest product of five consecutive digits in the 1000-digit number.

73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450

Notes

An interesting side note here is how does one specify a 1000 digit number. In this case I chose to use the python “”" string delimiter to specify a multiline string exactly as described in the problem statement. However such a string has embedded \n characters which need to be removed. (lines 1 to 22).

Lines 31 and 32 create a sequence of all the substrings of 5 consecutive digits in the above number. Lines 29 and 30 compute the product of all the numbers that form the substring. Line 28 represents the tuple of the substring followed by the product. The reduce function on lines 24 to 26 (which gets intialised with the initial data on line 33) then continuously selects the tuple with the maximum product. The final ‘[1]‘ indexing operation on line 33 then just selects the product from the number product tuple selected by the earlier reduce function.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
strnum = """
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
"""
.replace('\n','')

print reduce(
        lambda x,(y,z) :
            x[1] < z and (y,z)
            or x,tuple(
                (substr,
                 reduce(
                    lambda x,y : x * int(y), substr, 1))
                for substr in (strnum[i:i+5]
                    for i in xrange(len(strnum)-4))),
            ('',0))[1]

4
Jan 10

Least common multiple : What is the smallest number that is evenly divisible by all of the numbers from 1 to 20

Problem Statement

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.

What is the smallest number that is evenly divisible by all of the numbers from 1 to 20?

Notes

This is essentially a requirement to compute the least common multiple for the values 1 through 20.

We first need to find the prime factors for each number. For some of the numbers, some of the prime factors occur more than once. eg. for 12, the prime factors are 2 and 3, of which while 3 occurs once, 2 occurs twice. Thus for each number we create a hashmap of the prime factors and the number of occurrences. To do so we define a inc_count(dict_,key) which increments the occurence count of the key in the dictionary. This dictionary for each number is computed once and is referred to as new_factors.

We need to ensure that we eventually create yet another dictionary which keeps track of the maximum count for each factor across all the numbers. We define yet another dictionary factors which is used to keep track of the maximum occurences of a given factor across all the new_factor instances.

We finally fold the factors dictionary by compute a product of all the factors with each factor being used as many times as it occurs in the factors dictionary. That gives us the least common multiple, which is the solution to the problem.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from itertools import chain

# function to take the first value of a generator and ignore the rest
def first(gen):
    try:
        return gen.next()
    except StopIteration :
        return None

# generator to return all the prime factors of a given number
def prime_factors(n):
    while n > 1 :
        ff = first(val for val in chain(
                    xrange(2,int(n**0.5+1.0)),[n]) if n % val == 0)
        yield ff
        n = n / ff

# increment the occurrences value of a key in a dictionary
def inc_count(dict_,key):
    dict_[key] = dict_[key] + 1
    return dict_

# keep track of the maximum occurrences of a key in a dictionary
def set_max_count(dict_,(key,val)):
    if dict_[key] < val :
        dict_[key] = val
    return dict_

# Actual solution
# Initialise a dictionary with all keys with occurrences set to zero
factors = dict((n,0)for n in range(2,21))

# For each number for whom we are computing the least common multiple
for num in range(2,21) :
    # Compute the prime factor occurences dictionary for the number
    new_factors = reduce(
                    inc_count,
                    prime_factors(num),
                    dict((n,0)for n in range(2,21)))
    # Update the tracking dictionary to keep track
    # of the maximum occurrences of a key (factor)
    factors = reduce(set_max_count, new_factors.items(),factors)

# Generate a product by multiplying all the factors
number = reduce(
            lambda num,(key,val) : num * (key ** val),
            factors.items(),
            1)
print number