/code/blog

Code, code and code

Code Kata : Ruby Programming Challenge for Newbies in Python

An interesting contest caught my eye today. There’s a site Ruby Learning by Pune’s Satish Talim (Twitter : @IndianGuru) which organises regular Ruby Programming Challenge for Newbies and it introduced the 13th challenge earlier yesterday : RPCFN: Economics 101 (#13) by Dr. Bruce Scharlau. While neither being a regular rubyist nor being a newbie, I thought it made for a decent exercise and a diversion for a little while, albeit in python. As an added interest I wrote the solutions twice. Once in a very procedural way and once leveraging the functional programming constructs. Here’s the brief problem as stated in the challenge :
The file cia-1996.xml (links back to rublylearning.org) is the data from the CIA World Factbook of 1996 in XML format. It has details about 260 countries across five continents. Your challenge, should you choose to accept it, is to uncover the following details buried within this file:
  • What is the population of the country with the most people? Yes, we know it’s China, but just how many people lived there in 1996?
  • What are the five countries with the highest inflation rates, and what were those rates in 1996?
  • What are the six continents in the file and which countries belong to which continent? Can you also produce them in alphabetical order?
I used python 2.6 with the lxml xml parser for this exercises. Parsing the xml Since the source file is a zip file, one needs to open the zip, and extract the xml out of it. Since I am using python 2.6 and not python 2.7 I couldn’t use the with construct which would’ve not required the explicit zipfile close() statement.
1
2
3
4
5
6
7
8
9
10
import zipfile
from lxml import objectify
from operator import itemgetter
import itertools

# Unzip file, extract xml, convert to object    
zip = zipfile.ZipFile("../cia-1996.zip")
xmlfile = zip.open("cia-1996.xml")
root = objectify.fromstring("".join(line for line in xmlfile.readlines()))
zip.close()
This opens the zip file, extracts the xml out of it, concatenates all the lines as a single string and converts the data into a single object referred to as root Note: the itemgetter and itertools imports are used subsequently. Procedural : Find the country with the highest population
1
2
3
4
5
6
7
8
9
# Country with highest population
maxp = 0
maxc = ''
for country in root.country :
    if int(country.get('population',0)) >= maxp :
        maxc, maxp =country.get('name'),int(country.get('population',0))
print (maxc,maxp)

# Output is : ('China', 1210004956)
Procedural : Top 5 countries with highest inflation rates
1
2
3
4
5
6
7
8
# Top 5 countries with highest inflation rates
inflation_country_tuples = []
for country in root.country :
    inflation_country_tuples.append((float(country.get('inflation',0.0)), country.get('name')))
inflation_country_tuples = sorted(inflation_country_tuples, key=itemgetter(0), reverse=True)
print inflation_country_tuples[0:5]

# Output : ((244.0, 'Belarus'), (94.0, 'Turkey'), (85.0, 'Azerbaijan'), (83.299999999999997, 'Malawi'), (71.299999999999997, 'Yemen'))
This extracts the inflation, name tuple from each country and creates a list out of it, sorts the list using inflation in a descending order and then prints the first five elements. Procedural : Sorted continents, each associated with all their sorted countries
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Countries by continent
continent_country_tuples = []
for country in root.country :
    continent_country_tuples.append((country.get('continent',''), country.get('name')))
continent_country_tuples = sorted(continent_country_tuples)
current_continent = None
countries_of_continent = None
continent_grouped_countries = []
for continent, country in continent_country_tuples :
    if continent != current_continent :
        if current_continent :
            continent_grouped_countries.append((current_continent, countries_of_continent))
        countries_of_continent = []
        current_continent = continent
    countries_of_continent.append(country)
continent_grouped_countries.append((current_continent, countries_of_continent))
print continent_grouped_countries

# Output : too long to include here
Functional Programming Solutions Incidentally the same problem can also be solved using very functional programming constructs as follows. This shows an interesting contrast of solutions in both the procedural and functional programming ways. The logic used across both the sets is virtually the same thought the constructs are different.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Country with highest population
print reduce(lambda (maxc,maxp), c :
             (c.get('name'),int(c.get('population',0)))
                if int(c.get('population',0)) >= maxp else
                    (maxc,maxp),
             root.country,('',0))

# Top 5 countries with highest inflation rates
print tuple(itertools.islice(
            sorted((
                (float(country.get('inflation',0.0)), country.get('name'))
                        for country in root.country),
                key=itemgetter(0),reverse=True),
                5))

# Countries by continent
print tuple((continent,tuple(country[1] for country in countries))
        for continent, countries in itertools.groupby(
            sorted((country.get('continent',''), country.get('name'))
                    for country in root.country),itemgetter(0)))
Update I further structured the functional programming approach code. This code has no list comprehensions (for loops) at all. The code is as follows :
1
2
3
4
5
6
7
8
9
10
# Find the country with the maximum population
print reduce(lambda country_pop_max, country_pop_next : country_pop_next if country_pop_next[1] > country_pop_max[1] else country_pop_max,
        map(lambda country : (country.get('name'), int(country.get('population', 0))),root.country),('',0))

print tuple(itertools.islice(sorted(
            map(lambda country : (country.get('name'), float(country.get('inflation', 0.0))),root.country),
            key=itemgetter(1),reverse=True),5))

print map(lambda (continent, continent_country_tuples) : (continent, map(lambda (continent, country) : country, continent_country_tuples)),
    itertools.groupby(sorted(map(lambda country : (country.get('continent'), country.get('name')),root.country)),itemgetter(0)))
Since the above is likely to be a little too cryptic and confusing, here’s the detailed commented code (only comments and whitespace added)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
print reduce(
        # Comparator to decide the country with the maximum population
        lambda country_pop_max, country_pop_next : country_pop_next if country_pop_next[1] > country_pop_max[1] else country_pop_max,
        # get a sequence of tuples of (country name, country population)    
        map(lambda country : (country.get('name'), int(country.get('population', 0))),root.country),
        # initial seed
        ('',0))

print tuple(
        # take the top 5 items
        itertools.islice(
            # sort the list
            sorted(
                # get a sequence of tuples of (country name, inflation)   
                map(lambda country : (country.get('name'), float(country.get('inflation', 0.0))),root.country),
                # sorting to be done using the second element of the tuple
                key=itemgetter(1),
                # sort to be done in the descending order
                reverse=True),
            # count of elements to be sliced
            5))

print map(
    # function to flatten the continent_country_tuple tuple into a country tuple      
    lambda (continent, continent_country_tuples) :
        # first element of the tuple is continent
        (continent,
        # as the second element, return only the country from the continent country tuple to form a tuple of countries                                        
        map(lambda (continent, country) : country, continent_country_tuples)),
    # group the continent country tuples by continent
    itertools.groupby(
        # sort the continent country tuples
        sorted(
            # extract a continent country tuple from the country
            map(lambda country : (country.get('continent'), country.get('name')),root.country)),
        # this is the function to extract the key to sort by
        itemgetter(0)))

Comments