Setting Up a CSV File

First Things First

Introduction

Who has not seen the now famous video where Hans Rosling shows the evolution of two hundred countries over two hundred years with respect to income per capita and life expectancy? In this tutorial we will learn how to use a command-line utility call curl and ruby script to create an input file that would allow us to reproduce this video with popcharts.

We will be using a two step process to create our gapMinder.txt input file.

  1. Downloading the source files using curl
  2. Merge them together using a ruby script

Download the scripts and sample files so that you can follow along.

GapMinder Indicators == Row-Major

We will download two types of files from Gap Minder: country mappings and indicators. Country mappings are available in CSV format and already follow a column-major layout. This means that thery are easy to process. On the other hand indicators follow a row-major layout and will need extra processing. Here is what a typical indicator file looks like:

VariableName,1810,1811,1812,1813,1814...2010
Country-A,0.123,0.128;;0.134,0.2,...1.9
Country-B,0.289,0.301;;0.335,0.336,...1.9
...

There will be one such file for every variable that we wish to study. Further more, there might be gaps in the years and the year range is not the same for all files. To further complicate things, some files use quoted values for numerical data using a comma as the thousands separator. We need to lookout for these in our ruby script.

Step 1: Downloading the source files

curl is a really powerful command-line utility and we will use it to download the public google spreadsheets made available by gapminder.org. Open the gap-fetch.sh file in your favorite text editor to see how we do this:

#!/bin/bash
 
# Simple Country Mappings
curl --silent "https://spreadsheets.google.com/pub?key=0ArfEDsV3bBwCdHlocWJBM1Uyd1ZPeHduU3c4WGplMEE&output=csv" > gapContinent.txt
curl --silent "https://spreadsheets.google.com/pub?key=phT4mwjvEuGBtdf1ZeO7_PQ&output=csv" > gapRegion.txt
curl --silent "https://spreadsheets.google.com/pub?key=0ArtujvvFrPjVdHUzTGVicFJZQ1NjaFhqelV5SDNxMVE&output=csv" > gapMainReligion.txt
 
# Indicators
curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj1jiMAkmq1iMg&output=csv" > gapIncomePerCapita.txt
curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj0TAlJeCEzcGQ&output=csv" > gapTotalFertility.txt
curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj2tPLxKvvnNPA&output=csv" > gapLifeExpectancy.txt
curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj0XOoBL_n5tAQ&output=csv" > gapPopulation.txt

Go to the data section at gapminder.org and find an indicator that interests you. For this example we have chosen "Urban Population". Go over to the Excel icon but don't click on it. Instead, right-click on it and copy the link to the clipboard and past in at the bottom of the script file. Here is what we get:

http://spreadsheets.google.com/pub?key=phAwcNAVuyj0-LE4StzCsEw&output=xls

All we need to do now is change the protocol to https, change the output type at the end of the URL to csv and add the surrounding curl command and redirect. Here is what this line should look like after some manipulation:

curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj0-LE4StzCsEw&output=csv" > gapUrbanPopulation.txt

Finally, run the script and wait for it to finish downloading the sample files to your local hard drive by issuing the following command:

          gap-fetch.sh

Step 2: Merge the Files together

We use a short ruby script to pivot and merge the gap minder files together. Going over the details of the script is beyond the scope of this tutorial but the code is simple so go ahead and have a look.

To merge the downloaded files use the following command line

        gap-merge.rb 1810 2010 gap*.txt > ../gapMinder.txt

The first few lines of the output file will look something like this:

Country	Year	Continent	MainReligion	Region	IncomePerCapita	LifeExpectancy	Population	TotalFertility	UrbanPopulation
Afghanistan	1810	[Asia]	Muslim	South Asia	472.05349959347	28.1085333333333			
Afghanistan	1811	[Asia]	Muslim	South Asia	472.05349959347	28.0982866666667			
Afghanistan	1812	[Asia]	Muslim	South Asia	472.05349959347	28.08804			
Afghanistan	1813	[Asia]	Muslim	South Asia	472.05349959347	28.0777933333333			
Afghanistan	1814	[Asia]	Muslim	South Asia	472.05349959347	28.0675466666667			
Afghanistan	1815	[Asia]	Muslim	South Asia	472.05349959347	28.0573			
Afghanistan	1816	[Asia]	Muslim	South Asia	472.05349959347	28.0470533333333			
Afghanistan	1817	[Asia]	Muslim	South Asia	472.05349959347	28.0368066666667			
Afghanistan	1818	[Asia]	Muslim	South Asia	472.05349959347	28.02656

Congratulations!

You now have an input file that will allow you to reproduce Hans Roslings famous video.