Who has not seen the now famous video where Hans Rosling shows the evolution of two hundred countries over two hundred years with respect to income per capita and life expectancy? In this tutorial we will learn how to use a command-line utility call
ruby script to create an input file that would allow us to reproduce this video with popcharts.
We will be using a two step process to create our
gapMinder.txt input file.
- Downloading the source files using curl
- Merge them together using a ruby script
Download the scripts and sample files so that you can follow along.
GapMinder Indicators == Row-Major
We will download two types of files from Gap Minder: country mappings and indicators. Country mappings are available in CSV format and already follow a column-major layout. This means that thery are easy to process. On the other hand indicators follow a row-major layout and will need extra processing. Here is what a typical indicator file looks like:
VariableName,1810,1811,1812,1813,1814...2010 Country-A,0.123,0.128;;0.134,0.2,...1.9 Country-B,0.289,0.301;;0.335,0.336,...1.9 ...
There will be one such file for every variable that we wish to study. Further more, there might be gaps in the years and the year range is not the same for all files. To further complicate things, some files use quoted values for numerical data using a comma as the thousands separator. We need to lookout for these in our ruby script.
Step 1: Downloading the source files
curl is a really powerful command-line utility and we will use it to download the public google spreadsheets made available by gapminder.org. Open the
gap-fetch.sh file in your favorite text editor to see how we do this:
#!/bin/bash # Simple Country Mappings curl --silent "https://spreadsheets.google.com/pub?key=0ArfEDsV3bBwCdHlocWJBM1Uyd1ZPeHduU3c4WGplMEE&output=csv" > gapContinent.txt curl --silent "https://spreadsheets.google.com/pub?key=phT4mwjvEuGBtdf1ZeO7_PQ&output=csv" > gapRegion.txt curl --silent "https://spreadsheets.google.com/pub?key=0ArtujvvFrPjVdHUzTGVicFJZQ1NjaFhqelV5SDNxMVE&output=csv" > gapMainReligion.txt # Indicators curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj1jiMAkmq1iMg&output=csv" > gapIncomePerCapita.txt curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj0TAlJeCEzcGQ&output=csv" > gapTotalFertility.txt curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj2tPLxKvvnNPA&output=csv" > gapLifeExpectancy.txt curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj0XOoBL_n5tAQ&output=csv" > gapPopulation.txt
Go to the data section at gapminder.org and find an indicator that interests you. For this example we have chosen "Urban Population". Go over to the Excel icon but don't click on it. Instead, right-click on it and copy the link to the clipboard and past in at the bottom of the script file. Here is what we get:
All we need to do now is change the protocol to
https, change the output type at the end of the URL to
csv and add the surrounding
curl command and redirect. Here is what this line should look like after some manipulation:
curl --silent "https://spreadsheets.google.com/pub?key=phAwcNAVuyj0-LE4StzCsEw&output=csv" > gapUrbanPopulation.txt
Finally, run the script and wait for it to finish downloading the sample files to your local hard drive by issuing the following command: