GSOC 2016 blog: wrap-up


This blog post summarizes the main tasks that I have done during these 3 months as a GSoC’16 Intern, and the things I have learned along with that.

I have been working on the Ecodata Retriever project, with my mentors Henry Senyondo and Ethan White under Numfocus.

All the commits I have made during this period are listed here on this Github link:


Note: All the code has been merged to master branch.


 Lists of tasks:

a) Upgrade scripts to Datapackage.JSON standard.

This was my main GSoC task, that I spent most of the last 6 weeks in, that includes code and documentation.


Summary 1:

  1. The scripts have been updated from .script format to .json using the parse_script_to_json module I wrote.
  2. A new CLI (command-line-interface) tool has been added by me, that can:
    1. Create new JSON scripts: Takes input for all the relevant fields from the user, validates the input, and stores them in valid JSON format (Datapackage.JSON standard).
    2. Delete JSON scripts: Deletes any script based on the script’s shortname. Searches the list of python scripts  (SCRIPT_LIST) and deletes the scripts that match the users requirement after confirming.
    3. (Experimental) Edit JSON scripts: This feature has not been completely tested, so currently disabled. This allows users to edit existing retriever scripts.
    4. Added unit-tests and modified integration tests to test input validation and JSON script integration (download and installation regression tests).
    5. Added documentation (link) to guide the user on this new tool.


b) Port retriever to Python 3, maintaining backwards compatibility.
Not a cakewalk at all. I already highlighted the various csv and encoding issues (UTF-8 / latin-1) in the previous post. But nevertheless, the library is now fully compatible, both on Python 2 and 3, on all major *NIX and Windows platforms (tested on Ubuntu, Mac, Windows 7).

I completed this in the first month of the GSoC period, and have been adding fixes related to all the bugs that came up during the rest of the coding period. I refactored the code so that there is no more need for explicit OS checks, thanks to help from my mentors Henry and Ethan.


Summary 2:

  1. retrievercan now be installed in either Python 2 or Python 3, without any difficulties.
  2. Cross-platform compatiblity (with python 2 and 3 both).
  3. Updated documentation(link) to reflect Python 3 support.



1. Python idioms

2. Unit testing (with pytest)

3. Different types of unicode encodings (UTF-8 and ISO 8859-1)

4. sphinx documentation system

5. git-fu!

6. Python 2 vs python 3    syntax and package-support differences.


In closing, it was an immensely rewarding learning experience, and I look forward to remain associated with the retrieverproject 😀

Thanks for reading!



One thought on “GSOC 2016 blog: wrap-up”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s