GSOC 2016 blog: wrap-up


This blog post summarizes the main tasks that I have done during these 3 months as a GSoC’16 Intern, and the things I have learned along with that.

I have been working on the Ecodata Retriever project, with my mentors Henry Senyondo and Ethan White under Numfocus.

All the commits I have made during this period are listed here on this Github link:


Note: All the code has been merged to master branch.


 Lists of tasks:

a) Upgrade scripts to Datapackage.JSON standard.

This was my main GSoC task, that I spent most of the last 6 weeks in, that includes code and documentation.


Summary 1:

  1. The scripts have been updated from .script format to .json using the parse_script_to_json module I wrote.
  2. A new CLI (command-line-interface) tool has been added by me, that can:
    1. Create new JSON scripts: Takes input for all the relevant fields from the user, validates the input, and stores them in valid JSON format (Datapackage.JSON standard).
    2. Delete JSON scripts: Deletes any script based on the script’s shortname. Searches the list of python scripts  (SCRIPT_LIST) and deletes the scripts that match the users requirement after confirming.
    3. (Experimental) Edit JSON scripts: This feature has not been completely tested, so currently disabled. This allows users to edit existing retriever scripts.
    4. Added unit-tests and modified integration tests to test input validation and JSON script integration (download and installation regression tests).
    5. Added documentation (link) to guide the user on this new tool.


b) Port retriever to Python 3, maintaining backwards compatibility.
Not a cakewalk at all. I already highlighted the various csv and encoding issues (UTF-8 / latin-1) in the previous post. But nevertheless, the library is now fully compatible, both on Python 2 and 3, on all major *NIX and Windows platforms (tested on Ubuntu, Mac, Windows 7).

I completed this in the first month of the GSoC period, and have been adding fixes related to all the bugs that came up during the rest of the coding period. I refactored the code so that there is no more need for explicit OS checks, thanks to help from my mentors Henry and Ethan.


Summary 2:

  1. retrievercan now be installed in either Python 2 or Python 3, without any difficulties.
  2. Cross-platform compatiblity (with python 2 and 3 both).
  3. Updated documentation(link) to reflect Python 3 support.



1. Python idioms

2. Unit testing (with pytest)

3. Different types of unicode encodings (UTF-8 and ISO 8859-1)

4. sphinx documentation system

5. git-fu!

6. Python 2 vs python 3    syntax and package-support differences.


In closing, it was an immensely rewarding learning experience, and I look forward to remain associated with the retrieverproject 😀

Thanks for reading!



GSoC Blog – Part II

This blog marks the end of the first 4 weeks of my GSoC internship with NumFocus.  As I have mentioned, I am working on the project EcoData Retriever (an awesome tool to download and examine ecological datasets) and its been a great learning experience so far.


Python 3

First things first – Ecodata Retriever now completely supports Python 2 and Python 3 natively. That isn’t to say that there aren’t bugs, but the build passes all tests on python2 and 3 on *nix and Windows systems. I would appreciate any bugs filed regarding the compatibility on the issue tracker.

For this task, I used the future package from pip, which made adding a lot of these changes very easy. Its a wonderful piece of software, and if you are looking to port your library to python 3 and maintain backwards compatibility, then you should look into it as well.

Even after using future though, there were a lot of issues, mainly involving”

  1. Unicode (especially UTF-8) and
  2. The csv module (which is difficult to backport).

The unicode changes were not that hard. All I did was decode() and encode() strings where Unicode or bytes value was needed (strings are bytes by default on python 2 and Unicode in python 3). Until unless bytes-type was required, I cast all strings to Unicode (UTF-8 by default).

The csv module though, was a lot of pain. It took me a while to realise that csv doesn’t work that well cross-platform (adds extra \r on opening in text mode on windows). Plus, it doesn’t play nice with the str module from the future.builtins module. I had to insert python version checks ( sys.version_info ) and OS checks (nt vs posix) to get it compatible on both python 2 & 3 across all platforms.

Datapackage standard

Next is my main GSoC task – Upgrading the dataset scripts to datapackage.json standard. This, thankfully, proves to be much easier than the former task. This has three main parts:

  1. Upgrade existing scripts to JSON
  2. Add CLI tool to create new JSON scripts
  3. and edit the existing ones.

I had already done the first part during my community bonding period, and thus did not have to spend a lot of time on that.

I have completed the major portions of the second task, by creating a new function to get input data from a user using python input() prompts. It was fairly easy, as I already had a discussion with Henry on the major changes that needed to be incorporated into the tool. And based on the datapackage.json specification, I came up with a nice format to port the current YAML like scripts to JSON.

We are currently reviewing the changes on this. Its a work in progress, and the final release will only come by the end of this month (or by August-end).

I hope to add the changes for the third sub-task in the next week. I’ll keep updating and posting as I go along.

Raspbian is overrated (or Why Lubuntu rocks)

Hello all.

One of my very old posts was based on my nascent interaction with Raspbian, the most popular distro that people generally install on their Raspberry Pi the first time around.

Now its no secret that Raspbian is very kludgy. The version that I was running on was based on Debian Wheezy, so I can’t vouch for what upgrades and fixes (Rasbian related) Jessie might have brought to the table.

I guess its more of a personal preference, but the Raspbian (Wheezy) felt very outdated and slow for my needs. And it came installed with Scratch and Mathematica, but not python GPIO libraries. Go figure.

But after experimenting on the Pi2 for a while, it suddenly dawned on me, that why a long time Ubuntu user like me, is not using Lubuntu on this board instead? One of my friends runs his old Pentium 4 on it, and its pretty smooth. I didn’t even hate using his desktop for some Arduino programming, until its UPS died and I had to switch to another system :p

So I decided to wipe and install Lubuntu 16.04 on my microSD for Pi2, and it didn’t disappoint me at all. The performance is good, browsers work without any hacks or lags, and best of all, I get an environment that I am very familiar with.

I would personally recommend it for all Pi2 hobbyists out there.




GSoC blog : The beginning

It’s been a long time that I posted here. Thankfully, I will have something useful to talk about this time.

I am very excited to have been selected as one of the interns for the Google Summer of Code (GSoC) program for the year 2016. Thanks to GSoC, I have become very interested in open source, and even become one of the mentors for an open-source org on Github ( Link ).

The organisation that I have been selected under is NumFOCUS. The project that I have chosen is the EcoData Retriever project on Github. My mentors are Ethan White and Henry Senyondo. They have been very helpful and encouraging during the GSoC application and community bonding phase.

My project’s main goals are as follows:

  1. Convert data scripts to datapackage.json standard.
  2. Add python 3 support.
  3. Resolve important issues to reach Retriever 2.0 milestone.

Thanks to working on this project, I have improved my Python programming skills and also learned how to use git properly.

Hoping to have a great time this summer!



Trials and tribulations of Theano+Jupyter

An ongoing post about how I setup my machine to use Theano for development.


    1.  Activate GPU support:
      • Install the latest CUDA
      • Get cuDNN libs from NVIDIA
      • copy cuDNN libs to CUDA folder (/usr/local/cuda-7.5/ for example)
    2. Setup a default editor:
      • Make a python file with name in ~/.jupyter
      • Add the following code (replace ‘subl’ with your editor of choice):

        c = get_config()
        c.JupyterWidget.editor = ‘subl’

Always fetch upstream!

Welcome to n00bsland!

Everybody should remember where there come from. Including their repos.

Why you ask? Becase git merge :/

I recently edited a few files in my branch of a repo and created pull request. Well guess what? The master had moved forward.

So the usual-

1. Delete local repo

2. Add remote upstream

git remote add upstream

3. Update local copy.

git fetch upstream

4. Create changes (again >_<)

5. Commit and pull.

Don’t be a n00b, always add upstream repo.

Getting started with Raspbian – First touches

Disclaimer : All the knowledge here has been scoured from the internet.

So I got myself a Raspberry Pi 2 this month. But due to some display connectivity issues (lack of HDMI monitor, USB keyboard,etc) on my part, I was unable to use it.

Finally now though, I have got it up and running. But there are a few customizations (and fixes) that I had to do before it became usable for me.

1. The terminal hotkey 

My number one problem was that there was no terminal hotkey (that launched in the same x session). I sorely missed that, needing to click on the taskbar icon and then shifting onto the keyboard to type the update and install commands.

Adding the shortcut is no biggie. Do the following:

    sudo leafpad ~/.config/openbox/lxde-pi-rc.xml

Find the tag and paste this line just before it:

and restart.

2. The youtube play icon
So there is a glitch in epiphany because of which the play icon of youtube doesn’t really go away.
5 seconds since the video started
So I just installed midori and now that’s my default browser.
sudo apt-get install midori
3. Headphones audio not working
This required configuring from the rasp-config utility.
sudo raspi-config 
Goto Advanced Options -> Audio -> Force 3.5 mm.
I’ll keep adding as and when I find something worth sharing.