GSoC Blog – Part II

This blog marks the end of the first 4 weeks of my GSoC internship with NumFocus.  As I have mentioned, I am working on the project EcoData Retriever (an awesome tool to download and examine ecological datasets) and its been a great learning experience so far.


Python 3

First things first – Ecodata Retriever now completely supports Python 2 and Python 3 natively. That isn’t to say that there aren’t bugs, but the build passes all tests on python2 and 3 on *nix and Windows systems. I would appreciate any bugs filed regarding the compatibility on the issue tracker.

For this task, I used the future package from pip, which made adding a lot of these changes very easy. Its a wonderful piece of software, and if you are looking to port your library to python 3 and maintain backwards compatibility, then you should look into it as well.

Even after using future though, there were a lot of issues, mainly involving”

  1. Unicode (especially UTF-8) and
  2. The csv module (which is difficult to backport).

The unicode changes were not that hard. All I did was decode() and encode() strings where Unicode or bytes value was needed (strings are bytes by default on python 2 and Unicode in python 3). Until unless bytes-type was required, I cast all strings to Unicode (UTF-8 by default).

The csv module though, was a lot of pain. It took me a while to realise that csv doesn’t work that well cross-platform (adds extra \r on opening in text mode on windows). Plus, it doesn’t play nice with the str module from the future.builtins module. I had to insert python version checks ( sys.version_info ) and OS checks (nt vs posix) to get it compatible on both python 2 & 3 across all platforms.

Datapackage standard

Next is my main GSoC task – Upgrading the dataset scripts to datapackage.json standard. This, thankfully, proves to be much easier than the former task. This has three main parts:

  1. Upgrade existing scripts to JSON
  2. Add CLI tool to create new JSON scripts
  3. and edit the existing ones.

I had already done the first part during my community bonding period, and thus did not have to spend a lot of time on that.

I have completed the major portions of the second task, by creating a new function to get input data from a user using python input() prompts. It was fairly easy, as I already had a discussion with Henry on the major changes that needed to be incorporated into the tool. And based on the datapackage.json specification, I came up with a nice format to port the current YAML like scripts to JSON.

We are currently reviewing the changes on this. Its a work in progress, and the final release will only come by the end of this month (or by August-end).

I hope to add the changes for the third sub-task in the next week. I’ll keep updating and posting as I go along.


One thought on “GSoC Blog – Part II”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s