This blog marks the end of the first 4 weeks of my GSoC internship with NumFocus. As I have mentioned, I am working on the project EcoData Retriever (an awesome tool to download and examine ecological datasets) and its been a great learning experience so far.
First things first – Ecodata Retriever now completely supports
Python 2 and
Python 3 natively. That isn’t to say that there aren’t bugs, but the build passes all tests on python2 and 3 on *nix and Windows systems. I would appreciate any bugs filed regarding the compatibility on the issue tracker.
For this task, I used the future package from pip, which made adding a lot of these changes very easy. Its a wonderful piece of software, and if you are looking to port your library to python 3 and maintain backwards compatibility, then you should look into it as well.
Even after using
future though, there were a lot of issues, mainly involving”
- Unicode (especially UTF-8) and
csvmodule (which is difficult to backport).
The unicode changes were not that hard. All I did was
encode() strings where Unicode or bytes value was needed (strings are bytes by default on python 2 and Unicode in python 3). Until unless bytes-type was required, I cast all strings to Unicode (
UTF-8 by default).
csv module though, was a lot of pain. It took me a while to realise that
csv doesn’t work that well cross-platform (adds extra
\r on opening in text mode on windows). Plus, it doesn’t play nice with the
str module from the
future.builtins module. I had to insert python version checks (
sys.version_info ) and OS checks (
posix) to get it compatible on both python 2 & 3 across all platforms.
Next is my main GSoC task – Upgrading the dataset scripts to datapackage.json standard. This, thankfully, proves to be much easier than the former task. This has three main parts:
- Upgrade existing scripts to JSON
- Add CLI tool to create new JSON scripts
- and edit the existing ones.
I had already done the first part during my community bonding period, and thus did not have to spend a lot of time on that.
I have completed the major portions of the second task, by creating a new function to get input data from a user using python input() prompts. It was fairly easy, as I already had a discussion with Henry on the major changes that needed to be incorporated into the tool. And based on the datapackage.json specification, I came up with a nice format to port the current YAML like scripts to JSON.
We are currently reviewing the changes on this. Its a work in progress, and the final release will only come by the end of this month (or by August-end).
I hope to add the changes for the third sub-task in the next week. I’ll keep updating and posting as I go along.