OpenData Archive - MICHA.ELMUELLER

May 7, 2014 1

Exploring the ZEIT ONLINE API

The german weekly newspaper “DIE ZEIT” has an API available. This means it is easily possible for developers to use a lot of their data. Since they have made access to the data of nearly 400.000 articles since 1945 possible this is quite interesting (access to full texts is sadly missing, but a lot of other stuff is available). This post is about some of the interesting things I found whilst exploring the API.

My initial idea was to visualize how the ratio of articles with anglicisms evolved over time. At the moment this is too complex a project, due to the fact that getting the necessary data via the current API is difficult. However, I made some other interesting findings along the way.

The Wiktionary project provides a list of anglicisms (around 960 words) which I parsed out and used to search for articles concerning these words. This gave a list of how many matching articles on this word had been written each year since 1945. I also made an empty search to find out how many articles were created in total each year. These numbers could then be used to calculate the percentage of articles with anglicisms in each year.

Not all of the words provided interesting results but here is selection of some interesting ones. Please be aware that the statistics show a zoomed-in range. This is not a scale of 0-100%!

One should be very careful to interpret reasons for the peak just by looking at the visual representation. A potential reason might be the Gulf War in 1990–91 (the german translation is: “Golfkrieg”). Other causes worth investigating could be successes of german golf athletes or events around the VW Golf automobile.

Potential reasons for the peaks could be: in 1985 the Sinking of the Rainbow Warrior, in 1995 the Brent Spar protests and in 2010 the Deepwater Horizon.

The peak in 1987 could relate to the increased media coverage on aids. Also in 1987 the Institute for German Language (Gesellschaft für deutsche Sprache) chose “aids” has as the word of the year.

The peak in 1970 is most interesting to me, a potential cause could be the movement of 1968.

I have made the code used to gather the data and build the visualizations available under the MIT license via GitHub.

May 2, 2014 4

GTFS Visualizations

GTFS is an abbreviation for General Transit Feed Specification, a standard which “defines a common format for public transportation schedules and associated geographic information”. Basically this is a possibility for public transport agencies — like the Stadtwerke Ulm/Neu-Ulm (SWU) for example — to release their data to the public in a proper manner. Fortunately some agencies have done so (here’s a list). In Germany the agencies in Ulm and Berlin have released their schedule data under a free license as GTFS. In both cases this process was pushed forward by local Open Data enthusiasts who were involved in this process. Together with some friends from the UlmAPI group, I was involved within the efforts here in Ulm and it has since tempted me to create something from this data.

So basically I wrote a program which visualizes GTFS. The program draws the routes which transportation entities take and emphasizes the ones which are frequented more often by painting them thicker and in a stronger opacity. Since many agencies have released their schedule as GTFS it is easily possible to reuse the program as a mean to visualize different transportation systems in different cities.

So here are the renderings for some GTFS feeds! Just click on the thumbnails to get a larger image. The color coding is: red=busses, green=subway/metro, blue=tram.

Madrid
GTFS data: Empresa Municipal de Transportes.
Download: PNG (1.4 MB) | PDF (0.4 MB)

Miami
GTFS data: Miami Dade Transit.
Download: PNG (0.3 MB) | PDF (0.8 MB)

San Diego
GTFS data: San Diego Metropolitan Transit System.
Download: PNG (0.5 MB) | PDF (0.6 MB)

Ulm
GTFS data: Stadtwerke Ulm/Neu-Ulm.
Download: PNG (0.4 MB) | PDF (0.12 MB)

Washington DC
GTFS data: DC Circulator & MET.
Download: PNG (1.2 MB)

Los Angeles
GTFS data: Metro Los Angeles.
Download: PNG (0.9 MB)

San Francisco
GTFS data: San Francisco Transportation Agency.
Download: PNG (1 MB) | PDF (1.1 MB)

I am very satisfied with the resulting images, which in my opinion look really beautiful. I have rendered some of the cities as PDFs as well. With the momentary program, this is a very time consuming process and for some cities — due to performance or memory issues — not even possible on my (quite sophisticated) pc. This is due to the enormous transportation schedule (> 300 MB, ASCII) of some cities. But my program can surely be heavily optimized.

Please note: These visualizations would not exist without Open Data. This project was only possible because of transport agencies releasing their data under a free license. One should not forget that the existence of projects like this is a major benefit of Open Data.

Also one should not forget that standardized formats in the Open Data scene have proven to be a major benefit. Existing applications can easily be re-deployed like in the case of Mapnificent, OpenSpending or, well, in mine.

The best thing to do with your data will be thought of by someone else.
—Rufus Pollock

License & Code
The images are licensed under a Creative Commons Attribution 4.0 International license (CC-BY 4.0). Feel free to print, remix and use them! The source code is available via GitHub under the MIT license. Please note that it definitely has to be properly refactored since it wasn’t designed, but rather grew. That’s also the reason for using two different technologies (node.js and processing) within the project. I had a different thing in mind when I started coding.

Preventing misunderstandings
To prevent misunderstandings: The visualizations show only the data released by the according agencies! So in the case of e.g. Madrid there exists a metro line which is not shown in the visualization above. This is due to a different agency — who has not yet released their data as GTFS — operating the metro line. I hope that more agencies start to make their data freely available after seeing which unexpected and beautiful results they might get.

Another misunderstanding which I want to directly address: The exact GTFS feed is visualized. This means that when looking closely at the resulting PDF you may find some lines which are very close to another and might even overlap in part. This is no bug, but the way the shapes are defined in the feed.

Printing
If you want to print the visualizations: I have created two posters (DIN A0). The graphics within them are properly generated PDFs in CMYK. So be aware that the colors will look different on your screen than when printed.

(click on image to enlarge)

Madrid (PDF, 11 MB)

(click on image to enlarge)

Madrid, Ulm, Washington, San Diego (PDF, 81 MB)

Apr 15, 2014 0

The Principles of Datalove — Audiomashup

Some years ago the Telecomix crew came up with the term datalove and wrote an according manifesto (see here for more details):

Love data
Data is essential
Data must flow
Data must be used
Data is neither good nor bad
There is no illegal data
Data is free
Data can not be owned
No man, machine or system shall interrupt the flow of data
Locking data is a crime against datanity
Love data

I use the term datalove quite often when referring to the free culture or open data movement. About two years ago I had the idea to create a voice mashup from the text and recorded various female friends reading the text. In order to give the mashup an electronic, digital feeling I alienated the voices a bit over an ambient electronic track (2012 by pielkor, CC-BY 3.0).

soundcloud direct link

At the time, two years ago, the result was not like I imagined and I wasn’t satisfied. So I didn’t release it online. Yesterday I listened to the track again and was quite surprised. It was by far not as bad as I recalled it. This angers me somehow. I have a lot of stuff, video interviews, photos, software, visualizations, which I haven’t released because I was unsatisfied with the quality, got aware of technical shortcomings whilst working on the project or realized how it could have been done better. In part, I am also trying to avoid giving other people a possibility to attack my own work. Today I think it was stupid not to release projects like this and I regret it. It was a nice project and I should let other people decide if they can use it or not.

I have to thank Saron, Zenib, Sonja, Kate, Amrei, Natty, Jenny, Elizabeth and Lisa without whom this mashup would not have been possible. The track is licensed under a Creative Commons 4.0 International Attribution license (CC-BY 4.0).

The student group I participate in is called datalove as well, ulmAPI is an open data project by the datalove group.

Mar 6, 2013 1

Open Data Hackathon February 2013

On February the 23rd the datalove university group participated in a global Hackathon centered around Open Data. We gathered within a room at the university and worked on different projects all day. At peak level we were around 17 people: university students and personnel, students by the university of applied sciences and local politicians. We organized enough food, coffee and stuff for everyone and spent a nice day working on many different projects. Mainly to highlight:

Falco worked on updating the LiveMap, which we have created about two years ago in an 48hr hackathon. For most of us this had been the first bigger node.js project and so it was time to correct some faulties. To paraphrase Stefan: “While looking for better ways on how to do such a project, I only found other people who forked our stuff.” Well, we are not entirely certain, if that is a good thing ;).
Some Open Data activists from Cologne are currently adapting the project to their city: schienenliebe.de. It is always very nice to see other people being able to build upon your work!
Benjamin took use of the shape files (= geodata of local city districts) for Ulm. We gathered this data under a free license about two years ago, but never had any use for them — until now!
Check out Click that ‘hood!
I took the time to work on an idea which I had in mind for a long time: visualizing different facilities within Ulm which are currently open, on a web based map. This can be used to e.g. find out which bakeries in the inner city are still open on a Saturday evening. The application is online via oeffnungszeiten.ulmapi.de.
The opening hours data is gathered from the Open Street Map project. I plan to regularly export it from there, although I first have to manually correct some of the entries, since not all of them are valid. I also plan to add new opening hour entries to the map, though I am not yet entirely sure about how to approach that.
Stefan is working on visualizing the household budget of Ulm. The respective data has been made available to the public under CC-BY in the meantime. If you have any knowledge on Doppik calculations, I am sure he would appreciate help!
Some friends were brainstorming about network visualizations considering the university. When I heard of the idea I was quite enthusiastic and went to talk to the local network administrators. As a result we got a nice data treasure: sanitized log data of all (~360) access points on the university terrain over the duration of one week. Under ODbL v1.0. _This_ is quite nice. The data is available here. I spent nearly all of the Hackathon writing a parser for the data. When you have 76 MB ASCII stuff (> 500 000 entries) a database is worth it. In the meantime the parser is finished but we are still missing geolocations for the access points. For this purpose I wrote a very simple web application to crowd-source the process of collecting geolocations for all access points. But this (and the resulting visualizations) are material enough for one separate blog post, once the project is finished!

The sourcecode for most of the projects described is available online via GitHub, either on github.com/UlmApi or on github.com/cmichi.

MICHA.ELMUELLER

Exploring the ZEIT ONLINE API

GTFS Visualizations

The Principles of Datalove — Audiomashup

Open Data Hackathon February 2013

About Me

Links

License

Categories