May 7, 2014 1
The german weekly newspaper “DIE ZEIT” has an API available. This means it is easily possible for developers to use a lot of their data. Since they have made access to the data of nearly 400.000 articles since 1945 possible this is quite interesting (access to full texts is sadly missing, but a lot of other stuff is available). This post is about some of the interesting things I found whilst exploring the API.
My initial idea was to visualize how the ratio of articles with anglicisms evolved over time. At the moment this is too complex a project, due to the fact that getting the necessary data via the current API is difficult. However, I made some other interesting findings along the way.
The Wiktionary project provides a list of anglicisms (around 960 words) which I parsed out and used to search for articles concerning these words. This gave a list of how many matching articles on this word had been written each year since 1945. I also made an empty search to find out how many articles were created in total each year. These numbers could then be used to calculate the percentage of articles with anglicisms in each year.
Not all of the words provided interesting results but here is selection of some interesting ones. Please be aware that the statistics show a zoomed-in range. This is not a scale of 0-100%!
One should be very careful to interpret reasons for the peak just by looking at the visual representation. A potential reason might be the Gulf War in 1990–91 (the german translation is: “Golfkrieg”). Other causes worth investigating could be successes of german golf athletes or events around the VW Golf automobile.
The peak in 1987 could relate to the increased media coverage on aids. Also in 1987 the Institute for German Language (Gesellschaft für deutsche Sprache) chose “aids” has as the word of the year.
The peak in 1970 is most interesting to me, a potential cause could be the movement of 1968.
I have made the code used to gather the data and build the visualizations available under the MIT license via GitHub.