MICHA.ELMUELLER

 

University project: Route planner for the university terrain

At my university every bachelor student of a computer science degree has to do a project in a team. For one semester you plan the project, do requirements engineering, etc. in a team of 3 students. In the next semester you actually code the project in a team of 6 students.

All teams had to implement the same project. The project was to build a route planner for the university terrain from ground up.
There were some things that were quite tricky: Things like multiple floors on top of each other.

About the routing: Usual algorithms for routing problems (Dijkstra, Bellman-Ford, etc.) take the approach of a weighted graph. Our team decided to go for a graph-based NoSQL database: Neo4J. Since we were going to build a route planner we might as well use a database that is inherently constructed using a graph.

Many of the other teams had problems with the routing algorithms.
If you were going to use some relational database, this was going to get quite ugly.

Don’t choose your technologies just because it is the only technology you happen to know! Choose the technologies based on the fact that it fits the job best.

As a web application framework we decided to go for Vaadin. Vaadin is a framework on top of GWT that enables you to write web applications like you would write a Swing application in Java. A cross-compiler converts Java code into JavaScript, HTML & CSS. Since most members of our team were familiar with Java, this was an easy choice. The framework worked quite well, very fast development cycle.

We also wrote a standalone desktop application for uploading and editing maps. But since we splitted the tasks I was only involved in the synchronization with the web app (which we did using git, see my article Git as an Update mechanism).

Other technologies involved: For printing PDFs we decided to go with LaTeX. We used node.js to scrape the university address book. This way we gathered a large amount of reasonable data for the database.

The student-projects will not be used productive. However the institute works on an example implementation that will be used.

 

Visualizing WikiLeaks Mirrors

After WikiLeaks released the diplomatic cables on 28. November 2010, several DNS services refused to resolve the domains. In the days and weeks after this incident about 2.200 mirror sites were set up by volunteers. This event shows how the decentralized structure of the internet was used to prevent censorship and depression.

The video shows a visualization of the wikileaks mirrors.

Finding all domains for the mirrors was not a problem, there are several sites listing adresses on a simple HTML page. This list can easily be parsed (for that task I used node.js). For resolving these domains to a WGS84 coordinate I used the same free GeoIP database as in the traceroute project. For more details on resolving domains to coordinates and mapping them on a globe see my last blog post (visualizing traceroute).

It is pretty interesting that the servers in fact are distributed over the whole world. Most of them are located — not really surprisingly — in Central Europe. But there are also some mirrors in China. Of course these results give no 100% exact location, but I think a tendency is clearly visible.

I’ve put together a little video of the global mirror distribution:

directlink to vimeo

Visualizing traceroute

Notice: This article was originally published on the blog ioexception.de (in german).

To get more familiar with Processing and OpenGL I wrote a graphical frontend for the Unix progarm traceroute. The output of traceroute is a list of stations a packet takes on it’s way through the network. This way network connection can easily be debugged, for example.

Technically this is realized with a “Time-To-Live”-field in the header of IP-packets. The TTL-entry describes after how many stations a packet should be discarded. Each router, which the packet passes, decrements this field. Once the TTL reaches 0 the packet is discarded and the sender gets notified with the ICMP-message TIME_EXCEEDED.

traceroute makes use of this and repeatedly sends packets to the destination host. The TTL gets incremented with each packet until the destination host is reached. The hosts on the route will give notice via ICMP-message. This way we will gather informations about the hosts and hopefully be able to identify the individual hosts on the route. The route may not be correct inevitably. There are several reasons for possible variations, e.g. firewalls often completely disable ICMP.

For the visualization I have tied traceroute to Processing. For further explanations on how to this see my blog post at ioexception.de. Though the post is in german the code will make things clear. It’s not really a complicated to do. The frontend reads the output of the command traceroute domain.org until EOF. Each line gets parsed and each individual host is resolved to an IP-address. Then a coordinate for this IP is assigned.

The coordinates can then — with some sin/cos magic — be mapped on a globe. Resolving IPs to a Geolocation is realized using a GeoIP database. GeoIP databases assign a coordinate for an IP with a certain probability and are not specifically 100% exact. But for our purpose this will do. There are some free suppliers and many commerical ones. I decided to give the free GeoLite City by Maxmind a go. This way we can resolve IP adresses to a WGS84 coordinate.

For the fronted I wrote a visualization in Java using the Processing API. The texture of the globe gets furthered rendered using a shader written in GLSL. Libraries I used: GLGraphics (OpenGL Rendering Engine for Processing), controlP5 (Button, Slider, Textfield) and toxiclibs (Interpolation & more numerical methods).

The source code is available under MIT on GitHub: visual-traceroute.

Some eye candy can be found within this video:

vimeo directlink.

Using git as an autoupdate mechanism

Notice: This article was originally published on the blog ioexception.de (in german).

If you are developing software with a client-server infrastructure, you may have the need to safely update your client systems. In my case I had several devices in ubiquitary environments without any human technical maintenance. I needed an update mechanism which makes it easy to deploy new software to all machines.

I had several needs for the system:

  • Authentication
    The connection has to be authenticated. This was a huge problem in the main operating systems and is still a big problem in many programs. The Instant Messenger-Exploit earlier this year for example exploited that the system didn’t authenticate the update server. Attack vectors are packet-spoofing or manipulating the hosts file.
  • Fallbacks
    If the update fails one should always be able to easily restore to the last working state. It is inacceptable that any data gets lost, everything should be stored.
  • Scripting
    I want to be able to hook scripts in every part of the update-process (after updating, before updating, etc.). You could use this to reboot the device after installing updates, to check if the update went successful, or to execute database changes after all files are pulled down to the device.
  • Authorization
    The update server must not be publicly available. Instead, clients fetching updates have to provide authorization data in order to get the software updates. It should be possible to later build a group-based license policy for different software versions on top.

I decided to use git for this purpose because it fulfilled several of my needs from the start and I am already quite familiar with it. The way I have set up the system looks like this:

On the server side:

  • Web server, accessable only through HTTPS, using a self-signed X.509 certificate.
  • A bare git repository, protected with basic authentication: https://updates.foo.net/bar.git.
    This is needed to ensure that only devices with correct authorization data have access to the repository.
    If you have new updates for the clients you push them to this repository.
  • hooks/post-update: Gets executed if you push new updates to the repository.

    git update-server-info  # required by git for HTTP repositories
    find ~/bar.git -exec chmod o+rx '{}' \;  # clients must have access to the repository

On the client side:

  • Cronjob. Regularly check for updates. check.sh contains:
    #!/bin/sh
    # your repository folder
    cd "~/bar.git"
    
     
    # fetch changes, git stores them in FETCH_HEAD git fetch
     
    # check for remote changes in origin repository newUpdatesAvailable=`git diff HEAD FETCH_HEAD` if [ "$newUpdatesAvailable" != "" ] then # create the fallback git branch fallbacks git checkout fallbacks
     
    git add . git add -u git commit -m `date "+%Y-%m-%d"` echo "fallback created"
     
    git checkout master git merge FETCH_HEAD echo "merged updates" else echo "no updates available" fi
  • Hook hooks/post-merge to execute database-changes, reboot or check if updates were successful.

The client has a copy of the server certificate and uses it to authenticate the connection. So you can be sure that you are connected to the right server. To ensure this your .gitconfig on the client should contain:

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*
        url = https://user@updates.foo/bar.git
[http]
	sslVerify = true
	sslCAInfo = ~/server.crt
[core]
	# needed to provide the pw for the http authentication
	askpass = ~/echo_your_https_authentication_pw.sh

 
What are disadvantages of this setup?
I have chosen a version control system by intention, because my main need was to ensure that no data gets lost.

This might not be ideally for you: If your devices have little disk space or you often have to update a large part of the device system, this solution would not make sense. Since git stores everything internally, this would result in disk space loss. In this case you should rather look into a solution using rsync or similar tools.
rdiff-backup for example does incremental backups and offers the possibility of keeping just a fixed number of revisions.

Do not use SSH.
An easier way to setup such a system would be over SSH with restricted file permissions for the clients. The problem here is that you give devices real access to your server. An attacker could take the private SSH key of the client and connect to the remote server. The past has shown, it is not uncommon that flaws in operating system kernels or system libraries are found. A worst case scenario could be that an attacker manipulates the repository on the server. All clients would then fetch the manipulated content. Fetching manipulated content is not prevented in my setup either, but someone must have a login on the server.

Thanks to nico for his — as always 🙂 — helpful remarks.

Saving disk space by eliminating duplicate files

A friend of mine, matou, wrote a script that searches a file tree recursively, hashes every file and outputs duplicate files.
I forked his script and added several functionalities. I added a flag -s, which makes the script to keep just one occurence of duplicate files, the other duplicates are deleted. Then hardlinks are set from the location of the, now deleted, files to the one occurence that still exists.

$ python duplicatefiles.py -s /foo/

In fact the file system should look exactly the same to the operating system after execution — besides from using less space. I used the script to clean my media folder and saved nearly 1 gb of disk space.

You can also do a dry run:

$ python duplicatefiles.py -l=error -c ./foo
these files are the same:
.//music/foo/bar.mp3
.//music/bar/foo.mp3
-------------------------------
Duplicate files, total:  15
Estimated space freed after deleting duplicates: ca. 40 MiB

The script is available under a BSD-like license on github.
I tested it under Mac OS X but it should work on other UNIX systems as well.

I would recommend to make a backup before executing the script.

LaTeX Template Collection

I have a significant stream of visitors looking for LaTeX invoice templates (my magento-on-latex post is probably good seo 😉 ).
When I started using LaTeX I had exactly the same need and wrote my own invoice template.

In the meantime I have created several other templates. I now packed them all up and put a repository latex-template-collection up on github. Currently there are templates for letters, timesheets (thanks to Dennis Mack!) and invoices. I am going to contiously add new templates.

The templates still have room for optimization (e.g. automatically increment the invoice position, etc.). Please feel free to contribute.

Google Chrome Extension: Compact Wikipedia Layout

I wrote another simple extension for the Chrome browser. Sometimes I have to deal with two browser windows placed side by side — with the standard MediaWiki-Laoyut this looks very messed up and is a pain to read.

The extension overwrites parts of the stylesheet and displays the content in a clean and easy readable way.

 

Standard MediaWiki-Layout

With extension

 

The extension is available in the Google Gallery.

Btw: I use Cinch (Mac) to easily place floating windows. Great program. Really useful!

Magento on LaTeX

Several banking houses use LaTeX to generate their bank statements (in german: Kontoauszüge) into pdfs. There are many pros to such a system: speed, small pdf size and a large number of packages, for example. In my case I had to adjust the invoices of a Magento Online-Shop instance.

I didn’t want to get into the Zend PDF printer and since I am quite familiar with LaTeX I took the opportunity to develop an extension. Of course no web server runs TeX out of the box. But since it is really easy to install a TeX distribution like TexLive, this shouldn’t be a problem if you own a root server and have some disk space.

 
Basically I wrote an extension for Magento which serves pdf invoices generated from a LaTeX template.
This means that it is very easy to modify your invoice template, so that it uses your corporate design etc.. This also means that your invoices will stop looking poor and start looking tight 🙂 .

The code is available for free, licensed under the LGPL.

Check out my github account for more informations.

 
Update: Seeing that there were some visitors looking for LaTeX invoice templates: You find them here.

Update: If you are interested in what other industry applications LaTeX is used for check out this post.

Important: Wolfgang Mederle has created an updated version of the module. I wrote my code a couple of years ago and Magento has undergone some changes in the meantime. If you want to use the module today you will be better off using Wolfang’s module. You can find his code here. Thanks Wolfgang!

About Me

I am a 29 year old techno-creative enthusiast who lives and works in Berlin. In a previous life I studied computer science (more specifically Media Informatics) at the Ulm University in Germany.

I care about exploring ideas and developing new things. I like creating great stuff that I am passionate about.

License

All content is licensed under CC-BY 4.0 International (if not explicitly noted otherwise).
 
I would be happy to hear if my work gets used! Just drop me a mail.
 
The CC license above applies to all content on this site created by me. It does not apply to linked and sourced material.
 
http://www.mymailproject.de