Milan mobility project

In this project we made a comparative study between clustering analysis, which is typically used in mobility scenarios, and supervised classification for the identification of home and work zones of an area. We will use a mobility dataset from the city of Milan to achieve this. Using passive mobile positioning data offers a powerful tool to study the geography and the mobility of the population. With the available data, we will try to identify workplaces and residential areas using both supervised classification and clustering. In order to generate training data for the classification model, we manually label several sub-regions of the available grid, one with random cells and another with a 20-by-20 resolution. Experimental results show that the kNN algorithm provides an acceptable accuracy that could be able to predict if a cell represent a working or a residential area for the full grid, thanks to the semi-supervised approach used in learning from a manually-labeled region. However, the results provided with k-means and k-medoids clustering show that it is not able to accomplish the former idea, instead it focuses on identifying the mobile traffic distribution around the city.

The experiment and its results are available in the Github repository linked. We have created a 20×20 and a random sub-grid in order to make the experiments, which consisted of performing a kNN classification of home-work label and later on a prediction of this label for the whole region using training data from the previously defined sub-grids.

Milan 20-by-20 grid visualization using geojson.io

Furthermore, we analysed several clustering algorithms in order to determine if they would be able to detect home and work areas. We used k-means and k-medoids clustering and found out that the results did not provide insight about the home or working zones, instead the clustering focused on detecting the areas with more to least mobile traffic in Milan.

We made a second paper presenting a study for different multi-label classification techniques for the area identification field.
The multi-label approach is novel and more suitable than just assigning one class to each zone, since there are areas that can be both residential and working places along with other meaningful classes.