The islands of music
playground demonstration is something like a tag cloud where similar tags are located close to each together. The map was created using clustering algorithms
These algorithms group similar music on islands. Similar islands are placed close to each other. For example, various flavours of metal are located close to each other in the upper right of the map. The map also suggests several more or less continuous transitions. For example, there is a path from folk
to doom metal
, progressive rock
, and progressive metal
Another somewhat curious example is the sea of mistagged artist
where various flavours of non-English world
music can be found. Generally, not all clusters make sense and part of the explanation is that there is plenty of noise in the data.In more technical terms:
The map is a self-organizing map
of 13,000 randomly sampled Last.fm users labelled with tags and artists associated with each user.
Each of these 13,000 users is described with a tag cloud which is extracted from the music the user listens to. This data is normalized in a similar way as described here
. One consequence of this is that a large part of alternative indie rock pop
is averaged out.
After all the normalization and pre-processing the 13k users are represented by 2000 distinct tags resulting in 13k sparse vectors
in a 2k-dimensional tag vector space
Using singular value decomposition
(SVD) the dimensionality is reduced
to 120 dimensions. This 120-dimensional space is a latent semantic space
in which no distinction is made, for example, between brazil
clustering 400 prototypical users are computed. Users very close to the zero-vector are not considered for further analysis. (Given the normalization and the latent space mapping, these zero-vector users can be interpreted as either very average users, or so unique that they can’t be described within the 120-dimensional space.)
Using a self-organizing map
(SOM) the latent space is mapped to a 2-dimensional visualization space. The SOM has a size of 20 rows and 40 columns. A smoothed data histogram of the SOM is computed and visualized so that clusters show up as islands.