Clustering

Lots of confusing math I’m glad I don’t have to do

As each image dimension was analyzed I added it to an Excel spreadsheet. I was able to sort by one of the dimensions, export the file, and load the images into Processing in that particular sort order. This gave me the first glimpse that something was working since I was able to see a sorted list of images by redness, dissimilarity, or any other dimension.

The results were clearly sorted but the similarity was crude. I knew that getting the computer to really “see” the image would require clustering these dimensions together and sorting them based on all attributes. I didn’t know anything about doing that so I talked with the class about it and then went digging around on Google. Luckily I ran across the wonderful program Cluster, created by folks at Stanford University.

Cluster was designed to do gene analysis but really it can cluster any set of records based on various data points. It turned out to be exactly what I needed and allowed me to experiment with different techniques to see which one worked best. I tried out K-Means, K-Medians, and self-organizing map (SOM). I also ran principal components analysis (PCA) on my dataset, which seemed to let the clustering algorithms work faster and with better results. After comparing the sorted sets I chose to go with SOM since it was able to pull together some particularly tricky similarities that K-Means passed over. I may rerun the various analysis techniques in the future with different dimensional weighting, which Cluster allows for.

If you are interested you can download the data file [720K] that I used with Cluster. The unique reference to each image maps to the filename and you can get the images by downloading the zip file [80MB] of my Processing project from the explore section. If you do any additional clustering or analysis of these images I would love to know about it.

It Works!

Similarities start showing up

The results from clustering are much more interesting than the previous one-dimensional sorts. It’s particularly good at bringing images together that were taken by the same photographer of similar subject matter. However the clustering isn’t limited to these pairings and will often group objects with vary different content but some sort of similarity. Sometimes it’s hard to describe how the photos are similar but many times they just “feel” right.

Here are some examples of good similarity matches:

Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example Clustering Example