(This article is now three years old. Our latest research on machine learning for art valuation is available in our 2020 research article “Can Machine Learning Predict The Price Of Art At Auction?” originally published by MIT press in the Harvard Data Science review.)

Since starting Artnome, I’ve been lucky to talk to a lot of really smart folks about the intersection of art and data, perhaps none more brilliant than Ahmed Hosny. I sat down with Ahmed over pizza to talk about his project “The Green Canvas” which explores art valuation analytics using machine learning and linear regression.

Before I get into the interview, a bit of background on Ahmed. When Ahmed is not busy hacking cancer with deep learning, designing art museums in China, developing image-guided surgical equipment, or advancing 3D printing at MIT Media Lab, he enjoys flying small planes and practicing Mandarin (one of four languages he speaks). Despite these incredible accomplishments, Ahmed is also humble, down to earth, and a great conversationalist. I hope people enjoy reading about the conversation as much as I did having it.

Jason Bailey (JB): Ahmed, really excited to have you as the first guest in our Artnome interview series! It sets a high bar.

We have had a chance to chat twice now over dinner, and I found both conversations to be some of the most rewarding and enjoyable I’ve had in recent memory. You have done a lot of really interesting things - could you share with us some background on your "Green Canvas" project which explored the use of machine learning for art valuation?

Ahmed Hosny (AH): Thanks Jason, sure thing. The Green Canvas project aimed at studying art valuation with a specific focus on paintings. We were interested in quantifying aesthetics as an extremely subjective and quality-based feature as well as exploring the middle realm between artistic evaluation and scientific statistics. How do we evaluate paintings? Will there be any interesting relationships between price evaluation and pixels?

JB: Cool! Those are great questions and they really resonate well with Artnome’s interests in exploring art through data. Can you give us some detail around how you set up the project? For example, I am curious about the sample size for your analysis and what data sources you used.

AH: All data was acquired from the Blouin Art Sales Index website. We tried to gather a representative non-bias sample of data including various artists, styles and mediums. We analyzed 35,407 paintings at a total valuation of $9,366,754,845. Prices included a maximum of $119,922,500, an average of $264,545 and a minimum of $3.

JB: Did any trends emerge during the initial data exploration stage?

AH: Yes, there were many. Some of the trends explored included:

Paintings produced in the 1960s recorded the highest sales. This coincides with the many artistic impulses that began to gain momentum during that period, including the explosion of consumerism and popular culture.
Paintings with whites, grays, and blacks as dominant colors are most likely to have high sales values compared to other more saturated colors.

Paintings with whites, grays and blacks as dominant colors are most likely to have high sales values, compared to other more saturated colors

Paintings where low corner percentages (or less edge intersections) are detected are also more likely to have high sales values.

Harris Corner Detection was used to detect the corner in the painting.

Auctions of valuable pieces tend to coincide with successful exhibitions.

JB: Was the analysis market-wide? Were you able to improve results by focusing in on a single artist or group of artists?

AH: We realized early on that we were dealing with way too many variables and that we had to narrow down our dimension space by looking at a strategically drawn subset of the data. In an attempt to develop a machine-learning platform for pricing artwork, we created a linear regression model specifically fit for paintings by Spanish painter Pablo Picasso. Building a model on a single artist's work meant that we eliminated any price trends directly related to the author. Picasso was the obvious choice here given the shear amount of his well-documented works. We used a set of 4,000 paintings for training and another equal set for testing. Our model reached a prediction score of 0.58, measured as the correlation between true and predicted prices. Using a single log on the price value gave the most optimum results. We also built separate regression models based on single parameters as predictors. We noticed that the ratio of unique colors alone generated a relatively high correlation of 0.46 between predicted prices and actual prices.

JB: It has been a few years since you first executed Green Canvas in 2014. If you were to revisit the project today, would you change your approach at all?

AH: This project was done using a combination of hand-crafted features + traditional machine learning. In this case, an "expert" hypothesized on what features in the artwork could be related to its price. We then "hand-crafted" equations that describe these features -include things like how bright the artwork is or how edgy it seems. These features were then used to build a machine learning model. It happens to be a linear regression model in our case, but could very well be based on random forests or support vector machines - aka "traditional machine learning". This was probably state of the art till the early 2010's, but not anymore. Today, deep learning is in. If I were to revisit this project, I would definitely eliminate much of this guess work by using deep learning. Instead of fitting a model to the data, deep learning learns feature representations from example data automatically and can hence learn very complex non-linear relationships. With both the shear amount of data and massive processing power we have at our disposal today, deep learning has become the defacto method for many applications.

I am sure you have been following the recent media craze over artificial intelligence and deep learning. What they don't tell you is how difficult it is to train them. I have been using them to predict disease prognosis from medical images for a couple of years now. There is very little theory as to how and why these networks work. In the healthcare space, they call them "black box medicine". As a result, training deep learning networks is more of an art that relies on empirical knowledge. Back to art, if (and only if) there is some sort of connection between any feature in the artwork and its price, then these networks would be able to identify it.

JB: So in theory, would deep learning be able to to differentiate the artists by the style? And then eventually price the work, as well?

AH: The network has no understanding of the real world obviously, so it might learn that a certain style (by some artist) is very expensive art. If we were to ask it to predict the price of an artwork of the same style but made by me (a no one in the art world), then it would be completely overpriced... and make me rich. There are ways to feed metadata about the artwork to the network and allow it to know more about the piece than just how it looks.

JB: What role could something like the Artnome database play in a model like this?

AH: These networks need lots of data: curated, clean data. Because of Artnome’s focus on data acquisition and analytics, I am sure you know what I am getting at. Curating data is expensive and time consuming - so there is an up-front investment there. If you are building a prediction system for modern art, how big of a modern art dataset do you have? Eventually, there is no guarantee if the network will learn the correct representative features and give good predictions, so it is a gamble. As trends and taste changes, these networks will need to be constantly retrained to reflect that - just something else to keep in mind.

JB: For sure. I made an attempt at building an MVP (minimum viable product) using machine learning for predicting auction values and came to the conclusion that much more data was needed for it to be successful. For that reason, data acquisition has become the primary goal for Artnome, with prediction modeling taking a back seat for a while. What other items would you take into consideration for building a predictive model?

AH: Who would use it? Is it an app for your average "artist" who works on art during the weekend or a more seasoned artist? Would they be "okay" with some algorithm pricing their art?

I used to be an architect in a former life and I remember that I always thought that my creations were "priceless" to some extent. You probably don’t consider stuff on etsy.com as art - but I wonder if they have some algorithm that helps creators price their creations simply by getting predictions from algorithms trained on previously traded pieces. I also wonder how much of these algorithms were trained using an image of the artwork itself vs. metadata such as material, size, cost of material, artist, year etc.

JB: All good questions. I wonder, Ahmed, as a technologist, what other emerging tech trends do you think will have an impact on art and the art market?

AH: I encourage you to also look at blockchain-based technologies - very very hot right now with Bitcoin price through the roof and the emergence of a new way to fund your venture: ICO's or Initial Coin Offerings. An application in art would be cool. You could create an immutable record of art transactions around the world based on cryptocurrencies. Transactions in these records could also be completely anonymous.

You could also create a high-frequency trading platform of art where art is sold and bought just like stock - don't know if that exists and if there is any utility in it.

JB: I think it does exist. I believe a company called Verisart is looking at certifying and verifying artworks and collectibles using the Bitcoin blockchain.

AH: An interesting application of blockchain - pretty sure we will see many more of these next year.

JB: Maybe it’s cheesy, but I listen to a lot of Tim Ferris interviews. One of my favorite questions he asks is what book or books folks would recommend reading. I’m stealing that question, but it does not have to be a book (can be an article, podcast, etc.). What do you recommend readers check out, especially if someone wanted to learn more about technologies like machine learning, deep learning, blockchain, etc.?

AH: Not cheesy at all. I definitely recommend the conveniently titled "Deep Learning" book by Ian Goodfellow - It includes both the high level intuition and the technical details. As a designer, I am also intrigued by deep learning applications in that area. I recently came across two interesting applications both intended to map and organize data: One for typefaces and the other for fashion.

JB: Great! I am looking forward to reading those, and I am sure the Artnome readers will find them interesting, as well.

Ahmed, thank you so much for being so generous with your time! This has been a really fascinating conversation.

AH: Great meeting you again, and thanks for the dinner!

JB: My pleasure!

To learn more watch the video on the Green Canvas project below.

Blog