Geometric Apple Core, Claes Oldenburg and Coosje van Bruggen, San Francisco Museum of Modern Art, 2016

Dataset Overview

What’s in our data?

The dataset consists of detailed information about 18 museums and 10,000 artists whose works are featured in their collections. The data captures several dimensions for each museum and artist, including collection details and artist details of name, gender, ethnicity, regional origin, and estimated birth year in the form of decade. Including such details enables our analysis related to diversity and representation within the museum collections.

What does our data illuminate?

Our dataset underscores the disparities in how major U.S. museums represent artists across various identities in dimensions of gender, ethnicity, regional origin, and birth decade. This observation prompted us to investigate the discriminatory selection processes that have shaped the current landscape, where white and male artists dominate museum collections. Additionally, it enables us to identify potential patterns or trends in artist diversity and to connect these findings to broader social and historical contexts.

Data Author

CT (Chad Topaz, author) was supported by funding from the Williams College Office of the Dean of Faculty, Science Division, Davis Center, and Department of Mathematics and Statistics. The funders had no rule in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Explore the Full Dataset

Data Critique

What our dataset cannot reveal?

All statements about the artists’ demographics in the dataset are limited to individual and identifiable artists. This means the coverage of data collection varies from different museums. For example, some long-standing museums or museums with a large range of collections such as the Museum of Fine Art Boston possess artworks from Egypt, Greece, and other regions with unidentifiable artists. For antiquities, what we perceive as art today might be functional back in their original context. Therefore, the dataset is restricted to known sources and does not reflect the diversity of artists completely. With this limitation in mind, we focus our research on museums known for their modern and contemporary collections with less uncertainty in artist information.

Crowdsourcing methodology

The dataset notes that “Not Inferred values indicate that we were not able to confidently determine the value based on our crowdsourcing approach”, revealing that the original data was gathered through a crowdsourcing methodology. This raises questions about the reliability and consistency of the dataset, as crowdsourcing often relies on varied contributors who may have differing levels of expertise or biases. Furthermore, the inability to confidently infer certain values suggests limitations in the depth or accuracy of the source information, potentially impacting the dataset’s overall credibility and the validity of any insights drawn from it.

Insufficient granularity

The dataset lacks granularity, with key information missing that limits its usefulness. For instance, it does not include the specific dates when artists were admitted to the museum. Although the data categorizes artists by birth decades, it lacks their ages at admission—a critical detail for analyzing trends. Knowing the exact year of admission would allow for a clearer understanding of whether museums have become more intentional about inclusivity over time. This would enable stronger, more confident conclusions about diversity improvements during specific periods. Furthermore, the dataset only lists artists’ countries of origin without specifying cities or regions, which further diminishes its contextual depth. These omissions hinder a comprehensive analysis of the artists’ backgrounds and the timeline of their representation in museums.

Our Data