Tag Archives: Python

Data Interlinking: Linked Jazz and Carnegie Hall

As linked data technology has developed over the last several years, the Linked Jazz project has continued to experiment — most recently interlinking our core jazz name entity list, derived from oral histories, with other jazz archival materials and their related metadata. Our research benefits from many ongoing collaborations, including that with Jeff Rubin and The Hogan Jazz Archive at Tulane University (our work identifying jazz relationships through historical photographs from Tulane University archives has been described here by William Levay), and with Gino Francesconi and Rob Hudson at the Carnegie Hall Archives. This post details a pilot we conducted to identify jazz musicians in both the Linked Jazz network and a subset of the Carnegie Hall Performance History Database focusing on jazz events from 1912-1955. From these entity matches, we created a visualization of the shared relationships between the two datasets. This first step in data interlinking allowed us to explore the possibilities as well as the limitations of the data integration process, and to identify common problems and best practices when reviewed alongside related use cases.

Continue reading

Enriching the Linked Jazz Name List with Gender Information

Inspired by Judy Chaikin’s “The Girls in the Band”, a documentary spotlighting the lesser-known history of women in jazz, Linked Jazz set out in 2014 to amplify the stories of jazz women by processing more interviews with female jazz musicians. A result of this activity was that the percentage of women in our list of people mentioned in interviews seemed to grow at a more rapid pace than previously. The list until then had been overwhelmingly men.  We wondered: Could we preliminarily assume that jazz women mention other women in the context of their lives and careers more often than men in jazz mention women? This was more a tangential observation for us than a formal research area to pursue. But we realized adding such attributes to our list of names could enable new discoveries for users. Enriching our dataset of 2000+ names with gender information became Linked Jazz’s first attempt to create a data mash-up with other open sets of data that provide semantic definition.

Continue reading

Connecting Musicians through the Photo Archive

The Linked Jazz project has derived most of the social relationships in its dataset from the transcripts of oral histories given by jazz musicians. One question we began to ask some time ago is: what other jazz historical material in digital form would be a good source of additional relationship data? One answer to that question is digitized photographs, specifically those with good-quality metadata.

Tulane University has a rich collection of historical photographs of jazz musicians living and performing in New Orleans and around the world. The Hogan Jazz Archive Photography Collection and Ralston Crawford Collection of Jazz Photography are two such collections, and we received two tab-delimited text files from Tulane, exported from their CONTENTdm system.

Some numbers: in this set we have 1,787 images, at least 681 unique individuals, and more than 2,700 depictions. Depiction is the FOAF term that we later used as a predicate in our triples from this dataset. One group photograph might depict several individuals, and one individual might be depicted in several photographs. People depicted in the same photograph can be said to “know” each other in some way.

In this post, we’ll describe the process we used to first standardize and reconcile the photograph metadata, and then describe the photographs and the people and relationships depicted using RDF triples. Continue reading