At the heart of our work are oral histories of jazz musicians. We are currently expanding our data sources to include other document types and music-related datasets.
In the first phase of the project — funded through an OCLC/ALISE Library and Information Science Research grant — we experimented with the actual content of interview transcripts, generating new triples based on data rather than converting existing metadata. The 50+ interview transcripts we’ve used to create LOD come from the Rutgers Institute for Jazz Studies Archives, Smithsonian Jazz Oral Histories, the Hamilton College Jazz Archive, UCLA’s Central Avenue Sounds series, and the University of Michigan’s Nathaniel C. Standifer Video Archive of Oral History. For a full list of transcripts, see our Data Sources page.
To create LOD, we developed a suite of tools: a transcript analyzer, a name mapping and curator tool, and a crowdsourcing tool. These tools operate together to find names mentioned during the interview in order to assign a positive identification to each, disambiguating names using online resources like DBpedia and VIAF. The transcript analyzer also recognizes the question and answer structure of the oral history. As people are mentioned by an interviewee, simple RDF triples between interviewee and persons mentioned are created in the form of knowsOf. These triples can then be mapped to the correlating block. For more on the transcript analyzer and Name Mapping tool, see our Tools page.
In the next step, the interview question and answer blocks are passed to Linked Jazz 52nd Street, our crowdsourcing tool. Volunteers are presented with these snippets of interview text and asked to assign more granular terms to describe the relationship between the interviewee and the person mentioned. These relationships include:
has met
is an acquaintance of
is a friend of
is a close friend of
is influenced by
is a mentor of
collaborated with:
was in a band together with
played with
was a member of the band of
toured with
was the bandleader for
In the last step of our data lifecycle, we pass the triples to a network visualization. At this moment, the visualization shows the knows of triples, not the triples generated by 52nd Street. It also displays images, videos, and short biographies of jazz musicians within the network. Although we have a much larger master list of musicians including names culled from sources like DBPedia, MusicBrainz, and more, only musicians who were mentioned in transcripts are included in the network visualization. The size of a person’s node reflects how many times they were mentioned in various oral history transcripts. More information about our source data and the data we’ve produced is available under the Data menu of this site.
Future Directions
We continue to explore the possibilities of applying LOD to the jazz domain and beyond. We plan to move the tools we have developed from prototype to full production so they can be adopted widely regardless of context or domain.
We are currently working with jazz archives from Carnegie Hall and Tulane University to integrate and enrich our RDF dataset with new semantic layers. We have been recently focusing on adding gender data to the dataset as a preliminary step towards a broader representation of women in jazz (Pattuelli, Hwang and Miller, 2016). More data additions are underway to represent various professional and social facets of the jazz community.
Learn more about ongoing projects from our publications and presentations, and from our Work In Progress posts.
you are interested in collaborating with us, please get in touch!