Transcript Analyzer
The Linked Jazz Transcript Analyzer structures digital archival documents for different purposes and identifies named entities in texts. In the context of the project, we use the Analyzer to upload interview transcripts from open access archives and to identify personal names cited in interview transcripts by leveraging the above-mentioned Linked Jazz Name Directory. The analyzer also employs natural language processing to locate names that are not present in the directory. In these instances, we relate the newly found names to URIs from the name authority files, or, if the name is not found in the authorities, we mint new URIs that we then host on the Linked Jazz namespace. Finally, the analyzer tool breaks interview transcripts down into discrete segments of questions and answers, which are later employed in the Linked Jazz 52nd Street tool.
Name Mapping & Curator Tool
The Name Mapping and Curator Tool was developed to support the creation of a directory of jazz artist personal names that was as extensive and accurate as possible. We started by creating an application which bootstraps names of jazz artists from DBpedia and then maps individuals’ URIs onto the Library of Congress Name Authority File and VIAF to include preferred and alternate names.
To further refine the directory resulting from the mapping process, we developed the Curator, a user-friendly interface on the front end of a heavily automated process. This tool allows for human curation of the directory, including the approval, removal, and disambiguation of personal names. Access the prototype files here.
Ecco!
Ecco! is a Linked Open Data application designed to disambiguate and reconcile named entities with URIs from authoritative sources.
Technically, Ecco! creates a wrapper around LOD APIs of suitable datasets such as VIAF and Freebase to retrieve data useful for supporting entity matching. The system automatically ranks and groups the results into different clusters according to various confidence levels — from exact matches to one-to-many or no matches.
The quality of the data output can be further refined through human disambiguation consisting of validating a match or identifying the correct URI when multiple matches are possible. Ecco! is designed to enable users to quickly and easily contribute to this curation process.
The system provides an intuitive user interface that supports a collaborative workflow where a community can work together in a distributed and incremental way. The combination of automated matching plus human curation has the potential to produce a superior quality of data, not currently achievable through traditional methods.
Ecco! was demoed at the 2014 International Conference on Dublin Core and Metadata Applications in Austin. You can view the poster here and read the abstract here.
View a demo install of Ecco! reconciling archival terms here: http://ecco.nypl-labs.biz/
Follow development here: https://github.com/thisismattmiller/Ecco-
More information, instructions and tutorials coming soon.
LodLive
Created by Diego Valerio Camarda (diego@lodlive.it), Silvia Mazzini (silvia@lodlive.it), and Alessandro Antonuccio (ale@lodlive.it), LodLive is a browsing tool that draws from a SPARQL endpoint. Starting with one resource (for example, Mary Lou Williams), the user can click to explore connected entities, which expand incrementally.
The Semantic Lab tools are available here.