Skip to main content

Entity Reconciliation Practice

Introduction

There are three steps to reconciling entities in your data to authority files:

1) Gather context about your entity

Review your source data to see what facts you know about the entity you are trying to reconcile. This could come from structured data like metadata records or sentences that mention the entity from a text document.

2) Find candidate matches

There are many approaches to this step. You may use a tool like OpenRefine that suggests potential matches. If the tool does not suggest a correct match, or you are working with an authority file that the tool does not connect to, then you will likely need to go to the external data source directly and use their search functionality.

3) Review the candidates and decide if one is a match

You need to review the context you have about the entity in your source data and compare it to the information about the candidate in the external data source. Does the evidence match well enough to claim they represent the same entity?

Let’s walk through examples for different types of entities. Try to think of your answer to each problem before looking at our suggestions.

People

Wikidata

Charles O’Brien

Entity

Source:

Name from Metadata:

  • Charles O’Brien

Excerpts:

“September 1998 — Hilary Mantel’s eighth novel, The Giant, O’Brien, published by Fourth Estate, traced the hapless career of a historical figure, the Irish Giant, a man who measured a little under eight feet tall.”

Charles Byrne travelled from Ireland to London at the end of the eighteenth century to put himself on display as a freak or monster. Though he took ingenious steps to try to keep his body out of the hands of dissecters, he was indeed dissected in the end by the famous Scottish surgeon and scientist Dr John Hunter.”

Candidates

We start by searching Wikidata using the person name in our metadata, Charles O’Brien. We do a quick scan of the results and the Wikidata description of each one. We find a British colonial governor, a cricketer (1921-1980), and a composer (1882-1968). None of the candidates look correct.

Let's move on and search for the name mentioned in the text, Charles Byrne. We see an Irish painter, an animator who worked at Disney, and an Irish entertainer. That one is promising, so we will visit the page and investigate.

Review

Let's review the evidence in Wikidata for this candidate: http://www.wikidata.org/entity/Q1063865

Date of death:

  • Listed as 1783, which would correctly make him a historical figure in September 1998 when Hilary Mantel wrote about him.

Medical condition:

  • Gigantism, which is strong evidence for a match.

Occupation:

  • Circus performer, which is again strong evidence for a match.

Notable people’s Wikidata pages often link to their Wikipedia pages. In this case, the linked page explicitly references Hilary Mantel’s novel The Giant, O’Brien.

Match Confirmed

Mary Fane

Entity

Source:

Name from Metadata:

  • Mary Fane

Excerpts:

“Frances Neville, Baroness Abergavenny, died, having entrusted her collection of prayers to her daughter on her deathbed.”

“Frances Neville, Baroness Abergavenny had one child (or one surviving child), a daughter, Mary (Nevill) Fane, born on 25 March 1554, to whom she entrusted her collection of prayers.View reference”

Candidates

All we know about this person is their name and their mother. See if you can find Mary in Wikidata? Can you think of a faster way by using the mother relationship?

Review

Instead of reviewing every Mary Fane in Wikidata, it would be faster to find the famous mother, Frances Neville, Baroness Abergavenny, for which there is only one match in Wikidata. On the mother’s page, we find the child property and click through to Mary Neville, Baroness le Despenser.

On Mary’s Wikidata page, we see:

  • The alternative name Frances Nevill is a match to the name in our excerpt.
  • She married a Thomas Fane, so the last name Fane in our metadata makes sense.
  • We know that the mother is a match.

Match Confirmed

Note that if your data has many known family relationships, you can speed up your reconciliation using SPARQL queries to find the family members of your already reconciled entities.

Adelaide Manning

Entity

content

Candidates

content

Review

content

Adolfo Suárez

Entity

content

Candidates

content

Review

content

VIAF

John Venn

Entity

content

Candidates

content

Review

content

John Colet

Entity

content

Candidates

content

Review

content

Hunton Addie W.

Entity

content

Candidates

content

Review

content

Locations

GeoNames

Place name

Entity

content

Candidates

content

Review

content

Nationality

Place name

Entity

content

Candidates

content

Review

content