The Semantic Web, Linked Data and their modelling language, Resource Description Format (RDF), have been around for over a decade now. It’s been a slow burner and take up by organizations outside of the media (BBC, Guardian) and academia has been slow. Why is this? Maybe it’s down to the perception that it is hard and also our entrenchment in 40 years of RDBMS and SQL. I’ve been involved in several projects (eg. Open PHACTS, GMDSP) where Linked Data has been used to free data from its silos and enrich its meaning and usefulness across and between disciplines and business domains. This post describes the basics behind RDF and a simple introduction to modelling a resource.
Linked data is simply a collection of facts which are expressed as triples, written in RDF, in the form “something has a property with a value”. This bit is really important and underpins the whole concept. Remember and repeat: “Something has a property with a value…………”
The something here is the resource in RDF. In RDF terms this translates as subject (something) has predicate (property) with object (value). You will hear the terms Subject, Predicate, Object a lot and will also come across their shorthand form S, P, O. Let’s try an example statement “Bob has blue eyes”. The subject/something is Bob. The property/predicate is eye colour. The object/value of the property is blue. We now have a triple:
Something: "Bob" Property: "eye colour" Value: "blue"
Someone else may want to talk about the properties for Bob so how can we ensure that we are all talking about the same person? In linked data if we want 2 things to be the same we need to use the same “Uniform Resource Identifier” or URI. You use URIs all the time when visiting websites although you probably use the term URL. URI is just a fancier term where the things that they describe do not have to be browsable or even exist on the web, although it is great if they do. Lets define a URI for Bob. We can use anything that uniquely identifies Bob. Could be a personal website, google plus page, ORCID identifier, github ID etc. It doesn’t really matter as long as it is something that we know uniquely represents Bob. Lets say it is a personal website http://example.com/bob. So now we have
Subject/something: http://example.com/bob Predicate/property: "eye colour" Object/value: "Blue"
Lets say we had all our friends represented in RDF and we wanted to find out who had Blue eyes. We could search for the predicate “eye colour”. What if someone used the US English spelling of “color”. Or spelled something wrong. Or used a more elaborate phrase. Just like the subjects, the predicates and the objects can be represented by URIs. Then we know that other resources described with these URIs are representing the same things. You can google to find URIs to use in your RDF and there are also lots of predefined ones from dbpedia, dublin core, rdf, rdfs, xsd, foaf, vcard and others. We call these collections of properties ontologies. You can search for them here with the top 100 searches at this site shown here.We will use 2 URIs from dbpedia.org which gives us
Subject: http://example.com/bob Predicate: http://dbpedia.org/ontology/eyeColor Object: http://dbpedia.org/page/Blue
The predicate is http://dbpedia.org/ontology/eyeColor and the object is http://dbpedia.org/page/Blue. The part which describes the collection of properties is known as the prefix
eg http://dbpedia.org/ontology/
. You can click on these links and look them up in your browser.
What about Bob’s name, we can’t expect humans or machines to know that it is Bob purely from the identifier http://example.com/bob
so we use another property from the foaf ontology.
S: http://example.com/bob P: http://xmlns.com/foaf/0.1/name O: "Bob"
Lets add another property for Bob's age
S: http://example.com/bob P: http://xmlns.com/foaf/0.1/age O: 28
We can also let people know that Bob is a person, rather than a cat, dog, chair, bicycle etc. In RDF we should enclose the URIs with angle brackets which would look like:
<http://example.com/bob> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>
It is more common to use the shorthand 'a' for type rather than http://www.w3.org/1999/02/22-rdf-syntax-ns#type
like this:
<http://example.com/bob> a <http://xmlns.com/foaf/0.1/Person>
Now we have a few triples which describe ‘Bob’ but if we want to share with someone else then we need them to understand our format. Luckily there are standard ways of ‘serialising’ RDF. We will use Turtle but others are available.
<http://example.com/bob> <http://dbpedia.org/ontology/eyeColor> <http://dbpedia.org/page/Blue> .
<http://example.com/bob> <http://xmlns.com/foaf/0.1/name> "Bob" .
<http://example.com/bob> <http://xmlns.com/foaf/0.1/age> 28 .
<http://example.com/bob> a <http://xmlns.com/foaf/0.1/Person> .
The full stops ‘.’ denote one set of triples. There is a lot of repetition here and turtle allows us to group sets of triples about the same subject.
<http://example.com/bob>
<http://dbpedia.org/ontology/eyeColor> <http://dbpedia.org/page/Blue> ;
<http://xmlns.com/foaf/0.1/name> "Bob" ;
<http://xmlns.com/foaf/0.1/age> 28 ;
a <http://xmlns.com/foaf/0.1/Person> .
The semi-colons ‘;’ say that the next predicate object pair are about the subject above it. In our example there are 4 statements about the same subject. Just like our earlier example, the full stop ‘.’ says that that is the end of our statements about a particular subject.
There is still a lot of repetition here and we can use shorthand about our URIs to make it easier to read.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dbpedia: <http://dbpedia.org/ontology/> .
<http://example.com/bob>
dbpedia:eyeColor dbpedia:Blue ;
foaf:name "Bob" ;
foaf:age 28 ;
a foaf:Person .
In this example wherever we see, for example, foaf
we mean http://xmlns.com/foaf/0.1/
. So, foaf:name
means http://xmlns.com/foaf/0.1/name