Difference between revisions of "RDFa"
|(2 intermediate revisions by one other user not shown)|
Latest revision as of 07:03, 28 October 2013
RDFa is a way of expressing RDF in XHTML. Creative Commons uses RDFa to express license and other information about works for the semantic web. When you select a license in our license chooser, you are given a snippet of HTML that contains RDFa. Sites like Thingiverse have implemented RDFa across their platform so that every object uploaded expresses semantic information about itself to machines.
By using RDFa, Creative Commons is helping build the semantic web. Here are some frequently asked questions which can help you understand RDFa and the semantic web.
- 1 What is the point of the semantic web?
- 2 What is RDF?
- 3 What is an RDF triple?
- 4 What is RDFa?
- 5 Why does CC use RDFa?
- 6 How does CC use RDFa?
- 7 What does it mean that RDFa is a W3C recommendation?
- 8 How do I use RDFa?
- 9 Content discovery
What is the point of the semantic web?
Traditionally, machines have had a very poor understanding of what humans are actually talking about when they create content. Computers may understand that a file is a text file or another file is an image file, but they typically don't understand what knowledge, or semantics, are expressed inside that file. This has been remedied to some extent by metadata such as tags or EXIF data.
But to a machine, the web is merely a heap of human-readable-data that needs to be classified by seemingly arbitrary rules. Google and other search engines have engineered meaning out of how humans create HTML documents. By interpreting a link expressed in HTML as a semantically meaningful "vote" for another page, Google can accurately value and rank all linked pages on the web.
While incoming links are good semantic indicators of value (or popularity) of a page, web pages should be able to express much more sophisticated statements to machines. Currently, web pages only express sophisticated statements to humans, but this can change.
Furthermore, humans should have the ability to query these knowledge statements in sophisticated ways. Just think if you could ask a question to Google about how many US presidents were born in Kentucky and receive an answer not because Google had somehow magically interpreted what you meant, but simply because Google maintained a database of statements about presidents and Kentucky that was trivially easy to query?
The vision behind the semantic web is that storing and retrieving information on the web should not require machines parsing human language but rather machines parsing machine language. The web has the potential to offer massive amounts of structured information in conjunction with the massive amounts of unstructured information that already exists and if we are careful to create the proper standards and platforms that can render these statements, then the semantic web will become a reality.
What is RDF?
RDF stands for "Resource Description Framework" which is not a particularly informative acronym. Put simply, RDF is the way information is expressed semantically on the web. RDF is constituted by triples which are subject-predicate-objects statements. This lets machines understand human knowledge statements. When many triples are aggregated they are stored in what's called a "triple store." By querying a triple store, we can learn information that might otherwise be hard to gather by just browsing the web with our eyes. As the semantic web evolves, it will become easier to query triple stores using natural language, and hopefully, discover things we wouldn't have ordinarily.
What is an RDF triple?
An RDF triple can formed from any semantically meaningful statement. Here are some examples:
<This website> <has> <a wiki>. <That photo> <is licensed under> <a Creative Commons Attribution license>.
Depending on the format chosen, RDF triples can live inside an XML or XHTML file.
What is RDFa?
RDFa is a way of expressing RDF triples inside XHTML pages using span tags.
This makes it much easier for people to casually express semantic information in conjunction with a normal web page. While there are many ways to express RDF (such as in serialized XML files that live next to standard web pages), RDFa helps machines and humans read exactly the same content.
Why does CC use RDFa?
Creative Commons licenses are expressed in three different formats: the human readable license deed, the lawyer readable legal code, and the machine readable code. RDFa is one of the ways in which we've chosen to make our licenses machine readable. By using RDFa CC licensed objects can be discovered by search engines and auto-discovery mechanisms without the need for a human to hand-curate content directories or lists.
Moreover, by using RDFa CC has made our licenses and their meta data compatible with a larger movement towards the open standard of a semantic web.
How does CC use RDFa?
When creators fill out the "Additional Information" section of our license chooser form, they are given a snippet of XHTML code that contains an image badge, a link to our license, some text, and some span tags. Inside these span tags, RDFa is expressed.
Let's take a look at some example code to learn more about RDFa:
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/"> <img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by/3.0/us/88x31.png" /></a><br /> <span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/Text" property="dc:title" rel="dc:type">RDFa FAQ</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="www.example.com" property="cc:attributionName" rel="cc:attributionURL">John Doe</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/">Creative Commons Attribution 3.0 United States License</a>.<br />Based on a work at <a xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://wiki.creativecommons.org/RDFa" rel="dc:source">wiki.creativecommons.org</a>.<br /> Permissions beyond the scope of this license may be available at <a xmlns:cc="http://creativecommons.org/ns#" href="http://moreperms" rel="cc:morePermissions">http://moreperms</a>.
Breaking Down RDFa
dc represents "Dublin Core" which is one of the oldest vocabularies or schemas on the semantic web. The full dublin core is available here. It allows one to express typical things like a work's title, or its date. Here you can CC using this to express the work's title which I've fictitiously named RDFa FAQ.
<span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/Text" property="dc:title" rel="dc:type"> RDFa FAQ</span>
CC is using a XML name-space abbreviation or "xmlns" for short. This enables CC to use shorthand to refer to the dublin core schema. Instead of having to repeatedly state a URL such as http://purl.org/dc/title to reference the title property, we can simply use dc: and achieve the same thing. This principle works across all RDF.
The dc:type references the Dublin Core definition for "Text" using the rel tag. Rel is used to specify the attribute's relationship to another resource. When rel is used inside a anchor tag, it is specifying that the document at the URL in the href attribute is of a particular relationship to the work. In this case that relationship is the Dublin Core "Type" attribute, and the href is the DC Text specification. Another document Type that could be specified with the href is http://purl.org/dc/dcmitype/StillImage or http://purl.org/dc/dcmitype/Sound, depending on the medium being licensed.
In this case, CC is using its own XML namespace, abbreviated using cc:
<a xmlns:cc="http://creativecommons.org/ns#" href="http://www.example.com" property="cc:attributionName" rel="cc:attributionURL">John Doe</a>
Again, the property is CC's AttributionName attribute, the value is the content inside the anchor tag (in this case, the fictitious John Doe), and a relationship of cc:AttributionURL is defined as being http://www.example.com. <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by/3.0/88x31.png" /></a>
This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution 3.0 Unported License</a>.
Similar to dc:title, dc:source specifies where the original source of the file is located.
Based on a work at <a xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://wiki.creativecommons.org/RDFa" rel="dc:source"> wiki.creativecommons.org</a>
In this case it is pointing to this document.
Finally, as part of the CC+ protocol, creators can specify a URL where re-users of CC licenses can obtain more rights to the work. Visit the CC+ page for more information and examples of CC+ in action. Here, the nonexistent URL of http://moreperms is used as a placeholder.
Permissions beyond the scope of this license may be available at <a xmlns:cc="http://creativecommons.org/ns#" href="http://moreperms" rel="cc:morePermissions">http://moreperms</a>.
CC licenses scrape RDFa metadata from referring pages in order to create a "live" box on the Deed page that suggests how to attribute the work: This will allow reusers to easily and properly attribute CC licensed works using RDFa.
What does it mean that RDFa is a W3C recommendation?
The World Wide Web Consortium agrees to recommend certain standards for helping the web run smoothly. One of those recommendations is that RDFa be used to expressed license information on the semantic web.
How do I use RDFa?
If you're a creator using CC licenses
Make sure to fill out the "Additional Information" box when selecting your license.
If you're a user of CC licensed works
Make sure to copy the attribution XHTML from the license deed if it is available. Otherwise, feel free to roll your own RDFa span tags.
I'm a developer, how do I implement RDFa into my platform?
Implementing RDFa into your platform shouldn't require much more work than implementing CC, it's merely a matter of adding a couple more print statements inside the RDFa framework detailed above. Just make sure you swap out John Doe's name with the user's name, and fill in the proper license URLs.
A key use case for RDFa is the annotation of resources included or embedded in web pages. Existing annotations apply to the current document. For example, http://example.com/foo contains
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/">cc</a>
To specify that bar.jpg is licensed, even under a different license, we can qualify the link with an about attribute:
<a about="/bar.jpg" rel="license" href="http://creativecommons.org/licenses/by-nc/3.0/">cc</a>
Found in http://example.com/foo this says http://example.com/bar.jpg is licensed under CC BY-NC 3.0.
See Qualifying Other Documents and Document Chunks in the RDFa primer for more examples.
The RDFa highlighter bookmarklet provides visual cues for statements about included resources.
The RDFa Triples Lister (for the Google Chrome browser) can help one identify the triples listed on a particular page.
RDFa is used or supported in the following CC tools:
- License Deeds
- MozCC (see What's New in MozCC 2)
- RdfaDict is a Python RDFa parser
- Wikipedia article
- RDFa @ the W3 Wiki