Difference between revisions of "Reuse tracking"

From Creative Commons
Jump to: navigation, search
(Proposal 1: Independent Refback Tracking)
(Proposal 1: Independent Refback Tracking)
Line 4: Line 4:
  
  
= Proposal 1: Independent Refback Tracking =
+
When a user opens a webpage, the browser sends some information to the
 +
server.  Of particular interest is the /referrer string/.  To put it
 +
simply, the referrer string contains the URL of the webpage that
 +
linked the user to the page they are currently viewing.
  
The Refback Tracking framework could be served independently from CC, having the chief advantage of having no hosting or maintenance cost to CC.  The chief disadvantage of this approach, is that it can only capture immediate derivativesFor larger organizations, this will still yield useful data; but would this not be terribly useful for smaller organizations and individuals.
+
The Refback Tracking framework would be hosted by respective content
 +
providers, and served independently from CC.  This advantage means
 +
that once a working system is prototyped, it would have no hosting
 +
(potentially none)cost for us, and therefor require the minimal
 +
amount of maintenance.
  
[[File:Proposal_1a.png]]
+
The disadvantage to this approach is it is only able to trace direct
 +
remixes of a work, but not remixes of remixes.
  
[[File:Proposal_1b.png]]
+
 
 +
Here is how it works:
 +
 
 +
[ picture a ]
 +
 
 +
The above picture describes the sequence of events that triggers the
 +
tracking mechanism, shown from the user's perspective.  The steps are
 +
like so:
 +
 
 +
'''1.''' The user opens Website A.  Website A contains a remixed work.
 +
The work provides proper attribution to the work which it is derived
 +
from, both visually for the user and invisibly with metadata.
 +
 
 +
'''2.''' The curious user clicks on the link to the original work, and
 +
is taken to Website B as expected.
 +
 
 +
Here is what happens behind the scenes:
 +
 
 +
[ picture b ]
 +
 
 +
'''1.''' The user opens a website.  The user's browser requests a page
 +
from a server.  The website has a remixed work on it, and is
 +
attributed with metadata.  The server replies to the user's request
 +
with the website.
 +
 
 +
'''2.''' The curious user clicks on the link to the original work.
 +
The user's browser sends a request to the server hosting the page of
 +
the original work.  This request's referrer string contains the url of
 +
the webpage with the remixed work on it.  The server replies to the
 +
user's request as expected, and takes note of the url in the referrer
 +
string (this can happen either using javascript embedded in the page,
 +
or with special code running on the webserver itself).
 +
 
 +
'''3.''' The server hosting the original work downloads the page of
 +
the remixed work (as noted from the referrer string).  The server of
 +
the remixed work replies as expected.  The server hosting the original
 +
work reads the metadata on the download page to verify that it indeed
 +
contains a remixed of the original work.  The server notes the url in
 +
a database, to be used for generating reuse statistics.
  
 
= Proposal 2: Hosted Refback Tracking =
 
= Proposal 2: Hosted Refback Tracking =

Revision as of 18:53, 9 August 2012

RDF metadata presents information about a work's ancestry in a machine readable way. The websites of users who properly used our license chooser tool already have this setup. While it is possible to trace backwards to find a derived work's source, it is impossible to trace forwards to find all of a source work's derivatives without the aid of extra infrastructure.

In this page, you will find proposals for several ethical (respects user's privacy, does not involve radio-tagging people with malware or drm) solutions to this problem. These may either be systems that Creative Commons would prototype with the intention of being a reference for other organizations to build their own infrastructure; or systems that we would build an maintain our self, and provide an api to interested parties (either free in the spirit of open, or for a small fee to help offset hosting costs). All of the proposed systems below have their own advantages and disadvantages; none of them the silver bullet.


When a user opens a webpage, the browser sends some information to the server. Of particular interest is the /referrer string/. To put it simply, the referrer string contains the URL of the webpage that linked the user to the page they are currently viewing.

The Refback Tracking framework would be hosted by respective content providers, and served independently from CC. This advantage means that once a working system is prototyped, it would have no hosting (potentially none). cost for us, and therefor require the minimal amount of maintenance.

The disadvantage to this approach is it is only able to trace direct remixes of a work, but not remixes of remixes.


Here is how it works:

[ picture a ]

The above picture describes the sequence of events that triggers the tracking mechanism, shown from the user's perspective. The steps are like so:

1. The user opens Website A. Website A contains a remixed work. The work provides proper attribution to the work which it is derived from, both visually for the user and invisibly with metadata.

2. The curious user clicks on the link to the original work, and is taken to Website B as expected.

Here is what happens behind the scenes:

[ picture b ]

1. The user opens a website. The user's browser requests a page from a server. The website has a remixed work on it, and is attributed with metadata. The server replies to the user's request with the website.

2. The curious user clicks on the link to the original work. The user's browser sends a request to the server hosting the page of the original work. This request's referrer string contains the url of the webpage with the remixed work on it. The server replies to the user's request as expected, and takes note of the url in the referrer string (this can happen either using javascript embedded in the page, or with special code running on the webserver itself).

3. The server hosting the original work downloads the page of the remixed work (as noted from the referrer string). The server of the remixed work replies as expected. The server hosting the original work reads the metadata on the download page to verify that it indeed contains a remixed of the original work. The server notes the url in a database, to be used for generating reuse statistics.

Proposal 2: Hosted Refback Tracking

This approach is a variation of the first proposal, where CC runs an instance of the serverside component described in the first proposal. The advantage, is that if the service sees significant adoption, the aggregated data could be used to construct a better tree of derivative work. However, this advantage is only possible if use of this service is sufficiently widespread. Otherwise, this proposal has the same disadvantage as the first version, with the additional cost and liability to CC as it would lack the advantage to the first version.


Proposal 3: Hosted Scraped Data Api