Planet Venus

From Creative Commons
Revision as of 00:25, 1 March 2008 by Nkinkade (talk | contribs)
Jump to: navigation, search

Creative Commmons uses feed aggregation software to collect in one place CC blogs, jurisdictions blogs, and also blogs of people and organizations that are closely associated with or actively involved in the CC community.

The software we are using to do this is called Planet Venus, which is a major rewrite of Planet done by Sam Ruby. For more information and documentation on Planet Venus, please see the project's page.

CC has slightly extended Planet Venus via a few plugins. First of all, as licensing information is eminently important to CC, we have created a plugin that will pull license information from the feeds and make it available to the HTML templates. For an example of the output, take a look at the CC Planet.

get_license_name.plugin

This plugin can can be viewed/downloaded at the cctools subversion repository at sourceforge.net.

NOTE: the plugin only works for HTMLTMPL templates. XSLT and Genshi templates have full access to every feed element and therefore can extract licensing data directly.

The get_license_name.plugin requires a couple of semi-non-standard Python modules: Beautiful Soup and rdfadict. The plugin self documents to some extent via comments and in any case is not so big.

In order for the plugin to work, a small patch much also be applied to the Planet Venus code iteslf -- to a single file: planet/shell/tmpl.py. Most of the patch below is context and comments.

--- venus/planet/shell/tmpl.py  2007-12-21 19:24:02.000000000 -0800
+++ branches/production/software/planet/shell/tmpl.py   2008-02-22 15:35:18.000000000 -0800
@@ -120,6 +120,8 @@
     ['published', PlanetDate, 'published_parsed'],
     ['published_822', Rfc822, 'published_parsed'],
     ['published_iso', Rfc3399, 'published_parsed'],
+    ['license', String, 'source', 'links', {'rel': 'license'}, 'href'],
+    ['default_license', String, 'source', 'planet_default_license']
 ]
 
 # Add additional rules for source information
@@ -141,6 +143,15 @@
                     elif node.get('type','')=='application/xhtml+xml':
                         node['value'] = empty.sub(r"<\1 />", node['value'])
                 node = node[path]
+                      
+            # This is a special-case elif needed to grab license info from the
+            # feed data.  Normally node will be a simple list or dict, but in
+            # the case of license information, node is a list of lists, so we
+            # need to look inside the first item, which is where the license
+            # data seems to always be.
+            elif isinstance(path, str) and isinstance(node, list) and \
+                    path in node[0]:
+                node = node[0][path]
             elif isinstance(path, int):
                 node = node[path]
             elif isinstance(path, dict):
@@ -155,8 +166,20 @@
             else:
                 break
         else:
-            if node: output[rule[0]] = rule[1](node)
-        
+            # If this node contains license information, indicated by rule[0]
+            # being 'license' or 'default_license' (from list Items), then
+            # drop the the license URI into a variable that will be accessible
+            # by the template.  'default_license' is specified in the config
+            # of each blog, and can be used if no other license data is found
+            # in the feed itself.
+            if node:
+                if rule[0] == 'license' or rule[0] == 'default_license':
+                    output[rule[0]] = '<a about="%s" rel="license" \
+                        href="%s" title="License information">License</a>' \
+                        % (source.link, node)
+                else:
+                    output[rule[0]] = rule[1](node)
+
     # copy over all planet namespaced elements from parent source
     for name,value in source.items():
         if name.startswith('planet_'):

The patch also tries to make use of a custom configuration parameter called default_license. Since very few blogs will actually embed license data into a syndication feed in a machine readable way, it was necessary to provide a mechanism for this information to be supplied manually on a feed-by-feed basis. The plugin will first look for license information in the feed itself. If it doesn't find any then it looks to see if default_license is defined for the feed. If no license information is found either in the feed or in the config variable default_license then the plugin does nothing.

Here is an example configuration:

[http://somedomain.org/blog/feed/atom]
name = Some Blog Name
default_license = http://creativecommons.org/licenses/by/3.0

filter_categories.py

This Panet Venus input filter can can be viewed/downloaded at the cctools subversion repository at sourceforge.net. It is a very small Python script that allows you to filter blog entries based on categories. This is not totally dissimilar to one of the filters that comes with Planet Venus called xpath_sifter.py. However, as best I could tell, the xpath_sifter.py filter allows you to filter (or not) feed entries based on

Planet-wide include/exclude lists. We needed this basic functionality, but with category specifications at the blog level.

NOTE: unlike the xpath_sifter.py filter, this filter only implements an include-type policy. That is, you can't explicitly exclude anything, though you can implicitly exclude all you like. :) If you need to have implicit include *and* exclude filters then xpath_sifter.py may be better for you, as long as you don't need to include/exclude using different lists on a blog-by-blog basis.

The filter makes use of a new blog-level configuration parameter named filter_categories. Here is an example configuration:

[http://somedomain.org/blog/feed/atom]
name = Some Blog Name
filters = filter_categories.py
filter_categories = Creative Commons, CC, copyright
default_license = http://creativecommons.org/licenses/by/3.0

filter_categories is a comma separated list of categories that MUST exist in the feed entry in order for the entry to be included. It uses OR logic, so at least one of the categories must appear, though all or any will suffice.