Clarity Project

From Creative Commons
Jump to: navigation, search

Not too long ago we worked on the "Sanity Overhaul", a rewriting of the CC website engine and underlying components (cc.i18n, cc.license, cc.licenserdf, etc). While we all seem to agree that it's been a massive improvement, it feels like there are a lot of loose ends to tie up with the project. This page's purpose is to document what those things are, and how to improve them.

We've made things (a lot more) sane. Now it's time to make them a lot more clear.


Exceptions suckiness

Exceptions suck in cc.license. Everything throws CCLicenseError, no matter what the kind of error really is. It's impossible to tell what specific problems have happened.

These exceptions are also abused, thrown when a query returns no results in various parts of the infrastructure and caught later, where returning None would have worked just as well or better.

Proposed solutions:

  • Subclass CCLicenseError with more specific errors
  • Use None where more appropriate than throwing an exception

by_code, by_url, etc monstrosities

There are quite a few "by_code" "by_url" and etc functions in cc.license and they are all convoluted and mind-melting to observe. The cult of the flying spagetti monster may be great, but not in regards to our code. We should refactor these to make sense, and many less tables will be flipped in the process.

shouldn't have to pass in model to rdf_helper calls

If rdf_helper functions are designed for specific models, shouldn't we just call those models? Or at least, pass them in as keyword arguments with default values, such as model=JURI_MODEL. That would make a lot more sense, assuming we wanted to be able to do dependency injection for unit testing purposes or something (which we don't currently do anyway!). Or, you know, we could just call JURI_MODEL in the code and save ourselves an argument.

cc.license._lib contains stuff which is useful to external programs

The namespacing of _lib implies that this stuff should be internal but all sorts of things (okay, cc.engine at least) just imports stuff out of it.

RDFa tests for Public Domain Mark, CC0 formatters

We have string tests which are fragile (kind of useful in its own way). We should have RDFa tests though.


Not enough, not good enough unit tests

cc.engine doesn't have the best of coverage currently. We should fix that.

See also Roundup: Add a string test to every page in cc.engine

request.urlgen useless with /ccengine-fcgi/

See Issue669

wsgi_cache sometimes writes 0 byte files instead of caching

A good candidate for table flipping.

I've wondered if we could at least apply a better bandaid so that 0 size cache files aren't written by:

  • RewriteCond TestString -s in apache, which makes sure the file is > 0 bytes
  • Upgrading wsgi_cache to see if the file == 0 bytes, if so, overwrite

Confirmation, Nathan thinks this is the way to go.

Deployment documentation not good

There's kind of some documentation on Jurisdiction Management but it's not very good, and that's not really the right place.

Instead we should make clear documentation on cc.engine deployment and instead just link to it from Jurisdiction Management.

A script that updates cc.engine for us

Now that we have the webadmin user, there's no reason not to just run something like this on live:

 $ ccengine_update

Which does:

  • a git pull of cc.engine
  • buildout
  • reload of apache

And warns if there are any errors.

Maybe with the addition of this argument:

 $ ccengine_update --clear-cache

It will also wipe the cache directory.


 $ ccengine_update --clear-cache-only


Jurisdiction management doc needs cleanup

Jurisdiction Management mostly has correct information now.

  • verify that the info is right
  • remove the red/blue/green cruft.* all need tests

Some of these things have tests, not much though.

Consistent ordering in rdf writing

Currently whenever we write out RDF files, they change completely in their ordering of assertions. This makes git diffs impossible, and with files as large as index.rdf, bloats our repository unnecessarily all the time. Even just sorting the assertions somehow alphabetically would be wonderful.


Document historical reasonings of our .po file formats

We have, for example:

  • cc/i18n/master/cc_org.po: The master .po file
  • cc/i18n/po/es/cc_org.po: The transifex-edited translations for es
  • cc/i18n/i18n/es/cc_org.po: The converted (to po-style? cc-style?? I always forget) files for compiling to .mo
  • cc/i18n/mo/es/cc_org.po: The compiled .mo gettext files

How did we end up with our "unusual" gettext file setup? Why are things this way? I don't even really know these things for sure and I even rewrote some of the tools that work with them!

Do we still need master/cc_org.po

nkinkade thinks maybe not, since transifex just works with po/en/cc_org.po anyway. We should figure this out.

Tests for our tools!

Can we haz?


Convert everything to use cc.i18n.util.locale_to_lower_[lower|upper]()