Clarity Project

Not too long ago we worked on the "Sanity Overhaul", a rewriting of the CC website engine and underlying components (cc.i18n, cc.license, cc.licenserdf, etc). While we all seem to agree that it's been a massive improvement, it feels like there are a lot of loose ends to tie up with the project. This page's purpose is to document what those things are, and how to improve them.

We've made things (a lot more) sane. Now it's time to make them a lot more clear.

cc.license

Exceptions suckiness

Exceptions suck in cc.license. Everything throws CCLicenseError, no matter what the kind of error really is. It's impossible to tell what specific problems have happened.

These exceptions are also abused, thrown when a query returns no results in various parts of the infrastructure and caught later, where returning None would have worked just as well or better.

Proposed solutions:

Subclass CCLicenseError with more specific errors
Use None where more appropriate than throwing an exception

by_code, by_url, etc monstrosities

There are quite a few "by_code" "by_url" and etc functions in cc.license and they are all convoluted and mind-melting to observe. The cult of the flying spagetti monster may be great, but not in regards to our code. We should refactor these to make sense, and many less tables will be flipped in the process.

shouldn't have to pass in model to rdf_helper calls

If rdf_helper functions are designed for specific models, shouldn't we just call those models? Or at least, pass them in as keyword arguments with default values, such as model=JURI_MODEL. That would make a lot more sense, assuming we wanted to be able to do dependency injection for unit testing purposes or something (which we don't currently do anyway!). Or, you know, we could just call JURI_MODEL in the code and save ourselves an argument.

cc.license._lib contains stuff which is useful to external programs

The namespacing of _lib implies that this stuff should be internal but all sorts of things (okay, cc.engine at least) just imports stuff out of it.

RDFa tests for Public Domain Mark, CC0 formatters

http://code.creativecommons.org/issues/issue679

We have string tests which are fragile (kind of useful in its own way). We should have RDFa tests though.

cc.engine

Not enough, not good enough unit tests

cc.engine doesn't have the best of coverage currently. We should fix that.

request.urlgen useless with /ccengine-fcgi/

See Issue669

wsgi_cache sometimes writes 0 byte files instead of caching

http://code.creativecommons.org/issues/issue698

A good candidate for table flipping.

I've wondered if we could at least apply a better bandaid so that 0 size cache files aren't written by:

RewriteCond TestString -s in apache, which makes sure the file is > 0 bytes
Upgrading wsgi_cache to see if the file == 0 bytes, if so, overwrite

Confirmation, Nathan thinks this is the way to go.

Deployment documentation not good

There's kind of some documentation on Jurisdiction Management but it's not very good, and that's not really the right place.

Instead we should make clear documentation on cc.engine deployment and instead just link to it from Jurisdiction Management.

A script that updates cc.engine for us

Now that we have the webadmin user, there's no reason not to just run something like this on live:

 $ ccengine_update

Which does:

a git pull of cc.engine
buildout
reload of apache

And warns if there are any errors.

Maybe with the addition of this argument:

 $ ccengine_update --clear-cache

It will also wipe the cache directory.

Also,

 $ ccengine_update --clear-cache-only

license.rdf

Jurisdiction management doc needs cleanup

Jurisdiction Management mostly has correct information now.

verify that the info is right
remove the red/blue/green cruft.

cc.licenserdf.tools.* all need tests

Some of these things have tests, not much though.

Consistent ordering in rdf writing

Currently whenever we write out RDF files, they change completely in their ordering of assertions. This makes git diffs impossible, and with files as large as index.rdf, bloats our repository unnecessarily all the time. Even just sorting the assertions somehow alphabetically would be wonderful.

cc.i18n

Document historical reasonings of our .po file formats

We have, for example:

cc/i18n/master/cc_org.po: The master .po file
cc/i18n/po/es/cc_org.po: The transifex-edited translations for es
cc/i18n/i18n/es/cc_org.po: The converted (to po-style? cc-style?? I always forget) files for compiling to .mo
cc/i18n/mo/es/cc_org.po: The compiled .mo gettext files

How did we end up with our "unusual" gettext file setup? Why are things this way? I don't even really know these things for sure and I even rewrote some of the tools that work with them!

Do we still need master/cc_org.po

nkinkade thinks maybe not, since transifex just works with po/en/cc_org.po anyway. We should figure this out.

Clarity Project

Contents

cc.license

Exceptions suckiness

by_code, by_url, etc monstrosities

shouldn't have to pass in model to rdf_helper calls

cc.license._lib contains stuff which is useful to external programs

RDFa tests for Public Domain Mark, CC0 formatters

cc.engine

Not enough, not good enough unit tests

request.urlgen useless with /ccengine-fcgi/

wsgi_cache sometimes writes 0 byte files instead of caching

Deployment documentation not good

A script that updates cc.engine for us

license.rdf

Jurisdiction management doc needs cleanup

cc.licenserdf.tools.* all need tests

Consistent ordering in rdf writing

cc.i18n

Document historical reasonings of our .po file formats

Do we still need master/cc_org.po

Tests for our tools!

General

Convert everything to use cc.i18n.util.locale_to_lower_[lower|upper]()

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

default links

wiki navigation

Tools