Clarity Project
Not too long ago we worked on the "Sanity Overhaul", a rewriting of the CC website engine and underlying components (cc.i18n, cc.license, cc.licenserdf, etc). While we all seem to agree that it's been a massive improvement, it feels like there are a lot of loose ends to tie up with the project. This page's purpose is to document what those things are, and how to improve them.
We've made things (a lot more) sane. Now it's time to make them a lot more clear.
cc.license
Exceptions suckiness
Exceptions suck in cc.license. Everything throws CCLicenseError, no matter what the kind of error really is. It's impossible to tell what specific problems have happened.
These exceptions are also abused, thrown when a query returns no results in various parts of the infrastructure and caught later, where returning None would have worked just as well or better.
Proposed solutions:
- Subclass CCLicenseError with more specific errors
- Use None where more appropriate than throwing an exception
by_code, by_url, etc monstrosities
There are quite a few "by_code" "by_url" and etc functions in cc.license and they are all convoluted and mind-melting to observe. The cult of the flying spagetti monster may be great, but not in regards to our code. We should refactor these to make sense, and many less tables will be flipped in the process.
shouldn't have to pass in model to rdf_helper calls
If rdf_helper functions are designed for specific models, shouldn't we just call those models? Or at least, pass them in as keyword arguments with default values, such as model=JURI_MODEL. That would make a lot more sense, assuming we wanted to be able to do dependency injection for unit testing purposes or something (which we don't currently do anyway!). Or, you know, we could just call JURI_MODEL in the code and save ourselves an argument.
cc.license._lib contains stuff which is useful to external programs
The namespacing of _lib implies that this stuff should be internal but all sorts of things (okay, cc.engine at least) just imports stuff out of it.
RDFa tests for Public Domain Mark, CC0 formatters
http://code.creativecommons.org/issues/issue679
We have string tests which are fragile (kind of useful in its own way). We should have RDFa tests though.
cc.engine
Not enough, not good enough unit tests
cc.engine doesn't have the best of coverage currently. We should fix that.
See also Roundup: Add a string test to every page in cc.engine
request.urlgen useless with /ccengine-fcgi/
See Issue669
wsgi_cache sometimes writes 0 byte files instead of caching
http://code.creativecommons.org/issues/issue698
A good candidate for table flipping.
I've wondered if we could at least apply a better bandaid so that 0 size cache files aren't written by:
- RewriteCond TestString -s in apache, which makes sure the file is > 0 bytes
- Upgrading wsgi_cache to see if the file == 0 bytes, if so, overwrite
Confirmation, Nathan thinks this is the way to go.
Deployment documentation not good
There's kind of some documentation on Jurisdiction Management but it's not very good, and that's not really the right place.
Instead we should make clear documentation on cc.engine deployment and instead just link to it from Jurisdiction Management.
A script that updates cc.engine for us
Now that we have the webadmin user, there's no reason not to just run something like this on live:
$ ccengine_update
Which does:
- a git pull of cc.engine
- buildout
- reload of apache
And warns if there are any errors.
Maybe with the addition of this argument:
$ ccengine_update --clear-cache
It will also wipe the cache directory.
Also,
$ ccengine_update --clear-cache-only
license.rdf
Jurisdiction management doc needs cleanup
Jurisdiction Management mostly has correct information now.
- verify that the info is right
- remove the red/blue/green cruft.
cc.licenserdf.tools.* all need tests
Some of these things have tests, not much though.
Consistent ordering in rdf writing
Currently whenever we write out RDF files, they change completely in their ordering of assertions. This makes git diffs impossible, and with files as large as index.rdf, bloats our repository unnecessarily all the time. Even just sorting the assertions somehow alphabetically would be wonderful.
cc.i18n
Document historical reasonings of our .po file formats
We have, for example:
- cc/i18n/master/cc_org.po: The master .po file
- cc/i18n/po/es/cc_org.po: The transifex-edited translations for es
- cc/i18n/i18n/es/cc_org.po: The converted (to po-style? cc-style?? I always forget) files for compiling to .mo
- cc/i18n/mo/es/cc_org.po: The compiled .mo gettext files
How did we end up with our "unusual" gettext file setup? Why are things this way? I don't even really know these things for sure and I even rewrote some of the tools that work with them!
Do we still need master/cc_org.po
nkinkade thinks maybe not, since transifex just works with po/en/cc_org.po anyway. We should figure this out.
Tests for our tools!
Can we haz?