Rewrite Metadata Validator/SoC 2008/IRC Scanner

From Creative Commons
Revision as of 16:30, 15 June 2008 by Hdworak (talk | contribs) (Initial writing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

If one wants to search for his or her activity at the IRC channel of Creative Commons, he or she can use Google or the following public domain script written in PHP 5:

<?php

$nick = 'john';
$path = 'http://mirrors.creativecommons.org/irc/cc/';
preg_match_all('/%23cc\.'.date('Y').'\-\d\d\-\d\d\.log\.html/', file_get_contents($path), $matches);
$irrelevant = array();
if (file_exists('irrelevant.txt')) {
    $irrelevant = unserialize(file_get_contents('irrelevant.txt'));
}
foreach ($matches[0] as $url) {
    if (file_exists('relevant/'.($filename = str_replace('%23', '', $url)))
     || in_array($filename, $irrelevant)) {
        echo 'Skipped ', $filename, PHP_EOL;
        continue;
    }
    $contents = file_get_contents($path.$url);
    if (!strstr($contents, $nick.'</th><td class="text"')) {
        echo 'Irrelevant ', $filename, PHP_EOL;
        $irrelevant[] = $filename;
        continue;  
    }
    file_put_contents('relevant/'.$filename, $contents);
    echo 'Downloaded ', $filename, PHP_EOL;
}
file_put_contents('irrelevant.txt', serialize($irrelevant));
echo 'Saved irrelevant.txt', PHP_EOL;

Please note that the herewith enclosed script searches the logs saved the current year. All relevant logs are downloaded to the local machine.