Difference between revisions of "AgShare/Tech"

From Creative Commons
Jump to: navigation, search
(Piwik analytics)
Line 34: Line 34:
 
== Piwik analytics ==
 
== Piwik analytics ==
  
We use a self-hosted package called [http://piwik.org/ Piwik] to record search engine queries and measure traffic to the website.
+
We use a self-hosted package called [http://piwik.org/ Piwik] to record search engine queries and measure traffic to the website. All the data stays with us.
  
You can use the [http://search.agshare.org/static/piwik/piwik/index.php Piwik interface from here] if you have an account. If you want an account, talk to Nathan.
+
You can use the [http://search.agshare.org/static/piwik/piwik/index.php Piwik admin interface] to view the stats, if you have an account. If you want an account, talk to Nathan.
  
 
=== Piwik general configuration ===
 
=== Piwik general configuration ===
  
 
* Configuration: It uses a MySQL database. You can see the details in the Piwik configuration file.
 
* Configuration: It uses a MySQL database. You can see the details in the Piwik configuration file.
* Path on the server: '''/var/www/search.agshare.org/www/static'''
+
* Path on the server: '''/var/www/search.agshare.org/www/static/piwik/piwik'''
 
* Web serving: Apache + mod_php5 serve it up. We set up '''/var/www/search.agshare.org/www/static''' to be served by Apache; you can see that in /etc/apache2/sites-available/search.agshare.org.
 
* Web serving: Apache + mod_php5 serve it up. We set up '''/var/www/search.agshare.org/www/static''' to be served by Apache; you can see that in /etc/apache2/sites-available/search.agshare.org.
 +
 +
To get piwik running, we had to add piwik to the default template. See the "changes" section below for more info.
  
 
=== Site search ===
 
=== Site search ===
Line 49: Line 51:
  
 
The site search plugin requires that we:
 
The site search plugin requires that we:
* Add piwik to the default template
 
 
* Change the default translations so that they  
 
* Change the default translations so that they  
* Configure it:  
+
* Configure it: In the [http://search.agshare.org/static/piwik/piwik/index.php?module=SiteSearch&action=admin&idSite=1&period=day&date=yesterday Site Search settings], I set the "Search URL" to "search.jsp" (no leading slash) and the "Search Parameter" to "query". This matches [http://search.agshare.org/search.jsp?query=body queries like this].
 +
 
 +
Piwik SiteSearch can keep track of the number of results that the search engine returns for each query. To do that, it needs some to be able to "scrape" the information out of the web page, or alternately have the servlet provide it. I chose the "scrape" option. I implemented that in [http://gitorious.org/+discovereders/discovered/agshare-live/commit/4ffdd225670c9af3e57d686111907aa5e5d150fe a commit].
 +
 
 +
== Version control ==
 +
 
 +
The Agshare deployment's git repository can be [http://gitorious.org/+discovereders/discovered/agshare-live/ found on Gitorious].
  
You can adminster
+
When you want to back up the AgShare deployment's git state, just do:
  
* Plugins: '''piwik/plugins/SiteSearch''' is a git clone of
+
$ git push mirror --mirror

Revision as of 15:43, 12 October 2010


The AgShare deployment works analogously to the CC Labs deployment of DiscoverEd. Some important things to note:

  • Username: agshare
  • Host name: search.agshare.org (currently the same as discovered.labs.creativecommons.org)

So, for example, to set up your environment, do:

$ sudo su - agshare

Given that, give Running DiscoverEd a look!

Deploying new WARs

To deploy a new war, do this:

  • rm -rf ~/tomcat/webapps/ROOT
  • cp nutch-1.1.war ~/tomcat/webapps/ROOT.war

Then restart Tomcat.

Restarting Tomcat

The AgShare deployment uses a Tomcat instance in its $HOME (supported by the tomcat6-instance-create script). It's wrapped as "/etc/init.d/agshare" so the boot process can use it. But you can restart it this way:

  • ~/tomcat/bin/shutdown.sh
  • ~/tomcat/bin/startup.sh

Starting Tomcat at boot

/etc/rc.local contains a call to run ~/tomcat/bin/startup.sh as the agshare user. That's kind of hackish, I realize.

Piwik analytics

We use a self-hosted package called Piwik to record search engine queries and measure traffic to the website. All the data stays with us.

You can use the Piwik admin interface to view the stats, if you have an account. If you want an account, talk to Nathan.

Piwik general configuration

  • Configuration: It uses a MySQL database. You can see the details in the Piwik configuration file.
  • Path on the server: /var/www/search.agshare.org/www/static/piwik/piwik
  • Web serving: Apache + mod_php5 serve it up. We set up /var/www/search.agshare.org/www/static to be served by Apache; you can see that in /etc/apache2/sites-available/search.agshare.org.

To get piwik running, we had to add piwik to the default template. See the "changes" section below for more info.

Site search

We added the the sitesearch plugin (still in beta; see this Piwik ticket) to let us analyze site search.

The site search plugin requires that we:

  • Change the default translations so that they
  • Configure it: In the Site Search settings, I set the "Search URL" to "search.jsp" (no leading slash) and the "Search Parameter" to "query". This matches queries like this.

Piwik SiteSearch can keep track of the number of results that the search engine returns for each query. To do that, it needs some to be able to "scrape" the information out of the web page, or alternately have the servlet provide it. I chose the "scrape" option. I implemented that in a commit.

Version control

The Agshare deployment's git repository can be found on Gitorious.

When you want to back up the AgShare deployment's git state, just do:

$ git push mirror --mirror