DiscoverEd/Install manually

From Creative Commons
Revision as of 14:38, 7 September 2010 by Paulproteus (talk | contribs) (Switching to MySQL)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

DiscoverEd is based on Nutch. As such, you may wish to consult the Nutch Wiki for general deployment questions.

Check out and build the source code

$ git clone git:// discovered
$ cd discovered
$ ant

Add a curator and a feed

DiscoverEd uses feeds to help identify resources to crawl. Feeds are provided by curators, who can also provide metadata about resources.

$ ./bin/feeds addcurator "ND OCW" 
$ ./bin/feeds addfeed rss

Aggregate and crawl resources

$ ./bin/feeds aggregate
$ mkdir seed
$ ./bin/feeds seed > seed/urls.txt
$ ant -f dedbuild.xml crawl

Run the web application

Edit conf/nutch-site.xml to point to your crawl location.

$ ant war
$ [copy the war file to your J2EE container]

Switching to MySQL

By default, DiscoverEd (at least on the next branch) uses an on-disk database called Derby for storing resource metadata. You should use a different database, like MySQL, in production.

To do that, edit conf/discovered.xml and update the following sections as appropriate:





Known issues

Derby and OAI:PMH aren't compatible

If you use the default backend, OAI:PMH crawls won't work. Instead, you'll get SQL syntax errors from the code. We haven't fully diagnosed the problem; instead, if you get a problem like that, we suggest you switch to MySQL as per the "Switching to MySQL" section.