Difference between revisions of "DiscoverEd/Install manually"

From Creative Commons
Jump to: navigation, search
(Switching to MySQL)
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
[[Category:DiscoverEd]]
 
[[Category:DiscoverEd]]
 +
 +
{{Infobox|
 +
[[DiscoverEd]] is based on [http://nutch.apache.org/ Nutch].  As such, you may wish to consult the [http://wiki.apache.org/nutch/ Nutch Wiki] for general deployment questions.}}
 +
 
{{Stub}}
 
{{Stub}}
  
 
=== Check out and build the source code ===
 
=== Check out and build the source code ===
 +
 
<pre>
 
<pre>
 
$ git clone git://gitorious.org/discovered/repo.git discovered
 
$ git clone git://gitorious.org/discovered/repo.git discovered
Line 31: Line 36:
 
Edit conf/nutch-site.xml to point to your crawl location.
 
Edit conf/nutch-site.xml to point to your crawl location.
  
<code>
+
<pre>
 
$ ant war
 
$ ant war
 
$ [copy the war file to your J2EE container]
 
$ [copy the war file to your J2EE container]
</code>
+
</pre>
  
 
=== Switching to MySQL ===
 
=== Switching to MySQL ===
Line 64: Line 69:
  
 
</pre>
 
</pre>
 +
 +
== Known issues ==
 +
 +
=== Derby and OAI:PMH aren't compatible ===
 +
 +
If you use the default backend, OAI:PMH crawls won't work. Instead, you'll get SQL syntax errors from the code. We haven't fully diagnosed the problem; instead, if you get a problem like that, we suggest you switch to MySQL as per the "Switching to MySQL" section.

Latest revision as of 15:38, 7 September 2010


DiscoverEd is based on Nutch. As such, you may wish to consult the Nutch Wiki for general deployment questions.

Check out and build the source code

$ git clone git://gitorious.org/discovered/repo.git discovered
$ cd discovered
$ ant

Add a curator and a feed

DiscoverEd uses feeds to help identify resources to crawl. Feeds are provided by curators, who can also provide metadata about resources.

$ ./bin/feeds addcurator "ND OCW" http://ocw.nd.edu/ 
$ ./bin/feeds addfeed rss http://ocw.nd.edu/front-page/courselist/rss http://ocw.nd.edu/

Aggregate and crawl resources

$ ./bin/feeds aggregate
$ mkdir seed
$ ./bin/feeds seed > seed/urls.txt
$ ant -f dedbuild.xml crawl

Run the web application

Edit conf/nutch-site.xml to point to your crawl location.

$ ant war
$ [copy the war file to your J2EE container]

Switching to MySQL

By default, DiscoverEd (at least on the next branch) uses an on-disk database called Derby for storing resource metadata. You should use a different database, like MySQL, in production.

To do that, edit conf/discovered.xml and update the following sections as appropriate:

<property>
  <name>rdfstore.db.driver</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>rdfstore.db.url</name>
  <value>jdbc:mysql://localhost/discovered?autoReconnect=true</value>
</property>

<property>
  <name>rdfstore.db.user</name>
  <value>discovered</value>
</property>

<property>
  <name>rdfstore.db.password</name>
  <value></value>
</property>

Known issues

Derby and OAI:PMH aren't compatible

If you use the default backend, OAI:PMH crawls won't work. Instead, you'll get SQL syntax errors from the code. We haven't fully diagnosed the problem; instead, if you get a problem like that, we suggest you switch to MySQL as per the "Switching to MySQL" section.