Difference between revisions of "DiscoverEd Quickstart"

From Creative Commons
Jump to: navigation, search
(What you should expect to see)
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[Category:DiscoverEd]]
 
[[Category:DiscoverEd]]
{{Stub}}
 
  
=== Check out and build the source code ===
+
== Getting Started ==
<pre>
+
 
$ git clone git://gitorious.org/discovered/repo.git discovered
+
Run these commands to download a quickstart script.
$ cd discovered
+
 
$ ant
+
<pre style='margin: 0 0 2em 2em; line-height: 160%;'>
 +
cd /tmp/ # As good a place as any
 +
wget http://gitorious.org/discovered/repo/blobs/raw/master/gimme-discovered
 +
bash gimme-discovered
 
</pre>
 
</pre>
  
=== Add a curator and a feed ===
+
The script will check for dependencies, build DiscoverEd, perform a small crawl, and launch a J2EE server ([http://en.wikipedia.org/wiki/Jetty_(web_server) Jetty]) with the software.
 +
 
 +
=== What does the script do? ===
 +
 
 +
* Install the DiscoverEd code in <tt>./discovered</tt> (relative to the working directory)
 +
* Create a Derby database in the <tt>DISCOVERED_DB</tt> directory
 +
* Add a sample curator, and a sample feed
 +
* Aggregate the resources listed in that feed
 +
* Perform a simple crawl
 +
* Run a test search for the term "christianity", and print the results to your terminal
 +
 
 +
The above steps use the search engine without using a web browser. To make it all work in your web browser, the script will then do the following:
 +
 
 +
* Launch an included copy of Jetty
 +
* Open the search engine in Firefox
  
DiscoverEd uses feeds to help identify resources to crawl.  Feeds are provided by curators, who can also provide metadata about resources.
+
=== What you should expect to see ===
  
<pre>
+
This is a development branch. What you'll see is a search engine that says "Nutch".
$ ./bin/feeds addcurator "ND OCW" http://ocw.nd.edu/
 
$ ./bin/feeds addfeed rss http://ocw.nd.edu/front-page/courselist/rss http://ocw.nd.edu/
 
</pre>
 
  
=== Aggregate and crawl resources ===
+
== Controlling the Installation ==
  
<pre>
+
The script has several variables which control its behavior, including the install location (<tt>DISCOVER_ED_ROOT</tt>).
$ ./bin/feeds aggregate
 
$ mkdir seed
 
$ ./bin/feeds seed > seed/urls.txt
 
$ ant -f dedbuild.xml crawl
 
</pre>
 
  
=== Run the web server ===
+
If you need more control (or are using this in production), you'll probably want to do a '''[[DiscoverEd/Install manually|manual installation]]'''.

Latest revision as of 06:01, 21 August 2010


Getting Started

Run these commands to download a quickstart script.

cd /tmp/ # As good a place as any
wget http://gitorious.org/discovered/repo/blobs/raw/master/gimme-discovered
bash gimme-discovered

The script will check for dependencies, build DiscoverEd, perform a small crawl, and launch a J2EE server (Jetty) with the software.

What does the script do?

  • Install the DiscoverEd code in ./discovered (relative to the working directory)
  • Create a Derby database in the DISCOVERED_DB directory
  • Add a sample curator, and a sample feed
  • Aggregate the resources listed in that feed
  • Perform a simple crawl
  • Run a test search for the term "christianity", and print the results to your terminal

The above steps use the search engine without using a web browser. To make it all work in your web browser, the script will then do the following:

  • Launch an included copy of Jetty
  • Open the search engine in Firefox

What you should expect to see

This is a development branch. What you'll see is a search engine that says "Nutch".

Controlling the Installation

The script has several variables which control its behavior, including the install location (DISCOVER_ED_ROOT).

If you need more control (or are using this in production), you'll probably want to do a manual installation.