Hacking DiscoverEd
How to deploy a hackable DiscoverEd, make changes, and update your deployment
These instructions assume a Unix-like environment; please help us by contributing information on Windows-based development.
Contents
- 1 Check out and build the source code
- 2 Add a curator and a feed
- 3 Aggregate and crawl resources
- 4 Run the web application
- 5 Hacking The Code
- 6 Commiting Changes and Merging to the Main Repository
- 7 Troubleshooting
- 7.1 I get a big long Java backtrace talking about Jena and MySQL
- 7.2 Database permissions
- 7.3 JAVA_HOME on a Mac
- 7.4 Error message: "Feature 'http://apache.org/xml/features/xinclude' is not recognized."
- 7.5 AccessControlException
- 7.6 Missing build/plugins
- 7.7 Missing parse-mp3 plugin
- 7.8 Eclipse complains: Wrong version number in .class file
Check out and build the source code
git clone git://gitorious.org/discovered/repo.git discovered cd discovered git checkout (whatever branch we're working on today) ant
Add a curator and a feed
By default DiscoverEd uses Derby and will create the on-disk database if needed. See the installation instructions for information on using other databases, such as MySQL.
DiscoverEd uses feeds to help identify resources to crawl. Feeds are provided by curators, who can also provide metadata about resources.
$ ./bin/feeds addcurator "ND OCW" http://ocw.nd.edu/ $ ./bin/feeds addfeed rss http://ocw.nd.edu/english/@@rss http://ocw.nd.edu/
See DiscoverEd Feeds for information on supported feed types.
More information on ./bin/feeds
commands at Running DiscoverEd (some information will be discovered.cc specific)
Aggregate and crawl resources
$ ./bin/feeds aggregate $ ./bin/feeds seed > seed/urls.txt $ ant -f dedbuild.xml crawl
Run the web application
You can run the web front-end using Jetty (included with your checkout) by running:
$ ant -f dedbuild.xml serve
Hacking The Code
- Run Eclipse
- Do File -> Import...
- When it asks you to "Existing projects into workspace," choose "General -> File System"
- Select the location of your source tree
- Click Finish
(There are three options. 1. "Existing projects into workspace". 2. "Create from existing source" 3. "File 1. "Existing projects into workspace". 2. "Create from existing source" 3. "File System". Some of these trigger an error regarding Nutch MP3 code.)
The DiscoverEd source code lives in two locations:
- ded/src/java contains DiscoverEd specific code, primarily related to interfacing with the RDF store.
- src/plugins/cclearn contains the DiscoverEd Nutch plugin, which provides some filtering features to Nutch and ensures metadata indexed in the RDF store is injected into the Lucene index
Generally, the plugin may depend upon code in the ded/src/java tree, but classes in the plugin may not be available to that code.
Note: The DiscoverEd developers will consider you extra special if you indent your code using spaces instead of tabs. You may even earn a gold star.
To use spaces for all indentation for all Java projects in Eclipse 3.5 (Galileo):
- Open Preferences
- Expand the Java group
- Expand the Code Style subgroup within the Java group
- Select Formatter
- Click on "New" in the Formatter section and name your profile
- Check that the "Indentation" tab is active
- Select "Spaces only" from the "Tab policy" dropdown
- Click "Apply" and or "OK"
You may also do this on a per-project basis by setting it as a project property. The general process is the same.
Commiting Changes and Merging to the Main Repository
Troubleshooting
I get a big long Java backtrace talking about Jena and MySQL
If you've configured DiscoverEd to use MySQL as the database backend, you'll need to create the database first, read more here
Database permissions
You might need to change the MySQL credentials or database configuration value in conf/discovered.xml
. DiscoverEd does not require that you use the root user; it does require that the database already exist.
JAVA_HOME on a Mac
Mac users setting JAVA_HOME should use /usr/libexec/java_home to determine the current JAVA_HOME
if you're really lazy add JAVA_HOME=`/usr/libexec/java_home` to .bash_profile and it will set JAVA_HOME each time you invoke a shell. (This is a good idea!)
Get Hostgator 1 cent here and save lot on Hostgator
Error message: "Feature 'http://apache.org/xml/features/xinclude' is not recognized."
"You probably have an older version of Xerces somewhere in your classpath or something is overriding the default parser configuration with one that doesn't support XInclude." (http://marc.info/?l=xerces-j-user&m=117066278506146&w=2)
AccessControlException
When starting Tomcat, if you get a traceback like this in your tomcat log (e.g., in /var/lib/tomcat6/logs/localhost-$date.log):
SEVERE: Exception sending context initialized event to listener instance of class org.apache.nutch.searcher.NutchBean$NutchBeanConstructor java.lang.RuntimeException: java.security.AccessControlException: access denied (java.lang.reflect.ReflectPermission suppressAccessChecks) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1377) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
and so on, try changing the Tomcat policy in /etc/tomcat6/policy.d/04webapps.policy. Add these lines in the grant {} block:
// Attempt to get Nutch working // Courtesy of Alex McLintock at http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200907.mbox/<d398ec7f0907041237j6acffe0fm10b7cd374a77795b@mail.gmail.com> permission java.security.AllPermission;
This is obviously inappropriate for any site running a public instance of DiscoverEd. But it might be useful for your local dev environment. If you know how to specify a class level permission, please update this document.
Missing build/plugins
Be sure to run ant in the root repo directory.
Missing parse-mp3 plugin
Remove that source folder from the build path (in Eclimse, Project > Properties > Java Build Path > Source.
Eclipse complains: Wrong version number in .class file
Use Java 1.6 as your compiler. Be sure to use the right JVM for this project.