Attempt to use Calais as a metadata importing plug-in to provide additional metadata for crawled resources.

Calais is a big initiative with a lot of components. What you care about is the Calais Web Service. The web service is an API that accepts unstructured text (like news articles, blog postings, etc.), processes them using natural language processing and machine learning algorithms, and returns RDF-formatted entities, facts and events. It takes about 0.5 to 1.0 second depending on how big a document you send and the size of your pipe.

Calais Developer Documentation

Open Calais is a Thomas Reuters, Inc product. It requires agreement to a legal terms of service. It has usage limits. It requires registration and retrieval of a key.

Reason for caution for some users... If you syndicate, publish or otherwise transmit any content containing, enhanced by or derived from Calais-generated metadata you will use your best efforts to incorporate the correct Calais-provided Globally Unique Identifier (GUID) in that content. You specifically agree not to attach incorrect GUIDs to your content with any intent to mislead, spam, spoof, phish or otherwise deceive downstream consumers of your content.