<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Posting Rich Documents to Apache Solr using SolrJ and Solr Cell (Apache Tika)</title>
	<atom:link href="http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:13:03 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<item>
		<title>By: Geeta</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-7416</link>
		<dc:creator>Geeta</dc:creator>
		<pubDate>Tue, 08 Mar 2011 22:21:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-7416</guid>
		<description>Hi,

I have a code as mentioned above.

SolrServer	server = new StreamingUpdateSolrServer(&quot;http://localhost:8983/solr&quot;,100,100);
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(&quot;/update/extract&quot;);
String fileName = &quot;C:\\test.pdf&quot;;
req.addFile(new File(fileName));
req.setParam(ExtractingParams.LITERALS_PREFIX+&quot;contentid&quot;, &quot;test&quot;);
UpdateResponse resp = req.process(server);
System.out.println(&quot;Result: &quot; + resp.getStatus());
resp = server.commit();
System.out.println(&quot;Commit: &quot; + resp.getStatus());

I get both the status as 0, but still i m not able to search. The number shows 0.
Can u please let  me know what i have missed?


Thanks,
geeta</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I have a code as mentioned above.</p>
<p>SolrServer	server = new StreamingUpdateSolrServer(&#8220;http://localhost:8983/solr&#8221;,100,100);<br />
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(&#8220;/update/extract&#8221;);<br />
String fileName = &#8220;C:\\test.pdf&#8221;;<br />
req.addFile(new File(fileName));<br />
req.setParam(ExtractingParams.LITERALS_PREFIX+&#8221;contentid&#8221;, &#8220;test&#8221;);<br />
UpdateResponse resp = req.process(server);<br />
System.out.println(&#8220;Result: &#8221; + resp.getStatus());<br />
resp = server.commit();<br />
System.out.println(&#8220;Commit: &#8221; + resp.getStatus());</p>
<p>I get both the status as 0, but still i m not able to search. The number shows 0.<br />
Can u please let  me know what i have missed?</p>
<p>Thanks,<br />
geeta</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Archana</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-6784</link>
		<dc:creator>Archana</dc:creator>
		<pubDate>Thu, 16 Dec 2010 17:50:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-6784</guid>
		<description>Hi, I am trying to make simple readings over Lucene indexes
by using SolrJ.When I am trying to post my &quot;csv&quot; file to solrj it
is throwing this exception even though i specified this filed in
schema.xml And here is the code that i wrote SolrServer server =
new CommonsHttpSolrServer(&quot;http://localhost:8983/solr&quot;);
ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(&quot;/update/csv&quot;); req.addFile(new
File(&quot;c:/Sample.csv&quot;)); req.setParam(&quot;uprefix&quot;, &quot;attr_&quot;);
req.setParam(&quot;fmap.content&quot;, &quot;attr_content&quot;);
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList result = server.request(req); System.out.println(&quot;Result:
&quot; + result); when i execute this file I am getting this exception.
Exception in thread &quot;main&quot; org.apache.solr.common.SolrException:
undefined field isbn request:
http://localhost:8983/solr/update/csv?commit=true&amp;waitFlush=true&amp;waitSearcher=true&amp;wt=javabin&amp;version=1
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)at
Test.main(Test.java:23) Here is the schema that I wrote isbn isbn
Can anyone please please help me out on this? Thanks.</description>
		<content:encoded><![CDATA[<p>Hi, I am trying to make simple readings over Lucene indexes<br />
by using SolrJ.When I am trying to post my &#8220;csv&#8221; file to solrj it<br />
is throwing this exception even though i specified this filed in<br />
schema.xml And here is the code that i wrote SolrServer server =<br />
new CommonsHttpSolrServer(&#8220;http://localhost:8983/solr&#8221;);<br />
ContentStreamUpdateRequest req = new<br />
ContentStreamUpdateRequest(&#8220;/update/csv&#8221;); req.addFile(new<br />
File(&#8220;c:/Sample.csv&#8221;)); req.setParam(&#8220;uprefix&#8221;, &#8220;attr_&#8221;);<br />
req.setParam(&#8220;fmap.content&#8221;, &#8220;attr_content&#8221;);<br />
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);<br />
NamedList result = server.request(req); System.out.println(&#8220;Result:<br />
&#8221; + result); when i execute this file I am getting this exception.<br />
Exception in thread &#8220;main&#8221; org.apache.solr.common.SolrException:<br />
undefined field isbn request:<br />
<a href="http://localhost:8983/solr/update/csv?commit=true&#038;waitFlush=true&#038;waitSearcher=true&#038;wt=javabin&#038;version=1" rel="nofollow">http://localhost:8983/solr/update/csv?commit=true&#038;waitFlush=true&#038;waitSearcher=true&#038;wt=javabin&#038;version=1</a><br />
at<br />
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)<br />
at<br />
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)at<br />
Test.main(Test.java:23) Here is the schema that I wrote isbn isbn<br />
Can anyone please please help me out on this? Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Mose</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-5608</link>
		<dc:creator>Eric Mose</dc:creator>
		<pubDate>Mon, 16 Aug 2010 18:12:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-5608</guid>
		<description>grant - Ah ! TikaEntityProcessor.class is not in my solr 1.4 war ! Where/how can I get it?</description>
		<content:encoded><![CDATA[<p>grant &#8211; Ah ! TikaEntityProcessor.class is not in my solr 1.4 war ! Where/how can I get it?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Mose</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-5607</link>
		<dc:creator>Eric Mose</dc:creator>
		<pubDate>Mon, 16 Aug 2010 18:00:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-5607</guid>
		<description>oops here ... you can&#039;t see the code duh
&lt;pre&gt;
      &lt;entity processor=&quot;TikaEntityProcessor&quot; url=&quot;NAManual.pdf&quot; dataSource=&quot;docDataSource&quot; format=&quot;text&quot;&gt;
          &lt;!--Do appropriate mapping here  meta=&quot;true&quot; means it is a metadata field --&gt;
          &lt;field column=&quot;Author&quot; meta=&quot;true&quot; name=&quot;author&quot;/&gt;
          &lt;field column=&quot;title&quot; meta=&quot;true&quot; name=&quot;docTitle&quot;/&gt;
          &lt;!--&#039;text&#039; is an implicit field emited by TikaEntityProcessor . Map it appropriately--&gt;
          &lt;field column=&quot;text&quot; name=&quot;articleText&quot; /&gt;
        &lt;/entity&gt;
&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>oops here &#8230; you can&#8217;t see the code duh</p>
<pre>
      &lt;entity processor="TikaEntityProcessor" url="NAManual.pdf" dataSource="docDataSource" format="text"&gt;
          &lt;!--Do appropriate mapping here  meta="true" means it is a metadata field --&gt;
          &lt;field column="Author" meta="true" name="author"/&gt;
          &lt;field column="title" meta="true" name="docTitle"/&gt;
          &lt;!--'text' is an implicit field emited by TikaEntityProcessor . Map it appropriately--&gt;
          &lt;field column="text" name="articleText" /&gt;
        &lt;/entity&gt;
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Mose</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-5606</link>
		<dc:creator>Eric Mose</dc:creator>
		<pubDate>Mon, 16 Aug 2010 17:59:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-5606</guid>
		<description>Grant - I completely see how to do this ... I didn&#039;t realize you could nest the entities in the DIH config XML file. But - Now I get an error :
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:2859509962269735 

when I try to do:
&lt;pre&gt;
      
          &lt;!--Do appropriate mapping here  meta=&quot;true&quot; means it is a metadata field --&gt;
          
          
          &lt;!--&#039;text&#039; is an implicit field emited by TikaEntityProcessor . Map it appropriately--&gt;
          
        
&lt;/pre&gt;

Any ideas?</description>
		<content:encoded><![CDATA[<p>Grant &#8211; I completely see how to do this &#8230; I didn&#8217;t realize you could nest the entities in the DIH config XML file. But &#8211; Now I get an error :<br />
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:2859509962269735 </p>
<p>when I try to do:</p>
<pre>

          <!--Do appropriate mapping here  meta="true" means it is a metadata field -->

          <!--'text' is an implicit field emited by TikaEntityProcessor . Map it appropriately-->
</pre>
<p>Any ideas?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Grant Ingersoll</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-5604</link>
		<dc:creator>Grant Ingersoll</dc:creator>
		<pubDate>Mon, 16 Aug 2010 14:11:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-5604</guid>
		<description>Eric,

I haven&#039;t tried it yet, but DIH now fully supports Tika as well, so you may be able to deal with it solely from DIH.</description>
		<content:encoded><![CDATA[<p>Eric,</p>
<p>I haven&#8217;t tried it yet, but DIH now fully supports Tika as well, so you may be able to deal with it solely from DIH.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Mose</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-5603</link>
		<dc:creator>Eric Mose</dc:creator>
		<pubDate>Mon, 16 Aug 2010 13:39:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-5603</guid>
		<description>Is there a way to merge/associate indexed documents with another document that has been indexed using DIH? For instance, a PDF is associated with a document in a database through a Content Management System - I want to search on a term in the PDF but in the results I want the associated document in the database, *not* the PDF ?!?!?! Any way to do this? SOLRJ ? 

Thanks!</description>
		<content:encoded><![CDATA[<p>Is there a way to merge/associate indexed documents with another document that has been indexed using DIH? For instance, a PDF is associated with a document in a database through a Content Management System &#8211; I want to search on a term in the PDF but in the results I want the associated document in the database, *not* the PDF ?!?!?! Any way to do this? SOLRJ ? </p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: 好记性,不如烂博客! &#124; 做人要豁達大道</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-4997</link>
		<dc:creator>好记性,不如烂博客! &#124; 做人要豁達大道</dc:creator>
		<pubDate>Fri, 04 Jun 2010 06:23:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-4997</guid>
		<description>[...] http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-an...    This entry was posted in tips and tricks and tagged tips and tricks. Bookmark the permalink.    &#8592; Puppy Arcade:&#160;超强游戏模拟器合集 [...]</description>
		<content:encoded><![CDATA[<p>[...] <a href="http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-an.." rel="nofollow">http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-an..</a>.    This entry was posted in tips and tricks and tagged tips and tricks. Bookmark the permalink.    &larr; Puppy Arcade:&nbsp;超强游戏模拟器合集 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dorai</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-4081</link>
		<dc:creator>dorai</dc:creator>
		<pubDate>Wed, 02 Dec 2009 20:42:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-4081</guid>
		<description>Grant, Could you also provide an example of how to use ContentStreamUpdateRequest and remote-stream a local file (this is marked as a TODO on the wiki page). I tried the following 3 options with Solr 1.4 with mixed results:

Option 1:
---------
Specifying both calls:
req.addFile(new File(&quot;/tmp/features.pdf&quot;));
req.setParam(&quot;stream.file&quot;, &quot;/tmp/features.pdf&quot;);
This works, but the dump output shows the ContentStream being added twice, i.e:
INFO: add {[/tmp/features.pdf, /tmp/features.pdf]}

Option 2
--------
Leaving out the req.setParam call and only using req.addFile(new File(&quot;/tmp/features.pdf&quot;));
This works, and the dump info shows:
INFO: add [/tmp/features.pdf]
i.e. the ContentStream only once - but does the Solr Server read the local file directly versus the SolrJ client sending the content over HTTP?

Option 3:
Leaving out the req.addFile call and only specifying:
req.setParam(&quot;stream.file&quot;, &quot;/tmp/features.pdf&quot;);
This causes:
- a Java NullPointerException at: CommonsHTTPSolrServer.java:381
-no dump output and 
- indexing fails</description>
		<content:encoded><![CDATA[<p>Grant, Could you also provide an example of how to use ContentStreamUpdateRequest and remote-stream a local file (this is marked as a TODO on the wiki page). I tried the following 3 options with Solr 1.4 with mixed results:</p>
<p>Option 1:<br />
&#8212;&#8212;&#8212;<br />
Specifying both calls:<br />
req.addFile(new File(&#8220;/tmp/features.pdf&#8221;));<br />
req.setParam(&#8220;stream.file&#8221;, &#8220;/tmp/features.pdf&#8221;);<br />
This works, but the dump output shows the ContentStream being added twice, i.e:<br />
INFO: add {[/tmp/features.pdf, /tmp/features.pdf]}</p>
<p>Option 2<br />
&#8212;&#8212;&#8211;<br />
Leaving out the req.setParam call and only using req.addFile(new File(&#8220;/tmp/features.pdf&#8221;));<br />
This works, and the dump info shows:<br />
INFO: add [/tmp/features.pdf]<br />
i.e. the ContentStream only once &#8211; but does the Solr Server read the local file directly versus the SolrJ client sending the content over HTTP?</p>
<p>Option 3:<br />
Leaving out the req.addFile call and only specifying:<br />
req.setParam(&#8220;stream.file&#8221;, &#8220;/tmp/features.pdf&#8221;);<br />
This causes:<br />
- a Java NullPointerException at: CommonsHTTPSolrServer.java:381<br />
-no dump output and<br />
- indexing fails</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Grant Ingersoll</title>
		<link>http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/comment-page-1/#comment-4052</link>
		<dc:creator>Grant Ingersoll</dc:creator>
		<pubDate>Mon, 23 Nov 2009 15:05:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=983#comment-4052</guid>
		<description>SolrJ is a client side technology.  I don&#039;t believe you have to do anything with Tomcat.</description>
		<content:encoded><![CDATA[<p>SolrJ is a client side technology.  I don&#8217;t believe you have to do anything with Tomcat.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

