Hi all, I just pushed a minor change to Bixo, where the FetcherPolicy now has an isTerminated() method. By default this just checks the crawl end time, and if
Hi Brian, ... You should be able to use the Cascading JDBCTap to directly insert values into your DB. See the Bixo SimpleCrawlTool for an example of using the
... [snip] ... [snip] ... I just committed a change to bixo's master in GitHub, where I refactored the robots.txt processing code. You can now customize the
... Hi Ken, You clarified me a lot of things with this. I will try to follow rules as much as possible once you explained me how to do it. But also will be