CRAN

robotstxt 0.6.2

A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

Released Jul 18, 2018 by Peter Meissner

This package can be loaded by Renjin but 1 out 21 tests failed.

Dependencies

spiderbar 0.2.1 future.apply 1.0.0 stringr 1.3.1 httr 1.3.1 magrittr 1.5 future 1.9.0

Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, ...) are allowed to access specific resources on a domain.

Installation

Maven

This package can be included as a dependency from a Java or Scala project by including the following your project's pom.xml file. Read more about embedding Renjin in JVM-based projects.

<dependencies>
  <dependency>
    <groupId>org.renjin.cran</groupId>
    <artifactId>robotstxt</artifactId>
    <version>0.6.2-b1</version>
  </dependency>
</dependencies>
<repositories>
  <repository>
    <id>bedatadriven</id>
    <name>bedatadriven public repo</name>
    <url>https://nexus.bedatadriven.com/content/groups/public/</url>
  </repository>
</repositories>

View build log

Renjin CLI

If you're using Renjin from the command line, you load this library by invoking:

library('org.renjin.cran:robotstxt')

Test Results

This package was last tested against Renjin 0.9.2657 on Aug 18, 2018.