CRAN
robotstxt 0.6.2
A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker
Released Jul 18, 2018 by Peter Meissner
Dependencies
spiderbar 0.2.1 future.apply 1.0.0 stringr 1.3.1 httr 1.3.1 magrittr 1.5 future 1.9.0
Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, ...) are allowed to access specific resources on a domain.
Installation
Maven
This package can be included as a dependency from a Java or Scala project by including
the following your project's pom.xml
file.
Read more
about embedding Renjin in JVM-based projects.
<dependencies> <dependency> <groupId>org.renjin.cran</groupId> <artifactId>robotstxt</artifactId> <version>0.6.2-b1</version> </dependency> </dependencies> <repositories> <repository> <id>bedatadriven</id> <name>bedatadriven public repo</name> <url>https://nexus.bedatadriven.com/content/groups/public/</url> </repository> </repositories>
Renjin CLI
If you're using Renjin from the command line, you load this library by invoking:
library('org.renjin.cran:robotstxt')
Test Results
This package was last tested against Renjin 0.9.2657 on Aug 18, 2018.
- testthat
- useragent_extraction.all_robots_txt_files_are_valid_E1
- useragent_extraction.all_robots_txt_files_are_valid_E10
- useragent_extraction.all_robots_txt_files_are_valid_E11
- useragent_extraction.all_robots_txt_files_are_valid_E12
- useragent_extraction.all_robots_txt_files_are_valid_E13
- useragent_extraction.all_robots_txt_files_are_valid_E14
- useragent_extraction.all_robots_txt_files_are_valid_E15
- useragent_extraction.all_robots_txt_files_are_valid_E16
- useragent_extraction.all_robots_txt_files_are_valid_E17
- useragent_extraction.all_robots_txt_files_are_valid_E18
- useragent_extraction.all_robots_txt_files_are_valid_E19
- useragent_extraction.all_robots_txt_files_are_valid_E2
- useragent_extraction.all_robots_txt_files_are_valid_E20
- useragent_extraction.all_robots_txt_files_are_valid_E3
- useragent_extraction.all_robots_txt_files_are_valid_E4
- useragent_extraction.all_robots_txt_files_are_valid_E5
- useragent_extraction.all_robots_txt_files_are_valid_E6
- useragent_extraction.all_robots_txt_files_are_valid_E7
- useragent_extraction.all_robots_txt_files_are_valid_E8
- useragent_extraction.all_robots_txt_files_are_valid_E9