CRAN

textreuse 0.1.4

Detect Text Reuse and Document Similarity

Released Nov 28, 2016 by Lincoln Mullen

This package can be loaded by Renjin but 12 out 14 tests failed.

Dependencies

stringr 1.0.0 dplyr 0.5.0 RcppProgress 0.3 digest 0.6.10 assertthat 0.1 tidyr 0.6.0 NLP 0.1-9 Rcpp BH 1.62.0-1

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Installation

Maven

This package can be included as a dependency from a Java or Scala project by including the following your project's pom.xml file. Read more about embedding Renjin in JVM-based projects.

<dependencies>
  <dependency>
    <groupId>org.renjin.cran</groupId>
    <artifactId>textreuse</artifactId>
    <version>0.1.4-b15</version>
  </dependency>
</dependencies>
<repositories>
  <repository>
    <id>bedatadriven</id>
    <name>bedatadriven public repo</name>
    <url>https://nexus.bedatadriven.com/content/groups/public/</url>
  </repository>
</repositories>

View build log

Renjin CLI

If you're using Renjin from the command line, you load this library by invoking:

library('org.renjin.cran:textreuse')

Test Results

This package was last tested against Renjin 0.8.2503 on Oct 29, 2017.

Source

R
C++

View GitHub Mirror

Release History