CRAN
textreuse 0.1.4
Detect Text Reuse and Document Similarity
Released Nov 28, 2016 by Lincoln Mullen
Dependencies
stringr 1.0.0 dplyr 0.5.0 RcppProgress 0.3 digest 0.6.10 assertthat 0.1 tidyr 0.6.0 NLP 0.1-9 Rcpp BH 1.62.0-1
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Installation
Maven
This package can be included as a dependency from a Java or Scala project by including
the following your project's pom.xml
file.
Read more
about embedding Renjin in JVM-based projects.
<dependencies> <dependency> <groupId>org.renjin.cran</groupId> <artifactId>textreuse</artifactId> <version>0.1.4-b15</version> </dependency> </dependencies> <repositories> <repository> <id>bedatadriven</id> <name>bedatadriven public repo</name> <url>https://nexus.bedatadriven.com/content/groups/public/</url> </repository> </repositories>
Renjin CLI
If you're using Renjin from the command line, you load this library by invoking:
library('org.renjin.cran:textreuse')
Test Results
This package was last tested against Renjin 0.8.2503 on Oct 29, 2017.