CRAN
ngram 3.0.4
Fast n-Gram 'Tokenization'
Released Nov 21, 2017 by Drew Schmidt
An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.
Installation
Maven
This package can be included as a dependency from a Java or Scala project by including
the following your project's pom.xml
file.
Read more
about embedding Renjin in JVM-based projects.
<dependencies> <dependency> <groupId>org.renjin.cran</groupId> <artifactId>ngram</artifactId> <version>3.0.4-b8</version> </dependency> </dependencies> <repositories> <repository> <id>bedatadriven</id> <name>bedatadriven public repo</name> <url>https://nexus.bedatadriven.com/content/groups/public/</url> </repository> </repositories>
Renjin CLI
If you're using Renjin from the command line, you load this library by invoking:
library('org.renjin.cran:ngram')
Test Results
This package was last tested against Renjin 0.9.2644 on Jun 1, 2018.