CRAN

AhoCorasickTrie 0.1.0

Fast Searching for Multiple Keywords in Multiple Texts

Released Jul 29, 2016 by Matt Chambers [aut, cre], Tomas Petricek [aut, cph], Vanderbilt University [cph]

This package can be loaded by Renjin but there was an error compiling C/FORTRAN sources and all tests failed.

Dependencies

Rcpp

Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.

Installation

Maven

This package can be included as a dependency from a Java or Scala project by including the following your project's pom.xml file. Read more about embedding Renjin in JVM-based projects.

<dependencies>
  <dependency>
    <groupId>org.renjin.cran</groupId>
    <artifactId>AhoCorasickTrie</artifactId>
    <version>0.1.0-b14</version>
  </dependency>
</dependencies>
<repositories>
  <repository>
    <id>bedatadriven</id>
    <name>bedatadriven public repo</name>
    <url>https://nexus.bedatadriven.com/content/groups/public/</url>
  </repository>
</repositories>

View build log

Renjin CLI

If you're using Renjin from the command line, you load this library by invoking:

library('org.renjin.cran:AhoCorasickTrie')

Test Results

This package was last tested against Renjin 0.8.2442 on Sep 25, 2017.

Source

R
C++

View GitHub Mirror

Release History