Nlp library

Index:

Functions:

nlp-lemmatize

nlp-lemmatize

Synopsis
Gets the lemma of a word.
Usage
(nlp-lemmatize word pos "dictionary" => dict_path)
Returns
lemma
Where
  • word is string: the word which lemma is to be obtained.
  • pos is string: the POS (part of speech) tag corresponding to the word.
  • dict_path is string[0..1]: the path to the dictionary lemmatizer file. When not specified, no lemmatization will be performed.
  • lemma is string: the lemma of the specified word. When that word is not found in the dictionary the same word is returned.
Description
Returns the lemma (canonical form) of the specified word and POS tag.
Exceptions
  • "FileNotFoundException": when the specified model file is not found.
Examples
(nlp-lemmatize
  "making"
  "VBG"
  "dictionary" => "en-lemmatizer.dict"
)
"make"
nlp-parse

nlp-parse

Synopsis
Creates a natural language parse tree.
Usage
(nlp-parse
  sentence
  "parser" => parser_model_path
  "tokenizer" => tokenizer_model_path
  "dictionary" => dict_path
  "parses" => parses
)
Returns
parse_tree
Where
  • sentence is string: the sentence to parse.
  • parser_model_path is string[0..1]: the path to the parser model file. When no specified, parser_model_path is assumed to be ${user.home}/opennlp/en-parser-chunking.bin.
    See http://opennlp.apache.org documentation for more details.
  • tokenizer_model_path is string[0..1]: the path to the tokenizer model file. When not specified, tokenizer_model_path is assumed to be ${user.home}/opennlp/en-token.bin.
    When tokenizer_model_path is an empty string a whitespace tokenizer will be used.
  • dict_path is string[0..1]: the path to the dictionary lemmatizer file. When not specified, no lemmatization will be performed.
  • parses is number[0..1]: the number of parse trees to generate. By default parses is 1.
  • parse_tree is list: a list that represents the parse tree.
Description
Returns a natural language parse tree of the given text.This tree represents the grammatical structure of the text.
Exceptions
  • "FileNotFoundException": when the specified model file is not found.
Examples
(nlp-parse "My name is John.")
(
  "TOP"
  (
    "S"
    ("NP" ("PRP$" "my") ("NN" "name"))
    ("VP" ("VBZ" "is") ("NP" ("NNP" "John")))
    ("." ".")
  )
)
nlp-postag

nlp-postag

Synopsis
Gets the POS tags of a word list.
Usage
(nlp-postag
  (word ... word)
  "pos" => pos_model_path
  "dictionary" => dict_path
)
Returns
(tag_word ... tag_word)
Where
  • word is string[1..N]: the word which tag is to be obtained.
  • pos_model_path is string[0..1]: the path to the POS (part of speech) file. When not specified, pos_model_path is assumed to be ${user.home}/opennlp/en-pos-maxent.bin.
  • dict_path is string[0..1]: the path to the dictionary lemmatizer file. When not specified, no lemmatization will be performed.
  • tag_word is (tag output_word)[1..N]
  • tag is string: the POS tag.
  • output_word is string: the input word when dictionary is not specified, otherwise the lemma of the input word.
Description
Returns a list that contains a pair (tag word) for each input word.
Exceptions
  • "FileNotFoundException": when the specified model file is not found.
Examples
(nlp-postag ("My" "name" "is" "John"))
(
  ("PRP$" "My")
  ("NN" "name")
  ("VBZ" "is")
  ("NNP" "John")
)
nlp-sentences

nlp-sentences

Synopsis
Splits text into sentences.
Usage
(nlp-sentences
  text
  "sentence-detector" => sentence_model_path
)
Returns
(sentence ... sentence)
Where
  • text is string: the text to split into sentences.
  • sentence_model_path is string[0..1]: the path to the sentence detector model file. When not specified, sentence_model_path is assumed to be ${user.home}/opennlp/en-sent.bin.
    When sentence_model_path is an empty string a newline sentence detector will be used.
  • sentence is string[1..N]: a sentence extracted from text.
Description
Returns a list that contains the sentences of the specified text.
Exceptions
  • "FileNotFoundException": when the specified model file is not found.
Examples
(nlp-sentences
  "My name is John. What is your name?"
)
("My name is John." "What is your name?")
nlp-tokenize

nlp-tokenize

Synopsis
Extracts tokens from text.
Usage
(nlp-tokenize
  text
  "tokenizer" => tokenizer_model_path
)
Returns
(token ... token)
Where
  • text is string: the text to split into tokens.
  • tokenizer_model_path is string[0..1]: the path to the tokenizer model file. When not specified, tokenizer_model_path is assumed to be ${user.home}/opennlp/en-token.bin.
    When tokenizer_model_path is an empty string a whitespace tokenizer will be used.
  • token is string[1..N]
Description
Returns a list that contains the tokens (strings) of the specified text.
Exceptions
  • "FileNotFoundException": when the specified model file is not found.
Examples
(nlp-tokenize "My name is John.")
("My" "name" "is" "John" ".")
nlp-translate

nlp-translate

Synopsis
Translates text.
Usage
(nlp-translate
  (text ... text)
  source
  target
  service
  options
)
Returns
(translation ... translation)
Where
  • text is string[1..N]: a text to translate.
  • source is string: the source language (ISO 639-1 code). Some translation services ignore this parameter as they detect the source language automatically.
  • target is string: the target language (ISO 639-1 code).
  • service is string: the name of the translation service (for example, "google").
  • options is list[0..1]: a list containing service specific parameters like credentials, operation mode, etc.
  • translation is (
      "translated-text" => translated_text
      "detected-language" => detected_language
    )
    [1..N]: a list that contains the translated text.
  • translated_text is string: the translation of the corresponding source text.
  • detected_language is string[0..1]: the detected source language (ISO 639-1 code).
Description
Translates a list of strings to a target language using the specified translation service.
Examples
(nlp-translate
  ("Em dic Ricard." "Como te llamas?")
  null
  "en"
  "google"
  ("key" => "Asd4jgh3hd42hnf9_dgF23k625ghs")
)
(
  (
    "translated-text" => "My name is Ricard."
    "detected-language" => "ca"
  )
  (
    "translated-text" => "What is your name?"
    "detected-language" => "es"
  )
)
Top