NlpLibrary - Brain4it

Index:

nlp-lemmatize nlp-parse nlp-postag nlp-sentences nlp-tokenize nlp-translate

Functions:

nlp-lemmatize

Synopsis

Gets the lemma of a word.

Usage

(nlp-lemmatize word pos "dictionary" => dict_path)

Returns

lemma

Where

word is string: the word which lemma is to be obtained.
pos is string: the POS (part of speech) tag corresponding to the word.
dict_path is string[0..1]: the path to the dictionary lemmatizer file. When not specified, no lemmatization will be performed.
lemma is string: the lemma of the specified word. When that word is not found in the dictionary the same word is returned.

Description

Returns the lemma (canonical form) of the specified word and POS tag.

Exceptions

"FileNotFoundException": when the specified model file is not found.

Examples

(nlp-lemmatize
  "making"
  "VBG"
  "dictionary" => "en-lemmatizer.dict"
)

"make"

nlp-parse

Synopsis

Creates a natural language parse tree.

Usage

(nlp-parse
  sentence
  "parser" => parser_model_path
  "tokenizer" => tokenizer_model_path
  "dictionary" => dict_path
  "parses" => parses
)

Returns

parse_tree

Where

sentence is string: the sentence to parse.
parser_model_path is string[0..1]: the path to the parser model file. When no specified, parser_model_path is assumed to be ${user.home}/opennlp/en-parser-chunking.bin.
See http://opennlp.apache.org documentation for more details.
tokenizer_model_path is string[0..1]: the path to the tokenizer model file. When not specified, tokenizer_model_path is assumed to be ${user.home}/opennlp/en-token.bin.
When tokenizer_model_path is an empty string a whitespace tokenizer will be used.
dict_path is string[0..1]: the path to the dictionary lemmatizer file. When not specified, no lemmatization will be performed.
parses is number[0..1]: the number of parse trees to generate. By default parses is 1.
parse_tree is list: a list that represents the parse tree.

Description

Returns a natural language parse tree of the given text.This tree represents the grammatical structure of the text.

Exceptions

"FileNotFoundException": when the specified model file is not found.

Examples

(nlp-parse "My name is John.")

(
  "TOP"
  (
    "S"
    ("NP" ("PRP$" "my") ("NN" "name"))
    ("VP" ("VBZ" "is") ("NP" ("NNP" "John")))
    ("." ".")
  )
)

nlp-postag

Synopsis

Gets the POS tags of a word list.

Usage

(nlp-postag
  (word ... word)
  "pos" => pos_model_path
  "dictionary" => dict_path
)

Returns

(tag_word ... tag_word)

Where

word is string[1..N]: the word which tag is to be obtained.
pos_model_path is string[0..1]: the path to the POS (part of speech) file. When not specified, pos_model_path is assumed to be ${user.home}/opennlp/en-pos-maxent.bin.
dict_path is string[0..1]: the path to the dictionary lemmatizer file. When not specified, no lemmatization will be performed.
tag_word is (tag output_word)[1..N]
tag is string: the POS tag.
output_word is string: the input word when dictionary is not specified, otherwise the lemma of the input word.

Description

Returns a list that contains a pair (tag word) for each input word.

Exceptions

"FileNotFoundException": when the specified model file is not found.

Examples

(nlp-postag ("My" "name" "is" "John"))

(
  ("PRP$" "My")
  ("NN" "name")
  ("VBZ" "is")
  ("NNP" "John")
)

nlp-sentences

Synopsis

Splits text into sentences.

Usage

(nlp-sentences
text
"sentence-detector" => sentence_model_path
)

Returns

(sentence ... sentence)

Where

text is string: the text to split into sentences.
sentence_model_path is string[0..1]: the path to the sentence detector model file. When not specified, sentence_model_path is assumed to be ${user.home}/opennlp/en-sent.bin.
When sentence_model_path is an empty string a newline sentence detector will be used.
sentence is string[1..N]: a sentence extracted from text.

Description

Returns a list that contains the sentences of the specified text.

Exceptions

"FileNotFoundException": when the specified model file is not found.

Examples

(nlp-sentences
"My name is John. What is your name?"
)

("My name is John." "What is your name?")

nlp-tokenize

Synopsis

Extracts tokens from text.

Usage

(nlp-tokenize
text
"tokenizer" => tokenizer_model_path
)

Returns

(token ... token)

Where

text is string: the text to split into tokens.
tokenizer_model_path is string[0..1]: the path to the tokenizer model file. When not specified, tokenizer_model_path is assumed to be ${user.home}/opennlp/en-token.bin.
When tokenizer_model_path is an empty string a whitespace tokenizer will be used.
token is string[1..N]

Description

Returns a list that contains the tokens (strings) of the specified text.

Exceptions

"FileNotFoundException": when the specified model file is not found.

Examples

(nlp-tokenize "My name is John.")

("My" "name" "is" "John" ".")

nlp-translate

Synopsis

Translates text.

Usage

(nlp-translate
  (text ... text)
  source
  target
  service
  options
)

Returns

(translation ... translation)

Where

text is string[1..N]: a text to translate.
source is string: the source language (ISO 639-1 code). Some translation services ignore this parameter as they detect the source language automatically.
target is string: the target language (ISO 639-1 code).
service is string: the name of the translation service (for example, "google").
options is list[0..1]: a list containing service specific parameters like credentials, operation mode, etc.
translation is (
"translated-text" => translated_text
"detected-language" => detected_language
)[1..N]: a list that contains the translated text.
translated_text is string: the translation of the corresponding source text.
detected_language is string[0..1]: the detected source language (ISO 639-1 code).

Description

Translates a list of strings to a target language using the specified translation service.

Examples

(nlp-translate
  ("Em dic Ricard." "Como te llamas?")
  null
  "en"
  "google"
  ("key" => "Asd4jgh3hd42hnf9_dgF23k625ghs")
)

(
  (
    "translated-text" => "My name is Ricard."
    "detected-language" => "ca"
  )
  (
    "translated-text" => "What is your name?"
    "detected-language" => "es"
  )
)

Nlp library

Index:

Functions:

nlp-lemmatize

nlp-parse

nlp-postag

nlp-sentences

nlp-tokenize

nlp-translate