tools/embeddings.lua

embeddings.lua options:

  • -h [] (default: false)
    This help.
  • -md [] (default: false)
    Dump help in Markdown format.
  • -config (default: '')
    Load options from this file.
  • -save_config (default: '')
    Save options to this file.

Data options

  • -dict_file (required)
    Path to outputted dict file from preprocess.lua.
  • -embed_file (default: '')
    Path to the embedding file. Ignored if -lang is used.
  • -save_data (required)
    Output file path/label.
  • -save_unknown_dict (default: '')
    Path to file for saving vocabs not found in embedding.

Embedding options

  • -lang (default: '')
    Wikipedia Language Code to autoload embeddings.
  • -embed_type (accepted: word2vec-bin, word2vec-txt, glove; default: word2vec-bin)
    Embeddings file origin. Ignored if -lang is used.
  • -normalize [] (default: true)
    Boolean to normalize the word vectors, or not.
  • -approximate [] (default: false)
    If set, will also look for variants (case, joiner annotate) to match dictionary and word embedding.
  • -report_every (default: 100000)
    Print stats every this many lines read from embedding file.

Logger options

  • -log_file (default: '')
    Output logs to a file under this path instead of stdout - if file name ending with json, output structure json.
  • -disable_logs [] (default: false)
    If set, output nothing.
  • -log_level (accepted: DEBUG, INFO, WARNING, ERROR, NOERROR; default: INFO)
    Output logs at this level and above.