0%

神经机器翻译 | Neural Machine Translation

标题 说明 时间
Transformer_implementation_and_application 完整复现了Transformer模型,并且应用在神经机器翻译任务和聊天机器人上。 持续更新
Tensor2Tensor TensorFlow 库,非常系统的神经机器翻译讲解 持续更新
Neural Machine Translation (seq2seq) Tutorial TensorFlow 库 持续更新
OpenNMT-tf 哈佛机器翻译库 持续更新
习翔宇 深度学习、传统机器学习、自然语言处理算法及实现
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) 可视化神经机器翻译模型,动态图和视频结合讲解,十分透彻

Neural Machine Translation (seq2seq) Tutorial

WMT English-German — Full Comparison

The first 2 rows are our models with GNMT
attention:
model 1 (4 layers),
model 2 (8 layers).

Systems newstest2014 newstest2015
Ours — NMT + GNMT attention (4 layers) 23.7 26.5
Ours — NMT + GNMT attention (8 layers) 24.4 27.6
WMT SOTA 20.6 24.9
OpenNMT (Klein et al., 2017) 19.3 -
tf-seq2seq (Britz et al., 2017) 22.2 25.2
GNMT (Wu et al., 2016) 24.6 -

The above results show our models are very competitive among models of similar architectures.\
[Note that OpenNMT uses smaller models and the current best result (as of this writing) is 28.4 obtained by the Transformer network (Vaswani et al., 2017) which has a significantly different architecture.]

Other details for better NMT models

Bidirectional RNNs

Bidirectionality on the encoder side generally gives better performance (with
some degradation in speed as more layers are used). Here, we give a simplified
example of how to build an encoder with a single bidirectional layer:

1
2
3
4
5
6
7
8
# Construct forward and backward cells
forward_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
backward_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

bi_outputs, encoder_state = tf.nn.bidirectional_dynamic_rnn(
forward_cell, backward_cell, encoder_emb_inp,
sequence_length=source_sequence_length, time_major=True)
encoder_outputs = tf.concat(bi_outputs, -1)

The variables encoder_outputs and encoder_state can be used in the same way
as in Section Encoder. Note that, for multiple bidirectional layers, we need to
manipulate the encoder_state a bit, see model.py, method
_build_bidirectional_rnn() for more details.

While greedy decoding can give us quite reasonable translation quality, a beam
search decoder can further boost performance. The idea of beam search is to
better explore the search space of all possible translations by keeping around a
small set of top candidates as we translate. The size of the beam is called
beam width; a minimal beam width of, say size 10, is generally sufficient. For
more information, we refer readers to Section 7.2.3
of Neubig, (2017). Here’s an example of how
beam search can be done:

OpenNMT-tf

OpenNMT-tf is a general purpose sequence learning toolkit using TensorFlow. While neural machine translation is the main target task, it has been designed to more generally support:

  • sequence to sequence mapping
  • sequence tagging
  • sequence classification

The project is production-oriented and comes with stability guarantees.

Key features

OpenNMT-tf focuses on modularity to support advanced modeling and training capabilities:

  • arbitrarily complex encoder architectures
    e.g. mixing RNNs, CNNs, self-attention, etc. in parallel or in sequence.
  • hybrid encoder-decoder models
    e.g. self-attention encoder and RNN decoder or vice versa.
  • neural source-target alignment
    train with guided alignment to constrain attention vectors and output alignments as part of the translation API.
  • multi-source training
    e.g. source text and Moses translation as inputs for machine translation.
  • multiple input format
    text with support of mixed word/character embeddings or real vectors serialized in TFRecord files.
  • on-the-fly tokenization
    apply advanced tokenization dynamically during the training and detokenize the predictions during inference or evaluation.
  • domain adaptation
    specialize a model to a new domain in a few training steps by updating the word vocabularies in checkpoints.
  • automatic evaluation
    support for saving evaluation predictions and running external evaluators (e.g. BLEU).
  • mixed precision training
    take advantage of the latest NVIDIA optimizations to train models with half-precision floating points.

and all of the above can be used simultaneously to train novel and complex architectures. See the predefined models to discover how they are defined and the API documentation to customize them.

本站所有文章和源码均免费开放,如您喜欢,可以请我喝杯咖啡