标题	说明	时间
Transformer_implementation_and_application	完整复现了Transformer模型，并且应用在神经机器翻译任务和聊天机器人上。	持续更新
Tensor2Tensor	TensorFlow 库，非常系统的神经机器翻译讲解	持续更新
Neural Machine Translation (seq2seq) Tutorial	TensorFlow 库	持续更新
OpenNMT-tf	哈佛机器翻译库	持续更新
习翔宇	深度学习、传统机器学习、自然语言处理算法及实现
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)	可视化神经机器翻译模型，动态图和视频结合讲解，十分透彻

Neural Machine Translation (seq2seq) Tutorial

WMT English-German — Full Comparison

The first 2 rows are our models with GNMT
attention:
model 1 (4 layers),
model 2 (8 layers).

Systems	newstest2014	newstest2015
Ours — NMT + GNMT attention (4 layers)	23.7	26.5
Ours — NMT + GNMT attention (8 layers)	24.4	27.6
WMT SOTA	20.6	24.9
OpenNMT (Klein et al., 2017)	19.3	-
tf-seq2seq (Britz et al., 2017)	22.2	25.2
GNMT (Wu et al., 2016)	24.6	-

The above results show our models are very competitive among models of similar architectures.\
[Note that OpenNMT uses smaller models and the current best result (as of this writing) is 28.4 obtained by the Transformer network (Vaswani et al., 2017) which has a significantly different architecture.]

Other details for better NMT models

Bidirectional RNNs

Bidirectionality on the encoder side generally gives better performance (with
some degradation in speed as more layers are used). Here, we give a simplified
example of how to build an encoder with a single bidirectional layer:

# Construct forward and backward cells
forward_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
backward_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

bi_outputs, encoder_state = tf.nn.bidirectional_dynamic_rnn(
    forward_cell, backward_cell, encoder_emb_inp,
    sequence_length=source_sequence_length, time_major=True)
encoder_outputs = tf.concat(bi_outputs, -1)

The variables encoder_outputs and encoder_state can be used in the same way
as in Section Encoder. Note that, for multiple bidirectional layers, we need to
manipulate the encoder_state a bit, see model.py, method
_build_bidirectional_rnn() for more details.

Beam search

While greedy decoding can give us quite reasonable translation quality, a beam
search decoder can further boost performance. The idea of beam search is to
better explore the search space of all possible translations by keeping around a
small set of top candidates as we translate. The size of the beam is called
beam width; a minimal beam width of, say size 10, is generally sufficient. For
more information, we refer readers to Section 7.2.3
of Neubig, (2017). Here’s an example of how
beam search can be done:

OpenNMT-tf

OpenNMT-tf is a general purpose sequence learning toolkit using TensorFlow. While neural machine translation is the main target task, it has been designed to more generally support:

sequence to sequence mapping
sequence tagging
sequence classification

The project is production-oriented and comes with stability guarantees.

Key features

OpenNMT-tf focuses on modularity to support advanced modeling and training capabilities:

arbitrarily complex encoder architectures
e.g. mixing RNNs, CNNs, self-attention, etc. in parallel or in sequence.
hybrid encoder-decoder models
e.g. self-attention encoder and RNN decoder or vice versa.
neural source-target alignment
train with guided alignment to constrain attention vectors and output alignments as part of the translation API.
multi-source training
e.g. source text and Moses translation as inputs for machine translation.
multiple input format
text with support of mixed word/character embeddings or real vectors serialized in TFRecord files.
on-the-fly tokenization
apply advanced tokenization dynamically during the training and detokenize the predictions during inference or evaluation.
domain adaptation
specialize a model to a new domain in a few training steps by updating the word vocabularies in checkpoints.
automatic evaluation
support for saving evaluation predictions and running external evaluators (e.g. BLEU).
mixed precision training
take advantage of the latest NVIDIA optimizations to train models with half-precision floating points.

and all of the above can be used simultaneously to train novel and complex architectures. See the predefined models to discover how they are defined and the API documentation to customize them.

人工智能

神经机器翻译 | Neural Machine Translation