|Neural Machine Translation (seq2seq) Tutorial||TensorFlow 库||持续更新|
|Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)||可视化神经机器翻译模型，动态图和视频结合讲解，十分透彻|
|Ours — NMT + GNMT attention (4 layers)||23.7||26.5|
|Ours — NMT + GNMT attention (8 layers)||24.4||27.6|
|OpenNMT (Klein et al., 2017)||19.3||-|
|tf-seq2seq (Britz et al., 2017)||22.2||25.2|
|GNMT (Wu et al., 2016)||24.6||-|
The above results show our models are very competitive among models of similar architectures.\
[Note that OpenNMT uses smaller models and the current best result (as of this writing) is 28.4 obtained by the Transformer network (Vaswani et al., 2017) which has a significantly different architecture.]
Bidirectionality on the encoder side generally gives better performance (with
some degradation in speed as more layers are used). Here, we give a simplified
example of how to build an encoder with a single bidirectional layer:
# Construct forward and backward cells
The variables encoder_outputs and encoder_state can be used in the same way
as in Section Encoder. Note that, for multiple bidirectional layers, we need to
manipulate the encoder_state a bit, see model.py, method
_build_bidirectional_rnn() for more details.
While greedy decoding can give us quite reasonable translation quality, a beam
search decoder can further boost performance. The idea of beam search is to
better explore the search space of all possible translations by keeping around a
small set of top candidates as we translate. The size of the beam is called
beam width; a minimal beam width of, say size 10, is generally sufficient. For
more information, we refer readers to Section 7.2.3
of Neubig, (2017). Here’s an example of how
beam search can be done:
OpenNMT-tf is a general purpose sequence learning toolkit using TensorFlow. While neural machine translation is the main target task, it has been designed to more generally support:
- sequence to sequence mapping
- sequence tagging
- sequence classification
The project is production-oriented and comes with stability guarantees.
OpenNMT-tf focuses on modularity to support advanced modeling and training capabilities:
- arbitrarily complex encoder architectures
e.g. mixing RNNs, CNNs, self-attention, etc. in parallel or in sequence.
- hybrid encoder-decoder models
e.g. self-attention encoder and RNN decoder or vice versa.
- neural source-target alignment
train with guided alignment to constrain attention vectors and output alignments as part of the translation API.
- multi-source training
e.g. source text and Moses translation as inputs for machine translation.
- multiple input format
text with support of mixed word/character embeddings or real vectors serialized in TFRecord files.
- on-the-fly tokenization
apply advanced tokenization dynamically during the training and detokenize the predictions during inference or evaluation.
- domain adaptation
specialize a model to a new domain in a few training steps by updating the word vocabularies in checkpoints.
- automatic evaluation
support for saving evaluation predictions and running external evaluators (e.g. BLEU).
- mixed precision training
take advantage of the latest NVIDIA optimizations to train models with half-precision floating points.