标题 | 说明 | 时间 |
---|---|---|
Transformer_implementation_and_application | 完整复现了Transformer模型,并且应用在神经机器翻译任务和聊天机器人上。 | 持续更新 |
Tensor2Tensor | TensorFlow 库,非常系统的神经机器翻译讲解 | 持续更新 |
Neural Machine Translation (seq2seq) Tutorial | TensorFlow 库 | 持续更新 |
OpenNMT-tf | 哈佛机器翻译库 | 持续更新 |
习翔宇 | 深度学习、传统机器学习、自然语言处理算法及实现 | |
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) | 可视化神经机器翻译模型,动态图和视频结合讲解,十分透彻 |
Neural Machine Translation (seq2seq) Tutorial
WMT English-German — Full Comparison
The first 2 rows are our models with GNMT
attention:
model 1 (4 layers),
model 2 (8 layers).
Systems | newstest2014 | newstest2015 |
---|---|---|
Ours — NMT + GNMT attention (4 layers) | 23.7 | 26.5 |
Ours — NMT + GNMT attention (8 layers) | 24.4 | 27.6 |
WMT SOTA | 20.6 | 24.9 |
OpenNMT (Klein et al., 2017) | 19.3 | - |
tf-seq2seq (Britz et al., 2017) | 22.2 | 25.2 |
GNMT (Wu et al., 2016) | 24.6 | - |
The above results show our models are very competitive among models of similar architectures.\
[Note that OpenNMT uses smaller models and the current best result (as of this writing) is 28.4 obtained by the Transformer network (Vaswani et al., 2017) which has a significantly different architecture.]
Other details for better NMT models
Bidirectional RNNs
Bidirectionality on the encoder side generally gives better performance (with
some degradation in speed as more layers are used). Here, we give a simplified
example of how to build an encoder with a single bidirectional layer:
1 | # Construct forward and backward cells |
The variables encoder_outputs and encoder_state can be used in the same way
as in Section Encoder. Note that, for multiple bidirectional layers, we need to
manipulate the encoder_state a bit, see model.py, method
_build_bidirectional_rnn() for more details.
Beam search
While greedy decoding can give us quite reasonable translation quality, a beam
search decoder can further boost performance. The idea of beam search is to
better explore the search space of all possible translations by keeping around a
small set of top candidates as we translate. The size of the beam is called
beam width; a minimal beam width of, say size 10, is generally sufficient. For
more information, we refer readers to Section 7.2.3
of Neubig, (2017). Here’s an example of how
beam search can be done:
OpenNMT-tf
OpenNMT-tf is a general purpose sequence learning toolkit using TensorFlow. While neural machine translation is the main target task, it has been designed to more generally support:
- sequence to sequence mapping
- sequence tagging
- sequence classification
The project is production-oriented and comes with stability guarantees.
Key features
OpenNMT-tf focuses on modularity to support advanced modeling and training capabilities:
- arbitrarily complex encoder architectures
e.g. mixing RNNs, CNNs, self-attention, etc. in parallel or in sequence. - hybrid encoder-decoder models
e.g. self-attention encoder and RNN decoder or vice versa. - neural source-target alignment
train with guided alignment to constrain attention vectors and output alignments as part of the translation API. - multi-source training
e.g. source text and Moses translation as inputs for machine translation. - multiple input format
text with support of mixed word/character embeddings or real vectors serialized in TFRecord files. - on-the-fly tokenization
apply advanced tokenization dynamically during the training and detokenize the predictions during inference or evaluation. - domain adaptation
specialize a model to a new domain in a few training steps by updating the word vocabularies in checkpoints. - automatic evaluation
support for saving evaluation predictions and running external evaluators (e.g. BLEU). - mixed precision training
take advantage of the latest NVIDIA optimizations to train models with half-precision floating points.
and all of the above can be used simultaneously to train novel and complex architectures. See the predefined models to discover how they are defined and the API documentation to customize them.