名称	网址
Keras GitHub	https://github.com/keras-team/keras
Keras 英文官网	https://keras.io/
Keras 中文官网	https://keras.io/zh/
Keras 中文	https://keras-cn.readthedocs.io/en/latest/

Keras Example 实验

Vision models examples

mnist_mlp.py

Trains a simple deep multi-layer perceptron on the MNIST dataset.

网络结构

实验结果

Gets to 98.43% test accuracy after 20 epochs

mnist_cnn.py

Trains a simple convnet on the MNIST dataset.

网络结构

实验结果

Gets to 99.19% test accuracy after 12 epochs

cifar10_cnn.py

Trains a simple deep CNN on the CIFAR10 small images dataset.

It gets to 75% validation accuracy in 25 epochs, and 79% after 50 epochs.
(it’s still underfitting at that point, though).

网络结构

实验结果

cifar10_cnn_capsule.py

Trains a simple CNN-Capsule Network on the CIFAR10 small images dataset.

Capsule Implement is from https://github.com/bojone/Capsule/
Capsule Paper: https://arxiv.org/abs/1710.09829

“””Train a simple CNN-Capsule Network on the CIFAR10 small images dataset.

Without Data Augmentation:
It gets to 75% validation accuracy in 10 epochs,
and 79% after 15 epochs, and overfitting after 20 epochs

With Data Augmentation:
It gets to 75% validation accuracy in 10 epochs,
and 79% after 15 epochs, and 83% after 30 epochs.
In my test, highest validation accuracy is 83.79% after 50 epochs.

This is a fast Implement, just 20s/epoch with a gtx 1070 gpu.
“””

网络结构

实验结果

cifar10_resnet.py

Trains a ResNet on the CIFAR10 small images dataset.

“””Trains a ResNet on the CIFAR10 dataset.

ResNet v1
[a] Deep Residual Learning for Image Recognition
https://arxiv.org/pdf/1512.03385.pdf

ResNet v2
[b] Identity Mappings in Deep Residual Networks
https://arxiv.org/pdf/1603.05027.pdf
“””

# Model parameter
# ----------------------------------------------------------------------------
#           |      | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model     |  n   | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
#           |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20  | 3 (2)| 92.16     | 91.25     | -----     | -----     | 35 (---)
# ResNet32  | 5(NA)| 92.46     | 92.49     | NA        | NA        | 50 ( NA)
# ResNet44  | 7(NA)| 92.50     | 92.83     | NA        | NA        | 70 ( NA)
# ResNet56  | 9 (6)| 92.71     | 93.03     | 93.01     | NA        | 90 (100)
# ResNet110 |18(12)| 92.65     | 93.39+-.16| 93.15     | 93.63     | 165(180)
# ResNet164 |27(18)| -----     | 94.07     | -----     | 94.54     | ---(---)
# ResNet1001| (111)| -----     | 92.39     | -----     | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------

网络结构

实验结果

conv_lstm.py

Demonstrates the use of a convolutional LSTM network.

网络结构

实验结果

epochs=100 - loss: 3.8720e-05 - val_loss: 1.0020e-04

image_ocr.py
Trains a convolutional stack followed by a recurrent stack and a CTC logloss function to perform optical character recognition (OCR).

mnist_acgan.py

Implementation of AC-GAN (Auxiliary Classifier GAN) on the MNIST dataset

"""
Train an Auxiliary Classifier Generative Adversarial Network (ACGAN) on the
MNIST dataset. See https://arxiv.org/abs/1610.09585 for more details.

You should start to see reasonable images after ~5 epochs, and good images
by ~15 epochs. You should use a GPU, as the convolution-heavy operations are
very slow on the CPU. Prefer the TensorFlow backend if you plan on iterating,
as the compilation time can be a blocker using Theano.

Timings:

Hardware           | Backend | Time / Epoch
-------------------------------------------
 CPU               | TF      | 3 hrs
 Titan X (maxwell) | TF      | 4 min
 Titan X (maxwell) | TH      | 7 min

Consult https://github.com/lukedeo/keras-acgan for more information and
example output
"""
# Adam parameters suggested in
https://arxiv.org/abs/1511.06434
adam_lr = 0.0002
adam_beta_1 = 0.5

网络结构

mnist_acgan_generator

mnist_acgan_discriminator

mnist_acgan

实验结果

Testing for epoch 100:
component              | loss | generation_loss | auxiliary_loss
-----------------------------------------------------------------
generator (train)      | 0.77 | 0.7664          | 0.0005
generator (test)       | 0.81 | 0.8105          | 0.0000
discriminator (train)  | 0.71 | 0.6947          | 0.0131
discriminator (test)   | 0.71 | 0.7032          | 0.0110

周期1

周期10

周期30

周期100

mnist_hierarchical_rnn.py

Trains a Hierarchical RNN (HRNN) to classify MNIST digits.

网络结构

实验结果

mnist_siamese.py
Trains a Siamese multi-layer perceptron on pairs of digits from the MNIST dataset.

mnist_swwae.py

Trains a Stacked What-Where AutoEncoder built on residual blocks on the MNIST dataset.

'''Trains a stacked what-where autoencoder built on residual blocks on the
MNIST dataset. It exemplifies two influential methods that have been developed
in the past few years.

The first is the idea of properly 'unpooling.' During any max pool, the
exact location (the 'where') of the maximal value in a pooled receptive field
is lost, however it can be very useful in the overall reconstruction of an
input image. Therefore, if the 'where' is handed from the encoder
to the corresponding decoder layer, features being decoded can be 'placed' in
the right location, allowing for reconstructions of much higher fidelity.

# References

- Visualizing and Understanding Convolutional Networks
  Matthew D Zeiler, Rob Fergus
  https://arxiv.org/abs/1311.2901v3
- Stacked What-Where Auto-encoders
  Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun
  https://arxiv.org/abs/1506.02351v8

The second idea exploited here is that of residual learning. Residual blocks
ease the training process by allowing skip connections that give the network
the ability to be as linear (or non-linear) as the data sees fit.  This allows
for much deep networks to be easily trained. The residual element seems to
be advantageous in the context of this example as it allows a nice symmetry
between the encoder and decoder. Normally, in the decoder, the final
projection to the space where the image is reconstructed is linear, however
this does not have to be the case for a residual block as the degree to which
its output is linear or non-linear is determined by the data it is fed.
However, in order to cap the reconstruction in this example, a hard softmax is
applied as a bias because we know the MNIST digits are mapped to [0, 1].

# References
- Deep Residual Learning for Image Recognition
  Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
  https://arxiv.org/abs/1512.03385v1
- Identity Mappings in Deep Residual Networks
  Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
  https://arxiv.org/abs/1603.05027v3
'''

网络结构

实验结果

1	epochs=5 loss: 0.0023 - val_loss: 0.0022

mnist_transfer_cnn.py
Transfer learning toy example on the MNIST dataset.

‘’’Transfer learning toy example.
1 - Train a simple convnet on the MNIST dataset the first 5 digits [0..4].
2 - Freeze convolutional layers and fine-tune dense layers
for the classification of digits [5..9].
Get to 99.8% test accuracy after 5 epochs
for the first five digits classifier
and 99.2% for the last five digits after transfer + fine-tuning.
‘’’

网络结构

实验结果

橙色的线是迁移前模型的数据，蓝色的线是迁移后模型的数据。

mnist_denoising_autoencoder.py

Trains a denoising autoencoder on the MNIST dataset.

网络结构

encoder

decoder

autoencoder

实验结果

模型去噪声结果

Text & sequences examples

addition_rnn.py

Implementation of sequence to sequence learning for performing addition of two numbers (as strings).

'''An implementation of sequence to sequence learning for performing addition

Input: "535+61"
Output: "596"
Padding is handled by using a repeated sentinel character (space)

Input may optionally be reversed, shown to increase performance in many tasks in:
"Learning to Execute"
http://arxiv.org/abs/1410.4615
and
"Sequence to Sequence Learning with Neural Networks"
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
Theoretically it introduces shorter term dependencies between source and target.

Two digits reversed:
+ One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs

Three digits reversed:
+ One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy in 100 epochs

Four digits reversed:
+ One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy in 20 epochs

Five digits reversed:
+ One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in 30 epochs
'''

网络结构

实验结果

epochs = 200 时的结果

loss: 1.2168e-04 - acc: 1.0000 - val_loss: 0.0011 - val_acc: 0.9997
Q 6+909   T 915  ☑ 915
Q 128+263 T 391  ☑ 391
Q 104+0   T 104  ☑ 104
Q 63+352  T 415  ☑ 415
Q 624+8   T 632  ☑ 632
Q 31+251  T 282  ☑ 282
Q 758+445 T 1203 ☑ 1203
Q 88+534  T 622  ☑ 622
Q 315+624 T 939  ☑ 939
Q 81+459  T 540  ☑ 540

babi_rnn.py

Trains a two-branch recurrent network on the bAbI dataset for reading comprehension.

'''Trains two recurrent neural networks based upon a story and a question.

The resulting merged vector is then queried to answer a range of bAbI tasks.

The results are comparable to those for an LSTM model provided in Weston et al.:
"Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks"
http://arxiv.org/abs/1502.05698

Task Number                  | FB LSTM Baseline | Keras QA
---                          | ---              | ---
QA1 - Single Supporting Fact | 50               | 100.0
QA2 - Two Supporting Facts   | 20               | 50.0
QA3 - Three Supporting Facts | 20               | 20.5
QA4 - Two Arg. Relations     | 61               | 62.9
QA5 - Three Arg. Relations   | 70               | 61.9
QA6 - yes/No Questions       | 48               | 50.7
QA7 - Counting               | 49               | 78.9
QA8 - Lists/Sets             | 45               | 77.2
QA9 - Simple Negation        | 64               | 64.0
QA10 - Indefinite Knowledge  | 44               | 47.7
QA11 - Basic Coreference     | 72               | 74.9
QA12 - Conjunction           | 74               | 76.4
QA13 - Compound Coreference  | 94               | 94.4
QA14 - Time Reasoning        | 27               | 34.8
QA15 - Basic Deduction       | 21               | 32.4
QA16 - Basic Induction       | 23               | 50.6
QA17 - Positional Reasoning  | 51               | 49.1
QA18 - Size Reasoning        | 52               | 90.8
QA19 - Path Finding          | 8                | 9.0
QA20 - Agent's Motivations   | 91               | 90.7

For the resources related to the bAbI project, refer to:
https://research.facebook.com/researchers/1543934539189348

# Notes

- With default word, sentence, and query vector sizes, the GRU model achieves:
  - 100% test accuracy on QA1 in 20 epochs (2 seconds per epoch on CPU)
  - 50% test accuracy on QA2 in 20 epochs (16 seconds per epoch on CPU)
In comparison, the Facebook paper achieves 50% and 20% for the LSTM baseline.

- The task does not traditionally parse the question separately. This likely
improves accuracy and is a good example of merging two RNNs.

- The word vector embeddings are not shared between the story and question RNNs.

- See how the accuracy changes given 10,000 training samples (en-10k) instead
of only 1000. 1000 was used in order to be comparable to the original paper.

- Experiment with GRU, LSTM, and JZS1-3 as they give subtly different results.

- The length and noise (i.e. 'useless' story components) impact the ability for
LSTMs / GRUs to provide the correct answer. Given only the supporting facts,
these RNNs can achieve 100% accuracy on many tasks. Memory networks and neural
networks that use attentional processes can efficiently search through this
noise to find the relevant statements, improving performance substantially.
This becomes especially obvious on QA2 and QA3, both far longer than QA1.
'''

网络结构

实验结果

仅仅把 only_supporting=False 改为 Ture 就可以显著的提高模型准确率和训练速度。原因可能是去除了与问题无关的信息，缩小了答案搜索空间（由x(None, 552) 变成了 (None, 14) ）。这也说明目前，神经网络从大段阅读理解文章段里面找答案还是比较困难的，由几句话中找答案还是还是比较准确的。效果如下：

1
2
3

with tarfile.open(path) as tar:
    train = get_stories(tar.extractfile(challenge.format('train')), only_supporting=True)
    test = get_stories(tar.extractfile(challenge.format('test')), only_supporting=True)

babi_rnn_v2.py

Trains a two-branch recurrent network on the bAbI dataset for reading comprehension.

'''
change:     merged = layers.add([encoded_sentence, encoded_question])
to:         merged = layers.multiply([encoded_sentence, encoded_question])

将 encoded_sentence 句子张量与 encoded_question 问题张量由通过加法融合变成乘积融合。
模型 v2 在训练集中的表现优于 v1，在测试集同样（未过拟合前）优于 v1，但过拟合后差于 v1。
'''

网络结构

实验结果

橙色的表示 v1，分红表示 v2。

babi_memnn.py

Trains a memory network on the bAbI dataset for reading comprehension.

'''Trains a memory network on the bAbI dataset.

References:

- Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, Alexander M. Rush,
  "Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks",
  http://arxiv.org/abs/1502.05698

- Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus,
  "End-To-End Memory Networks",
  http://arxiv.org/abs/1503.08895

Reaches 98.6% accuracy on task 'single_supporting_fact_10k' after 120 epochs.
Time per epoch: 3s on CPU (core i7).
'''

网络结构

实验结果

babi_rnn_vs_memnn_on_10k_qa1_single-supporting-fact

babi_rnn.py 和 babi_memnn.py 在 Facebook bAbi 阅读理解数据集 qa1_single-supporting-fact 上的对比实验。

网络结构

见 babi_rnn.py 和 babi_memnn.py 的网络结构

实验结果

可见 babi_rnn.py 比 babi_memnn.py 不仅模型更简单，而且训练更快、效果更好。

imdb_bidirectional_lstm.py

Trains a Bidirectional LSTM on the IMDB sentiment classification task.

网络结构

实验结果

imdb_cnn.py

Demonstrates the use of Convolution1D for text classification.

网络结构

实验结果

epochs=10
loss: 0.0293 - acc: 0.9894 - val_loss: 0.5514 - val_acc: 0.8855

imdb_cnn_lstm.py

Trains a convolutional stack followed by a recurrent stack network on the IMDB sentiment classification task.

网络结构

实验结果

epochs=10
Test score: 0.8979291968528181
Test accuracy: 0.825119995546341

imdb_fasttext.py

Trains a FastText model on the IMDB sentiment classification task.

'''This example demonstrates the use of fasttext for text classification

Based on Joulin et al's paper:

Bags of Tricks for Efficient Text Classification
https://arxiv.org/abs/1607.01759

Results on IMDB datasets with uni and bi-gram embeddings:
    Uni-gram: 0.8813 test accuracy after 5 epochs. 8s/epoch on i7 cpu.
    Bi-gram : 0.9056 test accuracy after 5 epochs. 2s/epoch on GTx 980M gpu.
'''

网络结构

实验结果

imdb_lstm.py

Trains an LSTM model on the IMDB sentiment classification task.

'''Trains an LSTM model on the IMDB sentiment classification task.

The dataset is actually too small for LSTM to be of any advantage
compared to simpler, much faster methods such as TF-IDF + LogReg.

# Notes

- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.

- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.
'''

网络结构

实验结果

epochs=15
Test accuracy: 0.81312

lstm_stateful.py
Demonstrates how to use stateful RNNs to model long sequences efficiently.

网络结构

实验结果

lstm_seq2seq.py

Trains a basic character-level sequence-to-sequence model.

'''Sequence to sequence example in Keras (character-level).

This script demonstrates how to implement a basic character-level
sequence-to-sequence model. We apply it to translating
short English sentences into short French sentences,
character-by-character. Note that it is fairly unusual to
do character-level machine translation, as word-level
models are more common in this domain.

# Summary of the algorithm

- We start with input sequences from a domain (e.g. English sentences)
    and corresponding target sequences from another domain
    (e.g. French sentences).
- An encoder LSTM turns input sequences to 2 state vectors
    (we keep the last LSTM state and discard the outputs).
- A decoder LSTM is trained to turn the target sequences into
    the same sequence but offset by one timestep in the future,
    a training process called "teacher forcing" in this context.
    Is uses as initial state the state vectors from the encoder.
    Effectively, the decoder learns to generate `targets[t+1...]`
    given `targets[...t]`, conditioned on the input sequence.
- In inference mode, when we want to decode unknown input sequences, we:
    - Encode the input sequence into state vectors
    - Start with a target sequence of size 1
        (just the start-of-sequence character)
    - Feed the state vectors and 1-char target sequence
        to the decoder to produce predictions for the next character
    - Sample the next character using these predictions
        (we simply use argmax).
    - Append the sampled character to the target sequence
    - Repeat until we generate the end-of-sequence character or we
        hit the character limit.

# Data download

English to French sentence pairs.
http://www.manythings.org/anki/fra-eng.zip

Lots of neat sentence pairs datasets can be found at:
http://www.manythings.org/anki/

# References

- Sequence to Sequence Learning with Neural Networks
    https://arxiv.org/abs/1409.3215
- Learning Phrase Representations using
    RNN Encoder-Decoder for Statistical Machine Translation
    https://arxiv.org/abs/1406.1078
'''

网络结构

编码器

解码器

实验结果

epochs=100
loss: 0.0602 - val_loss: 0.7592

-
Input sentence: Come in.
Decoded sentence: Entrez !
-
Input sentence: Come on!
Decoded sentence: Viens !
-
Input sentence: Drop it!
Decoded sentence: Laisse tomber !

lstm_seq2seq_restore.py

Restores a character-level sequence to sequence model from disk (saved by lstm_seq2seq.py) and uses it to generate predictions.

网络结构

同 lstm_seq2seq

实验结果

Input sentence: Fire!
Decoded sentence: Au feu !

Input sentence: Help!
Decoded sentence: À l'aide !

Input sentence: Jump.
Decoded sentence: Saute.

pretrained_word_embeddings.py

Loads pre-trained word embeddings (GloVe embeddings) into a frozen Keras Embedding layer, and uses it to train a text classification model on the 20 Newsgroup dataset.

网络结构

实验结果

epochs=30
loss: 0.1113 - acc: 0.9586 - val_loss: 1.6334 - val_acc: 0.7179

reuters_mlp.py

Trains and evaluate a simple MLP on the Reuters newswire topic classification task.

网络结构

实验结果

reuters_mlp_relu_vs_selu

网络结构

Network 1 Relu
Hyperparameters: {'n_dense': 6, 'dense_units': 16, 'activation': 'relu', 'dropout': <class 'keras.layers.core.Dropout'>, 'dropout_rate': 0.5, 'kernel_initializer': 'glorot_uniform', 'optimizer': 'sgd'}

Network 2 Selu
Hyperparameters: {'n_dense': 6, 'dense_units': 16, 'activation': 'selu', 'dropout': <class 'keras.layers.noise.AlphaDropout'>, 'dropout_rate': 0.1, 'kernel_initializer': 'lecun_normal', 'optimizer': 'sgd'}

实验结果

Network 1 Relu
Test score: 1.9769948146646827
Test accuracy: 0.5204808548796103
Network 2 Selu
Test score: 1.530816549927872
Test accuracy: 0.6714158504007124

Generative models examples

lstm_text_generation.py

Generates text from Nietzsche’s writings.

'''Example script to generate text from Nietzsche's writings.

At least 20 epochs are required before the generated text
starts sounding coherent.

It is recommended to run this script on GPU, as recurrent
networks are quite computationally intensive.

If you try this script on new data, make sure your corpus
has at least ~100k characters. ~1M is better.
'''

网络结构

实验结果

----- Generating text after Epoch: 99
----- diversity: 0.2
----- Generating with seed: " of his tolerance and humanity which pro"
 of his tolerance and humanity which proe ee e e  e ee e ee et e ee te ee e"     !  esov ehe eeted en   enete ee e  e ee  q   te e t  eeqo a   x  een   x  qe n et"    ! ex if i    tewn  x .o e v  -  '   e  !   q   jj  j  s t  qhi   xh x ! tetsa  k ewetce  e e of eeh e  e et i e ee e x  ee eetk(  e e n e    x xo =        x ! xvekne   x    e )   x  qnb z   ehe   e n   x )edxd  xx  x    xi  x  x tcnwet x-    'e o q e  te  eto e  x    eti '
----- diversity: 0.5
----- Generating with seed: " of his tolerance and humanity which pro"
 of his tolerance and humanity which pro ee e e ee ee ee      e beeee ee  e  ' xx     z e   e '  en  qengg x       x evo-n x   !   xn  'o    xb i it k   x  n x xbe a  wee   x tix    e i t q etvece    e  etoe o z eet    i  q h wete eo  xe'e  e  egovee    ! e eese      e oe  xe x  e   e zd  ti     q n x  ni  tqj x a n zxb x x "  x we n  n  x  j   e   et  z   v   !  xj    an ee   xq "  q styiete   nxe!    x   et     ! i qt ta        xbn tx
----- diversity: 1.0
----- Generating with seed: " of his tolerance and humanity which pro"
 of his tolerance and humanity which proe eee ee  eeeeee  et eeeteeei ee (eeiiey qqe i   n  xxee x  ue e jni " x  xb    z n   n e in   n e'   ve te j   e zw eeq(se     t  x      q   xve im:t x xxb tek x  ehed te  e't  xheefe e-e xn  e ebe     eey zn xeg:ti xbs   xvfete x  !o ee-   e e e o  e we  e    ese  eet -e     oee e   x   (xb e ee necf e j e   e     (      ee  je ie este)n ax    q    n .  xjf    z      xi o t  xxv      x xnocese i
----- diversity: 1.2
----- Generating with seed: " of his tolerance and humanity which pro"
 of his tolerance and humanity which pro eeeee  eee e e  e tetee" e  eteeee-e   i eex  o   xhe ee  x  e on e te"(  t     x   xg   e      i hek.tni   etf xiecnne  evet    ecewe e e  o ec ? e;e ,ee e e ee e;skyn  xe e e x     xte he et i      q    x  jw w   xn  xn e x     '    q= t o n nex e =tho xenwei  ao? x  zn  x  evq     ety   q   x    x  et    e es be d x     xq   ie  n     xetzo ke    q y etx   xt xsn     inn eti e   x   ei eq  t

conv_filter_visualization.py

Visualization of the filters of VGG16, via gradient ascent in input space.

Results example: http://i.imgur.com/4nj4KjN.jpg

网络结构

实验结果

deep_dream.py

Deep Dreams in Keras.

网络结构

实验结果

neural_doodle.py

Neural doodle.

网络结构

实验结果

neural_style_transfer.py

Neural style transfer.

网络结构

实验结果

variational_autoencoder.py

Demonstrates how to build a variational autoencoder.

'''Example of VAE on MNIST dataset using MLP

The VAE has a modular design. The encoder, decoder and VAE
are 3 models that share weights. After training the VAE model,
the encoder can be used to  generate latent vectors.
The decoder can be used to generate MNIST digits by sampling the
latent vector from a Gaussian distribution with mean=0 and std=1.

**Reference：**
[1] Kingma, Diederik P., and Max Welling.
"Auto-encoding variational bayes."
https://arxiv.org/abs/1312.6114
'''

网络结构

encoder

decoder

autoencoder

实验结果

模型生成手写数字的效果 digits_over_latent

模型 2 维隐藏变量的均值（z_mean）可视化后的效果

variational_autoencoder_deconv.py

Demonstrates how to build a variational autoencoder with Keras using deconvolution layers.

‘’’Example of VAE on MNIST dataset using CNN
The VAE has a modular design. The encoder, decoder and VAE
are 3 models that share weights. After training the VAE model,
the encoder can be used to generate latent vectors.
The decoder can be used to generate MNIST digits by sampling the
latent vector from a Gaussian distribution with mean=0 and std=1.

Reference
[1] Kingma, Diederik P., and Max Welling.
“Auto-encoding variational bayes.”
https://arxiv.org/abs/1312.6114
‘’’

网络结构

encoder

decoder

autoencoder

实验结果

模型生成手写数字的效果

模型 2 维隐藏变量的均值（z_mean）可视化后的效果

Examples demonstrating specific Keras functionality

antirectifier.py

Demonstrates how to write custom layers for Keras.

'''The example demonstrates how to write custom layers for Keras.

We build a custom activation layer called 'Antirectifier',
which modifies the shape of the tensor that passes through it.
We need to specify two methods: `compute_output_shape` and `call`.

Note that the same result can also be achieved via a Lambda layer.

Because our custom layer is written with primitives from the Keras
backend (`K`), our code can run both on TensorFlow and Theano.
'''

网络结构

实验结果

1 2	epochs = 40 acc: 0.9985 - val_loss: 0.0901 - val_acc: 0.9809

灰色是 antirectifier，蓝色是 mnist_cnn，橙色是 mnist_mpl。

mnist_sklearn_wrapper.py

Demonstrates how to use the sklearn wrapper.

'''Example of how to use sklearn wrapper

Builds simple CNN models on MNIST and uses sklearn's GridSearchCV to find best model
'''

网络结构

实验结果

The parameters of the best model are:
{'dense_layer_sizes': [64, 64], 'epochs': 6, 'filters': 8, 'kernel_size': 3, 'pool_size': 2}
loss :  0.042469549842912235
acc :  0.9872

mnist_irnn.py
Reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units” by Le et al.

mnist_net2net.py
Reproduction of the Net2Net experiment with MNIST in “Net2Net: Accelerating Learning via Knowledge Transfer”.

reuters_mlp_relu_vs_selu.py
Compares self-normalizing MLPs with regular MLPs.

mnist_tfrecord.py
MNIST dataset with TFRecords, the standard TensorFlow data format.

mnist_dataset_api.py
MNIST dataset with TensorFlow’s Dataset API.

cifar10_cnn_tfaugment2d.py

Trains a simple deep CNN on the CIFAR10 small images dataset using Tensorflow internal augmentation APIs.

'''Train a simple deep CNN on the CIFAR10 small images dataset.

Using Tensorflow internal augmentation APIs by replacing ImageGenerator with
an embedded AugmentLayer using LambdaLayer, which is faster on GPU.

# Benchmark of `ImageGenerator` vs `AugmentLayer` both using augmentation 2D:
(backend = Tensorflow-GPU, Nvidia Tesla P100-SXM2)

Settings: horizontal_flip = True
----------------------------------------------------------------------------
Epoch     | ImageGenerator | ImageGenerator | AugmentLayer  | Augment Layer
Number    | %Accuracy      | Performance    | %Accuracy     | Performance
----------------------------------------------------------------------------
1         | 44.84          | 15ms/step      | 45.54         | 358us/step
2         | 52.34          |  8ms/step      | 50.55         | 285us/step
8         | 65.45          |  8ms/step      | 65.59         | 281us/step
25        | 76.74          |  8ms/step      | 76.17         | 280us/step
100       | 78.81          |  8ms/step      | 78.70         | 285us/step
---------------------------------------------------------------------------

Settings: rotation = 30.0
----------------------------------------------------------------------------
Epoch     | ImageGenerator | ImageGenerator | AugmentLayer  | Augment Layer
Number    | %Accuracy      | Performance    | %Accuracy     | Performance
----------------------------------------------------------------------------
1         | 43.46          | 15ms/step      | 42.21         | 334us/step
2         | 48.95          | 11ms/step      | 48.06         | 282us/step
8         | 63.59          | 11ms/step      | 61.35         | 290us/step
25        | 72.25          | 12ms/step      | 71.08         | 287us/step
100       | 76.35          | 11ms/step      | 74.62         | 286us/step
---------------------------------------------------------------------------
(Corner process and rotation precision by `ImageGenerator` and `AugmentLayer`
are slightly different.)
'''

网络结构

实验结果

tensorboard_embeddings_mnist.py
Trains a simple convnet on the MNIST dataset and embeds test data which can be later visualized using TensorBoard’s Embedding Projector.

Keras API 部分说明

LSTM

理解LSTM在keras API中参数return_sequences和return_state

Understand the Difference Between Return Sequences and Return States for LSTMs in Keras

Kears LSTM API 中给出的两个参数描述

return_sequences：默认 False。在输出序列中，返回单个 hidden state值还是返回全部time step 的 hidden state值。 False 返回单个， true 返回全部。
return_state：默认 False。是否返回除输出之外的最后一个状态。

return_sequences

return_state

式子

效果

False

|True|False|```LSTM(1, return_sequences=True)```|输出的hidden state 包含全部时间步的结果。|
|False|True|```lstm1, state_h, state_c = LSTM(1, return_state=True)```|lstm1 和 state_h 结果都是 hidden state。在这种参数设定下，它们俩的值相同。都是最后一个时间步的 hidden state。 state_c 是最后一个时间步 cell state结果。|
|True|True|```lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)```|此时，我们既要输出全部时间步的 hidden state ，又要输出 cell state。lstm1 存放的就是全部时间步的 hidden state。state_h 存放的是最后一个时间步的 hidden state。state_c 存放的是最后一个时间步的 cell state|

## AlphaDropout [噪声层Noise](https://keras-cn.readthedocs.io/en/latest/layers/noise_layer/)

Alpha Dropout是一种保持输入均值和方差不变的Dropout，该层的作用是即使在dropout时也保持数据的自规范性。 通过随机对负的饱和值进行激活，Alphe Drpout与selu激活函数配合较好。

参数
rate: 浮点数，类似Dropout的Drop比例。乘性mask的标准差将保证为sqrt(rate / (1 - rate)).
seed: 随机数种子
输入shape
任意，当使用该层为模型首层时需指定input_shape参数

输出shape
与输入相同

参考文献
[Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515)

## [文本Tokenizer与序列pad_sequences预处理](https://blog.csdn.net/zzulp/article/details/76146947)

```python
import keras.preprocessing.text as T
from keras.preprocessing.text import Tokenizer

text1='some thing to eat'
text2='some thing to drink'
texts=[text1,text2]

print T.text_to_word_sequence(text1)  #['some', 'thing', 'to', 'eat']
print T.one_hot(text1,10)  #[7, 9, 3, 4]
print T.one_hot(text2,10)  #[7, 9, 3, 1]

tokenizer = Tokenizer(num_words=10)
tokenzier.fit_on_text(texts)
print tokenizer.word_count #[('some', 2), ('thing', 2), ('to', 2), ('eat', 1), ('drink', 1)]
print tokenizer.word_index #{'some': 1, 'thing': 2,'to': 3 ','eat': 4, drink': 5}
print tokenizer.word_docs #{'some': 2, 'thing': 2, 'to': 2, 'drink': 1,  'eat': 1}
print tokenizer.index_docs #{1: 2, 2: 2, 3: 2, 4: 1, 5: 1}

print tokenizer.text_to_sequences(texts) #[[1, 2, 3, 4], [1, 2, 3, 5]]
print tokenizer.text_to_matrix(texts) #
[[ 0.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
 [ 0.,  1.,  1.,  1.,  0.,  1.,  0.,  0.,  0.,  0.]]

1
2
3

import keras.preprocessing.sequence as S
S.pad_sequences([[1,2,3]],10,padding='post')
# [[1, 2, 3, 0, 0, 0, 0, 0, 0, 0]]