How to Implement Progressive Growing GAN Models in Keras

The progressive growing generative adversarial network is an approach for training a deep convolutional neural network model for generating synthetic images.

It is an extension of the more traditional GAN architecture that involves incrementally growing the size of the generated image during training, starting with a very small image, such as a 4×4 pixels. This allows the stable training and growth of GAN models capable of generating very large high-quality images, such as images of synthetic celebrity faces with the size of 1024×1024 pixels.

In this tutorial, you will discover how to develop progressive growing generative adversarial network models from scratch with Keras.

After completing this tutorial, you will know:

  • How to develop pre-defined discriminator and generator models at each level of output image growth.
  • How to define composite models for training the generator models via the discriminator models.
  • How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Implement Progressive Growing GAN Models in Keras

How to Implement Progressive Growing GAN Models in Keras
Photo by Diogo Santos Silva, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Progressive Growing GAN Architecture?
  2. How to Implement the Progressive Growing GAN Discriminator Model
  3. How to Implement the Progressive Growing GAN Generator Model
  4. How to Implement Composite Models for Updating the Generator
  5. How to Train Discriminator and Generator Models

What Is the Progressive Growing GAN Architecture?

GANs are effective at generating crisp synthetic images, although are typically limited in the size of the images that can be generated.

The Progressive Growing GAN is an extension to the GAN that allows the training of generator models capable of outputting large high-quality images, such as photorealistic faces with the size 1024×1024 pixels. It was described in the 2017 paper by Tero Karras, et al. from Nvidia titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation.”

The key innovation of the Progressive Growing GAN is the incremental increase in the size of images output by the generator starting with a 4×4 pixel image and double to 8×8, 16×16, and so on until the desired output resolution.

Our primary contribution is a training methodology for GANs where we start with low-resolution images, and then progressively increase the resolution by adding layers to the networks.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

This is achieved by a training procedure that involves periods of fine-tuning the model with a given output resolution, and periods of slowly phasing in a new model with a larger resolution.

When doubling the resolution of the generator (G) and discriminator (D) we fade in the new layers smoothly

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

All layers remain trainable during the training process, including existing layers when new layers are added.

All existing layers in both networks remain trainable throughout the training process.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Progressive Growing GAN involves using a generator and discriminator model with the same general structure and starting with very small images. During training, new blocks of convolutional layers are systematically added to both the generator model and the discriminator models.

Example of Progressively Adding Layers to Generator and Discriminator Models.

Example of Progressively Adding Layers to Generator and Discriminator Models.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The incremental addition of the layers allows the models to effectively learn coarse-level detail and later learn ever finer detail, both on the generator and discriminator side.

This incremental nature allows the training to first discover the large-scale structure of the image distribution and then shift attention to increasingly finer-scale detail, instead of having to learn all scales simultaneously.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The model architecture is complex and cannot be implemented directly.

In this tutorial, we will focus on how the progressive growing GAN can be implemented using the Keras deep learning library.

We will step through how each of the discriminator and generator models can be defined, how the generator can be trained via the discriminator model, and how each model can be updated during the training process.

These implementation details will provide the basis for you developing a progressive growing GAN for your own applications.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the Progressive Growing GAN Discriminator Model

The discriminator model is given images as input and must classify them as either real (from the dataset) or fake (generated).

During the training process, the discriminator must grow to support images with ever-increasing size, starting with 4×4 pixel color images and doubling to 8×8, 16×16, 32×32, and so on.

This is achieved by inserting a new input layer to support the larger input image followed by a new block of layers. The output of this new block is then downsampled. Additionally, the new image is also downsampled directly and passed through the old input processing layer before it is combined with the output of the new block.

During the transition from a lower resolution to a higher resolution, e.g. 16×16 to 32×32, the discriminator model will have two input pathways as follows:

  • [32×32 Image] -> [fromRGB Conv] -> [NewBlock] -> [Downsample] ->
  • [32×32 Image] -> [Downsample] -> [fromRGB Conv] ->

The output of the new block that is downsampled and the output of the old input processing layer are combined using a weighted average, where the weighting is controlled by a new hyperparameter called alpha. The weighted sum is calculated as follows:

  • Output = ((1 – alpha) * fromRGB) + (alpha * NewBlock)

The weighted average of the two pathways is then fed into the rest of the existing model.

Initially, the weighting is completely biased towards the old input processing layer (alpha=0) and is linearly increased over training iterations so that the new block is given more weight until eventually, the output is entirely the product of the new block (alpha=1). At this time, the old pathway can be removed.

This can be summarized with the following figure taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

Figure Showing the Growing of the Discriminator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution

Figure Showing the Growing of the Discriminator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The fromRGB layers are implemented as a 1×1 convolutional layer. A block is comprised of two convolutional layers with 3×3 sized filters and the leaky ReLU activation function with a slope of 0.2, followed by a downsampling layer. Average pooling is used for downsampling, which is unlike most other GAN models that use transpose convolutional layers.

The output of the model involves two convolutional layers with 3×3 and 4×4 sized filters and Leaky ReLU activation, followed by a fully connected layer that outputs the single value prediction. The model uses a linear activation function instead of a sigmoid activation function like other discriminator models and is trained directly either by Wasserstein loss (specifically WGAN-GP) or least squares loss; we will use the latter in this tutorial. Model weights are initialized using He Gaussian (he_normal), which is very similar to the method used in the paper.

The model uses a custom layer called Minibatch standard deviation at the beginning of the output block, and instead of batch normalization, each layer uses local response normalization, referred to as pixel-wise normalization in the paper. We will leave out the minibatch normalization and use batch normalization in this tutorial for brevity.

One approach to implementing the progressive growing GAN would be to manually expand a model on demand during training. Another approach is to pre-define all of the models prior to training and carefully use the Keras functional API to ensure that layers are shared across the models and continue training.

I believe the latter approach might be easier and is the approach we will use in this tutorial.

First, we must define a custom layer that we can use when fading in a new higher-resolution input image and block. This new layer must take two sets of activation maps with the same dimensions (width, height, channels) and add them together using a weighted sum.

We can implement this as a new layer called WeightedSum that extends the Add merge layer and uses a hyperparameter ‘alpha‘ to control the contribution of each input. This new class is defined below. The layer assumes only two inputs: the first for the output of the old or existing layers and the second for the newly added layers. The new hyperparameter is defined as a backend variable, meaning that we can change it any time via changing the value of the variable.

# weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output

The discriminator model is by far more complex than the generator to grow because we have to change the model input, so let’s step through this slowly.

Firstly, we can define a discriminator model that takes a 4×4 color image as input and outputs a prediction of whether the image is real or fake. The model is comprised of a 1×1 input processing layer (fromRGB) and an output block.

... # base model input in_image = Input(shape=(4,4,3)) # conv 1x1 g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 (output block) g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 4x4 g = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # dense output layer g = Flatten()(g) out_class = Dense(1)(g) # define model model = Model(in_image, out_class) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

Next, we need to define a new model that handles the intermediate stage between this model and a new discriminator model that takes 8×8 color images as input.

The existing input processing layer must receive a downsampled version of the new 8×8 image. A new input process layer must be defined that takes the 8×8 input image and passes it through a new block of two convolutional layers and a downsampling layer. The output of the new block after downsampling and the old input processing layer must be added together using a weighted sum via our new WeightedSum layer and then must reuse the same output block (two convolutional layers and the output layer).

Given the first defined model and our knowledge about this model (e.g. the number of layers in the input processing layer is 2 for the Conv2D and LeakyReLU), we can construct this new intermediate or fade-in model using layer indexes from the old model.

... old_model = model # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # define new block g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = AveragePooling2D()(g) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input g = WeightedSum()([block_old, g]) # skip the input, 1x1 and activation for the old model for i in range(3, len(old_model.layers)): 	g = old_model.layers[i](g) # define straight-through model model = Model(in_image, g) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

So far, so good.

We also need a version of the same model with the same layers without the fade-in of the input from the old model’s input processing layers.

This straight-through version is required for training before we fade-in the next doubling of the input image size.

We can update the above example to create two versions of the model. First, the straight-through version as it is simpler, then the version used for the fade-in that reuses the layers from the new block and the output layers of the old model.

The add_discriminator_block() function below implements this, returning a list of the two defined models (straight-through and fade-in), and takes the old model as an argument and defines the number of input layers as a default argument (3).

To ensure that the WeightedSum layer works correctly, we have fixed all convolutional layers to always have 64 filters, and in turn, output 64 feature maps. If there is a mismatch between the old model’s input processing layer and the new blocks output in terms of the number of feature maps (channels), then the weighted sum will fail.

# add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]

It is not an elegant function as we have some repetition, but it is readable and will get the job done.

We can then call this function again and again as we double the size of input images. Importantly, the function expects the straight-through version of the prior model as input.

The example below defines a new function called define_discriminator() that defines our base model that expects a 4×4 color image as input, then repeatedly adds blocks to create new versions of the discriminator model each time that expects images with quadruple the area.

# define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

This function will return a list of models, where each item in the list is a two-element list that contains first the straight-through version of the model at that resolution, and second the fade-in version of the model for that resolution.

We can tie all of this together and define a new “discriminator model” that will grow from 4×4, through to 8×8, and finally to 16×16. This is achieved by passing he n_blocks argument to 3 when calling the define_discriminator() function, for the creation of three sets of models.

The complete example is listed below.

# example of defining discriminator models for the progressive growing gan from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define models discriminators = define_discriminator(3) # spot check m = discriminators[2][1] m.summary() plot_model(m, to_file='discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the fade-in version of the third model showing the 16×16 color image inputs and the single value output.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_3 (InputLayer)            (None, 16, 16, 3)    0 __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 16, 16, 64)   256         input_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU)       (None, 16, 16, 64)   0           conv2d_7[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_7[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64)   256         conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_8 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_8[0][0] __________________________________________________________________________________________________ average_pooling2d_4 (AveragePoo (None, 8, 8, 3)      0           input_3[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64)   256         conv2d_9[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 8, 8, 64)     256         average_pooling2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_9 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 8, 8, 64)     0           conv2d_4[1][0] __________________________________________________________________________________________________ average_pooling2d_3 (AveragePoo (None, 8, 8, 64)     0           leaky_re_lu_9[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum)    (None, 8, 8, 64)     0           leaky_re_lu_4[1][0]                                                                  average_pooling2d_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 64)     36928       weighted_sum_2[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64)     256         conv2d_5[2][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_3[2][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 8, 8, 64)     36928       leaky_re_lu_5[2][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64)     256         conv2d_6[2][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_4[2][0] __________________________________________________________________________________________________ average_pooling2d_1 (AveragePoo (None, 4, 4, 64)     0           leaky_re_lu_6[2][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 4, 4, 128)    73856       average_pooling2d_1[2][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128)    512         conv2d_2[4][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_1[4][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 4, 4, 128)    262272      leaky_re_lu_2[4][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128)    512         conv2d_3[4][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_2[4][0] __________________________________________________________________________________________________ flatten_1 (Flatten)             (None, 2048)         0           leaky_re_lu_3[4][0] __________________________________________________________________________________________________ dense_1 (Dense)                 (None, 1)            2049        flatten_1[4][0] ================================================================================================== Total params: 488,449 Trainable params: 487,425 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

Note: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to plot_model().

The plot shows the 16×16 input image that is downsampled and passed through the 8×8 input processing layers from the prior model (left). It also shows the addition of the new block (right) and the weighted average that combines both streams of input, before using the existing model layers to continue processing and outputting a prediction.

Plot of the Fade-In Discriminator Model For the Progressive Growing GAN Transitioning From 8x8 to 16x16 Input Images

Plot of the Fade-In Discriminator Model For the Progressive Growing GAN Transitioning From 8×8 to 16×16 Input Images

Now that we have seen how we can define the discriminator models, let’s look at how we can define the generator models.

How to Implement the Progressive Growing GAN Generator Model

The generator models for the progressive growing GAN are easier to implement in Keras than the discriminator models.

The reason for this is because each fade-in requires a minor change to the output of the model.

Increasing the resolution of the generator involves first upsampling the output of the end of the last block. This is then connected to the new block and a new output layer for an image that is double the height and width dimensions or quadruple the area. During the phase-in, the upsampling is also connected to the output layer from the old model and the output from both output layers is merged using a weighted average.

After the phase-in is complete, the old output layer is removed.

This can be summarized with the following figure, taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

Figure Showing the Growing of the Generator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution

Figure Showing the Growing of the Generator Model, Before (a), During (b), and After (c) the Phase-In of a High Resolution.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The toRGB layer is a convolutional layer with 3 1×1 filters, sufficient to output a color image.

The model takes a point in the latent space as input, e.g. such as a 100-element or 512-element vector as described in the paper. This is scaled up to provided the basis for 4×4 activation maps, followed by a convolutional layer with 4×4 filters and another with 3×3 filters. Like the discriminator, LeakyReLU activations are used, as is pixel normalization, which we will substitute with batch normalization for brevity.

A block involves an upsample layer followed by two convolutional layers with 3×3 filters. Upsampling is achieved using a nearest neighbor method (e.g. duplicating input rows and columns) via a UpSampling2D layer instead of the more common transpose convolutional layer.

We can define the baseline model that will take a point in latent space as input and output a 4×4 color image as follows:

... # base model latent input in_latent = Input(shape=(100,)) # linear scale up to activation maps g  = Dense(128 * 4 * 4, kernel_initializer='he_normal')(in_latent) g = Reshape((4, 4, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(in_latent, out_image)

Next, we need to define a version of the model that uses all of the same input layers, although adds a new block (upsample and 2 convolutional layers) and a new output layer (a 1×1 convolutional layer).

This would be the model after the phase-in to the new output resolution. This can be achieved by using own knowledge about the baseline model and that the end of the last block is the second last layer, e.g. layer at index -2 in the model’s list of layers.

The new model with the addition of a new block and output layer is defined as follows:

... old_model = model # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(old_model.input, out_image)

That is pretty straightforward; we have chopped off the old output layer at the end of the last block and grafted on a new block and output layer.

Now we need a version of this new model to use during the fade-in.

This involves connecting the old output layer to the new upsampling layer at the start of the new block and using an instance of our WeightedSum layer defined in the previous section to combine the output of the old and new output layers.

... # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged)

We can combine the definition of these two operations into a function named add_generator_block(), defined below, that will expand a given model and return both the new generator model with the added block (model1) and a version of the model with the fading in of the new block with the old output layer (model2).

# add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]

We can then call this function with our baseline model to create models with one added block and continue to call it with subsequent models to keep adding blocks.

The define_generator() function below implements this, taking the size of the latent space and number of blocks to add (models to create).

The baseline model is defined as outputting a color image with the shape 4×4, controlled by the default argument in_dim.

# define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

We can tie all of this together and define a baseline generator and the addition of two blocks, so three models in total, where a straight-through and fade-in version of each model is defined.

The complete example is listed below.

# example of defining generator models for the progressive growing gan from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define models generators = define_generator(100, 3) # spot check m = generators[2][1] m.summary() plot_model(m, to_file='generator_plot.png', show_shapes=True, show_layer_names=True)

The example chooses the fade-in model for the last model to summarize.

Running the example first summarizes a linear list of the layers in the model. We can see that the last model takes a point from the latent space and outputs a 16×16 image.

This matches as our expectations as the baseline model outputs a 4×4 image, adding one block increases this to 8×8, and adding one more block increases this to 16×16.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 100)          0 __________________________________________________________________________________________________ dense_1 (Dense)                 (None, 2048)         206848      input_1[0][0] __________________________________________________________________________________________________ reshape_1 (Reshape)             (None, 4, 4, 128)    0           dense_1[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 4, 4, 128)    147584      reshape_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128)    512         conv2d_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 4, 4, 128)    147584      leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128)    512         conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_2[0][0] __________________________________________________________________________________________________ up_sampling2d_1 (UpSampling2D)  (None, 8, 8, 128)    0           leaky_re_lu_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 8, 8, 64)     73792       up_sampling2d_1[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64)     256         conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 64)     36928       leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64)     256         conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_4[0][0] __________________________________________________________________________________________________ up_sampling2d_2 (UpSampling2D)  (None, 16, 16, 64)   0           leaky_re_lu_4[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 16, 16, 64)   36928       up_sampling2d_2[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64)   256         conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_5[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64)   256         conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               multiple             195         up_sampling2d_2[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 16, 16, 3)    195         leaky_re_lu_6[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum)    (None, 16, 16, 3)    0           conv2d_6[1][0]                                                                  conv2d_9[0][0] ================================================================================================== Total params: 689,030 Trainable params: 688,006 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

Note: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to plot_model().

We can see that the output from the last block passes through an UpSampling2D layer before feeding the added block and a new output layer as well as the old output layer before being merged via a weighted sum into the final output layer.

Plot of the Fade-In Generator Model For the Progressive Growing GAN Transitioning From 8x8 to 16x16 Output Images

Plot of the Fade-In Generator Model For the Progressive Growing GAN Transitioning From 8×8 to 16×16 Output Images

Now that we have seen how to define the generator models, we can review how the generator models may be updated via the discriminator models.

How to Implement Composite Models for Updating the Generator

The discriminator models are trained directly with real and fake images as input and a target value of 0 for fake and 1 for real.

The generator models are not trained directly; instead, they are trained indirectly via the discriminator models, just like a normal GAN model.

We can create a composite model for each level of growth of the model, e.g. pair 4×4 generators and 4×4 discriminators. We can also pair the straight-through models together, and the fade-in models together.

For example, we can retrieve the generator and discriminator models for a given level of growth.

... g_models, d_models = generators[0], discriminators[0]

Then we can use them to create a composite model for training the straight-through generator, where the output of the generator is fed directly to the discriminator in order to classify.

# straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

And do the same for the composite model for the fade-in generator.

# fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

The function below, named define_composite(), automates this; given a list of defined discriminator and generator models, it will create an appropriate composite model for training each generator model.

# define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list

Tying this together with the definition of the discriminator and generator models above, the complete example of defining all models at each pre-defined level of growth is listed below.

# example of defining composite models for the progressive growing gan from keras.optimizers import Adam from keras.models import Sequential from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list  # define models discriminators = define_discriminator(3) # define models generators = define_generator(100, 3) # define composite models composite = define_composite(discriminators, generators)

Now that we know how to define all of the models, we can review how the models might be updated during training.

How to Train Discriminator and Generator Models

Pre-defining the generator, discriminator, and composite models was the hard part; training the models is straight forward and much like training any other GAN.

Importantly, in each training iteration the alpha variable in each WeightedSum layer must be set to a new value. This must be set for the layer in both the generator and discriminator models and allows for the smooth linear transition from the old model layers to the new model layers, e.g. alpha values set from 0 to 1 over a fixed number of training iterations.

The update_fadein() function below implements this and will loop through a list of models and set the alpha value on each based on the current step in a given number of training steps. You may be able to implement this more elegantly using a callback.

# update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): 	# calculate current alpha (linear from 0 to 1) 	alpha = step / float(n_steps - 1) 	# update the alpha for each model 	for model in models: 		for layer in model.layers: 			if isinstance(layer, WeightedSum): 				backend.set_value(layer.alpha, alpha)

We can define a generic function for training a given generator, discriminator, and composite model for a given number of training epochs.

The train_epochs() function below implements this where first the discriminator model is updated on real and fake images, then the generator model is updated, and the process is repeated for the required number of training iterations based on the dataset size and the number of epochs.

This function calls helper functions for retrieving a batch of real images via generate_real_samples(), generating a batch of fake samples with the generator generate_fake_samples(), and generating a sample of points in latent space generate_latent_points(). You can define these functions yourself quite trivially.

# train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): 	# calculate the number of batches per training epoch 	bat_per_epo = int(dataset.shape[0] / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# calculate the size of half a batch of samples 	half_batch = int(n_batch / 2) 	# manually enumerate epochs 	for i in range(n_steps): 		# update alpha for all WeightedSum layers when fading in new blocks 		if fadein: 			update_fadein([g_model, d_model, gan_model], i, n_steps) 		# prepare real and fake samples 		X_real, y_real = generate_real_samples(dataset, half_batch) 		X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) 		# update discriminator model 		d_loss1 = d_model.train_on_batch(X_real, y_real) 		d_loss2 = d_model.train_on_batch(X_fake, y_fake) 		# update the generator via the discriminator's error 		z_input = generate_latent_points(latent_dim, n_batch) 		y_real2 = ones((n_batch, 1)) 		g_loss = gan_model.train_on_batch(z_input, y_real2) 		# summarize loss on this batch 		print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss))

The images must be scaled to the size of each model. If the images are in-memory, we can define a simple scale_dataset() function to scale the loaded images.

In this case, we are using the skimage.transform.resize function from the scikit-image library to resize the NumPy array of pixels to the required size and use nearest neighbor interpolation.

# scale images to preferred size def scale_dataset(images, new_shape): 	images_list = list() 	for image in images: 		# resize with nearest neighbor interpolation 		new_image = resize(image, new_shape, 0) 		# store 		images_list.append(new_image) 	return asarray(images_list)

First, the baseline model must be fit for a given number of training epochs, e.g. the model that outputs 4×4 sized images.

This will require that the loaded images be scaled to the required size defined by the shape of the generator models output layer.

# fit the baseline model g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can then process each level of growth, e.g. the first being 8×8.

This involves first retrieving the models, scaling the data to the appropriate size, then fitting the fade-in model followed by training the straight-through version of the model for fine tuning.

We can repeat this for each level of growth in a loop.

# process each level of growth for i in range(1, len(g_models)): 	# retrieve models for this level of growth 	[g_normal, g_fadein] = g_models[i] 	[d_normal, d_fadein] = d_models[i] 	[gan_normal, gan_fadein] = gan_models[i] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train fade-in models for next level of growth 	train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can tie this together and define a function called train() to train the progressive growing GAN function.

# train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): 	# fit the baseline model 	g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch) 	# process each level of growth 	for i in range(1, len(g_models)): 		# retrieve models for this level of growth 		[g_normal, g_fadein] = g_models[i] 		[d_normal, d_fadein] = d_models[i] 		[gan_normal, gan_fadein] = gan_models[i] 		# scale dataset to appropriate size 		gen_shape = g_normal.output_shape 		scaled_data = scale_dataset(dataset, gen_shape[1:]) 		print('Scaled Data', scaled_data.shape) 		# train fade-in models for next level of growth 		train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch, True) 		# train normal or straight-through models 		train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

The number of epochs for the normal phase is defined by the e_norm argument and the number of epochs during the fade-in phase is defined by the e_fadein argument.

The number of epochs must be specified based on the size of the image dataset and the same number of epochs can be used for each phase, as was used in the paper.

We start with 4×4 resolution and train the networks until we have shown the discriminator 800k real images in total. We then alternate between two phases: fade in the first 3-layer block during the next 800k images, stabilize the networks for 800k images, fade in the next 3-layer block during 800k images, etc.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

We can then define our models as we did in the previous section, then call the training function.

# number of growth phase, e.g. 3 = 16x16 images n_blocks = 3 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(100, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples() # train model train(g_models, d_models, gan_models, dataset, latent_dim, 100, 100, 16)

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Official

API

Articles

Summary

In this tutorial, you discovered how to develop progressive growing generative adversarial network models from scratch with Keras.

Specifically, you learned:

  • How to develop pre-defined discriminator and generator models at each level of output image growth.
  • How to define composite models for training the generator models via the discriminator models.
  • How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Progressive Growing GAN Models in Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

How to Implement CycleGAN Models From Scratch With Keras

The Cycle Generative adversarial Network, or CycleGAN for short, is a generator model for converting images from one domain to another domain.

For example, the model can be used to translate images of horses to images of zebras, or photographs of city landscapes at night to city landscapes during the day.

The benefit of the CycleGAN model is that it can be trained without paired examples. That is, it does not require examples of photographs before and after the translation in order to train the model, e.g. photos of the same city landscape during the day and at night. Instead, it is able to use a collection of photographs from each domain and extract and harness the underlying style of images in the collection in order to perform the translation.

The model is very impressive but has an architecture that appears quite complicated to implement for beginners.

In this tutorial, you will discover how to implement the CycleGAN architecture from scratch using the Keras deep learning framework.

After completing this tutorial, you will know:

  • How to implement the discriminator and generator models.
  • How to define composite models to train the generator models via adversarial and cycle loss.
  • How to implement the training process to update model weights each training iteration.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Develop CycleGAN Models From Scratch With Keras

How to Develop CycleGAN Models From Scratch With Keras
Photo by anokarina, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the CycleGAN Architecture?
  2. How to Implement the CycleGAN Discriminator Model
  3. How to Implement the CycleGAN Generator Model
  4. How to Implement Composite Models for Least Squares and Cycle Loss
  5. How to Update Discriminator and Generator Models

What Is the CycleGAN Architecture?

The CycleGAN model was described by Jun-Yan Zhu, et al. in their 2017 paper titled “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.”

The model architecture is comprised of two generator models: one generator (Generator-A) for generating images for the first domain (Domain-A) and the second generator (Generator-B) for generating images for the second domain (Domain-B).

  • Generator-A -> Domain-A
  • Generator-B -> Domain-B

The generator models perform image translation, meaning that the image generation process is conditional on an input image, specifically an image from the other domain. Generator-A takes an image from Domain-B as input and Generator-B takes an image from Domain-A as input.

  • Domain-B -> Generator-A -> Domain-A
  • Domain-A -> Generator-B -> Domain-B

Each generator has a corresponding discriminator model.

The first discriminator model (Discriminator-A) takes real images from Domain-A and generated images from Generator-A and predicts whether they are real or fake. The second discriminator model (Discriminator-B) takes real images from Domain-B and generated images from Generator-B and predicts whether they are real or fake.

  • Domain-A -> Discriminator-A -> [Real/Fake]
  • Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake]
  • Domain-B -> Discriminator-B -> [Real/Fake]
  • Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]

The discriminator and generator models are trained in an adversarial zero-sum process, like normal GAN models.

The generators learn to better fool the discriminators and the discriminators learn to better detect fake images. Together, the models find an equilibrium during the training process.

Additionally, the generator models are regularized not just to create new images in the target domain, but instead create translated versions of the input images from the source domain. This is achieved by using generated images as input to the corresponding generator model and comparing the output image to the original images.

Passing an image through both generators is called a cycle. Together, each pair of generator models are trained to better reproduce the original source image, referred to as cycle consistency.

  • Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B
  • Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

There is one further element to the architecture referred to as the identity mapping.

This is where a generator is provided with images as input from the target domain and is expected to generate the same image without change. This addition to the architecture is optional, although it results in a better matching of the color profile of the input image.

  • Domain-A -> Generator-A -> Domain-A
  • Domain-B -> Generator-B -> Domain-B

Now that we are familiar with the model architecture, we can take a closer look at each model in turn and how they can be implemented.

The paper provides a good description of the models and training process, although the official Torch implementation was used as the definitive description for each model and training process and provides the basis for the the model implementations described below.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the CycleGAN Discriminator Model

The discriminator model is responsible for taking a real or generated image as input and predicting whether it is real or fake.

The discriminator model is implemented as a PatchGAN model.

For the discriminator networks we use 70 × 70 PatchGANs, which aim to classify whether 70 × 70 overlapping image patches are real or fake.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The PatchGAN was described in the 2016 paper titled “Precomputed Real-time Texture Synthesis With Markovian Generative Adversarial Networks” and was used in the pix2pix model for image translation described in the 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks.”

The architecture is described as discriminating an input image as real or fake by averaging the prediction for nxn squares or patches of the source image.

… we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each NxN patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

This can be implemented directly by using a somewhat standard deep convolutional discriminator model.

Instead of outputting a single value like a traditional discriminator model, the PatchGAN discriminator model can output a square or one-channel feature map of predictions. The 70×70 refers to the effective receptive field of the model on the input, not the actual shape of the output feature map.

The receptive field of a convolutional layer refers to the number of pixels that one output of the layer maps to in the input to the layer. The effective receptive field refers to the mapping of one pixel in the output of a deep convolutional model (multiple layers) to the input image. Here, the PatchGAN is an approach to designing a deep convolutional network based on the effective receptive field, where one output activation of the model maps to a 70×70 patch of the input image, regardless of the size of the input image.

The PatchGAN has the effect of predicting whether each 70×70 patch in the input image is real or fake. These predictions can then be averaged to give the output of the model (if needed) or compared directly to a matrix (or a vector if flattened) of expected values (e.g. 0 or 1 values).

The discriminator model described in the paper takes 256×256 color images as input and defines an explicit architecture that is used on all of the test problems. The architecture uses blocks of Conv2D-InstanceNorm-LeakyReLU layers, with 4×4 filters and a 2×2 stride.

Let Ck denote a 4×4 Convolution-InstanceNorm-LeakyReLU layer with k filters and stride 2. After the last layer, we apply a convolution to produce a 1-dimensional output. We do not use InstanceNorm for the first C64 layer. We use leaky ReLUs with a slope of 0.2.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The architecture for the discriminator is as follows:

  • C64-C128-C256-C512

This is referred to as a 3-layer PatchGAN in the CycleGAN and Pix2Pix nomenclature, as excluding the first hidden layer, the model has three hidden layers that could be scaled up or down to give different sized PatchGAN models.

Not listed in the paper, the model also has a final hidden layer C512 with a 1×1 stride, and an output layer C1, also with a 1×1 stride with a linear activation function. Given the model is mostly used with 256×256 sized images as input, the size of the output feature map of activations is 16×16. If 128×128 images were used as input, then the size of the output feature map of activations would be 8×8.

The model does not use batch normalization; instead, instance normalization is used.

Instance normalization was described in the 2016 paper titled “Instance Normalization: The Missing Ingredient for Fast Stylization.” It is a very simple type of normalization and involves standardizing (e.g. scaling to a standard Gaussian) the values on each feature map.

The intent is to remove image-specific contrast information from the image during image generation, resulting in better generated images.

The key idea is to replace batch normalization layers in the generator architecture with instance normalization layers, and to keep them at test time (as opposed to freeze and simplify them out as done for batch normalization). Intuitively, the normalization process allows to remove instance-specific contrast information from the content image, which simplifies generation. In practice, this results in vastly improved images.

Instance Normalization: The Missing Ingredient for Fast Stylization, 2016.

Although designed for generator models, it can also prove effective in discriminator models.

An implementation of instance normalization is provided in the keras-contrib project that provides early access to community-supplied Keras features.

The keras-contrib library can be installed via pip as follows:

sudo pip install git+https://www.github.com/keras-team/keras-contrib.git

Or, if you are using an Anaconda virtual environment, such as on EC2:

git clone https://www.github.com/keras-team/keras-contrib.git cd keras-contrib sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

The new InstanceNormalization layer can then be used as follows:

... from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization # define layer layer = InstanceNormalization(axis=-1) ...

The “axis” argument is set to -1 to ensure that features are normalized per feature map.

The network weights are initialized to Gaussian random numbers with a standard deviation of 0.02, as is described for DCGANs more generally.

Weights are initialized from a Gaussian distribution N (0, 0.02).

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The discriminator model is updated using a least squares loss (L2), a so-called Least-Squared Generative Adversarial Network, or LSGAN.

… we replace the negative log likelihood objective by a least-squares loss. This loss is more stable during training and generates higher quality results.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be implemented using “mean squared error” between the target values of class=1 for real images and class=0 for fake images.

Additionally, the paper suggests dividing the loss for the discriminator by half during training, in an effort to slow down updates to the discriminator relative to the generator.

In practice, we divide the objective by 2 while optimizing D, which slows down the rate at which D learns, relative to the rate of G.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be achieved by setting the “loss_weights” argument to 0.5 when compiling the model. Note that this weighting does not appear to be implemented in the official Torch implementation when updating discriminator models are defined in the fDx_basic() function.

We can tie all of this together in the example below with a define_discriminator() function that defines the PatchGAN discriminator. The model configuration matches the description in the appendix of the paper with additional details from the official Torch implementation defined in the defineD_n_layers() function.

# example of defining a 70x70 patchgan discriminator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import BatchNormalization from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_image = Input(shape=image_shape) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	# define model 	model = Model(in_image, patch_out) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) 	return model  # define image shape image_shape = (256,256,3) # create the model model = define_discriminator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='discriminator_model_plot.png', show_shapes=True, show_layer_names=True)

Note: the plot_model() function requires that both the pydot and pygraphviz libraries are installed. If this is a problem, you can comment out both the import and call to this function.

Running the example summarizes the model showing the size inputs and outputs for each layer.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= input_1 (InputLayer)         (None, 256, 256, 3)       0 _________________________________________________________________ conv2d_1 (Conv2D)            (None, 128, 128, 64)      3136 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU)    (None, 128, 128, 64)      0 _________________________________________________________________ conv2d_2 (Conv2D)            (None, 64, 64, 128)       131200 _________________________________________________________________ instance_normalization_1 (In (None, 64, 64, 128)       256 _________________________________________________________________ leaky_re_lu_2 (LeakyReLU)    (None, 64, 64, 128)       0 _________________________________________________________________ conv2d_3 (Conv2D)            (None, 32, 32, 256)       524544 _________________________________________________________________ instance_normalization_2 (In (None, 32, 32, 256)       512 _________________________________________________________________ leaky_re_lu_3 (LeakyReLU)    (None, 32, 32, 256)       0 _________________________________________________________________ conv2d_4 (Conv2D)            (None, 16, 16, 512)       2097664 _________________________________________________________________ instance_normalization_3 (In (None, 16, 16, 512)       1024 _________________________________________________________________ leaky_re_lu_4 (LeakyReLU)    (None, 16, 16, 512)       0 _________________________________________________________________ conv2d_5 (Conv2D)            (None, 16, 16, 512)       4194816 _________________________________________________________________ instance_normalization_4 (In (None, 16, 16, 512)       1024 _________________________________________________________________ leaky_re_lu_5 (LeakyReLU)    (None, 16, 16, 512)       0 _________________________________________________________________ conv2d_6 (Conv2D)            (None, 16, 16, 1)         8193 ================================================================= Total params: 6,962,369 Trainable params: 6,962,369 Non-trainable params: 0 _________________________________________________________________

A plot of the model architecture is also created to help get an idea of the inputs, outputs, and transitions of the image data through the model.

Plot of the PatchGAN Discriminator Model for the CycleGAN

Plot of the PatchGAN Discriminator Model for the CycleGAN

How to Implement the CycleGAN Generator Model

The CycleGAN Generator model takes an image as input and generates a translated image as output.

The model uses a sequence of downsampling convolutional blocks to encode the input image, a number of residual network (ResNet) convolutional blocks to transform the image, and a number of upsampling convolutional blocks to generate the output image.

Let c7s1-k denote a 7×7 Convolution-InstanceNormReLU layer with k filters and stride 1. dk denotes a 3×3 Convolution-InstanceNorm-ReLU layer with k filters and stride 2. Reflection padding was used to reduce artifacts. Rk denotes a residual block that contains two 3 × 3 convolutional layers with the same number of filters on both layer. uk denotes a 3 × 3 fractional-strided-ConvolutionInstanceNorm-ReLU layer with k filters and stride 1/2.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The architecture for the 6-resnet block generator for 128×128 images is as follows:

  • c7s1-64,d128,d256,R256,R256,R256,R256,R256,R256,u128,u64,c7s1-3

First, we need a function to define the ResNet blocks. These are blocks comprised of two 3×3 CNN layers where the input to the block is concatenated to the output of the block, channel-wise.

This is implemented in the resnet_block() function that creates two Conv-InstanceNorm blocks with 3×3 filters and 1×1 stride and without a ReLU activation after the second block, matching the official Torch implementation in the build_conv_block() function. Same padding is used instead of reflection padded recommended in the paper for simplicity.

# generator a resnet block def resnet_block(n_filters, input_layer): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# first layer convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# second convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	# concatenate merge channel-wise with input layer 	g = Concatenate()([g, input_layer]) 	return g

Next, we can define a function that will create the 9-resnet block version for 256×256 input images. This can easily be changed to the 6-resnet block version by setting image_shape to (128x128x3) and n_resnet function argument to 6.

Importantly, the model outputs pixel values with the shape as the input and pixel values are in the range [-1, 1], typical for GAN generator models.

# define the standalone generator model def define_generator(image_shape=(256,256,3), n_resnet=9): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# c7s1-64 	g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d128 	g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d256 	g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# R256 	for _ in range(n_resnet): 		g = resnet_block(256, g) 	# u128 	g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# u64 	g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# c7s1-3 	g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model

The generator model is not compiled as it is trained via a composite model, seen in the next section.

Tying this together, the complete example is listed below.

# example of an encoder-decoder generator for the cyclegan from keras.optimizers import Adam from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import Activation from keras.initializers import RandomNormal from keras.layers import Concatenate from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model  # generator a resnet block def resnet_block(n_filters, input_layer): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# first layer convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# second convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	# concatenate merge channel-wise with input layer 	g = Concatenate()([g, input_layer]) 	return g  # define the standalone generator model def define_generator(image_shape=(256,256,3), n_resnet=9): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# c7s1-64 	g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d128 	g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d256 	g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# R256 	for _ in range(n_resnet): 		g = resnet_block(256, g) 	# u128 	g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# u64 	g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# c7s1-3 	g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # create the model model = define_generator() # summarize the model model.summary() # plot the model plot_model(model, to_file='generator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 256, 256, 64) 9472        input_1[0][0] __________________________________________________________________________________________________ instance_normalization_1 (Insta (None, 256, 256, 64) 128         conv2d_1[0][0] __________________________________________________________________________________________________ activation_1 (Activation)       (None, 256, 256, 64) 0           instance_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 128, 128, 128 73856       activation_1[0][0] __________________________________________________________________________________________________ instance_normalization_2 (Insta (None, 128, 128, 128 256         conv2d_2[0][0] __________________________________________________________________________________________________ activation_2 (Activation)       (None, 128, 128, 128 0           instance_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 64, 64, 256)  295168      activation_2[0][0] __________________________________________________________________________________________________ instance_normalization_3 (Insta (None, 64, 64, 256)  512         conv2d_3[0][0] __________________________________________________________________________________________________ activation_3 (Activation)       (None, 64, 64, 256)  0           instance_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 64, 64, 256)  590080      activation_3[0][0] __________________________________________________________________________________________________ instance_normalization_4 (Insta (None, 64, 64, 256)  512         conv2d_4[0][0] __________________________________________________________________________________________________ activation_4 (Activation)       (None, 64, 64, 256)  0           instance_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 64, 64, 256)  590080      activation_4[0][0] __________________________________________________________________________________________________ instance_normalization_5 (Insta (None, 64, 64, 256)  512         conv2d_5[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 64, 64, 512)  0           instance_normalization_5[0][0]                                                                  activation_3[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 64, 64, 256)  1179904     concatenate_1[0][0] __________________________________________________________________________________________________ instance_normalization_6 (Insta (None, 64, 64, 256)  512         conv2d_6[0][0] __________________________________________________________________________________________________ activation_5 (Activation)       (None, 64, 64, 256)  0           instance_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 64, 64, 256)  590080      activation_5[0][0] __________________________________________________________________________________________________ instance_normalization_7 (Insta (None, 64, 64, 256)  512         conv2d_7[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate)     (None, 64, 64, 768)  0           instance_normalization_7[0][0]                                                                  concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 64, 64, 256)  1769728     concatenate_2[0][0] __________________________________________________________________________________________________ instance_normalization_8 (Insta (None, 64, 64, 256)  512         conv2d_8[0][0] __________________________________________________________________________________________________ activation_6 (Activation)       (None, 64, 64, 256)  0           instance_normalization_8[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 64, 64, 256)  590080      activation_6[0][0] __________________________________________________________________________________________________ instance_normalization_9 (Insta (None, 64, 64, 256)  512         conv2d_9[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate)     (None, 64, 64, 1024) 0           instance_normalization_9[0][0]                                                                  concatenate_2[0][0] __________________________________________________________________________________________________ conv2d_10 (Conv2D)              (None, 64, 64, 256)  2359552     concatenate_3[0][0] __________________________________________________________________________________________________ instance_normalization_10 (Inst (None, 64, 64, 256)  512         conv2d_10[0][0] __________________________________________________________________________________________________ activation_7 (Activation)       (None, 64, 64, 256)  0           instance_normalization_10[0][0] __________________________________________________________________________________________________ conv2d_11 (Conv2D)              (None, 64, 64, 256)  590080      activation_7[0][0] __________________________________________________________________________________________________ instance_normalization_11 (Inst (None, 64, 64, 256)  512         conv2d_11[0][0] __________________________________________________________________________________________________ concatenate_4 (Concatenate)     (None, 64, 64, 1280) 0           instance_normalization_11[0][0]                                                                  concatenate_3[0][0] __________________________________________________________________________________________________ conv2d_12 (Conv2D)              (None, 64, 64, 256)  2949376     concatenate_4[0][0] __________________________________________________________________________________________________ instance_normalization_12 (Inst (None, 64, 64, 256)  512         conv2d_12[0][0] __________________________________________________________________________________________________ activation_8 (Activation)       (None, 64, 64, 256)  0           instance_normalization_12[0][0] __________________________________________________________________________________________________ conv2d_13 (Conv2D)              (None, 64, 64, 256)  590080      activation_8[0][0] __________________________________________________________________________________________________ instance_normalization_13 (Inst (None, 64, 64, 256)  512         conv2d_13[0][0] __________________________________________________________________________________________________ concatenate_5 (Concatenate)     (None, 64, 64, 1536) 0           instance_normalization_13[0][0]                                                                  concatenate_4[0][0] __________________________________________________________________________________________________ conv2d_14 (Conv2D)              (None, 64, 64, 256)  3539200     concatenate_5[0][0] __________________________________________________________________________________________________ instance_normalization_14 (Inst (None, 64, 64, 256)  512         conv2d_14[0][0] __________________________________________________________________________________________________ activation_9 (Activation)       (None, 64, 64, 256)  0           instance_normalization_14[0][0] __________________________________________________________________________________________________ conv2d_15 (Conv2D)              (None, 64, 64, 256)  590080      activation_9[0][0] __________________________________________________________________________________________________ instance_normalization_15 (Inst (None, 64, 64, 256)  512         conv2d_15[0][0] __________________________________________________________________________________________________ concatenate_6 (Concatenate)     (None, 64, 64, 1792) 0           instance_normalization_15[0][0]                                                                  concatenate_5[0][0] __________________________________________________________________________________________________ conv2d_16 (Conv2D)              (None, 64, 64, 256)  4129024     concatenate_6[0][0] __________________________________________________________________________________________________ instance_normalization_16 (Inst (None, 64, 64, 256)  512         conv2d_16[0][0] __________________________________________________________________________________________________ activation_10 (Activation)      (None, 64, 64, 256)  0           instance_normalization_16[0][0] __________________________________________________________________________________________________ conv2d_17 (Conv2D)              (None, 64, 64, 256)  590080      activation_10[0][0] __________________________________________________________________________________________________ instance_normalization_17 (Inst (None, 64, 64, 256)  512         conv2d_17[0][0] __________________________________________________________________________________________________ concatenate_7 (Concatenate)     (None, 64, 64, 2048) 0           instance_normalization_17[0][0]                                                                  concatenate_6[0][0] __________________________________________________________________________________________________ conv2d_18 (Conv2D)              (None, 64, 64, 256)  4718848     concatenate_7[0][0] __________________________________________________________________________________________________ instance_normalization_18 (Inst (None, 64, 64, 256)  512         conv2d_18[0][0] __________________________________________________________________________________________________ activation_11 (Activation)      (None, 64, 64, 256)  0           instance_normalization_18[0][0] __________________________________________________________________________________________________ conv2d_19 (Conv2D)              (None, 64, 64, 256)  590080      activation_11[0][0] __________________________________________________________________________________________________ instance_normalization_19 (Inst (None, 64, 64, 256)  512         conv2d_19[0][0] __________________________________________________________________________________________________ concatenate_8 (Concatenate)     (None, 64, 64, 2304) 0           instance_normalization_19[0][0]                                                                  concatenate_7[0][0] __________________________________________________________________________________________________ conv2d_20 (Conv2D)              (None, 64, 64, 256)  5308672     concatenate_8[0][0] __________________________________________________________________________________________________ instance_normalization_20 (Inst (None, 64, 64, 256)  512         conv2d_20[0][0] __________________________________________________________________________________________________ activation_12 (Activation)      (None, 64, 64, 256)  0           instance_normalization_20[0][0] __________________________________________________________________________________________________ conv2d_21 (Conv2D)              (None, 64, 64, 256)  590080      activation_12[0][0] __________________________________________________________________________________________________ instance_normalization_21 (Inst (None, 64, 64, 256)  512         conv2d_21[0][0] __________________________________________________________________________________________________ concatenate_9 (Concatenate)     (None, 64, 64, 2560) 0           instance_normalization_21[0][0]                                                                  concatenate_8[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 128, 128, 128 2949248     concatenate_9[0][0] __________________________________________________________________________________________________ instance_normalization_22 (Inst (None, 128, 128, 128 256         conv2d_transpose_1[0][0] __________________________________________________________________________________________________ activation_13 (Activation)      (None, 128, 128, 128 0           instance_normalization_22[0][0] __________________________________________________________________________________________________ conv2d_transpose_2 (Conv2DTrans (None, 256, 256, 64) 73792       activation_13[0][0] __________________________________________________________________________________________________ instance_normalization_23 (Inst (None, 256, 256, 64) 128         conv2d_transpose_2[0][0] __________________________________________________________________________________________________ activation_14 (Activation)      (None, 256, 256, 64) 0           instance_normalization_23[0][0] __________________________________________________________________________________________________ conv2d_22 (Conv2D)              (None, 256, 256, 3)  9411        activation_14[0][0] __________________________________________________________________________________________________ instance_normalization_24 (Inst (None, 256, 256, 3)  6           conv2d_22[0][0] __________________________________________________________________________________________________ activation_15 (Activation)      (None, 256, 256, 3)  0           instance_normalization_24[0][0] ================================================================================================== Total params: 35,276,553 Trainable params: 35,276,553 Non-trainable params: 0 __________________________________________________________________________________________________

A Plot of the generator model is also created, showing the skip connections in the ResNet blocks.

Plot of the Generator Model for the CycleGAN

Plot of the Generator Model for the CycleGAN

How to Implement Composite Models for Least Squares and Cycle Loss

The generator models are not updated directly. Instead, the generator models are updated via composite models.

An update to each generator model involves changes to the model weights based on four concerns:

  • Adversarial loss (L2 or mean squared error).
  • Identity loss (L1 or mean absolute error).
  • Forward cycle loss (L1 or mean absolute error).
  • Backward cycle loss (L1 or mean absolute error).

The adversarial loss is the standard approach for updating the generator via the discriminator, although in this case, the least squares loss function is used instead of the negative log likelihood (e.g. binary cross entropy).

First, we can use our function to define the two generators and two discriminators used in the CycleGAN.

... # input shape image_shape = (256,256,3) # generator: A -> B g_model_AtoB = define_generator(image_shape) # generator: B -> A g_model_BtoA = define_generator(image_shape) # discriminator: A -> [real/fake] d_model_A = define_discriminator(image_shape) # discriminator: B -> [real/fake] d_model_B = define_discriminator(image_shape)

A composite model is required for each generator model that is responsible for only updating the weights of that generator model, although it is required to share the weights with the related discriminator model and the other generator model.

This can be achieved by marking the weights of the other models as not trainable in the context of the composite model to ensure we are only updating the intended generator.

... # ensure the model we're updating is trainable g_model_1.trainable = True # mark discriminator as not trainable d_model.trainable = False # mark other generator model as not trainable g_model_2.trainable = False

The model can be constructed piecewise using the Keras functional API.

The first step is to define the input of the real image from the source domain, pass it through our generator model, then connect the output of the generator to the discriminator and classify it as real or fake.

... # discriminator element input_gen = Input(shape=image_shape) gen1_out = g_model_1(input_gen) output_d = d_model(gen1_out)

Next, we can connect the identity mapping element with a new input for the real image from the target domain, pass it through our generator model, and output the (hopefully) untranslated image directly.

... # identity element input_id = Input(shape=image_shape) output_id = g_model_1(input_id)

So far, we have a composite model with two real image inputs and a discriminator classification and identity image output. Next, we need to add the forward and backward cycles.

The forward cycle can be achieved by connecting the output of our generator to the other generator, the output of which can be compared to the input to our generator and should be identical.

... # forward cycle output_f = g_model_2(gen1_out)

The backward cycle is more complex and involves the input for the real image from the target domain passing through the other generator, then passing through our generator, which should match the real image from the target domain.

... # backward cycle gen2_out = g_model_2(input_id) output_b = g_model_1(gen2_out)

That’s it.

We can then define this composite model with two inputs: one real image for the source and the target domain, and four outputs, one for the discriminator, one for the generator for the identity mapping, one for the other generator for the forward cycle, and one from our generator for the backward cycle.

... # define model graph model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b])

The adversarial loss for the discriminator output uses least squares loss which is implemented as L2 or mean squared error. The outputs from the generators are compared to images and are optimized using L1 loss implemented as mean absolute error.

The generator is updated as a weighted average of the four loss values. The adversarial loss is weighted normally, whereas the forward and backward cycle loss is weighted using a parameter called lambda and is set to 10, e.g. 10 times more important than adversarial loss. The identity loss is also weighted as a fraction of the lambda parameter and is set to 0.5 * 10 or 5 in the official Torch implementation.

... # compile model with weighting of least squares loss and L1 loss model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt)

We can tie all of this together and define the function define_composite_model() for creating a composite model for training a given generator model.

# define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): 	# ensure the model we're updating is trainable 	g_model_1.trainable = True 	# mark discriminator as not trainable 	d_model.trainable = False 	# mark other generator model as not trainable 	g_model_2.trainable = False 	# discriminator element 	input_gen = Input(shape=image_shape) 	gen1_out = g_model_1(input_gen) 	output_d = d_model(gen1_out) 	# identity element 	input_id = Input(shape=image_shape) 	output_id = g_model_1(input_id) 	# forward cycle 	output_f = g_model_2(gen1_out) 	# backward cycle 	gen2_out = g_model_2(input_id) 	output_b = g_model_1(gen2_out) 	# define model graph 	model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) 	# define optimization algorithm configuration 	opt = Adam(lr=0.0002, beta_1=0.5) 	# compile model with weighting of least squares loss and L1 loss 	model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) 	return model

This function can then be called to prepare a composite model for training both the g_model_AtoB generator model and the g_model_BtoA model; for example:

... # composite: A -> B -> [real/fake, A] c_model_AtoBtoA = define_composite_model(g_model_AtoB, d_model_B, g_model_BtoA, image_shape) # composite: B -> A -> [real/fake, B] c_model_BtoAtoB = define_composite_model(g_model_BtoA, d_model_A, g_model_AtoB, image_shape)

Summarizing and plotting the composite model is a bit of a mess as it does not help to see the inputs and outputs of the model clearly.

We can summarize the inputs and outputs for each of the composite models below. Recall that we are sharing or reusing the same set of weights if a given model is used more than once in the composite model.

Generator-A Composite Model

Only Generator-A weights are trainable and weights for other models and not trainable.

  • Adversarial Loss: Domain-B -> Generator-A -> Domain-A -> Discriminator-A -> [real/fake]
  • Identity Loss: Domain-A -> Generator-A -> Domain-A
  • Forward Cycle Loss: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B
  • Backward Cycle Loss: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

Generator-B Composite Model

Only Generator-B weights are trainable and weights for other models are not trainable.

  • Adversarial Loss: Domain-A -> Generator-B -> Domain-B -> Discriminator-B -> [real/fake]
  • Identity Loss: Domain-B -> Generator-B -> Domain-B
  • Forward Cycle Loss: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A
  • Backward Cycle Loss: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B

A complete example of creating all of the models is listed below for completeness.

# example of defining composite models for training cyclegan generators from keras.optimizers import Adam from keras.models import Model from keras.models import Sequential from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import Activation from keras.layers import LeakyReLU from keras.initializers import RandomNormal from keras.layers import Concatenate from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_image = Input(shape=image_shape) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	# define model 	model = Model(in_image, patch_out) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) 	return model  # generator a resnet block def resnet_block(n_filters, input_layer): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# first layer convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# second convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	# concatenate merge channel-wise with input layer 	g = Concatenate()([g, input_layer]) 	return g  # define the standalone generator model def define_generator(image_shape, n_resnet=9): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# c7s1-64 	g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d128 	g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d256 	g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# R256 	for _ in range(n_resnet): 		g = resnet_block(256, g) 	# u128 	g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# u64 	g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# c7s1-3 	g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): 	# ensure the model we're updating is trainable 	g_model_1.trainable = True 	# mark discriminator as not trainable 	d_model.trainable = False 	# mark other generator model as not trainable 	g_model_2.trainable = False 	# discriminator element 	input_gen = Input(shape=image_shape) 	gen1_out = g_model_1(input_gen) 	output_d = d_model(gen1_out) 	# identity element 	input_id = Input(shape=image_shape) 	output_id = g_model_1(input_id) 	# forward cycle 	output_f = g_model_2(gen1_out) 	# backward cycle 	gen2_out = g_model_2(input_id) 	output_b = g_model_1(gen2_out) 	# define model graph 	model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) 	# define optimization algorithm configuration 	opt = Adam(lr=0.0002, beta_1=0.5) 	# compile model with weighting of least squares loss and L1 loss 	model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) 	return model  # input shape image_shape = (256,256,3) # generator: A -> B g_model_AtoB = define_generator(image_shape) # generator: B -> A g_model_BtoA = define_generator(image_shape) # discriminator: A -> [real/fake] d_model_A = define_discriminator(image_shape) # discriminator: B -> [real/fake] d_model_B = define_discriminator(image_shape) # composite: A -> B -> [real/fake, A] c_model_AtoB = define_composite_model(g_model_AtoB, d_model_B, g_model_BtoA, image_shape) # composite: B -> A -> [real/fake, B] c_model_BtoA = define_composite_model(g_model_BtoA, d_model_A, g_model_AtoB, image_shape)

How to Update Discriminator and Generator Models

Training the defined models is relatively straightforward.

First, we must define a helper function that will select a batch of real images and the associated target (1.0).

# select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): 	# choose random instances 	ix = randint(0, dataset.shape[0], n_samples) 	# retrieve selected images 	X = dataset[ix] 	# generate 'real' class labels (1) 	y = ones((n_samples, patch_shape, patch_shape, 1)) 	return X, y

Similarly, we need a function to generate a batch of fake images and the associated target (0.0).

# generate a batch of images, returns images and targets def generate_fake_samples(g_model, dataset, patch_shape): 	# generate fake instance 	X = g_model.predict(dataset) 	# create 'fake' class labels (0) 	y = zeros((len(X), patch_shape, patch_shape, 1)) 	return X, y

Now, we can define the steps of a single training iteration. We will model the order of updates based on the implementation in the official Torch implementation in the OptimizeParameters() function (Note: the official code uses a more confusing inverted naming convention).

  1. Update Generator-B (A->B)
  2. Update Discriminator-B
  3. Update Generator-A (B->A)
  4. Update Discriminator-A

First, we must select a batch of real images by calling generate_real_samples() for both Domain-A and Domain-B.

Typically, the batch size (n_batch) is set to 1. In this case, we will assume 256×256 input images, which means the n_patch for the PatchGAN discriminator will be 16.

... # select a batch of real samples X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch)

Next, we can use the batches of selected real images to generate corresponding batches of generated or fake images.

... # generate a batch of fake samples X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch)

The paper describes using a pool of previously generated images from which examples are randomly selected and used to update the discriminator model, where the pool size was set to 50 images.

… [we] update the discriminators using a history of generated images rather than the ones produced by the latest generators. We keep an image buffer that stores the 50 previously created images.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be implemented using a list for each domain and a using a function to populate the pool, then randomly replace elements from the pool once it is at capacity.

The update_image_pool() function below implements this based on the official Torch implementation in image_pool.lua.

# update image pool for fake images def update_image_pool(pool, images, max_size=50): 	selected = list() 	for image in images: 		if len(pool) < max_size: 			# stock the pool 			pool.append(image) 			selected.append(image) 		elif random() < 0.5: 			# use image, but don't add it to the pool 			selected.append(image) 		else: 			# replace an existing image and use replaced image 			ix = randint(0, len(pool)) 			selected.append(pool[ix]) 			pool[ix] = image 	return asarray(selected)

We can then update our image pool with generated fake images, the results of which can be used to train the discriminator models.

... # update fakes from pool X_fakeA = update_image_pool(poolA, X_fakeA) X_fakeB = update_image_pool(poolB, X_fakeB)

Next, we can update Generator-A.

The train_on_batch() function will return a value for each of the four loss functions, one for each output, as well as the weighted sum (first value) used to update the model weights which we are interested in.

... # update generator B->A via adversarial and cycle loss g_loss2, _, _, _, _  = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA])

We can then update the discriminator model using the fake images that may or may not have come from the image pool.

... # update discriminator for A -> [real/fake] dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA)

We can then do the same for the other generator and discriminator models.

... # update generator A->B via adversarial and cycle loss g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) # update discriminator for B -> [real/fake] dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB)

At the end of the training run, we can then report the current loss for the discriminator models on real and fake images and of each generator model.

... # summarize performance print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2))

Tying this all together, we can define a function named train() that takes an instance of each of the defined models and a loaded dataset (list of two NumPy arrays, one for each domain) and trains the model.

A batch size of 1 is used as is described in the paper and the models are fit for 100 training epochs.

# train cyclegan models def train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset): 	# define properties of the training run 	n_epochs, n_batch, = 100, 1 	# determine the output square shape of the discriminator 	n_patch = d_model_A.output_shape[1] 	# unpack dataset 	trainA, trainB = dataset 	# prepare image pool for fakes 	poolA, poolB = list(), list() 	# calculate the number of batches per training epoch 	bat_per_epo = int(len(trainA) / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# manually enumerate epochs 	for i in range(n_steps): 		# select a batch of real samples 		X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) 		X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch) 		# generate a batch of fake samples 		X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) 		X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch) 		# update fakes from pool 		X_fakeA = update_image_pool(poolA, X_fakeA) 		X_fakeB = update_image_pool(poolB, X_fakeB) 		# update generator B->A via adversarial and cycle loss 		g_loss2, _, _, _, _  = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA]) 		# update discriminator for A -> [real/fake] 		dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) 		dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA) 		# update generator A->B via adversarial and cycle loss 		g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) 		# update discriminator for B -> [real/fake] 		dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) 		dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB) 		# summarize performance 		print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2))

The train function can then be called directly with our defined models and loaded dataset.

... # load a dataset as a list of two numpy arrays dataset = ... # train models train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset)

As an improvement, it may be desirable to combine the update to each discriminator model into a single operation as is performed in the fDx_basic() function of the official implementation.

Additionally, the paper describes updating the models for another 100 epochs (200 in total), where the learning rate is decayed to 0.0. This too can be added as a minor extension to the training process.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

API

Projects

Articles

Summary

In this tutorial, you discovered how to implement the CycleGAN architecture from scratch using the Keras deep learning framework.

Specifically, you learned:

  • How to implement the discriminator and generator models.
  • How to define composite models to train the generator models via adversarial and cycle loss.
  • How to implement the training process to update model weights each training iteration.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement CycleGAN Models From Scratch With Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

How to Implement Pix2Pix GAN Models From Scratch With Keras

The Pix2Pix GAN is a generator model for performing image-to-image translation trained on paired examples.

For example, the model can be used to translate images of daytime to nighttime, or from sketches of products like shoes to photographs of products.

The benefit of the Pix2Pix model is that compared to other GANs for conditional image generation, it is relatively simple and capable of generating large high-quality images across a variety of image translation tasks.

The model is very impressive but has an architecture that appears somewhat complicated to implement for beginners.

In this tutorial, you will discover how to implement the Pix2Pix GAN architecture from scratch using the Keras deep learning framework.

After completing this tutorial, you will know:

  • How to develop the PatchGAN discriminator model for the Pix2Pix GAN.
  • How to develop the U-Net encoder-decoder generator model for the Pix2Pix GAN.
  • How to implement the composite model for updating the generator and how to train both models.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Implement Pix2Pix GAN Models From Scratch With Keras

How to Implement Pix2Pix GAN Models From Scratch With Keras
Photo by Ray in Manila, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Pix2Pix GAN?
  2. How to Implement the PatchGAN Discriminator Model
  3. How to Implement the U-Net Generator Model
  4. How to Implement Adversarial and L1 Loss
  5. How to Update Model Weights

What Is the Pix2Pix GAN?

Pix2Pix is a Generative Adversarial Network, or GAN, model designed for general purpose image-to-image translation.

The approach was presented by Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” and presented at CVPR in 2017.

The GAN architecture is comprised of a generator model for outputting new plausible synthetic images and a discriminator model that classifies images as real (from the dataset) or fake (generated). The discriminator model is updated directly, whereas the generator model is updated via the discriminator model. As such, the two models are trained simultaneously in an adversarial process where the generator seeks to better fool the discriminator and the discriminator seeks to better identify the counterfeit images.

The Pix2Pix model is a type of conditional GAN, or cGAN, where the generation of the output image is conditional on an input, in this case, a source image. The discriminator is provided both with a source image and the target image and must determine whether the target is a plausible transformation of the source image.

Again, the discriminator model is updated directly, and the generator model is updated via the discriminator model, although the loss function is updated. The generator is trained via adversarial loss, which encourages the generator to generate plausible images in the target domain. The generator is also updated via L1 loss measured between the generated image and the expected output image. This additional loss encourages the generator model to create plausible translations of the source image.

The Pix2Pix GAN has been demonstrated on a range of image-to-image translation tasks such as converting maps to satellite photographs, black and white photographs to color, and sketches of products to product photographs.

Now that we are familiar with the Pix2Pix GAN, let’s explore how we can implement it using the Keras deep learning library.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the PatchGAN Discriminator Model

The discriminator model in the Pix2Pix GAN is implemented as a PatchGAN.

The PatchGAN is designed based on the size of the receptive field, sometimes called the effective receptive field. The receptive field is the relationship between one output activation of the model to an area on the input image (actually volume as it proceeded down the input channels).

A PatchGAN with the size 70×70 is used, which means that the output (or each output) of the model maps to a 70×70 square of the input image. In effect, a 70×70 PatchGAN will classify 70×70 patches of the input image as real or fake.

… we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each NxN patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Before we dive into the configuration details of the PatchGAN, it is important to get a handle on the calculation of the receptive field.

The receptive field is not the size of the output of the discriminator model, e.g. it does not refer to the shape of the activation map output by the model. It is a definition of the model in terms of one pixel in the output activation map to the input image. The output of the model may be a single value or a square activation map of values that predict whether each patch of the input image is real or fake.

Traditionally, the receptive field refers to the size of the activation map of a single convolutional layer with regards to the input of the layer, the size of the filter, and the size of the stride. The effective receptive field generalizes this idea and calculates the receptive field for the output of a stack of convolutional layers with regard to the raw image input. The terms are often used interchangeably.

The authors of the Pix2Pix GAN provide a Matlab script to calculate the effective receptive field size for different model configurations in a script called receptive_field_sizes.m. It can be helpful to work through an example for the 70×70 PatchGAN receptive field calculation.

The 70×70 PatchGAN has a fixed number of three layers (excluding the output and second last layers), regardless of the size of the input image. The calculation of the receptive field in one dimension is calculated as:

  • receptive field = (output size – 1) * stride + kernel size

Where output size is the size of the prior layers activation map, stride is the number of pixels the filter is moved when applied to the activation, and kernel size is the size of the filter to be applied.

The PatchGAN uses a fixed stride of 2×2 (except in the output and second last layers) and a fixed kernel size of 4×4. We can, therefore, calculate the receptive field size starting with one pixel in the output of the model and working backward to the input image.

We can develop a Python function called receptive_field() to calculate the receptive field, then calculate and print the receptive field for each layer in the Pix2Pix PatchGAN model. The complete example is listed below.

# example of calculating the receptive field for the PatchGAN  # calculate the effective receptive field size def receptive_field(output_size, kernel_size, stride_size):     return (output_size - 1) * stride_size + kernel_size  # output layer 1x1 pixel with 4x4 kernel and 1x1 stride rf = receptive_field(1, 4, 1) print(rf) # second last layer with 4x4 kernel and 1x1 stride rf = receptive_field(rf, 4, 1) print(rf) # 3 PatchGAN layers with 4x4 kernel and 2x2 stride rf = receptive_field(rf, 4, 2) print(rf) rf = receptive_field(rf, 4, 2) print(rf) rf = receptive_field(rf, 4, 2) print(rf)

Running the example prints the size of the receptive field for each layer in the model from the output layer to the input layer.

We can see that each 1×1 pixel in the output layer maps to a 70×70 receptive field in the input layer.

4 7 16 34 70

The authors of the Pix2Pix paper explore different PatchGAN configurations, including a 1×1 receptive field called a PixelGAN and a receptive field that matches the 256×256 pixel images input to the model (resampled to 286×286) called an ImageGAN. They found that the 70×70 PatchGAN resulted in the best trade-off of performance and image quality.

The 70×70 PatchGAN […] achieves slightly better scores. Scaling beyond this, to the full 286×286 ImageGAN, does not appear to improve the visual quality of the results.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The configuration for the PatchGAN is provided in the appendix of the paper and can be confirmed by reviewing the defineD_n_layers() function in the official Torch implementation.

The model takes two images as input, specifically a source and a target image. These images are concatenated together at the channel level, e.g. 3 color channels of each image become 6 channels of the input.

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. […] All convolutions are 4× 4 spatial filters applied with stride 2. […] The 70 × 70 discriminator architecture is: C64-C128-C256-C512. After the last layer, a convolution is applied to map to a 1-dimensional output, followed by a Sigmoid function. As an exception to the above notation, BatchNorm is not applied to the first C64 layer. All ReLUs are leaky, with slope 0.2.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The PatchGAN configuration is defined using a shorthand notation as: C64-C128-C256-C512, where C refers to a block of Convolution-BatchNorm-LeakyReLU layers and the number indicates the number of filters. Batch normalization is not used in the first layer. As mentioned, the kernel size is fixed at 4×4 and a stride of 2×2 is used on all but the last 2 layers of the model. The slope of the LeakyReLU is set to 0.2, and a sigmoid activation function is used in the output layer.

Random jitter was applied by resizing the 256×256 input images to 286 × 286 and then randomly cropping back to size 256 × 256. Weights were initialized from a Gaussian distribution with mean 0 and standard deviation 0.02.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Model weights were initialized via random Gaussian with a mean of 0.0 and standard deviation of 0.02. Images input to the model are 256×256.

… we divide the objective by 2 while optimizing D, which slows down the rate at which D learns relative to G. We use minibatch SGD and apply the Adam solver, with a learning rate of 0.0002, and momentum parameters β1 = 0.5, β2 = 0.999.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The model is trained with a batch size of one image and the Adam version of stochastic gradient descent is used with a small learning range and modest momentum. The loss for the discriminator is weighted by 50% for each model update.

Tying this all together, we can define a function named define_discriminator() that creates the 70×70 PatchGAN discriminator model.

The complete example of defining the model is listed below.

# example of defining a 70x70 patchgan discriminator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import BatchNormalization from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_src_image = Input(shape=image_shape) 	# target image input 	in_target_image = Input(shape=image_shape) 	# concatenate images channel-wise 	merged = Concatenate()([in_src_image, in_target_image]) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	patch_out = Activation('sigmoid')(d) 	# define model 	model = Model([in_src_image, in_target_image], patch_out) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5]) 	return model  # define image shape image_shape = (256,256,3) # create the model model = define_discriminator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='discriminator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model, providing insight into how the input shape is transformed across the layers and the number of parameters in the model.

We can see that the two input images are concatenated together to create one 256x256x6 input to the first hidden convolutional layer. This concatenation of input images could occur before the input layer of the model, but allowing the model to perform the concatenation makes the behavior of the model clearer.

We can see that the model output will be an activation map with the size 16×16 pixels or activations and a single channel, with each value in the map corresponding to a 70×70 pixel patch of the input 256×256 image. If the input image was half the size at 128×128, then the output feature map would also be halved to 8×8.

The model is a binary classification model, meaning it predicts an output as a probability in the range [0,1], in this case, the likelihood of whether the input image is real or from the target dataset. The patch of values can be averaged to give a real/fake prediction by the model. When trained, the target is compared to a matrix of target values, 0 for fake and 1 for real.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ input_2 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 256, 256, 6)  0           input_1[0][0]                                                                  input_2[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 128, 128, 64) 6208        concatenate_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU)       (None, 128, 128, 64) 0           conv2d_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 64, 64, 128)  131200      leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 64, 64, 128)  512         conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 64, 64, 128)  0           batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 32, 32, 256)  524544      leaky_re_lu_2[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 32, 32, 256)  1024        conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 32, 32, 256)  0           batch_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 16, 16, 512)  2097664     leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 16, 16, 512)  2048        conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 16, 16, 512)  0           batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 16, 16, 512)  4194816     leaky_re_lu_4[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 16, 16, 512)  2048        conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 16, 16, 512)  0           batch_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 16, 16, 1)    8193        leaky_re_lu_5[0][0] __________________________________________________________________________________________________ activation_1 (Activation)       (None, 16, 16, 1)    0           conv2d_6[0][0] ================================================================================================== Total params: 6,968,257 Trainable params: 6,965,441 Non-trainable params: 2,816 __________________________________________________________________________________________________

A plot of the model is created showing much the same information in a graphical form. The model is not complex, with a linear path with two input images and a single output prediction.

Note: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the plot_model() function.

Plot of the PatchGAN Model Used in the Pix2Pix GAN Architecture

Plot of the PatchGAN Model Used in the Pix2Pix GAN Architecture

Now that we know how to implement the PatchGAN discriminator model, we can now look at implementing the U-Net generator model.

How to Implement the U-Net Generator Model

The generator model for the Pix2Pix GAN is implemented as a U-Net.

The U-Net model is an encoder-decoder model for image translation where skip connections are used to connect layers in the encoder with corresponding layers in the decoder that have the same sized feature maps.

The encoder part of the model is comprised of convolutional layers that use a 2×2 stride to downsample the input source image down to a bottleneck layer. The decoder part of the model reads the bottleneck output and uses transpose convolutional layers to upsample to the required output image size.

… the input is passed through a series of layers that progressively downsample, until a bottleneck layer, at which point the process is reversed.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Architecture of the U-Net Generator Model

Architecture of the U-Net Generator Model
Taken from Image-to-Image Translation With Conditional Adversarial Networks.

Skip connections are added between the layers with the same sized feature maps so that the first downsampling layer is connected with the last upsampling layer, the second downsampling layer is connected with the second last upsampling layer, and so on. The connections concatenate the channels of the feature map in the downsampling layer with the feature map in the upsampling layer.

Specifically, we add skip connections between each layer i and layer n − i, where n is the total number of layers. Each skip connection simply concatenates all channels at layer i with those at layer n − i.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Unlike traditional generator models in the GAN architecture, the U-Net generator does not take a point from the latent space as input. Instead, dropout layers are used as a source of randomness both during training and when the model is used to make a prediction, e.g. generate an image at inference time.

Similarly, batch normalization is used in the same way during training and inference, meaning that statistics are calculated for each batch and not fixed at the end of the training process. This is referred to as instance normalization, specifically when the batch size is set to 1 as it is with the Pix2Pix model.

At inference time, we run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that we apply dropout at test time, and we apply batch normalization using the statistics of the test batch, rather than aggregated statistics of the training batch.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

In Keras, layers like Dropout and BatchNormalization operate differently during training and in inference model. We can set the “training” argument when calling these layers to “True” to ensure that they always operate in training-model, even when used during inference.

For example, a Dropout layer that will drop out during inference as well as training can be added to the model as follows:

... g = Dropout(0.5)(g, training=True)

As with the discriminator model, the configuration details of the generator model are defined in the appendix of the paper and can be confirmed when comparing against the defineG_unet() function in the official Torch implementation.

The encoder uses blocks of Convolution-BatchNorm-LeakyReLU like the discriminator model, whereas the decoder model uses blocks of Convolution-BatchNorm-Dropout-ReLU with a dropout rate of 50%. All convolutional layers use a filter size of 4×4 and a stride of 2×2.

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNormDropout-ReLU layer with a dropout rate of 50%. All convolutions are 4× 4 spatial filters applied with stride 2.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The architecture of the U-Net model is defined using the shorthand notation as:

  • Encoder: C64-C128-C256-C512-C512-C512-C512-C512
  • Decoder: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128

The last layer of the encoder is the bottleneck layer, which does not use batch normalization, according to an amendment to the paper and confirmation in the code, and uses a ReLU activation instead of LeakyRelu.

… the activations of the bottleneck layer are zeroed by the batchnorm operation, effectively making the innermost layer skipped. This issue can be fixed by removing batchnorm from this layer, as has been done in the public code

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The number of filters in the U-Net decoder is a little misleading as it is the number of filters for the layer after concatenation with the equivalent layer in the encoder. This may become more clear when we create a plot of the model.

The output of the model uses a single convolutional layer with three channels, and tanh activation function is used in the output layer, common to GAN generator models. Batch normalization is not used in the first layer of the decoder.

After the last layer in the decoder, a convolution is applied to map to the number of output channels (3 in general […]), followed by a Tanh function […] BatchNorm is not applied to the first C64 layer in the encoder. All ReLUs in the encoder are leaky, with slope 0.2, while ReLUs in the decoder are not leaky.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Tying this all together, we can define a function named define_generator() that defines the U-Net encoder-decoder generator model. Two helper functions are also provided for defining encoder blocks of layers and decoder blocks of layers.

The complete example of defining the model is listed below.

# example of defining a u-net encoder-decoder generator model from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import Dropout from keras.layers import BatchNormalization from keras.layers import LeakyReLU from keras.utils.vis_utils import plot_model  # define an encoder block def define_encoder_block(layer_in, n_filters, batchnorm=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add downsampling layer 	g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# conditionally add batch normalization 	if batchnorm: 		g = BatchNormalization()(g, training=True) 	# leaky relu activation 	g = LeakyReLU(alpha=0.2)(g) 	return g  # define a decoder block def decoder_block(layer_in, skip_in, n_filters, dropout=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add upsampling layer 	g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# add batch normalization 	g = BatchNormalization()(g, training=True) 	# conditionally add dropout 	if dropout: 		g = Dropout(0.5)(g, training=True) 	# merge with skip connection 	g = Concatenate()([g, skip_in]) 	# relu activation 	g = Activation('relu')(g) 	return g  # define the standalone generator model def define_generator(image_shape=(256,256,3)): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# encoder model: C64-C128-C256-C512-C512-C512-C512-C512 	e1 = define_encoder_block(in_image, 64, batchnorm=False) 	e2 = define_encoder_block(e1, 128) 	e3 = define_encoder_block(e2, 256) 	e4 = define_encoder_block(e3, 512) 	e5 = define_encoder_block(e4, 512) 	e6 = define_encoder_block(e5, 512) 	e7 = define_encoder_block(e6, 512) 	# bottleneck, no batch norm and relu 	b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7) 	b = Activation('relu')(b) 	# decoder model: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128 	d1 = decoder_block(b, e7, 512) 	d2 = decoder_block(d1, e6, 512) 	d3 = decoder_block(d2, e5, 512) 	d4 = decoder_block(d3, e4, 512, dropout=False) 	d5 = decoder_block(d4, e3, 256, dropout=False) 	d6 = decoder_block(d5, e2, 128, dropout=False) 	d7 = decoder_block(d6, e1, 64, dropout=False) 	# output 	g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # define image shape image_shape = (256,256,3) # create the model model = define_generator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='generator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model.

The model has a single input and output, but the skip connections make the summary difficult to read.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 128, 128, 64) 3136        input_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU)       (None, 128, 128, 64) 0           conv2d_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 64, 64, 128)  131200      leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 64, 64, 128)  512         conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 64, 64, 128)  0           batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 32, 32, 256)  524544      leaky_re_lu_2[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 32, 32, 256)  1024        conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 32, 32, 256)  0           batch_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 16, 16, 512)  2097664     leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 16, 16, 512)  2048        conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 16, 16, 512)  0           batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 512)    4194816     leaky_re_lu_4[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 512)    2048        conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 8, 8, 512)    0           batch_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 4, 4, 512)    4194816     leaky_re_lu_5[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 4, 4, 512)    2048        conv2d_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 4, 4, 512)    0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 2, 2, 512)    4194816     leaky_re_lu_6[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 2, 2, 512)    2048        conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU)       (None, 2, 2, 512)    0           batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 1, 1, 512)    4194816     leaky_re_lu_7[0][0] __________________________________________________________________________________________________ activation_1 (Activation)       (None, 1, 1, 512)    0           conv2d_8[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 2, 2, 512)    4194816     activation_1[0][0] __________________________________________________________________________________________________ batch_normalization_7 (BatchNor (None, 2, 2, 512)    2048        conv2d_transpose_1[0][0] __________________________________________________________________________________________________ dropout_1 (Dropout)             (None, 2, 2, 512)    0           batch_normalization_7[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 2, 2, 1024)   0           dropout_1[0][0]                                                                  leaky_re_lu_7[0][0] __________________________________________________________________________________________________ activation_2 (Activation)       (None, 2, 2, 1024)   0           concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_transpose_2 (Conv2DTrans (None, 4, 4, 512)    8389120     activation_2[0][0] __________________________________________________________________________________________________ batch_normalization_8 (BatchNor (None, 4, 4, 512)    2048        conv2d_transpose_2[0][0] __________________________________________________________________________________________________ dropout_2 (Dropout)             (None, 4, 4, 512)    0           batch_normalization_8[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate)     (None, 4, 4, 1024)   0           dropout_2[0][0]                                                                  leaky_re_lu_6[0][0] __________________________________________________________________________________________________ activation_3 (Activation)       (None, 4, 4, 1024)   0           concatenate_2[0][0] __________________________________________________________________________________________________ conv2d_transpose_3 (Conv2DTrans (None, 8, 8, 512)    8389120     activation_3[0][0] __________________________________________________________________________________________________ batch_normalization_9 (BatchNor (None, 8, 8, 512)    2048        conv2d_transpose_3[0][0] __________________________________________________________________________________________________ dropout_3 (Dropout)             (None, 8, 8, 512)    0           batch_normalization_9[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate)     (None, 8, 8, 1024)   0           dropout_3[0][0]                                                                  leaky_re_lu_5[0][0] __________________________________________________________________________________________________ activation_4 (Activation)       (None, 8, 8, 1024)   0           concatenate_3[0][0] __________________________________________________________________________________________________ conv2d_transpose_4 (Conv2DTrans (None, 16, 16, 512)  8389120     activation_4[0][0] __________________________________________________________________________________________________ batch_normalization_10 (BatchNo (None, 16, 16, 512)  2048        conv2d_transpose_4[0][0] __________________________________________________________________________________________________ concatenate_4 (Concatenate)     (None, 16, 16, 1024) 0           batch_normalization_10[0][0]                                                                  leaky_re_lu_4[0][0] __________________________________________________________________________________________________ activation_5 (Activation)       (None, 16, 16, 1024) 0           concatenate_4[0][0] __________________________________________________________________________________________________ conv2d_transpose_5 (Conv2DTrans (None, 32, 32, 256)  4194560     activation_5[0][0] __________________________________________________________________________________________________ batch_normalization_11 (BatchNo (None, 32, 32, 256)  1024        conv2d_transpose_5[0][0] __________________________________________________________________________________________________ concatenate_5 (Concatenate)     (None, 32, 32, 512)  0           batch_normalization_11[0][0]                                                                  leaky_re_lu_3[0][0] __________________________________________________________________________________________________ activation_6 (Activation)       (None, 32, 32, 512)  0           concatenate_5[0][0] __________________________________________________________________________________________________ conv2d_transpose_6 (Conv2DTrans (None, 64, 64, 128)  1048704     activation_6[0][0] __________________________________________________________________________________________________ batch_normalization_12 (BatchNo (None, 64, 64, 128)  512         conv2d_transpose_6[0][0] __________________________________________________________________________________________________ concatenate_6 (Concatenate)     (None, 64, 64, 256)  0           batch_normalization_12[0][0]                                                                  leaky_re_lu_2[0][0] __________________________________________________________________________________________________ activation_7 (Activation)       (None, 64, 64, 256)  0           concatenate_6[0][0] __________________________________________________________________________________________________ conv2d_transpose_7 (Conv2DTrans (None, 128, 128, 64) 262208      activation_7[0][0] __________________________________________________________________________________________________ batch_normalization_13 (BatchNo (None, 128, 128, 64) 256         conv2d_transpose_7[0][0] __________________________________________________________________________________________________ concatenate_7 (Concatenate)     (None, 128, 128, 128 0           batch_normalization_13[0][0]                                                                  leaky_re_lu_1[0][0] __________________________________________________________________________________________________ activation_8 (Activation)       (None, 128, 128, 128 0           concatenate_7[0][0] __________________________________________________________________________________________________ conv2d_transpose_8 (Conv2DTrans (None, 256, 256, 3)  6147        activation_8[0][0] __________________________________________________________________________________________________ activation_9 (Activation)       (None, 256, 256, 3)  0           conv2d_transpose_8[0][0] ================================================================================================== Total params: 54,429,315 Trainable params: 54,419,459 Non-trainable params: 9,856 __________________________________________________________________________________________________

A plot of the model is created showing much the same information in a graphical form. The model is complex, and the plot helps to understand the skip connections and their impact on the number of filters in the decoder.

Note: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the plot_model() function.

Working backward from the output layer, if we look at the Concatenate layers and the first Conv2DTranspose layer of the decoder, we can see the number of channels as:

  • [128, 256, 512, 1024, 1024, 1024, 1024, 512].

Reversing this list gives the stated configuration of the number of filters for each layer in the decoder from the paper of:

  • CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128
Plot of the U-Net Encoder-Decoder Model Used in the Pix2Pix GAN Architecture

Plot of the U-Net Encoder-Decoder Model Used in the Pix2Pix GAN Architecture

Now that we have defined both models, we can look at how the generator model is updated via the discriminator model.

How to Implement Adversarial and L1 Loss

The discriminator model can be updated directly, whereas the generator model must be updated via the discriminator model.

This can be achieved by defining a new composite model in Keras that connects the output of the generator model as input to the discriminator model. The discriminator model can then predict whether a generated image is real or fake. We can update the weights of the composite model in such a way that the generated image has the label of “real” instead of “fake“, which will cause the generator weights to be updated towards generating a better fake image. We can also mark the discriminator weights as not trainable in this context, to avoid the misleading update.

Additionally, the generator needs to be updated to better match the targeted translation of the input image. This means that the composite model must also output the generated image directly, allowing it to be compared to the target image.

Therefore, we can summarize the inputs and outputs of this composite model as follows:

  • Inputs: Source image
  • Outputs: Classification of real/fake, generated target image.

The weights of the generator will be updated via both adversarial loss via the discriminator output and L1 loss via the direct image output. The loss scores are added together, where the L1 loss is treated as a regularizing term and weighted via a hyperparameter called lambda, set to 100.

  • loss = adversarial loss + lambda * L1 loss

The define_gan() function below implements this, taking the defined generator and discriminator models as input and creating the composite GAN model that can be used to update the generator model weights.

The source image input is provided both to the generator and the discriminator as input and the output of the generator is also connected to the discriminator as input.

Two loss functions are specified when the model is compiled for the discriminator and generator outputs respectively. The loss_weights argument is used to define the weighting of each loss when added together to update the generator model weights.

# define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model, image_shape): 	# make weights in the discriminator not trainable 	d_model.trainable = False 	# define the source image 	in_src = Input(shape=image_shape) 	# connect the source image to the generator input 	gen_out = g_model(in_src) 	# connect the source input and generator output to the discriminator input 	dis_out = d_model([in_src, gen_out]) 	# src image as input, generated image and classification output 	model = Model(in_src, [dis_out, gen_out]) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss=['binary_crossentropy', 'mae'], optimizer=opt, loss_weights=[1,100]) 	return model

Tying this together with the model definitions from the previous sections, the complete example is listed below.

# example of defining a composite model for training the generator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import Dropout from keras.layers import BatchNormalization from keras.layers import LeakyReLU from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_src_image = Input(shape=image_shape) 	# target image input 	in_target_image = Input(shape=image_shape) 	# concatenate images channel-wise 	merged = Concatenate()([in_src_image, in_target_image]) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	patch_out = Activation('sigmoid')(d) 	# define model 	model = Model([in_src_image, in_target_image], patch_out) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5]) 	return model  # define an encoder block def define_encoder_block(layer_in, n_filters, batchnorm=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add downsampling layer 	g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# conditionally add batch normalization 	if batchnorm: 		g = BatchNormalization()(g, training=True) 	# leaky relu activation 	g = LeakyReLU(alpha=0.2)(g) 	return g  # define a decoder block def decoder_block(layer_in, skip_in, n_filters, dropout=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add upsampling layer 	g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# add batch normalization 	g = BatchNormalization()(g, training=True) 	# conditionally add dropout 	if dropout: 		g = Dropout(0.5)(g, training=True) 	# merge with skip connection 	g = Concatenate()([g, skip_in]) 	# relu activation 	g = Activation('relu')(g) 	return g  # define the standalone generator model def define_generator(image_shape=(256,256,3)): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# encoder model: C64-C128-C256-C512-C512-C512-C512-C512 	e1 = define_encoder_block(in_image, 64, batchnorm=False) 	e2 = define_encoder_block(e1, 128) 	e3 = define_encoder_block(e2, 256) 	e4 = define_encoder_block(e3, 512) 	e5 = define_encoder_block(e4, 512) 	e6 = define_encoder_block(e5, 512) 	e7 = define_encoder_block(e6, 512) 	# bottleneck, no batch norm and relu 	b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7) 	b = Activation('relu')(b) 	# decoder model: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128 	d1 = decoder_block(b, e7, 512) 	d2 = decoder_block(d1, e6, 512) 	d3 = decoder_block(d2, e5, 512) 	d4 = decoder_block(d3, e4, 512, dropout=False) 	d5 = decoder_block(d4, e3, 256, dropout=False) 	d6 = decoder_block(d5, e2, 128, dropout=False) 	d7 = decoder_block(d6, e1, 64, dropout=False) 	# output 	g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model, image_shape): 	# make weights in the discriminator not trainable 	d_model.trainable = False 	# define the source image 	in_src = Input(shape=image_shape) 	# connect the source image to the generator input 	gen_out = g_model(in_src) 	# connect the source input and generator output to the discriminator input 	dis_out = d_model([in_src, gen_out]) 	# src image as input, generated image and classification output 	model = Model(in_src, [dis_out, gen_out]) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss=['binary_crossentropy', 'mae'], optimizer=opt, loss_weights=[1,100]) 	return model  # define image shape image_shape = (256,256,3) # define the models d_model = define_discriminator(image_shape) g_model = define_generator(image_shape) # define the composite model gan_model = define_gan(g_model, d_model, image_shape) # summarize the model gan_model.summary() # plot the model plot_model(gan_model, to_file='gan_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the composite model, showing the 256×256 image input, the same shaped output from model_2 (the generator) and the PatchGAN classification prediction from model_1 (the discriminator).

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_4 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ model_2 (Model)                 (None, 256, 256, 3)  54429315    input_4[0][0] __________________________________________________________________________________________________ model_1 (Model)                 (None, 16, 16, 1)    6968257     input_4[0][0]                                                                  model_2[1][0] ================================================================================================== Total params: 61,397,572 Trainable params: 54,419,459 Non-trainable params: 6,978,113 __________________________________________________________________________________________________

A plot of the composite model is also created, showing how the input image flows into the generator and discriminator, and that the model has two outputs or end-points from each of the two models.

Note: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the plot_model() function.

Plot of the Composite GAN Model Used to Train the Generator in the Pix2Pix GAN Architecture

Plot of the Composite GAN Model Used to Train the Generator in the Pix2Pix GAN Architecture

How to Update Model Weights

Training the defined models is relatively straightforward.

First, we must define a helper function that will select a batch of real source and target images and the associated output (1.0). Here, the dataset is a list of two arrays of images.

# select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): 	# unpack dataset 	trainA, trainB = dataset 	# choose random instances 	ix = randint(0, trainA.shape[0], n_samples) 	# retrieve selected images 	X1, X2 = trainA[ix], trainB[ix] 	# generate 'real' class labels (1) 	y = ones((n_samples, patch_shape, patch_shape, 1)) 	return [X1, X2], y

Similarly, we need a function to generate a batch of fake images and the associated output (0.0). Here, the samples are an array of source images for which target images will be generated.

# generate a batch of images, returns images and targets def generate_fake_samples(g_model, samples, patch_shape): 	# generate fake instance 	X = g_model.predict(samples) 	# create 'fake' class labels (0) 	y = zeros((len(X), patch_shape, patch_shape, 1)) 	return X, y

Now, we can define the steps of a single training iteration.

First, we must select a batch of source and target images by calling generate_real_samples().

Typically, the batch size (n_batch) is set to 1. In this case, we will assume 256×256 input images, which means the n_patch for the PatchGAN discriminator will be 16 to indicate a 16×16 output feature map.

... # select a batch of real samples [X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch)

Next, we can use the batches of selected real source images to generate corresponding batches of generated or fake target images.

... # generate a batch of fake samples X_fakeB, y_fake = generate_fake_samples(g_model, X_realA, n_patch)

We can then use the real and fake images, as well as their targets, to update the standalone discriminator model.

... # update discriminator for real samples d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real) # update discriminator for generated samples d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake)

So far, this is normal for updating a GAN in Keras.

Next, we can update the generator model via adversarial loss and L1 loss. Recall that the composite GAN model takes a batch of source images as input and predicts first the classification of real/fake and second the generated target. Here, we provide a target to indicate the generated images are “real” (class=1) to the discriminator output of the composite model. The real target images are provided for calculating the L1 loss between them and the generated target images.

We have two loss functions, but three loss values calculated for a batch update, where only the first loss value is of interest as it is the weighted sum of the adversarial and L1 loss values for the batch.

... # update the generator g_loss, _, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB])

That’s all there is to it.

We can define all of this in a function called train() that takes the defined models and a loaded dataset (as a list of two NumPy arrays) and trains the models.

# train pix2pix models def train(d_model, g_model, gan_model, dataset, n_epochs=100, n_batch=1, n_patch=16): 	# unpack dataset 	trainA, trainB = dataset 	# calculate the number of batches per training epoch 	bat_per_epo = int(len(trainA) / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# manually enumerate epochs 	for i in range(n_steps): 		# select a batch of real samples 		[X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch) 		# generate a batch of fake samples 		X_fakeB, y_fake = generate_fake_samples(g_model, X_realA, n_patch) 		# update discriminator for real samples 		d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real) 		# update discriminator for generated samples 		d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake) 		# update the generator 		g_loss, _, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB]) 		# summarize performance 		print('>%d, d1[%.3f] d2[%.3f] g[%.3f]' % (i+1, d_loss1, d_loss2, g_loss))

The train function can then be called directly with our defined models and loaded dataset.

... # load image data dataset = ... # train model train(d_model, g_model, gan_model, dataset)

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Official

API

Articles

Summary

In this tutorial, you discovered how to implement the Pix2Pix GAN architecture from scratch using the Keras deep learning framework.

Specifically, you learned:

  • How to develop the PatchGAN discriminator model for the Pix2Pix GAN.
  • How to develop the U-Net encoder-decoder generator model for the Pix2Pix GAN.
  • How to implement the composite model for updating the generator and how to train both models.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Pix2Pix GAN Models From Scratch With Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

How to Implement a Semi-Supervised GAN (SGAN) From Scratch in Keras

Semi-supervised learning is the challenging problem of training a classifier in a dataset that contains a small number of labeled examples and a much larger number of unlabeled examples.

The Generative Adversarial Network, or GAN, is an architecture that makes effective use of large, unlabeled datasets to train an image generator model via an image discriminator model. The discriminator model can be used as a starting point for developing a classifier model in some cases.

The semi-supervised GAN, or SGAN, model is an extension of the GAN architecture that involves the simultaneous training of a supervised discriminator, unsupervised discriminator, and a generator model. The result is both a supervised classification model that generalizes well to unseen examples and a generator model that outputs plausible examples of images from the domain.

In this tutorial, you will discover how to develop a Semi-Supervised Generative Adversarial Network from scratch.

After completing this tutorial, you will know:

  • The semi-supervised GAN is an extension of the GAN architecture for training a classifier model while making use of labeled and unlabeled data.
  • There are at least three approaches to implementing the supervised and unsupervised discriminator models in Keras used in the semi-supervised GAN.
  • How to train a semi-supervised GAN from scratch on MNIST and load and use the trained classifier for making predictions.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Implement a Semi-Supervised Generative Adversarial Network From Scratch

How to Implement a Semi-Supervised Generative Adversarial Network From Scratch.
Photo by Carlos Johnson, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

  1. What Is the Semi-Supervised GAN?
  2. How to Implement the Semi-Supervised Discriminator Model
  3. How to Develop a Semi-Supervised GAN for MNIST
  4. How to Load and Use the Final SGAN Classifier Model

What Is the Semi-Supervised GAN?

Semi-supervised learning refers to a problem where a predictive model is required and there are few labeled examples and many unlabeled examples.

The most common example is a classification predictive modeling problem in which there may be a very large dataset of examples, but only a small fraction have target labels. The model must learn from the small set of labeled examples and somehow harness the larger dataset of unlabeled examples in order to generalize to classifying new examples in the future.

The Semi-Supervised GAN, or sometimes SGAN for short, is an extension of the Generative Adversarial Network architecture for addressing semi-supervised learning problems.

One of the primary goals of this work is to improve the effectiveness of generative adversarial networks for semi-supervised learning (improving the performance of a supervised task, in this case, classification, by learning on additional unlabeled examples).

Improved Techniques for Training GANs, 2016.

The discriminator in a traditional GAN is trained to predict whether a given image is real (from the dataset) or fake (generated), allowing it to learn features from unlabeled images. The discriminator can then be used via transfer learning as a starting point when developing a classifier for the same dataset, allowing the supervised prediction task to benefit from the unsupervised training of the GAN.

In the Semi-Supervised GAN, the discriminator model is updated to predict K+1 classes, where K is the number of classes in the prediction problem and the additional class label is added for a new “fake” class. It involves directly training the discriminator model for both the unsupervised GAN task and the supervised classification task simultaneously.

We train a generative model G and a discriminator D on a dataset with inputs belonging to one of N classes. At training time, D is made to predict which of N+1 classes the input belongs to, where an extra class is added to correspond to the outputs of G.

Semi-Supervised Learning with Generative Adversarial Networks, 2016.

As such, the discriminator is trained in two modes: a supervised and unsupervised mode.

  • Unsupervised Training: In the unsupervised mode, the discriminator is trained in the same way as the traditional GAN, to predict whether the example is either real or fake.
  • Supervised Training: In the supervised mode, the discriminator is trained to predict the class label of real examples.

Training in unsupervised mode allows the model to learn useful feature extraction capabilities from a large unlabeled dataset, whereas training in supervised mode allows the model to use the extracted features and apply class labels.

The result is a classifier model that can achieve state-of-the-art results on standard problems such as MNIST when trained on very few labeled examples, such as tens, hundreds, or one thousand. Additionally, the training process can also result in better quality images output by the generator model.

For example, Augustus Odena in his 2016 paper titled “Semi-Supervised Learning with Generative Adversarial Networks” shows how a GAN-trained classifier is able to perform as well as or better than a standalone CNN model on the MNIST handwritten digit recognition task when trained with 25, 50, 100, and 1,000 labeled examples.

Example of the Table of Results Comparing Classification Accuracy of a CNN and SGAN on MNIST

Example of the Table of Results Comparing Classification Accuracy of a CNN and SGAN on MNIST.
Taken from: Semi-Supervised Learning with Generative Adversarial Networks

Tim Salimans, et al. from OpenAI in their 2016 paper titled “Improved Techniques for Training GANs” achieved at the time state-of-the-art results on a number of image classification tasks using a semi-supervised GAN, including MNIST.

Example of the Table of Results Comparing Classification Accuracy of other GAN models to a SGAN on MNIST

Example of the Table of Results Comparing Classification Accuracy of other GAN models to a SGAN on MNIST.
Taken From: Improved Techniques for Training GANs

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the Semi-Supervised Discriminator Model

There are a number of ways that we can implement the discriminator model for the semi-supervised GAN.

In this section, we will review three candidate approaches.

Traditional Discriminator Model

Consider a discriminator model for the standard GAN model.

It must take an image as input and predict whether it is real or fake. More specifically, it predicts the likelihood of the input image being real. The output layer uses a sigmoid activation function to predict a probability value in [0,1] and the model is typically optimized using a binary cross entropy loss function.

For example, we can define a simple discriminator model that takes grayscale images as input with the size of 28×28 pixels and predicts a probability of the image being real. We can use best practices and downsample the image using convolutional layers with a 2×2 stride and a leaky ReLU activation function.

The define_discriminator() function below implements this and defines our standard discriminator model.

# example of defining the discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.optimizers import Adam from keras.utils.vis_utils import plot_model  # define the standalone discriminator model def define_discriminator(in_shape=(28,28,1)): 	# image input 	in_image = Input(shape=in_shape) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# flatten feature maps 	fe = Flatten()(fe) 	# dropout 	fe = Dropout(0.4)(fe) 	# output layer 	d_out_layer = Dense(1, activation='sigmoid')(fe) 	# define and compile discriminator model 	d_model = Model(in_image, d_out_layer) 	d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) 	return d_model  # create model model = define_discriminator() # plot the model plot_model(model, to_file='discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates a plot of the discriminator model, clearly showing the 28x28x1 shape of the input image and the prediction of a single probability value.

Plot of a Standard GAN Discriminator Model

Plot of a Standard GAN Discriminator Model

Separate Discriminator Models With Shared Weights

Starting with the standard GAN discriminator model, we can update it to create two models that share feature extraction weights.

Specifically, we can define one classifier model that predicts whether an input image is real or fake, and a second classifier model that predicts the class of a given model.

Both models have different output layers but share all feature extraction layers. This means that updates to one of the classifier models will impact both models.

The example below creates the traditional discriminator model with binary output first, then re-uses the feature extraction layers and creates a new multi-class prediction model, in this case with 10 classes.

# example of defining semi-supervised discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.optimizers import Adam from keras.utils.vis_utils import plot_model  # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): 	# image input 	in_image = Input(shape=in_shape) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# flatten feature maps 	fe = Flatten()(fe) 	# dropout 	fe = Dropout(0.4)(fe) 	# unsupervised output 	d_out_layer = Dense(1, activation='sigmoid')(fe) 	# define and compile unsupervised discriminator model 	d_model = Model(in_image, d_out_layer) 	d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) 	# supervised output 	c_out_layer = Dense(n_classes, activation='softmax')(fe) 	# define and compile supervised discriminator model 	c_model = Model(in_image, c_out_layer) 	c_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) 	return d_model, c_model  # create model d_model, c_model = define_discriminator() # plot the model plot_model(d_model, to_file='discriminator1_plot.png', show_shapes=True, show_layer_names=True) plot_model(c_model, to_file='discriminator2_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates and plots both models.

The plot for the first model is the same as before.

Plot of an Unsupervised Binary Classification GAN Discriminator Model

Plot of an Unsupervised Binary Classification GAN Discriminator Model

The plot of the second model shows the same expected input shape and same feature extraction layers, with a new 10 class classification output layer.

Plot of a Supervised Multi-Class Classification GAN Discriminator Model

Plot of a Supervised Multi-Class Classification GAN Discriminator Model

Single Discriminator Model With Multiple Outputs

Another approach to implementing the semi-supervised discriminator model is to have a single model with multiple output layers.

Specifically, this is a single model with one output layer for the unsupervised task and one output layer for the supervised task.

This is like having separate models for the supervised and unsupervised tasks in that they both share the same feature extraction layers, except that in this case, each input image always has two output predictions, specifically a real/fake prediction and a supervised class prediction.

A problem with this approach is that when the model is updated unlabeled and generated images, there is no supervised class label. In that case, these images must have an output label of “unknown” or “fake” from the supervised output. This means that an additional class label is required for the supervised output layer.

The example below implements the multi-output single model approach for the discriminator model in the semi-supervised GAN architecture.

We can see that the model is defined with two output layers and that the output layer for the supervised task is defined with n_classes + 1. in this case 11, making room for the additional “unknown” class label.

We can also see that the model is compiled to two loss functions, one for each output layer of the model.

# example of defining semi-supervised discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.optimizers import Adam from keras.utils.vis_utils import plot_model  # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): 	# image input 	in_image = Input(shape=in_shape) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# flatten feature maps 	fe = Flatten()(fe) 	# dropout 	fe = Dropout(0.4)(fe) 	# unsupervised output 	d_out_layer = Dense(1, activation='sigmoid')(fe) 	# supervised output 	c_out_layer = Dense(n_classes + 1, activation='softmax')(fe) 	# define and compile supervised discriminator model 	model = Model(in_image, [d_out_layer, c_out_layer]) 	model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) 	return model  # create model model = define_discriminator() # plot the model plot_model(model, to_file='multioutput_discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates and plots the single multi-output model.

The plot clearly shows the shared layers and the separate unsupervised and supervised output layers.

Plot of a Semi-Supervised GAN Discriminator Model With Unsupervised and Supervised Output Layers

Plot of a Semi-Supervised GAN Discriminator Model With Unsupervised and Supervised Output Layers

Stacked Discriminator Models With Shared Weights

A final approach is very similar to the prior two approaches and involves creating separate logical unsupervised and supervised models but attempts to reuse the output layers of one model to feed as input into another model.

The approach is based on the definition of the semi-supervised model in the 2016 paper by Tim Salimans, et al. from OpenAI titled “Improved Techniques for Training GANs.”

In the paper, they describe an efficient implementation, where first the supervised model is created with K output classes and a softmax activation function. The unsupervised model is then defined that takes the output of the supervised model prior to the softmax activation, then calculates a normalized sum of the exponential outputs.

Example of the Output Function for the Unsupervised Discriminator Model in the SGAN

Example of the Output Function for the Unsupervised Discriminator Model in the SGAN.
Taken from: Improved Techniques for Training GANs

To make this clearer, we can implement this activation function in NumPy and run some sample activations through it to see what happens.

The complete example is listed below.

# example of custom activation function import numpy as np  # custom activation function def custom_activation(output): 	logexpsum = np.sum(np.exp(output)) 	result = logexpsum / (logexpsum + 1.0) 	return result  # all -10s output = np.asarray([-10.0, -10.0, -10.0]) print(custom_activation(output)) # all -1s output = np.asarray([-1.0, -1.0, -1.0]) print(custom_activation(output)) # all 0s output = np.asarray([0.0, 0.0, 0.0]) print(custom_activation(output)) # all 1s output = np.asarray([1.0, 1.0, 1.0]) print(custom_activation(output)) # all 10s output = np.asarray([10.0, 10.0, 10.0]) print(custom_activation(output))

Remember, the output of the unsupervised model prior to the softmax activation function will be the activations of the nodes directly. They will be small positive or negative values, but not normalized, as this would be performed by the softmax activation.

The custom activation function will output a value between 0.0 and 1.0.

A value close to 0.0 is output for a small or negative activation and a value close to 1.0 for a positive or large activation. We can see this when we run the example.

0.00013618124143106674 0.5246331135813284 0.75 0.890768227426964 0.9999848669190928

This means that the model is encouraged to output a strong class prediction for real examples, and a small class prediction or low activation for fake examples. It’s a clever trick and allows the re-use of the same output nodes from the supervised model in both models.

The activation function can be implemented almost directly via the Keras backend and called from a Lambda layer, e.g. a layer that will apply a custom function to the input to the layer.

The complete example is listed below. First, the supervised model is defined with a softmax activation and categorical cross entropy loss function. The unsupervised model is stacked on top of the output layer of the supervised model before the softmax activation, and the activations of the nodes pass through our custom activation function via the Lambda layer.

No need for a sigmoid activation function as we have already normalized the activation. As before, the unsupervised model is fit using binary cross entropy loss.

# example of defining semi-supervised discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.layers import Activation from keras.layers import Lambda from keras.optimizers import Adam from keras.utils.vis_utils import plot_model from keras import backend  # custom activation function def custom_activation(output): 	logexpsum = backend.sum(backend.exp(output), axis=-1, keepdims=True) 	result = logexpsum / (logexpsum + 1.0) 	return result  # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): 	# image input 	in_image = Input(shape=in_shape) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# flatten feature maps 	fe = Flatten()(fe) 	# dropout 	fe = Dropout(0.4)(fe) 	# output layer nodes 	fe = Dense(n_classes)(fe) 	# supervised output 	c_out_layer = Activation('softmax')(fe) 	# define and compile supervised discriminator model 	c_model = Model(in_image, c_out_layer) 	c_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) 	# unsupervised output 	d_out_layer = Lambda(custom_activation)(fe) 	# define and compile unsupervised discriminator model 	d_model = Model(in_image, d_out_layer) 	d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) 	return d_model, c_model  # create model d_model, c_model = define_discriminator() # plot the model plot_model(d_model, to_file='stacked_discriminator1_plot.png', show_shapes=True, show_layer_names=True) plot_model(c_model, to_file='stacked_discriminator2_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates and plots the two models, which look much the same as the two models in the first example.

Stacked version of the unsupervised discriminator model:

Plot of the Stacked Version of the Unsupervised Discriminator Model of the Semi-Supervised GAN

Plot of the Stacked Version of the Unsupervised Discriminator Model of the Semi-Supervised GAN

Stacked version of the supervised discriminator model:

Plot of the Stacked Version of the Supervised Discriminator Model of the Semi-Supervised GAN

Plot of the Stacked Version of the Supervised Discriminator Model of the Semi-Supervised GAN

Now that we have seen how to implement the discriminator model in the semi-supervised GAN, we can develop a complete example for image generation and semi-supervised classification.

How to Develop a Semi-Supervised GAN for MNIST

In this section, we will develop a semi-supervised GAN model for the MNIST handwritten digit dataset.

The dataset has 10 classes for the digits 0-9, therefore the classifier model will have 10 output nodes. The model will be fit on the training dataset that contains 60,000 examples. Only 100 of the images in the training dataset will be used with labels, 10 from each of the 10 classes.

We will start off by defining the models.

We will use the stacked discriminator model, exactly as defined in the previous section.

Next, we can define the generator model. In this case, the generator model will take as input a point in the latent space and will use transpose convolutional layers to output a 28×28 grayscale image. The define_generator() function below implements this and returns the defined generator model.

# define the standalone generator model def define_generator(latent_dim): 	# image generator input 	in_lat = Input(shape=(latent_dim,)) 	# foundation for 7x7 image 	n_nodes = 128 * 7 * 7 	gen = Dense(n_nodes)(in_lat) 	gen = LeakyReLU(alpha=0.2)(gen) 	gen = Reshape((7, 7, 128))(gen) 	# upsample to 14x14 	gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) 	gen = LeakyReLU(alpha=0.2)(gen) 	# upsample to 28x28 	gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) 	gen = LeakyReLU(alpha=0.2)(gen) 	# output 	out_layer = Conv2D(1, (7,7), activation='tanh', padding='same')(gen) 	# define model 	model = Model(in_lat, out_layer) 	return model

The generator model will be fit via the unsupervised discriminator model.

We will use the composite model architecture, common to training the generator model when implemented in Keras. Specifically, weight sharing is used where the output of the generator model is passed directly to the unsupervised discriminator model, and the weights of the discriminator are marked as not trainable.

The define_gan() function below implements this, taking the already-defined generator and discriminator models as input and returning the composite model used to train the weights of the generator model.

# define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model): 	# make weights in the discriminator not trainable 	d_model.trainable = False 	# connect image output from generator as input to discriminator 	gan_output = d_model(g_model.output) 	# define gan model as taking noise and outputting a classification 	model = Model(g_model.input, gan_output) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss='binary_crossentropy', optimizer=opt) 	return model

We can load the training dataset and scale the pixels to the range [-1, 1] to match the output values of the generator model.

# load the images def load_real_samples(): 	# load dataset 	(trainX, trainy), (_, _) = load_data() 	# expand to 3d, e.g. add channels 	X = expand_dims(trainX, axis=-1) 	# convert from ints to floats 	X = X.astype('float32') 	# scale from [0,255] to [-1,1] 	X = (X - 127.5) / 127.5 	print(X.shape, trainy.shape) 	return [X, trainy]

We can also define a function to select a subset of the training dataset in which we keep the labels and train the supervised version of the discriminator model.

The select_supervised_samples() function below implements this and is careful to ensure that the selection of examples is random and that the classes are balanced. The number of labeled examples is parameterized and set at 100, meaning that each of the 10 classes will have 10 randomly selected examples.

# select a supervised subset of the dataset, ensures classes are balanced def select_supervised_samples(dataset, n_samples=100, n_classes=10): 	X, y = dataset 	X_list, y_list = list(), list() 	n_per_class = int(n_samples / n_classes) 	for i in range(n_classes): 		# get all images for this class 		X_with_class = X[y == i] 		# choose random instances 		ix = randint(0, len(X_with_class), n_per_class) 		# add to list 		[X_list.append(X_with_class[j]) for j in ix] 		[y_list.append(i) for j in ix] 	return asarray(X_list), asarray(y_list)

Next, we can define a function for retrieving a batch of real training examples.

A sample of images and labels is selected, with replacement. This same function can be used to retrieve examples from the labeled and unlabeled dataset, later when we train the models. In the case of the “unlabeled dataset“, we will ignore the labels.

# select real samples def generate_real_samples(dataset, n_samples): 	# split into images and labels 	images, labels = dataset 	# choose random instances 	ix = randint(0, images.shape[0], n_samples) 	# select images and labels 	X, labels = images[ix], labels[ix] 	# generate class labels 	y = ones((n_samples, 1)) 	return [X, labels], y

Next, we can define functions to help in generating images using the generator model.

First, the generate_latent_points() function will create a batch worth of random points in the latent space that can be used as input for generating images. The generate_fake_samples() function will call this function to generate a batch worth of images that can be fed to the unsupervised discriminator model or the composite GAN model during training.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): 	# generate points in the latent space 	z_input = randn(latent_dim * n_samples) 	# reshape into a batch of inputs for the network 	z_input = z_input.reshape(n_samples, latent_dim) 	return z_input  # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): 	# generate points in latent space 	z_input = generate_latent_points(latent_dim, n_samples) 	# predict outputs 	images = generator.predict(z_input) 	# create class labels 	y = zeros((n_samples, 1)) 	return images, y

Next, we can define a function to be called when we want to evaluate the performance of the model.

This function will generate and plot 100 images using the current state of the generator model. This plot of images can be used to subjectively evaluate the performance of the generator model.

The supervised discriminator model is then evaluated on the entire training dataset, and the classification accuracy is reported. Finally, the generator model and the supervised discriminator model are saved to file, to be used later.

The summarize_performance() function below implements this and can be called periodically, such as the end of every training epoch. The results can be reviewed at the end of the run to select a classifier and even generator models.

# generate samples and save as a plot and save the model def summarize_performance(step, g_model, c_model, latent_dim, dataset, n_samples=100): 	# prepare fake examples 	X, _ = generate_fake_samples(g_model, latent_dim, n_samples) 	# scale from [-1,1] to [0,1] 	X = (X + 1) / 2.0 	# plot images 	for i in range(100): 		# define subplot 		pyplot.subplot(10, 10, 1 + i) 		# turn off axis 		pyplot.axis('off') 		# plot raw pixel data 		pyplot.imshow(X[i, :, :, 0], cmap='gray_r') 	# save plot to file 	filename1 = 'generated_plot_%04d.png' % (step+1) 	pyplot.savefig(filename1) 	pyplot.close() 	# evaluate the classifier model 	X, y = dataset 	_, acc = c_model.evaluate(X, y, verbose=0) 	print('Classifier Accuracy: %.3f%%' % (acc * 100)) 	# save the generator model 	filename2 = 'g_model_%04d.h5' % (step+1) 	g_model.save(filename2) 	# save the classifier model 	filename3 = 'c_model_%04d.h5' % (step+1) 	c_model.save(filename3) 	print('>Saved: %s, %s, and %s' % (filename1, filename2, filename3))

Next, we can define a function to train the models. The defined models and loaded training dataset are provided as arguments, and the number of training epochs and batch size are parameterized with default values, in this case 20 epochs and a batch size of 100.

The chosen model configuration was found to overfit the training dataset quickly, hence the relatively smaller number of training epochs. Increasing the epochs to 100 or more results in much higher-quality generated images, but a lower-quality classifier model. Balancing these two concerns might make a fun extension.

First, the labeled subset of the training dataset is selected, and the number of training steps is calculated.

The training process is almost identical to the training of a vanilla GAN model, with the addition of updating the supervised model with labeled examples.

A single cycle through updating the models involves first updating the supervised discriminator model with labeled examples, then updating the unsupervised discriminator model with unlabeled real and generated examples. Finally, the generator model is updated via the composite model.

The shared weights of the discriminator model get updated with 1.5 batches worth of samples, whereas the weights of the generator model are updated with one batch worth of samples each iteration. Changing this so that each model is updated by the same amount might improve the model training process.

# train the generator and discriminator def train(g_model, d_model, c_model, gan_model, dataset, latent_dim, n_epochs=20, n_batch=100): 	# select supervised dataset 	X_sup, y_sup = select_supervised_samples(dataset) 	print(X_sup.shape, y_sup.shape) 	# calculate the number of batches per training epoch 	bat_per_epo = int(dataset[0].shape[0] / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# calculate the size of half a batch of samples 	half_batch = int(n_batch / 2) 	print('n_epochs=%d, n_batch=%d, 1/2=%d, b/e=%d, steps=%d' % (n_epochs, n_batch, half_batch, bat_per_epo, n_steps)) 	# manually enumerate epochs 	for i in range(n_steps): 		# update supervised discriminator (c) 		[Xsup_real, ysup_real], _ = generate_real_samples([X_sup, y_sup], half_batch) 		c_loss, c_acc = c_model.train_on_batch(Xsup_real, ysup_real) 		# update unsupervised discriminator (d) 		[X_real, _], y_real = generate_real_samples(dataset, half_batch) 		d_loss1 = d_model.train_on_batch(X_real, y_real) 		X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) 		d_loss2 = d_model.train_on_batch(X_fake, y_fake) 		# update generator (g) 		X_gan, y_gan = generate_latent_points(latent_dim, n_batch), ones((n_batch, 1)) 		g_loss = gan_model.train_on_batch(X_gan, y_gan) 		# summarize loss on this batch 		print('>%d, c[%.3f,%.0f], d[%.3f,%.3f], g[%.3f]' % (i+1, c_loss, c_acc*100, d_loss1, d_loss2, g_loss)) 		# evaluate the model performance every so often 		if (i+1) % (bat_per_epo * 1) == 0: 			summarize_performance(i, g_model, c_model, latent_dim, dataset)

Finally, we can define the models and call the function to train and save the models.

# size of the latent space latent_dim = 100 # create the discriminator models d_model, c_model = define_discriminator() # create the generator g_model = define_generator(latent_dim) # create the gan gan_model = define_gan(g_model, d_model) # load image data dataset = load_real_samples() # train model train(g_model, d_model, c_model, gan_model, dataset, latent_dim)

Tying all of this together, the complete example of training a semi-supervised GAN on the MNIST handwritten digit image classification task is listed below.

# example of semi-supervised gan for mnist from numpy import expand_dims from numpy import zeros from numpy import ones from numpy import asarray from numpy.random import randn from numpy.random import randint from keras.datasets.mnist import load_data from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Lambda from keras.layers import Activation from matplotlib import pyplot from keras import backend  # custom activation function def custom_activation(output): 	logexpsum = backend.sum(backend.exp(output), axis=-1, keepdims=True) 	result = logexpsum / (logexpsum + 1.0) 	return result  # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): 	# image input 	in_image = Input(shape=in_shape) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# downsample 	fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) 	fe = LeakyReLU(alpha=0.2)(fe) 	# flatten feature maps 	fe = Flatten()(fe) 	# dropout 	fe = Dropout(0.4)(fe) 	# output layer nodes 	fe = Dense(n_classes)(fe) 	# supervised output 	c_out_layer = Activation('softmax')(fe) 	# define and compile supervised discriminator model 	c_model = Model(in_image, c_out_layer) 	c_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) 	# unsupervised output 	d_out_layer = Lambda(custom_activation)(fe) 	# define and compile unsupervised discriminator model 	d_model = Model(in_image, d_out_layer) 	d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) 	return d_model, c_model  # define the standalone generator model def define_generator(latent_dim): 	# image generator input 	in_lat = Input(shape=(latent_dim,)) 	# foundation for 7x7 image 	n_nodes = 128 * 7 * 7 	gen = Dense(n_nodes)(in_lat) 	gen = LeakyReLU(alpha=0.2)(gen) 	gen = Reshape((7, 7, 128))(gen) 	# upsample to 14x14 	gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) 	gen = LeakyReLU(alpha=0.2)(gen) 	# upsample to 28x28 	gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) 	gen = LeakyReLU(alpha=0.2)(gen) 	# output 	out_layer = Conv2D(1, (7,7), activation='tanh', padding='same')(gen) 	# define model 	model = Model(in_lat, out_layer) 	return model  # define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model): 	# make weights in the discriminator not trainable 	d_model.trainable = False 	# connect image output from generator as input to discriminator 	gan_output = d_model(g_model.output) 	# define gan model as taking noise and outputting a classification 	model = Model(g_model.input, gan_output) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss='binary_crossentropy', optimizer=opt) 	return model  # load the images def load_real_samples(): 	# load dataset 	(trainX, trainy), (_, _) = load_data() 	# expand to 3d, e.g. add channels 	X = expand_dims(trainX, axis=-1) 	# convert from ints to floats 	X = X.astype('float32') 	# scale from [0,255] to [-1,1] 	X = (X - 127.5) / 127.5 	print(X.shape, trainy.shape) 	return [X, trainy]  # select a supervised subset of the dataset, ensures classes are balanced def select_supervised_samples(dataset, n_samples=100, n_classes=10): 	X, y = dataset 	X_list, y_list = list(), list() 	n_per_class = int(n_samples / n_classes) 	for i in range(n_classes): 		# get all images for this class 		X_with_class = X[y == i] 		# choose random instances 		ix = randint(0, len(X_with_class), n_per_class) 		# add to list 		[X_list.append(X_with_class[j]) for j in ix] 		[y_list.append(i) for j in ix] 	return asarray(X_list), asarray(y_list)  # select real samples def generate_real_samples(dataset, n_samples): 	# split into images and labels 	images, labels = dataset 	# choose random instances 	ix = randint(0, images.shape[0], n_samples) 	# select images and labels 	X, labels = images[ix], labels[ix] 	# generate class labels 	y = ones((n_samples, 1)) 	return [X, labels], y  # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): 	# generate points in the latent space 	z_input = randn(latent_dim * n_samples) 	# reshape into a batch of inputs for the network 	z_input = z_input.reshape(n_samples, latent_dim) 	return z_input  # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): 	# generate points in latent space 	z_input = generate_latent_points(latent_dim, n_samples) 	# predict outputs 	images = generator.predict(z_input) 	# create class labels 	y = zeros((n_samples, 1)) 	return images, y  # generate samples and save as a plot and save the model def summarize_performance(step, g_model, c_model, latent_dim, dataset, n_samples=100): 	# prepare fake examples 	X, _ = generate_fake_samples(g_model, latent_dim, n_samples) 	# scale from [-1,1] to [0,1] 	X = (X + 1) / 2.0 	# plot images 	for i in range(100): 		# define subplot 		pyplot.subplot(10, 10, 1 + i) 		# turn off axis 		pyplot.axis('off') 		# plot raw pixel data 		pyplot.imshow(X[i, :, :, 0], cmap='gray_r') 	# save plot to file 	filename1 = 'generated_plot_%04d.png' % (step+1) 	pyplot.savefig(filename1) 	pyplot.close() 	# evaluate the classifier model 	X, y = dataset 	_, acc = c_model.evaluate(X, y, verbose=0) 	print('Classifier Accuracy: %.3f%%' % (acc * 100)) 	# save the generator model 	filename2 = 'g_model_%04d.h5' % (step+1) 	g_model.save(filename2) 	# save the classifier model 	filename3 = 'c_model_%04d.h5' % (step+1) 	c_model.save(filename3) 	print('>Saved: %s, %s, and %s' % (filename1, filename2, filename3))  # train the generator and discriminator def train(g_model, d_model, c_model, gan_model, dataset, latent_dim, n_epochs=20, n_batch=100): 	# select supervised dataset 	X_sup, y_sup = select_supervised_samples(dataset) 	print(X_sup.shape, y_sup.shape) 	# calculate the number of batches per training epoch 	bat_per_epo = int(dataset[0].shape[0] / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# calculate the size of half a batch of samples 	half_batch = int(n_batch / 2) 	print('n_epochs=%d, n_batch=%d, 1/2=%d, b/e=%d, steps=%d' % (n_epochs, n_batch, half_batch, bat_per_epo, n_steps)) 	# manually enumerate epochs 	for i in range(n_steps): 		# update supervised discriminator (c) 		[Xsup_real, ysup_real], _ = generate_real_samples([X_sup, y_sup], half_batch) 		c_loss, c_acc = c_model.train_on_batch(Xsup_real, ysup_real) 		# update unsupervised discriminator (d) 		[X_real, _], y_real = generate_real_samples(dataset, half_batch) 		d_loss1 = d_model.train_on_batch(X_real, y_real) 		X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) 		d_loss2 = d_model.train_on_batch(X_fake, y_fake) 		# update generator (g) 		X_gan, y_gan = generate_latent_points(latent_dim, n_batch), ones((n_batch, 1)) 		g_loss = gan_model.train_on_batch(X_gan, y_gan) 		# summarize loss on this batch 		print('>%d, c[%.3f,%.0f], d[%.3f,%.3f], g[%.3f]' % (i+1, c_loss, c_acc*100, d_loss1, d_loss2, g_loss)) 		# evaluate the model performance every so often 		if (i+1) % (bat_per_epo * 1) == 0: 			summarize_performance(i, g_model, c_model, latent_dim, dataset)  # size of the latent space latent_dim = 100 # create the discriminator models d_model, c_model = define_discriminator() # create the generator g_model = define_generator(latent_dim) # create the gan gan_model = define_gan(g_model, d_model) # load image data dataset = load_real_samples() # train model train(g_model, d_model, c_model, gan_model, dataset, latent_dim)

The example can be run on a workstation with a CPU or GPU hardware, although a GPU is recommended for faster execution.

Given the stochastic nature of the training algorithm, your specific results will vary. Consider running the example a few times.

At the start of the run, the size of the training dataset is summarized, as is the supervised subset, confirming our configuration.

The performance of each model is summarized at the end of each update, including the loss and accuracy of the supervised discriminator model (c), the loss of the unsupervised discriminator model on real and generated examples (d), and the loss of the generator model updated via the composite model (g).

The loss for the supervised model will shrink to a small value close to zero and accuracy will hit 100%, which will be maintained for the entire run. The loss of the unsupervised discriminator and generator should remain at modest values throughout the run if they are kept in equilibrium.

(60000, 28, 28, 1) (60000,) (100, 28, 28, 1) (100,) n_epochs=20, n_batch=100, 1/2=50, b/e=600, steps=12000 >1, c[2.305,6], d[0.096,2.399], g[0.095] >2, c[2.298,18], d[0.089,2.399], g[0.095] >3, c[2.308,10], d[0.084,2.401], g[0.095] >4, c[2.304,8], d[0.080,2.404], g[0.095] >5, c[2.254,18], d[0.077,2.407], g[0.095] ...

The supervised classification model is evaluated on the entire training dataset at the end of every training epoch, in this case after every 600 training updates. At this time, the performance of the model is summarized, showing that it rapidly achieves good skill.

This is surprising given that the model is only trained on 10 labeled examples of each class.

Classifier Accuracy: 85.543% Classifier Accuracy: 91.487% Classifier Accuracy: 92.628% Classifier Accuracy: 94.017% Classifier Accuracy: 94.252% Classifier Accuracy: 93.828% Classifier Accuracy: 94.122% Classifier Accuracy: 93.597% Classifier Accuracy: 95.283% Classifier Accuracy: 95.287% Classifier Accuracy: 95.263% Classifier Accuracy: 95.432% Classifier Accuracy: 95.270% Classifier Accuracy: 95.212% Classifier Accuracy: 94.803% Classifier Accuracy: 94.640% Classifier Accuracy: 93.622% Classifier Accuracy: 91.870% Classifier Accuracy: 92.525% Classifier Accuracy: 92.180%

The models are also saved at the end of each training epoch and plots of generated images are also created.

The quality of the generated images is good given the relatively small number of training epochs.

Plot of Handwritten Digits Generated by the Semi-Supervised GAN After 8400 Updates.

Plot of Handwritten Digits Generated by the Semi-Supervised GAN After 8400 Updates.

How to Load and Use the Final SGAN Classifier Model

Now that we have trained the generator and discriminator models, we can make use of them.

In the case of the semi-supervised GAN, we are less interested in the generator model and more interested in the supervised model.

Reviewing the results for the specific run, we can select a specific saved model that is known to have good performance on the test dataset. In this case, the model saved after 12 training epochs, or 7,200 updates, that had a classification accuracy of about 95.432% on the training dataset.

We can load the model directly via the load_model() Keras function.

... # load the model model = load_model('c_model_7200.h5')

Once loaded, we can evaluate it on the entire training dataset again to confirm the finding, then evaluate it on the holdout test dataset.

Recall, the feature extraction layers expect the input images to have the pixel values scaled to the range [-1,1], therefore, this must be performed before any images are provided to the model.

The complete example of loading the saved semi-supervised classifier model and evaluating it in the complete MNIST dataset is listed below.

# example of loading the classifier model and generating images from numpy import expand_dims from keras.models import load_model from keras.datasets.mnist import load_data # load the model model = load_model('c_model_7200.h5') # load the dataset (trainX, trainy), (testX, testy) = load_data() # expand to 3d, e.g. add channels trainX = expand_dims(trainX, axis=-1) testX = expand_dims(testX, axis=-1) # convert from ints to floats trainX = trainX.astype('float32') testX = testX.astype('float32') # scale from [0,255] to [-1,1] trainX = (trainX - 127.5) / 127.5 testX = (testX - 127.5) / 127.5 # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) print('Train Accuracy: %.3f%%' % (train_acc * 100)) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Test Accuracy: %.3f%%' % (test_acc * 100))

Running the example loads the model and evaluates it on the MNIST dataset.

We can see that, in this case, the model achieves the expected performance of 95.432% on the training dataset, confirming we have loaded the correct model.

We can also see that the accuracy on the holdout test dataset is as good, or slightly better, at about 95.920%. This shows that the learned classifier has good generalization.

Train Accuracy: 95.432% Test Accuracy: 95.920%

We have successfully demonstrated the training and evaluation of a semi-supervised classifier model fit via the GAN architecture.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Standalone Classifier. Fit a standalone classifier model on the labeled dataset directly and compare performance to the SGAN model.
  • Number of Labeled Examples. Repeat the example of more or fewer labeled examples and compare the performance of the model
  • Model Tuning. Tune the performance of the discriminator and generator model to further lift the performance of the supervised model closer toward state-of-the-art results.

If you explore any of these extensions, I’d love to know.
Post your findings in the comments below.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

API

Articles

Projects

Summary

In this tutorial, you discovered how to develop a Semi-Supervised Generative Adversarial Network from scratch.

Specifically, you learned:

  • The semi-supervised GAN is an extension of the GAN architecture for training a classifier model while making use of labeled and unlabeled data.
  • There are at least three approaches to implementing the supervised and unsupervised discriminator models in Keras used in the semi-supervised GAN.
  • How to train a semi-supervised GAN from scratch on MNIST and load and use the trained classifier for making predictions.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement a Semi-Supervised GAN (SGAN) From Scratch in Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

How to Implement Wasserstein Loss for Generative Adversarial Networks

The Wasserstein Generative Adversarial Network, or Wasserstein GAN, is an extension to the generative adversarial network that both improves the stability when training the model and provides a loss function that correlates with the quality of generated images.

It is an important extension to the GAN model and requires a conceptual shift away from a discriminator that predicts the probability of a generated image being “real” and toward the idea of a critic model that scores the “realness” of a given image.

This conceptual shift is motivated mathematically using the earth mover distance, or Wasserstein distance, to train the GAN that measures the distance between the data distribution observed in the training dataset and the distribution observed in the generated examples.

In this post, you will discover how to implement Wasserstein loss for Generative Adversarial Networks.

After reading this post, you will know:

  • The conceptual shift in the WGAN from discriminator predicting a probability to a critic predicting a score.
  • The implementation details for the WGAN as minor changes to the standard deep convolutional GAN.
  • The intuition behind the Wasserstein loss function and how implement it from scratch.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Implement Wasserstein Loss for Generative Adversarial Networks

How to Implement Wasserstein Loss for Generative Adversarial Networks
Photo by Brandon Levinger, some rights reserved.

Overview

This tutorial is divided into five parts; they are:

  1. GAN Stability and the Discriminator
  2. What Is a Wasserstein GAN?
  3. Implementation Details of the Wasserstein GAN
  4. How to Implement Wasserstein Loss
  5. Common Point of Confusion With Expected Labels

GAN Stability and the Discriminator

Generative Adversarial Networks, or GANs, are challenging to train.

The discriminator model must classify a given input image as real (from the dataset) or fake (generated), and the generator model must generate new and plausible images.

The reason GANs are difficult to train is that the architecture involves the simultaneous training of a generator and a discriminator model in a zero-sum game. Stable training requires finding and maintaining an equilibrium between the capabilities of the two models.

The discriminator model is a neural network that learns a binary classification problem, using a sigmoid activation function in the output layer, and is fit using a binary cross entropy loss function. As such, the model predicts a probability that a given input is real (or fake as 1 minus the predicted) as a value between 0 and 1.

The loss function has the effect of penalizing the model proportionally to how far the predicted probability distribution differs from the expected probability distribution for a given image. This provides the basis for the error that is back propagated through the discriminator and the generator in order to perform better on the next batch.

The WGAN relaxes the role of the discriminator when training a GAN and proposes the alternative of a critic.

What Is a Wasserstein GAN?

The Wasserstein GAN, or WGAN for short, was introduced by Martin Arjovsky, et al. in their 2017 paper titled “Wasserstein GAN.”

It is an extension of the GAN that seeks an alternate way of training the generator model to better approximate the distribution of data observed in a given training dataset.

Instead of using a discriminator to classify or predict the probability of generated images as being real or fake, the WGAN changes or replaces the discriminator model with a critic that scores the realness or fakeness of a given image.

This change is motivated by a mathematical argument that training the generator should seek a minimization of the distance between the distribution of the data observed in the training dataset and the distribution observed in generated examples. The argument contrasts different distribution distance measures, such as Kullback-Leibler (KL) divergence, Jensen-Shannon (JS) divergence, and the Earth-Mover (EM) distance, referred to as Wasserstein distance.

The most fundamental difference between such distances is their impact on the convergence of sequences of probability distributions.

Wasserstein GAN, 2017.

They demonstrate that a critic neural network can be trained to approximate the Wasserstein distance, and, in turn, used to effectively train a generator model.

… we define a form of GAN called Wasserstein-GAN that minimizes a reasonable and efficient approximation of the EM distance, and we theoretically show that the corresponding optimization problem is sound.

Wasserstein GAN, 2017.

Importantly, the Wasserstein distance has the properties that it is continuous and differentiable and continues to provide a linear gradient, even after the critic is well trained.

The fact that the EM distance is continuous and differentiable a.e. means that we can (and should) train the critic till optimality. […] the more we train the critic, the more reliable gradient of the Wasserstein we get, which is actually useful by the fact that Wasserstein is differentiable almost everywhere.

Wasserstein GAN, 2017.

This is unlike the discriminator model that, once trained, may fail to provide useful gradient information for updating the generator model.

The discriminator learns very quickly to distinguish between fake and real, and as expected provides no reliable gradient information. The critic, however, can’t saturate, and converges to a linear function that gives remarkably clean gradients everywhere.

Wasserstein GAN, 2017.

The benefit of the WGAN is that the training process is more stable and less sensitive to model architecture and choice of hyperparameter configurations.

… training WGANs does not require maintaining a careful balance in training of the discriminator and the generator, and does not require a careful design of the network architecture either. The mode dropping phenomenon that is typical in GANs is also drastically reduced.

Wasserstein GAN, 2017.

Perhaps most importantly, the loss of the discriminator appears to relate to the quality of images created by the generator.

Specifically, the lower the loss of the critic when evaluating generated images, the higher the expected quality of the generated images. This is important as unlike other GANs that seek stability in terms of finding an equilibrium between two models, the WGAN seeks convergence, lowering generator loss.

To our knowledge, this is the first time in GAN literature that such a property is shown, where the loss of the GAN shows properties of convergence. This property is extremely useful when doing research in adversarial networks as one does not need to stare at the generated samples to figure out failure modes and to gain information on which models are doing better over others.

Wasserstein GAN, 2017.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Implementation Details of the Wasserstein GAN

Although the theoretical grounding for the WGAN is dense, the implementation of a WGAN requires a few minor changes to the standard deep convolutional GAN, or DCGAN.

Those changes are as follows:

  • Use a linear activation function in the output layer of the critic model (instead of sigmoid).
  • Use Wasserstein loss to train the critic and generator models that promote larger difference between scores for real and generated images.
  • Constrain critic model weights to a limited range after each mini batch update (e.g. [-0.01,0.01]).

In order to have parameters w lie in a compact space, something simple we can do is clamp the weights to a fixed box (say W = [−0.01, 0.01]l ) after each gradient update.

Wasserstein GAN, 2017.

  • Update the critic model more times than the generator each iteration (e.g. 5).
  • Use the RMSProp version of gradient descent with small learning rate and no momentum (e.g. 0.00005).

… we report that WGAN training becomes unstable at times when one uses a momentum based optimizer such as Adam […] We therefore switched to RMSProp …

Wasserstein GAN, 2017.

The image below provides a summary of the main training loop for training a WGAN, taken from the paper. Note the listing of recommended hyperparameters used in the model.

Algorithm for the Wasserstein Generative Adversarial Networks.
Taken from: Wasserstein GAN.

How to Implement Wasserstein Loss

The Wasserstein loss function seeks to increase the gap between the scores for real and generated images.

We can summarize the function as it is described in the paper as follows:

  • Critic Loss = [average critic score on real images] – [average critic score on fake images]
  • Generator Loss = -[average critic score on fake images]

Where the average scores are calculated across a mini-batch of samples.

This is precisely how the loss is implemented for graph-based deep learning frameworks such as PyTorch and TensorFlow.

The calculations are straightforward to interpret once we recall that stochastic gradient descent seeks to minimize loss.

In the case of the generator, a larger score from the critic will result in a smaller loss for the generator, encouraging the critic to output larger scores for fake images. For example, an average score of 10 becomes -10, an average score of 50 becomes -50, which is smaller, and so on.

In the case of the critic, a larger score for real images results in a larger resulting loss for the critic, penalizing the model. This encourages the critic to output smaller scores for real images. For example, an average score of 20 for real images and 50 for fake images results in a loss of -30; an average score of 10 for real images and 50 for fake images results in a loss of -40, which is better, and so on.

The sign of the loss does not matter in this case, as long as loss for real images is a small number and the loss for fake images is a large number. The Wasserstein loss encourages the critic to separate these numbers.

We can also reverse the situation and encourage the critic to output a large score for real images and a small score for fake images and achieve the same result. Some implementations make this change.

In the Keras deep learning library (and some others), we cannot implement the Wasserstein loss function directly as described in the paper and as implemented in PyTorch and TensorFlow. Instead, we can achieve the same effect without having the calculation of the loss for the critic dependent upon the loss calculated for real and fake images.

A good way to think about this is a negative score for real images and a positive score for fake images, although this negative/positive split of scores learned during training is not required; just larger and smaller is sufficient.

  • Small Critic Score (e.g.< 0): Real – Large Critic Score (e.g. >0): Fake

We can multiply the average predicted score by -1 in the case of fake images so that larger averages become smaller averages and the gradient is in the correct direction, i.e. minimizing loss. For example, average scores on fake images of [0.5, 0.8, and 1.0] across three batches of fake images would become [-0.5, -0.8, and -1.0] when calculating weight updates.

  • Loss For Fake Images = -1 * Average Critic Score

No change is needed for the case of real scores, as we want to encourage smaller average scores for real images.

  • Loss For Real Images = Average Critic Score

This can be implemented consistently by assigning an expected outcome target of -1 for fake images and 1 for real images and implementing the loss function as the expected label multiplied by the average score. The -1 label will be multiplied by the average score for fake images and encourage a larger predicted average, and the +1 label will be multiplied by the average score for real images and have no effect, encouraging a smaller predicted average.

  • Wasserstein Loss = Label * Average Critic Score

Or

  • Wasserstein Loss(Real Images) = 1 * Average Predicted Score
  • Wasserstein Loss(Fake Images) = -1 * Average Predicted Score

We can implement this in Keras by assigning the expected labels of -1 and 1 for fake and real images respectively. The inverse labels could be used to the same effect, e.g. -1 for real and +1 for fake to encourage small scores for fake images and large scores for real images. Some developers do implement the WGAN in this alternate way, which is just as correct.

The loss function can be implemented by multiplying the expected label for each sample by the predicted score (element wise), then calculating the mean.

def wasserstein_loss(y_true, y_pred): 	return mean(y_true * y_pred)

The above function is the elegant way to implement the loss function; an alternative, less-elegant implementation that might be more intuitive is as follows:

def wasserstein_loss(y_true, y_pred):  	return mean(y_true) * mean(y_pred)

In Keras, the mean function can be implemented using the Keras backend API to ensure the mean is calculated across samples in the provided tensors; for example:

from keras import backend  # implementation of wasserstein loss def wasserstein_loss(y_true, y_pred): 	return backend.mean(y_true * y_pred)

Now that we know how to implement the Wasserstein loss function in Keras, let’s clarify one common point of misunderstanding.

Common Point of Confusion With Expected Labels

Recall we are using the expected labels of -1 for fake images and +1 for real images.

A common point of confusion is that a perfect critic model will output -1 for every fake image and +1 for every real image.

This is incorrect.

Again, recall we are using stochastic gradient descent to find the set of weights in the critic (and generator) models that minimize the loss function.

We have established that we want the critic model to output larger scores on average for fake images and smaller scores on average for real images. We then designed a loss function to encourage this outcome.

This is the key point about loss functions used to train neural network models. They encourage a desired model behavior, and they do not have to achieve this by providing the expected outcomes. In this case, we defined our Wasserstein loss function to interpret the average score predicted by the critic model and used labels for the real and fake cases to help with this interpretation.

So what is a good loss for real and fake images under Wasserstein loss?

Wasserstein is not an absolute and comparable loss for comparing across GAN models. Instead, it is relative and depends on your model configuration and dataset. What is important is that it is consistent for a given critic model and convergence of the generator (better loss) does correlate with better generated image quality.

It could be negative scores for real images and positive scores for fake images, but this is not required. All scores could be positive or all scores could be negative.

The loss function only encourages a separation between scores for fake and real images as larger and smaller, not necessarily positive and negative.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Articles

Summary

In this post, you discovered how to implement Wasserstein loss for Generative Adversarial Networks.

Specifically, you learned:

  • The conceptual shift in the WGAN from discriminator predicting a probability to a critic predicting a score.
  • The implementation details for the WGAN as minor changes to the standard deep convolutional GAN.
  • The intuition behind the Wasserstein loss function and how implement it from scratch.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Wasserstein Loss for Generative Adversarial Networks appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

How to Implement GAN Hacks to Train Stable Generative Adversarial Networks

Generative Adversarial Networks, or GANs, are challenging to train.

This is because the architecture involves both a generator and a discriminator model that compete in a zero-sum game. It means that improvements to one model come at the cost of a degrading of performance in the other model. The result is a very unstable training process that can often lead to failure, e.g. a generator that generates the same image all the time or generates nonsense.

As such, there are a number of heuristics or best practices (called “GAN hacks“) that can be used when configuring and training your GAN models. These heuristics are been hard won by practitioners testing and evaluating hundreds or thousands of combinations of configuration operations on a range of problems over many years.

Some of these heuristics can be challenging to implement, especially for beginners.

Further, some or all of them may be required for a given project, although it may not be clear which subset of heuristics should be adopted, requiring experimentation. This means a practitioner must be ready to implement a given heuristic with little notice.

In this tutorial, you will discover how to implement a suite of best practices or GAN hacks that you can copy-and-paste directly into your GAN project.

After reading this tutorial, you will know:

  • The best sources for practical heuristics or hacks when developing generative adversarial networks.
  • How to implement seven best practices for the deep convolutional GAN model architecture from scratch.
  • How to implement four additional best practices from Soumith Chintala’s GAN Hacks presentation and list.

Let’s get started.

How to Implement Hacks to Train Stable Generative Adversarial Networks

How to Implement Hacks to Train Stable Generative Adversarial Networks
Photo by BLM Nevada, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Heuristics for Training Stable GANs
  2. Best Practices for Deep Convolutional GANs
    1. Downsample Using Strided Convolutions
    2. Upsample Using Strided Convolutions
    3. Use LeakyReLU
    4. Use Batch Normalization
    5. Use Gaussian Weight Initialization
    6. Use Adam Stochastic Gradient Descent
    7. Scale Images to the Range [-1,1]
  3. Soumith Chintala’s GAN Hacks
    1. Use a Gaussian Latent Space
    2. Separate Batches of Real and Fake Images
    3. Use Label Smoothing
    4. Use Noisy Labels

Heuristics for Training Stable GANs

GANs are difficult to train.

At the time of writing, there is no good theoretical foundation as to how to design and train GAN models, but there is established literature of heuristics, or “hacks,” that have been empirically demonstrated to work well in practice.

As such, there are a range of best practices to consider and implement when developing a GAN model.

Perhaps the two most important sources of suggested configuration and training parameters are:

  1. Alec Radford, et al’s 2015 paper that introduced the DCGAN architecture.
  2. Soumith Chintala’s 2016 presentation and associated “GAN Hacks” list.

In this tutorial, we will explore how to implement the most important best practices from these two sources.

Best Practices for Deep Convolutional GANs

Perhaps one of the most important steps forward in the design and training of stable GAN models was the 2015 paper by Alec Radford, et al. titled “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.”

In the paper, they describe the Deep Convolutional GAN, or DCGAN, approach to GAN development that has become the de facto standard.

We will look at how to implement seven best practices for the DCGAN model architecture in this section.

1. Downsample Using Strided Convolutions

The discriminator model is a standard convolutional neural network model that takes an image as input and must output a binary classification as to whether it is real or fake.

It is standard practice with deep convolutional networks to use pooling layers to downsample the input and feature maps with the depth of the network.

This is not recommended for the DCGAN, and instead, they recommend downsampling using strided convolutions.

This involves defining a convolutional layer as per normal, but instead of using the default two-dimensional stride of (1,1) to change it to (2,2). This has the effect of downsampling the input, specifically halving the width and height of the input, resulting in output feature maps with one quarter the area.

The example below demonstrates this with a single hidden convolutional layer that uses downsampling strided convolutions by setting the ‘strides‘ argument to (2,2). The effect is the model will downsample the input from 64×64 to 32×32.

# example of downsampling with strided convolutions from keras.models import Sequential from keras.layers import Conv2D # define model model = Sequential() model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3))) # summarize model model.summary()

Running the example shows the shape of the output of the convolutional layer, where the feature maps have one quarter of the area.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792 ================================================================= Total params: 1,792 Trainable params: 1,792 Non-trainable params: 0 _________________________________________________________________

2. Upsample Using Strided Convolutions

The generator model must generate an output image given as input at a random point from the latent space.

The recommended approach for achieving this is to use a transpose convolutional layer with a strided convolution. This is a special type of layer that performs the convolution operation in reverse. Intuitively, this means that setting a stride of 2×2 will have the opposite effect, upsampling the input instead of downsampling it in the case of a normal convolutional layer.

By stacking a transpose convolutional layer with strided convolutions, the generator model is able to scale a given input to the desired output dimensions.

The example below demonstrates this with a single hidden transpose convolutional layer that uses upsampling strided convolutions by setting the ‘strides‘ argument to (2,2).

The effect is the model will upsample the input from 64×64 to 128×128.

# example of upsampling with strided convolutions from keras.models import Sequential from keras.layers import Conv2DTranspose # define model model = Sequential() model.add(Conv2DTranspose(64, kernel_size=(4,4), strides=(2,2), padding='same', input_shape=(64,64,3))) # summarize model model.summary()

Running the example shows the shape of the output of the convolutional layer, where the feature maps have quadruple the area.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= conv2d_transpose_1 (Conv2DTr (None, 128, 128, 64)      3136 ================================================================= Total params: 3,136 Trainable params: 3,136 Non-trainable params: 0 _________________________________________________________________

3. Use LeakyReLU

The rectified linear activation unit, or ReLU for short, is a simple calculation that returns the value provided as input directly, or the value 0.0 if the input is 0.0 or less.

It has become a best practice when developing deep convolutional neural networks generally.

The best practice for GANs is to use a variation of the ReLU that allows some values less than zero and learns where the cut-off should be in each node. This is called the leaky rectified linear activation unit, or LeakyReLU for short.

A negative slope can be specified for the LeakyReLU and the default value of 0.2 is recommended.

Originally, ReLU was recommend for use in the generator model and LeakyReLU was recommended for use in the discriminator model, although more recently, the LeakyReLU is recommended in both models.

The example below demonstrates using the LeakyReLU with the default slope of 0.2 after a convolutional layer in a discriminator model.

# example of using leakyrelu in a discriminator model from keras.models import Sequential from keras.layers import Conv2D from keras.layers import BatchNormalization from keras.layers import LeakyReLU # define model model = Sequential() model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3))) model.add(LeakyReLU(0.2)) # summarize model model.summary()

Running the example demonstrates the structure of the model with a single convolutional layer followed by the activation layer.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU)    (None, 32, 32, 64)        0 ================================================================= Total params: 1,792 Trainable params: 1,792 Non-trainable params: 0 _________________________________________________________________

4. Use Batch Normalization

Batch normalization standardizes the activations from a prior layer to have a zero mean and unit variance. This has the effect of stabilizing the training process.

Batch normalization is used after the activation of convolution and transpose convolutional layers in the discriminator and generator models respectively.

It is added to the model after the hidden layer, but before the activation, such as LeakyReLU.

The example below demonstrates adding a Batch Normalization layer after a Conv2D layer in a discriminator model but before the activation.

# example of using batch norm in a discriminator model from keras.models import Sequential from keras.layers import Conv2D from keras.layers import BatchNormalization from keras.layers import LeakyReLU # define model model = Sequential() model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3))) model.add(BatchNormalization()) model.add(LeakyReLU(0.2)) # summarize model model.summary()

Running the example shows the desired usage of batch norm between the outputs of the convolutional layer and the activation function.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792 _________________________________________________________________ batch_normalization_1 (Batch (None, 32, 32, 64)        256 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU)    (None, 32, 32, 64)        0 ================================================================= Total params: 2,048 Trainable params: 1,920 Non-trainable params: 128 _________________________________________________________________

5. Use Gaussian Weight Initialization

Before a neural network can be trained, the model weights (parameters) must be initialized to small random variables.

The best practice for DCAGAN models reported in the paper is to initialize all weights using a zero-centered Gaussian distribution (the normal or bell-shaped distribution) with a standard deviation of 0.02.

The example below demonstrates defining a random Gaussian weight initializer with a mean of 0 and a standard deviation of 0.02 for use in a transpose convolutional layer in a generator model.

The same weight initializer instance could be used for each layer in a given model.

# example of gaussian weight initialization in a generator model from keras.models import Sequential from keras.layers import Conv2DTranspose from keras.initializers import RandomNormal # define model model = Sequential() init = RandomNormal(mean=0.0, stddev=0.02) model.add(Conv2DTranspose(64, kernel_size=(4,4), strides=(2,2), padding='same', kernel_initializer=init, input_shape=(64,64,3)))

6. Use Adam Stochastic Gradient Descent

Stochastic gradient descent, or SGD for short, is the standard algorithm used to optimize the weights of convolutional neural network models.

There are many variants of the training algorithm. The best practice for training DCGAN models is to use the Adam version of stochastic gradient descent with the learning rate of 0.0002 and the beta1 momentum value of 0.5 instead of the default of 0.9.

The Adam optimization algorithm with this configuration is recommended when both optimizing the discriminator and generator models.

The example below demonstrates configuring the Adam stochastic gradient descent optimization algorithm for training a discriminator model.

# example of using adam when training a discriminator model from keras.models import Sequential from keras.layers import Conv2D from keras.optimizers import Adam # define model model = Sequential() model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3))) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

7. Scale Images to the Range [-1,1]

It is recommended to use the hyperbolic tangent activation function as the output from the generator model.

As such, it is also recommended that real images used to train the discriminator are scaled so that their pixel values are in the range [-1,1]. This is so that the discriminator will always receive images as input, real and fake, that have pixel values in the same range.

Typically, image data is loaded as a NumPy array such that pixel values are 8-bit unsigned integer (uint8) values in the range [0, 255].

First, the array must be converted to floating point values, then rescaled to the required range.

The example below provides a function that will appropriately scale a NumPy array of loaded image data to the required range of [-1,1].

# example of a function for scaling images  # scale image data from [0,255] to [-1,1] def scale_images(images): 	# convert from unit8 to float32 	images = images.astype('float32') 	# scale from [0,255] to [-1,1] 	images = (images - 127.5) / 127.5 	return images

Soumith Chintala’s GAN Hacks

Soumith Chintala, one of the co-authors of the DCGAN paper, made a presentation at NIPS 2016 titled “How to Train a GAN?” summarizing many tips and tricks.

The video is available on YouTube and is highly recommended. A summary of the tips is also available as a GitHub repository titled “How to Train a GAN? Tips and tricks to make GANs work.”

The tips draw upon the suggestions from the DCGAN paper as well as elsewhere.

In this section, we will review how to implement four additional GAN best practices not covered in the previous section.

1. Use a Gaussian Latent Space

The latent space defines the shape and distribution of the input to the generator model used to generate new images.

The DCGAN recommends sampling from a uniform distribution, meaning that the shape of the latent space is a hypercube.

The more recent best practice is to sample from a standard Gaussian distribution, meaning that the shape of the latent space is a hypersphere, with a mean of zero and a standard deviation of one.

The example below demonstrates how to generate 500 random Gaussian points from a 100-dimensional latent space that can be used as input to a generator model; each point could be used to generate an image.

# example of sampling from a gaussian latent space from numpy.random import randn  # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): 	# generate points in the latent space 	x_input = randn(latent_dim * n_samples) 	# reshape into a batch of inputs for the network 	x_input = x_input.reshape((n_samples, latent_dim)) 	return x_input  # size of latent space n_dim = 100 # number of samples to generate n_samples = 500 # generate samples samples = generate_latent_points(n_dim, n_samples) # summarize print(samples.shape, samples.mean(), samples.std())

Running the example summarizes the generation of 500 points, each comprised of 100 random Gaussian values with a mean close to zero and a standard deviation close to 1, e.g. a standard Gaussian distribution.

(500, 100) -0.004791256735601787 0.9976912528950904

2. Separate Batches of Real and Fake Images

The discriminator model is trained using stochastic gradient descent with mini-batches.

The best practice is to update the discriminator with separate batches of real and fake images rather than combining real and fake images into a single batch.

This can be achieved by updating the model weights for the discriminator model with two separate calls to the train_on_batch() function.

The code snippet below demonstrates how you can do this within the inner loop of code when training your discriminator model.

... # get randomly selected 'real' samples X_real, y_real = ... # update discriminator model weights discriminator.train_on_batch(X_real, y_real) # generate 'fake' examples X_fake, y_fake = ... # update discriminator model weights discriminator.train_on_batch(X_fake, y_fake)

3. Use Label Smoothing

It is common to use the class label 1 to represent real images and class label 0 to represent fake images when training the discriminator model.

These are called hard labels, as the label values are precise or crisp.

It is a good practice to use soft labels, such as values slightly more or less than 1.0 or slightly more than 0.0 for real and fake images respectively, where the variation for each image is random.

This is often referred to as label smoothing and can have a regularizing effect when training the model.

The example below demonstrates defining 1,000 labels for the positive class (class=1) and smoothing the label values uniformly into the range [0.7,1.2] as recommended.

# example of positive label smoothing from numpy import ones from numpy.random import random  # example of smoothing class=1 to [0.7, 1.2] def smooth_positive_labels(y): 	return y - 0.3 + (random(y.shape) * 0.5)   # generate 'real' class labels (1) n_samples = 1000 y = ones((n_samples, 1)) # smooth labels y = smooth_positive_labels(y) # summarize smooth labels print(y.shape, y.min(), y.max())

Running the example summarizes the min and max values for the smooth values, showing they are close to the expected values.

(1000, 1) 0.7003103006957805 1.1997858934066357

There have been some suggestions that only positive-class label smoothing is required and to values less than 1.0. Nevertheless, you can also smooth negative class labels.

The example below demonstrates generating 1,000 labels for the negative class (class=0) and smoothing the label values uniformly into the range [0.0, 0.3] as recommended.

# example of negative label smoothing from numpy import zeros from numpy.random import random  # example of smoothing class=0 to [0.0, 0.3] def smooth_negative_labels(y): 	return y + random(y.shape) * 0.3  # generate 'fake' class labels (0) n_samples = 1000 y = zeros((n_samples, 1)) # smooth labels y = smooth_negative_labels(y) # summarize smooth labels print(y.shape, y.min(), y.max())

4. Use Noisy Labels

The labels used when training the discriminator model are always correct.

This means that fake images are always labeled with class 0 and real images are always labeled with class 1.

It is recommended to introduce some errors to these labels where some fake images are marked as real, and some real images are marked as fake.

If you are using separate batches to update the discriminator for real and fake images, this may mean randomly adding some fake images to the batch of real images, or randomly adding some real images to the batch of fake images.

If you are updating the discriminator with a combined batch of real and fake images, then this may involve randomly flipping the labels on some images.

The example below demonstrates this by creating 1,000 samples of real (class=1) labels and flipping them with a 5% probability, then doing the same with 1,000 samples of fake (class=0) labels.

# example of noisy labels from numpy import ones from numpy import zeros from numpy.random import choice  # randomly flip some labels def noisy_labels(y, p_flip): 	# determine the number of labels to flip 	n_select = int(p_flip * y.shape[0]) 	# choose labels to flip 	flip_ix = choice([i for i in range(y.shape[0])], size=n_select) 	# invert the labels in place 	y[flip_ix] = 1 - y[flip_ix] 	return y  # generate 'real' class labels (1) n_samples = 1000 y = ones((n_samples, 1)) # flip labels with 5% probability y = noisy_labels(y, 0.05) # summarize labels print(y.sum())  # generate 'fake' class labels (0) y = zeros((n_samples, 1)) # flip labels with 5% probability y = noisy_labels(y, 0.05) # summarize labels print(y.sum())

Try running the example a few times.

The results show that approximately 50 “1”s are flipped to 1s for the positive labels (e.g. 5% of 1,0000) and approximately 50 “0”s are flopped to 1s in for the negative labels.

950.049.0

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

API

Articles

Summary

In this tutorial, you discovered how to implement a suite of best practices or GAN hacks that you can copy-and-paste directly into your GAN project.

Specifically, you learned:

  • The best sources for practical heuristics or hacks when developing generative adversarial networks.
  • How to implement seven best practices for the deep convolutional GAN model architecture from scratch.
  • How to implement four additional best practices from Soumith Chintala’s GAN Hacks presentation and list.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement GAN Hacks to Train Stable Generative Adversarial Networks appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

23 Electric Vehicle Incentives Cities Can Implement To Stimulate EV Adoption – CleanTechnica

23 Electric Vehicle Incentives Cities Can Implement To Stimulate EV Adoption  CleanTechnica

March 17th, 2019 by Zachary Shahan. Below is one portion of our free report Electric Vehicle Charging Infrastructure: Guidelines for Cities. This report was …

“Sweden ev” – Google News

How to Implement VGG, Inception and ResNet Modules for Convolutional Neural Networks from Scratch

There are discrete architectural elements from milestone models that you can use in the design of your own convolutional neural networks.

Specifically, models that have achieved state-of-the-art results for tasks like image classification use discrete architecture elements repeated multiple times, such as the VGG block in the VGG models, the inception module in the GoogLeNet, and the residual module in the ResNet.

Once you able to implement parameterized versions of these architecture elements, you can use them in the design of your own models for computer vision and other applications.

In this tutorial, you will discover how to implement the key architecture elements from milestone convolutional neural network models, from scratch.

After completing this tutorial, you will know:

  • How to implement a VGG module used in the VGG-16 and VGG-19 convolutional neural network models.
  • How to implement the naive and optimized inception module used in the GoogLeNet model.
  • How to implement the identity residual module used in the ResNet model.

Let’s get started.

How to Implement Major Architecture Innovations for Convolutional Neural Networks

How to Implement Major Architecture Innovations for Convolutional Neural Networks
Photo by daveynin, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. How to implement VGG Blocks
  2. How to implement the Inception Module
  3. How to implement the Residual Module

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement VGG Blocks

The VGG convolutional neural network architecture, named for the Visual Geometry Group at Oxford, was an important milestone in the use of deep learning methods for computer vision.

The architecture was described in the 2014 paper titled “Very Deep Convolutional Networks for Large-Scale Image Recognition” by Karen Simonyan and Andrew Zisserman and achieved top results in the LSVRC-2014 computer vision competition.

The key innovation in this architecture was the definition and repetition of what we will refer to as VGG-blocks. These are groups of convolutional layers that use small filters (e.g. 3×3 pixels) followed by a max pooling layer.

The image is passed through a stack of convolutional (conv.) layers, where we use filters with a very small receptive field: 3 x 3 (which is the smallest size to capture the notion of left/right, up/down, center). […] Max-pooling is performed over a 2 x 2 pixel window, with stride 2.

Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

A convolutional neural network with VGG-blocks is a sensible starting point when developing a new model from scratch as it is easy to understand, easy to implement, and very effective at extracting features from images.

We can generalize the specification of a VGG-block as one or more convolutional layers with the same number of filters and a filter size of 3×3, a stride of 1×1, same padding so the output size is the same as the input size for each filter, and the use of a rectified linear activation function. These layers are then followed by a max pooling layer with a size of 2×2 and a stride of the same dimensions.

We can define a function to create a VGG-block using the Keras functional API with a given number of convolutional layers and with a given number of filters per layer.

# function for creating a vgg block def vgg_block(layer_in, n_filters, n_conv): 	# add convolutional layers 	for _ in range(n_conv): 		layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu')(layer_in) 	# add max pooling layer 	layer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in) 	return layer_in

To use the function, one would pass in the layer prior to the block and receive the layer for the end of the block that can be used to integrate into the model.

For example, the first layer might be an input layer which could be passed into the function as an argument. The function then returns a reference to the final layer in the block, the pooling layer, that could be connected to a flatten layer and subsequent dense layers for making a classification prediction.

We can demonstrate how to use this function by defining a small model that expects square color images as input and adds a single VGG block to the model with two convolutional layers, each with 64 filters.

# Example of creating a CNN model with a VGG block from keras.models import Model from keras.layers import Input from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.utils import plot_model  # function for creating a vgg block def vgg_block(layer_in, n_filters, n_conv): 	# add convolutional layers 	for _ in range(n_conv): 		layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu')(layer_in) 	# add max pooling layer 	layer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in) 	return layer_in  # define model input visible = Input(shape=(256, 256, 3)) # add vgg module layer = vgg_block(visible, 64, 2) # create model model = Model(inputs=visible, outputs=layer) # summarize model model.summary() # plot model architecture plot_model(model, show_shapes=True, to_file='vgg_block.png')

Running the example creates the model and summarizes the structure.

We can see that, as intended, the model added a single VGG block with two convolutional layers each with 64 filters, followed by a max pooling layer.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= input_1 (InputLayer)         (None, 256, 256, 3)       0 _________________________________________________________________ conv2d_1 (Conv2D)            (None, 256, 256, 64)      1792 _________________________________________________________________ conv2d_2 (Conv2D)            (None, 256, 256, 64)      36928 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 128, 128, 64)      0 ================================================================= Total params: 38,720 Trainable params: 38,720 Non-trainable params: 0 _________________________________________________________________

A plot is also created of the model architecture that may help to make the model layout more concrete.

Note, creating the plot assumes that you have pydot and pygraphviz installed. If this is not the case, you can comment out the import statement and call to the plot_model() function in the example.

Plot of Convolutional Neural Network Architecture With a VGG Block

Plot of Convolutional Neural Network Architecture With a VGG Block

Using VGG blocks in your own models should be common because they are so simple and effective.

We can expand the example and demonstrate a single model that has three VGG blocks, the first two blocks have two convolutional layers with 64 and 128 filters respectively, the third block has four convolutional layers with 256 filters. This is a common usage of VGG blocks where the number of filters is increased with the depth of the model.

The complete code listing is provided below.

# Example of creating a CNN model with many VGG blocks from keras.models import Model from keras.layers import Input from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.utils import plot_model  # function for creating a vgg block def vgg_block(layer_in, n_filters, n_conv): 	# add convolutional layers 	for _ in range(n_conv): 		layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu')(layer_in) 	# add max pooling layer 	layer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in) 	return layer_in  # define model input visible = Input(shape=(256, 256, 3)) # add vgg module layer = vgg_block(visible, 64, 2) # add vgg module layer = vgg_block(layer, 128, 2) # add vgg module layer = vgg_block(layer, 256, 4) # create model model = Model(inputs=visible, outputs=layer) # summarize model model.summary() # plot model architecture plot_model(model, show_shapes=True, to_file='multiple_vgg_blocks.png')

Again, running the example summarizes the model architecture and we can clearly see the pattern of VGG blocks.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= input_1 (InputLayer)         (None, 256, 256, 3)       0 _________________________________________________________________ conv2d_1 (Conv2D)            (None, 256, 256, 64)      1792 _________________________________________________________________ conv2d_2 (Conv2D)            (None, 256, 256, 64)      36928 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 128, 128, 64)      0 _________________________________________________________________ conv2d_3 (Conv2D)            (None, 128, 128, 128)     73856 _________________________________________________________________ conv2d_4 (Conv2D)            (None, 128, 128, 128)     147584 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 64, 64, 128)       0 _________________________________________________________________ conv2d_5 (Conv2D)            (None, 64, 64, 256)       295168 _________________________________________________________________ conv2d_6 (Conv2D)            (None, 64, 64, 256)       590080 _________________________________________________________________ conv2d_7 (Conv2D)            (None, 64, 64, 256)       590080 _________________________________________________________________ conv2d_8 (Conv2D)            (None, 64, 64, 256)       590080 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 32, 32, 256)       0 ================================================================= Total params: 2,325,568 Trainable params: 2,325,568 Non-trainable params: 0 _________________________________________________________________

A plot of the model architecture is created providing a different perspective on the same linear progression of layers.

Plot of Convolutional Neural Network Architecture With Multiple VGG Blocks

Plot of Convolutional Neural Network Architecture With Multiple VGG Blocks

How to Implement the Inception Module

The inception module was described and used in the GoogLeNet model in the 2015 paper by Christian Szegedy, et al. titled “Going Deeper with Convolutions.”

Like the VGG model, the GoogLeNet model achieved top results in the 2014 version of the ILSVRC challenge.

The key innovation on the inception model is called the inception module. This is a block of parallel convolutional layers with different sized filters (e.g. 1×1, 3×3, 5×5) and a and 3×3 max pooling layer, the results of which are then concatenated.

In order to avoid patch-alignment issues, current incarnations of the Inception architecture are restricted to filter sizes 1×1, 3×3 and 5×5; this decision was based more on convenience rather than necessity. […] Additionally, since pooling operations have been essential for the success of current convolutional networks, it suggests that adding an alternative parallel pooling path in each such stage should have additional beneficial effect, too

Going Deeper with Convolutions, 2015.

This is a very simple and powerful architectural unit that allows the model to learn not only parallel filters of the same size, but parallel filters of differing sizes, allowing learning at multiple scales.

We can implement an inception module directly using the Keras functional API. The function below will create a single inception module with a fixed number of filters for each of the parallel convolutional layers. From the GoogLeNet architecture described in the paper, it does not appear to use a systematic sizing of filters for parallel convolutional layers as the model is highly optimized. As such, we can parameterize the module definition so that we can specify the number of filters to use in each of the 1×1, 3×3, and 5×5 filters.

# function for creating a naive inception block def inception_module(layer_in, f1, f2, f3): 	# 1x1 conv 	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in) 	# 3x3 conv 	conv3 = Conv2D(f2, (3,3), padding='same', activation='relu')(layer_in) 	# 5x5 conv 	conv5 = Conv2D(f3, (5,5), padding='same', activation='relu')(layer_in) 	# 3x3 max pooling 	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in) 	# concatenate filters, assumes filters/channels last 	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1) 	return layer_out

To use the function, provide the reference to the prior layer as input, the number of filters, and it will return a reference to the concatenated filters layer that you can then connect to more inception modules or a submodel for making a prediction.

We can demonstrate how to use this function by creating a model with a single inception module. In this case, the number of filters is based on “inception (3a)” from Table 1 in the paper.

The complete example is listed below.

# example of creating a CNN with an inception module from keras.models import Model from keras.layers import Input from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers.merge import concatenate from keras.utils import plot_model  # function for creating a naive inception block def naive_inception_module(layer_in, f1, f2, f3): 	# 1x1 conv 	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in) 	# 3x3 conv 	conv3 = Conv2D(f2, (3,3), padding='same', activation='relu')(layer_in) 	# 5x5 conv 	conv5 = Conv2D(f3, (5,5), padding='same', activation='relu')(layer_in) 	# 3x3 max pooling 	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in) 	# concatenate filters, assumes filters/channels last 	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1) 	return layer_out  # define model input visible = Input(shape=(256, 256, 3)) # add inception module layer = naive_inception_module(visible, 64, 128, 32) # create model model = Model(inputs=visible, outputs=layer) # summarize model model.summary() # plot model architecture plot_model(model, show_shapes=True, to_file='naive_inception_module.png')

Running the example creates the model and summarizes the layers.

We know the convolutional and pooling layers are parallel, but this summary does not capture the structure easily.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 256, 256, 64) 256         input_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 256, 256, 128 3584        input_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 256, 256, 32) 2432        input_1[0][0] __________________________________________________________________________________________________ max_pooling2d_1 (MaxPooling2D)  (None, 256, 256, 3)  0           input_1[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 256, 256, 227 0           conv2d_1[0][0]                                                                  conv2d_2[0][0]                                                                  conv2d_3[0][0]                                                                  max_pooling2d_1[0][0] ================================================================================================== Total params: 6,272 Trainable params: 6,272 Non-trainable params: 0 __________________________________________________________________________________________________

A plot of the model architecture is also created that helps to clearly see the parallel structure of the module as well as the matching shapes of the output of each element of the module that allows their direct concatenation by the third dimension (filters or channels).

Plot of Convolutional Neural Network Architecture With a Naive Inception Module

Plot of Convolutional Neural Network Architecture With a Naive Inception Module

The version of the inception module that we have implemented is called the naive inception module.

A modification to the module was made in order to reduce the amount of computation required. Specifically, 1×1 convolutional layers were added to reduce the number of filters before the 3×3 and 5×5 convolutional layers, and to increase the number of filters after the pooling layer.

This leads to the second idea of the Inception architecture: judiciously reducing dimension wherever the computational requirements would increase too much otherwise. […] That is, 1×1 convolutions are used to compute reductions before the expensive 3×3 and 5×5 convolutions. Besides being used as reductions, they also include the use of rectified linear activation making them dual-purpose

Going Deeper with Convolutions, 2015.

If you intend to use many inception modules in your model, you may require this computational performance-based modification.

The function below implements this optimization improvement with parameterization so that you can control the amount of reduction in the number of filters prior to the 3×3 and 5×5 convolutional layers and the number of increased filters after max pooling.

# function for creating a projected inception module def inception_module(layer_in, f1, f2_in, f2_out, f3_in, f3_out, f4_out): 	# 1x1 conv 	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in) 	# 3x3 conv 	conv3 = Conv2D(f2_in, (1,1), padding='same', activation='relu')(layer_in) 	conv3 = Conv2D(f2_out, (3,3), padding='same', activation='relu')(conv3) 	# 5x5 conv 	conv5 = Conv2D(f3_in, (1,1), padding='same', activation='relu')(layer_in) 	conv5 = Conv2D(f3_out, (5,5), padding='same', activation='relu')(conv5) 	# 3x3 max pooling 	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in) 	pool = Conv2D(f4_out, (1,1), padding='same', activation='relu')(pool) 	# concatenate filters, assumes filters/channels last 	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1) 	return layer_out

We can create a model with two of these optimized inception modules to get a concrete idea of how the architecture looks in practice.

In this case, the number of filter configurations are based on “inception (3a)” and “inception (3b)” from Table 1 in the paper.

The complete example is listed below.

# example of creating a CNN with an efficient inception module from keras.models import Model from keras.layers import Input from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers.merge import concatenate from keras.utils import plot_model  # function for creating a projected inception module def inception_module(layer_in, f1, f2_in, f2_out, f3_in, f3_out, f4_out): 	# 1x1 conv 	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in) 	# 3x3 conv 	conv3 = Conv2D(f2_in, (1,1), padding='same', activation='relu')(layer_in) 	conv3 = Conv2D(f2_out, (3,3), padding='same', activation='relu')(conv3) 	# 5x5 conv 	conv5 = Conv2D(f3_in, (1,1), padding='same', activation='relu')(layer_in) 	conv5 = Conv2D(f3_out, (5,5), padding='same', activation='relu')(conv5) 	# 3x3 max pooling 	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in) 	pool = Conv2D(f4_out, (1,1), padding='same', activation='relu')(pool) 	# concatenate filters, assumes filters/channels last 	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1) 	return layer_out  # define model input visible = Input(shape=(256, 256, 3)) # add inception block 1 layer = inception_module(visible, 64, 96, 128, 16, 32, 32) # add inception block 1 layer = inception_module(layer, 128, 128, 192, 32, 96, 64) # create model model = Model(inputs=visible, outputs=layer) # summarize model model.summary() # plot model architecture plot_model(model, show_shapes=True, to_file='inception_module.png')

Running the example creates a linear summary of the layers that does not really help to understand what is going on.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 256, 256, 96) 384         input_1[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 256, 256, 16) 64          input_1[0][0] __________________________________________________________________________________________________ max_pooling2d_1 (MaxPooling2D)  (None, 256, 256, 3)  0           input_1[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 256, 256, 64) 256         input_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 256, 256, 128 110720      conv2d_2[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 256, 256, 32) 12832       conv2d_4[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 256, 256, 32) 128         max_pooling2d_1[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 256, 256, 256 0           conv2d_1[0][0]                                                                  conv2d_3[0][0]                                                                  conv2d_5[0][0]                                                                  conv2d_6[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 256, 256, 128 32896       concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_10 (Conv2D)              (None, 256, 256, 32) 8224        concatenate_1[0][0] __________________________________________________________________________________________________ max_pooling2d_2 (MaxPooling2D)  (None, 256, 256, 256 0           concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 256, 256, 128 32896       concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 256, 256, 192 221376      conv2d_8[0][0] __________________________________________________________________________________________________ conv2d_11 (Conv2D)              (None, 256, 256, 96) 76896       conv2d_10[0][0] __________________________________________________________________________________________________ conv2d_12 (Conv2D)              (None, 256, 256, 64) 16448       max_pooling2d_2[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate)     (None, 256, 256, 480 0           conv2d_7[0][0]                                                                  conv2d_9[0][0]                                                                  conv2d_11[0][0]                                                                  conv2d_12[0][0] ================================================================================================== Total params: 513,120 Trainable params: 513,120 Non-trainable params: 0 __________________________________________________________________________________________________

A plot of the model architecture is created that does make the layout of each module clear and how the first model feeds the second module.

Note that the first 1×1 convolution in each inception module is on the far right for space reasons, but besides that, the other layers are organized left to right within each module.

Plot of Convolutional Neural Network Architecture With a Efficient Inception Module

Plot of Convolutional Neural Network Architecture With a Efficient Inception Module

How to Implement the Residual Module

The Residual Network, or ResNet, architecture for convolutional neural networks was proposed by Kaiming He, et al. in their 2016 paper titled “Deep Residual Learning for Image Recognition,” which achieved success on the 2015 version of the ILSVRC challenge.

A key innovation in the ResNet was the residual module. The residual module, specifically the identity residual model, is a block of two convolutional layers with the same number of filters and a small filter size where the output of the second layer is added with the input to the first convolutional layer. Drawn as a graph, the input to the module is added to the output of the module and is called a shortcut connection.

We can implement this directly in Keras using the functional API and the add() merge function.

# function for creating an identity residual module def residual_module(layer_in, n_filters): 	# conv1 	conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in) 	# conv2 	conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1) 	# add filters, assumes filters/channels last 	layer_out = add([conv2, layer_in]) 	# activation function 	layer_out = Activation('relu')(layer_out) 	return layer_out

A limitation with this direct implementation is that if the number of filters in the input layer does not match the number of filters in the last convolutional layer of the module (defined by n_filters), then we will get an error.

One solution is to use a 1×1 convolution layer, often referred to as a projection layer, to either increase the number of filters for the input layer or reduce the number of filters for the last convolutional layer in the module. The former solution makes more sense, and is the approach proposed in the paper, referred to as a projection shortcut.

When the dimensions increase […], we consider two options: (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut […] is used to match dimensions (done by 1×1 convolutions).

Deep Residual Learning for Image Recognition, 2015.

Below is an updated version of the function that will use the identity if possible, otherwise a projection of the number of filters in the input does not match the n_filters argument.

# function for creating an identity or projection residual module def residual_module(layer_in, n_filters): 	merge_input = layer_in 	# check if the number of filters needs to be increase, assumes channels last format 	if layer_in.shape[-1] != n_filters: 		merge_input = Conv2D(n_filters, (1,1), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in) 	# conv1 	conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in) 	# conv2 	conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1) 	# add filters, assumes filters/channels last 	layer_out = add([conv2, merge_input]) 	# activation function 	layer_out = Activation('relu')(layer_out) 	return layer_out

We can demonstrate the usage of this module in a simple model.

The complete example is listed below.

# example of a CNN model with an identity or projection residual module from keras.models import Model from keras.layers import Input from keras.layers import Activation from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import add from keras.utils import plot_model  # function for creating an identity or projection residual module def residual_module(layer_in, n_filters): 	merge_input = layer_in 	# check if the number of filters needs to be increase, assumes channels last format 	if layer_in.shape[-1] != n_filters: 		merge_input = Conv2D(n_filters, (1,1), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in) 	# conv1 	conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in) 	# conv2 	conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1) 	# add filters, assumes filters/channels last 	layer_out = add([conv2, merge_input]) 	# activation function 	layer_out = Activation('relu')(layer_out) 	return layer_out  # define model input visible = Input(shape=(256, 256, 3)) # add vgg module layer = residual_module(visible, 64) # create model model = Model(inputs=visible, outputs=layer) # summarize model model.summary() # plot model architecture plot_model(model, show_shapes=True, to_file='residual_module.png')

Running the example first creates the model then prints a summary of the layers.

Because the module is linear, this summary is helpful to see what is going on.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 256, 256, 64) 1792        input_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 256, 256, 64) 36928       conv2d_2[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 256, 256, 64) 256         input_1[0][0] __________________________________________________________________________________________________ add_1 (Add)                     (None, 256, 256, 64) 0           conv2d_3[0][0]                                                                  conv2d_1[0][0] __________________________________________________________________________________________________ activation_1 (Activation)       (None, 256, 256, 64) 0           add_1[0][0] ================================================================================================== Total params: 38,976 Trainable params: 38,976 Non-trainable params: 0 __________________________________________________________________________________________________

A plot of the model architecture is also created.

We can see the module with the inflation of the number of filters in the input and the addition of the two elements at the end of the module.

Plot of Convolutional Neural Network Architecture With an Residual Module

Plot of Convolutional Neural Network Architecture With an Residual Module

The paper describes other types of residual connections such as bottlenecks. These are left as an exercise for the reader that can be implemented easily by updating the residual_module() function.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Posts

Papers

API

Summary

In this tutorial, you discovered how to implement key architecture elements from milestone convolutional neural network models, from scratch.

Specifically, you learned:

  • How to implement a VGG module used in the VGG-16 and VGG-19 convolutional neural network models.
  • How to implement the naive and optimized inception module used in the GoogLeNet model.
  • How to implement the identity residual module used in the ResNet model.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement VGG, Inception and ResNet Modules for Convolutional Neural Networks from Scratch appeared first on Machine Learning Mastery.

Machine Learning Mastery Blog – Machine Learning Mastery