How to Implement Progressive Growing GAN Models in Keras

The progressive growing generative adversarial network is an approach for training a deep convolutional neural network model for generating synthetic images.

It is an extension of the more traditional GAN architecture that involves incrementally growing the size of the generated image during training, starting with a very small image, such as a 4×4 pixels. This allows the stable training and growth of GAN models capable of generating very large high-quality images, such as images of synthetic celebrity faces with the size of 1024×1024 pixels.

In this tutorial, you will discover how to develop progressive growing generative adversarial network models from scratch with Keras.

After completing this tutorial, you will know:

  • How to develop pre-defined discriminator and generator models at each level of output image growth.
  • How to define composite models for training the generator models via the discriminator models.
  • How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Implement Progressive Growing GAN Models in Keras

How to Implement Progressive Growing GAN Models in Keras
Photo by Diogo Santos Silva, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Progressive Growing GAN Architecture?
  2. How to Implement the Progressive Growing GAN Discriminator Model
  3. How to Implement the Progressive Growing GAN Generator Model
  4. How to Implement Composite Models for Updating the Generator
  5. How to Train Discriminator and Generator Models

What Is the Progressive Growing GAN Architecture?

GANs are effective at generating crisp synthetic images, although are typically limited in the size of the images that can be generated.

The Progressive Growing GAN is an extension to the GAN that allows the training of generator models capable of outputting large high-quality images, such as photorealistic faces with the size 1024×1024 pixels. It was described in the 2017 paper by Tero Karras, et al. from Nvidia titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation.”

The key innovation of the Progressive Growing GAN is the incremental increase in the size of images output by the generator starting with a 4×4 pixel image and double to 8×8, 16×16, and so on until the desired output resolution.

Our primary contribution is a training methodology for GANs where we start with low-resolution images, and then progressively increase the resolution by adding layers to the networks.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

This is achieved by a training procedure that involves periods of fine-tuning the model with a given output resolution, and periods of slowly phasing in a new model with a larger resolution.

When doubling the resolution of the generator (G) and discriminator (D) we fade in the new layers smoothly

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

All layers remain trainable during the training process, including existing layers when new layers are added.

All existing layers in both networks remain trainable throughout the training process.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Progressive Growing GAN involves using a generator and discriminator model with the same general structure and starting with very small images. During training, new blocks of convolutional layers are systematically added to both the generator model and the discriminator models.

Example of Progressively Adding Layers to Generator and Discriminator Models.

Example of Progressively Adding Layers to Generator and Discriminator Models.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The incremental addition of the layers allows the models to effectively learn coarse-level detail and later learn ever finer detail, both on the generator and discriminator side.

This incremental nature allows the training to first discover the large-scale structure of the image distribution and then shift attention to increasingly finer-scale detail, instead of having to learn all scales simultaneously.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The model architecture is complex and cannot be implemented directly.

In this tutorial, we will focus on how the progressive growing GAN can be implemented using the Keras deep learning library.

We will step through how each of the discriminator and generator models can be defined, how the generator can be trained via the discriminator model, and how each model can be updated during the training process.

These implementation details will provide the basis for you developing a progressive growing GAN for your own applications.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the Progressive Growing GAN Discriminator Model

The discriminator model is given images as input and must classify them as either real (from the dataset) or fake (generated).

During the training process, the discriminator must grow to support images with ever-increasing size, starting with 4×4 pixel color images and doubling to 8×8, 16×16, 32×32, and so on.

This is achieved by inserting a new input layer to support the larger input image followed by a new block of layers. The output of this new block is then downsampled. Additionally, the new image is also downsampled directly and passed through the old input processing layer before it is combined with the output of the new block.

During the transition from a lower resolution to a higher resolution, e.g. 16×16 to 32×32, the discriminator model will have two input pathways as follows:

  • [32×32 Image] -> [fromRGB Conv] -> [NewBlock] -> [Downsample] ->
  • [32×32 Image] -> [Downsample] -> [fromRGB Conv] ->

The output of the new block that is downsampled and the output of the old input processing layer are combined using a weighted average, where the weighting is controlled by a new hyperparameter called alpha. The weighted sum is calculated as follows:

  • Output = ((1 – alpha) * fromRGB) + (alpha * NewBlock)

The weighted average of the two pathways is then fed into the rest of the existing model.

Initially, the weighting is completely biased towards the old input processing layer (alpha=0) and is linearly increased over training iterations so that the new block is given more weight until eventually, the output is entirely the product of the new block (alpha=1). At this time, the old pathway can be removed.

This can be summarized with the following figure taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

Figure Showing the Growing of the Discriminator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution

Figure Showing the Growing of the Discriminator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The fromRGB layers are implemented as a 1×1 convolutional layer. A block is comprised of two convolutional layers with 3×3 sized filters and the leaky ReLU activation function with a slope of 0.2, followed by a downsampling layer. Average pooling is used for downsampling, which is unlike most other GAN models that use transpose convolutional layers.

The output of the model involves two convolutional layers with 3×3 and 4×4 sized filters and Leaky ReLU activation, followed by a fully connected layer that outputs the single value prediction. The model uses a linear activation function instead of a sigmoid activation function like other discriminator models and is trained directly either by Wasserstein loss (specifically WGAN-GP) or least squares loss; we will use the latter in this tutorial. Model weights are initialized using He Gaussian (he_normal), which is very similar to the method used in the paper.

The model uses a custom layer called Minibatch standard deviation at the beginning of the output block, and instead of batch normalization, each layer uses local response normalization, referred to as pixel-wise normalization in the paper. We will leave out the minibatch normalization and use batch normalization in this tutorial for brevity.

One approach to implementing the progressive growing GAN would be to manually expand a model on demand during training. Another approach is to pre-define all of the models prior to training and carefully use the Keras functional API to ensure that layers are shared across the models and continue training.

I believe the latter approach might be easier and is the approach we will use in this tutorial.

First, we must define a custom layer that we can use when fading in a new higher-resolution input image and block. This new layer must take two sets of activation maps with the same dimensions (width, height, channels) and add them together using a weighted sum.

We can implement this as a new layer called WeightedSum that extends the Add merge layer and uses a hyperparameter ‘alpha‘ to control the contribution of each input. This new class is defined below. The layer assumes only two inputs: the first for the output of the old or existing layers and the second for the newly added layers. The new hyperparameter is defined as a backend variable, meaning that we can change it any time via changing the value of the variable.

# weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output

The discriminator model is by far more complex than the generator to grow because we have to change the model input, so let’s step through this slowly.

Firstly, we can define a discriminator model that takes a 4×4 color image as input and outputs a prediction of whether the image is real or fake. The model is comprised of a 1×1 input processing layer (fromRGB) and an output block.

... # base model input in_image = Input(shape=(4,4,3)) # conv 1x1 g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 (output block) g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 4x4 g = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # dense output layer g = Flatten()(g) out_class = Dense(1)(g) # define model model = Model(in_image, out_class) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

Next, we need to define a new model that handles the intermediate stage between this model and a new discriminator model that takes 8×8 color images as input.

The existing input processing layer must receive a downsampled version of the new 8×8 image. A new input process layer must be defined that takes the 8×8 input image and passes it through a new block of two convolutional layers and a downsampling layer. The output of the new block after downsampling and the old input processing layer must be added together using a weighted sum via our new WeightedSum layer and then must reuse the same output block (two convolutional layers and the output layer).

Given the first defined model and our knowledge about this model (e.g. the number of layers in the input processing layer is 2 for the Conv2D and LeakyReLU), we can construct this new intermediate or fade-in model using layer indexes from the old model.

... old_model = model # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # define new block g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = AveragePooling2D()(g) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input g = WeightedSum()([block_old, g]) # skip the input, 1x1 and activation for the old model for i in range(3, len(old_model.layers)): 	g = old_model.layers[i](g) # define straight-through model model = Model(in_image, g) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

So far, so good.

We also need a version of the same model with the same layers without the fade-in of the input from the old model’s input processing layers.

This straight-through version is required for training before we fade-in the next doubling of the input image size.

We can update the above example to create two versions of the model. First, the straight-through version as it is simpler, then the version used for the fade-in that reuses the layers from the new block and the output layers of the old model.

The add_discriminator_block() function below implements this, returning a list of the two defined models (straight-through and fade-in), and takes the old model as an argument and defines the number of input layers as a default argument (3).

To ensure that the WeightedSum layer works correctly, we have fixed all convolutional layers to always have 64 filters, and in turn, output 64 feature maps. If there is a mismatch between the old model’s input processing layer and the new blocks output in terms of the number of feature maps (channels), then the weighted sum will fail.

# add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]

It is not an elegant function as we have some repetition, but it is readable and will get the job done.

We can then call this function again and again as we double the size of input images. Importantly, the function expects the straight-through version of the prior model as input.

The example below defines a new function called define_discriminator() that defines our base model that expects a 4×4 color image as input, then repeatedly adds blocks to create new versions of the discriminator model each time that expects images with quadruple the area.

# define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

This function will return a list of models, where each item in the list is a two-element list that contains first the straight-through version of the model at that resolution, and second the fade-in version of the model for that resolution.

We can tie all of this together and define a new “discriminator model” that will grow from 4×4, through to 8×8, and finally to 16×16. This is achieved by passing he n_blocks argument to 3 when calling the define_discriminator() function, for the creation of three sets of models.

The complete example is listed below.

# example of defining discriminator models for the progressive growing gan from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define models discriminators = define_discriminator(3) # spot check m = discriminators[2][1] m.summary() plot_model(m, to_file='discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the fade-in version of the third model showing the 16×16 color image inputs and the single value output.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_3 (InputLayer)            (None, 16, 16, 3)    0 __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 16, 16, 64)   256         input_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU)       (None, 16, 16, 64)   0           conv2d_7[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_7[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64)   256         conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_8 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_8[0][0] __________________________________________________________________________________________________ average_pooling2d_4 (AveragePoo (None, 8, 8, 3)      0           input_3[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64)   256         conv2d_9[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 8, 8, 64)     256         average_pooling2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_9 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 8, 8, 64)     0           conv2d_4[1][0] __________________________________________________________________________________________________ average_pooling2d_3 (AveragePoo (None, 8, 8, 64)     0           leaky_re_lu_9[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum)    (None, 8, 8, 64)     0           leaky_re_lu_4[1][0]                                                                  average_pooling2d_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 64)     36928       weighted_sum_2[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64)     256         conv2d_5[2][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_3[2][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 8, 8, 64)     36928       leaky_re_lu_5[2][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64)     256         conv2d_6[2][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_4[2][0] __________________________________________________________________________________________________ average_pooling2d_1 (AveragePoo (None, 4, 4, 64)     0           leaky_re_lu_6[2][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 4, 4, 128)    73856       average_pooling2d_1[2][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128)    512         conv2d_2[4][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_1[4][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 4, 4, 128)    262272      leaky_re_lu_2[4][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128)    512         conv2d_3[4][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_2[4][0] __________________________________________________________________________________________________ flatten_1 (Flatten)             (None, 2048)         0           leaky_re_lu_3[4][0] __________________________________________________________________________________________________ dense_1 (Dense)                 (None, 1)            2049        flatten_1[4][0] ================================================================================================== Total params: 488,449 Trainable params: 487,425 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

Note: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to plot_model().

The plot shows the 16×16 input image that is downsampled and passed through the 8×8 input processing layers from the prior model (left). It also shows the addition of the new block (right) and the weighted average that combines both streams of input, before using the existing model layers to continue processing and outputting a prediction.

Plot of the Fade-In Discriminator Model For the Progressive Growing GAN Transitioning From 8x8 to 16x16 Input Images

Plot of the Fade-In Discriminator Model For the Progressive Growing GAN Transitioning From 8×8 to 16×16 Input Images

Now that we have seen how we can define the discriminator models, let’s look at how we can define the generator models.

How to Implement the Progressive Growing GAN Generator Model

The generator models for the progressive growing GAN are easier to implement in Keras than the discriminator models.

The reason for this is because each fade-in requires a minor change to the output of the model.

Increasing the resolution of the generator involves first upsampling the output of the end of the last block. This is then connected to the new block and a new output layer for an image that is double the height and width dimensions or quadruple the area. During the phase-in, the upsampling is also connected to the output layer from the old model and the output from both output layers is merged using a weighted average.

After the phase-in is complete, the old output layer is removed.

This can be summarized with the following figure, taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

Figure Showing the Growing of the Generator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution

Figure Showing the Growing of the Generator Model, Before (a), During (b), and After (c) the Phase-In of a High Resolution.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The toRGB layer is a convolutional layer with 3 1×1 filters, sufficient to output a color image.

The model takes a point in the latent space as input, e.g. such as a 100-element or 512-element vector as described in the paper. This is scaled up to provided the basis for 4×4 activation maps, followed by a convolutional layer with 4×4 filters and another with 3×3 filters. Like the discriminator, LeakyReLU activations are used, as is pixel normalization, which we will substitute with batch normalization for brevity.

A block involves an upsample layer followed by two convolutional layers with 3×3 filters. Upsampling is achieved using a nearest neighbor method (e.g. duplicating input rows and columns) via a UpSampling2D layer instead of the more common transpose convolutional layer.

We can define the baseline model that will take a point in latent space as input and output a 4×4 color image as follows:

... # base model latent input in_latent = Input(shape=(100,)) # linear scale up to activation maps g  = Dense(128 * 4 * 4, kernel_initializer='he_normal')(in_latent) g = Reshape((4, 4, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(in_latent, out_image)

Next, we need to define a version of the model that uses all of the same input layers, although adds a new block (upsample and 2 convolutional layers) and a new output layer (a 1×1 convolutional layer).

This would be the model after the phase-in to the new output resolution. This can be achieved by using own knowledge about the baseline model and that the end of the last block is the second last layer, e.g. layer at index -2 in the model’s list of layers.

The new model with the addition of a new block and output layer is defined as follows:

... old_model = model # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(old_model.input, out_image)

That is pretty straightforward; we have chopped off the old output layer at the end of the last block and grafted on a new block and output layer.

Now we need a version of this new model to use during the fade-in.

This involves connecting the old output layer to the new upsampling layer at the start of the new block and using an instance of our WeightedSum layer defined in the previous section to combine the output of the old and new output layers.

... # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged)

We can combine the definition of these two operations into a function named add_generator_block(), defined below, that will expand a given model and return both the new generator model with the added block (model1) and a version of the model with the fading in of the new block with the old output layer (model2).

# add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]

We can then call this function with our baseline model to create models with one added block and continue to call it with subsequent models to keep adding blocks.

The define_generator() function below implements this, taking the size of the latent space and number of blocks to add (models to create).

The baseline model is defined as outputting a color image with the shape 4×4, controlled by the default argument in_dim.

# define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

We can tie all of this together and define a baseline generator and the addition of two blocks, so three models in total, where a straight-through and fade-in version of each model is defined.

The complete example is listed below.

# example of defining generator models for the progressive growing gan from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define models generators = define_generator(100, 3) # spot check m = generators[2][1] m.summary() plot_model(m, to_file='generator_plot.png', show_shapes=True, show_layer_names=True)

The example chooses the fade-in model for the last model to summarize.

Running the example first summarizes a linear list of the layers in the model. We can see that the last model takes a point from the latent space and outputs a 16×16 image.

This matches as our expectations as the baseline model outputs a 4×4 image, adding one block increases this to 8×8, and adding one more block increases this to 16×16.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 100)          0 __________________________________________________________________________________________________ dense_1 (Dense)                 (None, 2048)         206848      input_1[0][0] __________________________________________________________________________________________________ reshape_1 (Reshape)             (None, 4, 4, 128)    0           dense_1[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 4, 4, 128)    147584      reshape_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128)    512         conv2d_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 4, 4, 128)    147584      leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128)    512         conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_2[0][0] __________________________________________________________________________________________________ up_sampling2d_1 (UpSampling2D)  (None, 8, 8, 128)    0           leaky_re_lu_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 8, 8, 64)     73792       up_sampling2d_1[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64)     256         conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 64)     36928       leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64)     256         conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_4[0][0] __________________________________________________________________________________________________ up_sampling2d_2 (UpSampling2D)  (None, 16, 16, 64)   0           leaky_re_lu_4[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 16, 16, 64)   36928       up_sampling2d_2[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64)   256         conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_5[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64)   256         conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               multiple             195         up_sampling2d_2[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 16, 16, 3)    195         leaky_re_lu_6[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum)    (None, 16, 16, 3)    0           conv2d_6[1][0]                                                                  conv2d_9[0][0] ================================================================================================== Total params: 689,030 Trainable params: 688,006 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

Note: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to plot_model().

We can see that the output from the last block passes through an UpSampling2D layer before feeding the added block and a new output layer as well as the old output layer before being merged via a weighted sum into the final output layer.

Plot of the Fade-In Generator Model For the Progressive Growing GAN Transitioning From 8x8 to 16x16 Output Images

Plot of the Fade-In Generator Model For the Progressive Growing GAN Transitioning From 8×8 to 16×16 Output Images

Now that we have seen how to define the generator models, we can review how the generator models may be updated via the discriminator models.

How to Implement Composite Models for Updating the Generator

The discriminator models are trained directly with real and fake images as input and a target value of 0 for fake and 1 for real.

The generator models are not trained directly; instead, they are trained indirectly via the discriminator models, just like a normal GAN model.

We can create a composite model for each level of growth of the model, e.g. pair 4×4 generators and 4×4 discriminators. We can also pair the straight-through models together, and the fade-in models together.

For example, we can retrieve the generator and discriminator models for a given level of growth.

... g_models, d_models = generators[0], discriminators[0]

Then we can use them to create a composite model for training the straight-through generator, where the output of the generator is fed directly to the discriminator in order to classify.

# straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

And do the same for the composite model for the fade-in generator.

# fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

The function below, named define_composite(), automates this; given a list of defined discriminator and generator models, it will create an appropriate composite model for training each generator model.

# define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list

Tying this together with the definition of the discriminator and generator models above, the complete example of defining all models at each pre-defined level of growth is listed below.

# example of defining composite models for the progressive growing gan from keras.optimizers import Adam from keras.models import Sequential from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list  # define models discriminators = define_discriminator(3) # define models generators = define_generator(100, 3) # define composite models composite = define_composite(discriminators, generators)

Now that we know how to define all of the models, we can review how the models might be updated during training.

How to Train Discriminator and Generator Models

Pre-defining the generator, discriminator, and composite models was the hard part; training the models is straight forward and much like training any other GAN.

Importantly, in each training iteration the alpha variable in each WeightedSum layer must be set to a new value. This must be set for the layer in both the generator and discriminator models and allows for the smooth linear transition from the old model layers to the new model layers, e.g. alpha values set from 0 to 1 over a fixed number of training iterations.

The update_fadein() function below implements this and will loop through a list of models and set the alpha value on each based on the current step in a given number of training steps. You may be able to implement this more elegantly using a callback.

# update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): 	# calculate current alpha (linear from 0 to 1) 	alpha = step / float(n_steps - 1) 	# update the alpha for each model 	for model in models: 		for layer in model.layers: 			if isinstance(layer, WeightedSum): 				backend.set_value(layer.alpha, alpha)

We can define a generic function for training a given generator, discriminator, and composite model for a given number of training epochs.

The train_epochs() function below implements this where first the discriminator model is updated on real and fake images, then the generator model is updated, and the process is repeated for the required number of training iterations based on the dataset size and the number of epochs.

This function calls helper functions for retrieving a batch of real images via generate_real_samples(), generating a batch of fake samples with the generator generate_fake_samples(), and generating a sample of points in latent space generate_latent_points(). You can define these functions yourself quite trivially.

# train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): 	# calculate the number of batches per training epoch 	bat_per_epo = int(dataset.shape[0] / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# calculate the size of half a batch of samples 	half_batch = int(n_batch / 2) 	# manually enumerate epochs 	for i in range(n_steps): 		# update alpha for all WeightedSum layers when fading in new blocks 		if fadein: 			update_fadein([g_model, d_model, gan_model], i, n_steps) 		# prepare real and fake samples 		X_real, y_real = generate_real_samples(dataset, half_batch) 		X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) 		# update discriminator model 		d_loss1 = d_model.train_on_batch(X_real, y_real) 		d_loss2 = d_model.train_on_batch(X_fake, y_fake) 		# update the generator via the discriminator's error 		z_input = generate_latent_points(latent_dim, n_batch) 		y_real2 = ones((n_batch, 1)) 		g_loss = gan_model.train_on_batch(z_input, y_real2) 		# summarize loss on this batch 		print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss))

The images must be scaled to the size of each model. If the images are in-memory, we can define a simple scale_dataset() function to scale the loaded images.

In this case, we are using the skimage.transform.resize function from the scikit-image library to resize the NumPy array of pixels to the required size and use nearest neighbor interpolation.

# scale images to preferred size def scale_dataset(images, new_shape): 	images_list = list() 	for image in images: 		# resize with nearest neighbor interpolation 		new_image = resize(image, new_shape, 0) 		# store 		images_list.append(new_image) 	return asarray(images_list)

First, the baseline model must be fit for a given number of training epochs, e.g. the model that outputs 4×4 sized images.

This will require that the loaded images be scaled to the required size defined by the shape of the generator models output layer.

# fit the baseline model g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can then process each level of growth, e.g. the first being 8×8.

This involves first retrieving the models, scaling the data to the appropriate size, then fitting the fade-in model followed by training the straight-through version of the model for fine tuning.

We can repeat this for each level of growth in a loop.

# process each level of growth for i in range(1, len(g_models)): 	# retrieve models for this level of growth 	[g_normal, g_fadein] = g_models[i] 	[d_normal, d_fadein] = d_models[i] 	[gan_normal, gan_fadein] = gan_models[i] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train fade-in models for next level of growth 	train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can tie this together and define a function called train() to train the progressive growing GAN function.

# train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): 	# fit the baseline model 	g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch) 	# process each level of growth 	for i in range(1, len(g_models)): 		# retrieve models for this level of growth 		[g_normal, g_fadein] = g_models[i] 		[d_normal, d_fadein] = d_models[i] 		[gan_normal, gan_fadein] = gan_models[i] 		# scale dataset to appropriate size 		gen_shape = g_normal.output_shape 		scaled_data = scale_dataset(dataset, gen_shape[1:]) 		print('Scaled Data', scaled_data.shape) 		# train fade-in models for next level of growth 		train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch, True) 		# train normal or straight-through models 		train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

The number of epochs for the normal phase is defined by the e_norm argument and the number of epochs during the fade-in phase is defined by the e_fadein argument.

The number of epochs must be specified based on the size of the image dataset and the same number of epochs can be used for each phase, as was used in the paper.

We start with 4×4 resolution and train the networks until we have shown the discriminator 800k real images in total. We then alternate between two phases: fade in the first 3-layer block during the next 800k images, stabilize the networks for 800k images, fade in the next 3-layer block during 800k images, etc.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

We can then define our models as we did in the previous section, then call the training function.

# number of growth phase, e.g. 3 = 16x16 images n_blocks = 3 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(100, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples() # train model train(g_models, d_models, gan_models, dataset, latent_dim, 100, 100, 16)

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Official

API

Articles

Summary

In this tutorial, you discovered how to develop progressive growing generative adversarial network models from scratch with Keras.

Specifically, you learned:

  • How to develop pre-defined discriminator and generator models at each level of output image growth.
  • How to define composite models for training the generator models via the discriminator models.
  • How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Progressive Growing GAN Models in Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

BMW AG future models – BMW i | Automotive Industry Analysis | just-auto – just-auto.com

BMW AG future models – BMW i | Automotive Industry Analysis | just-auto  just-auto.com

The penultimate feature in the BMW AG future models series takes a look at the Group’s BMW i division. At the moment, only the I01 i3, I12 i8 Coupe and I15 i8 …

“Sweden ev” – Google News

Microsoft buys Simplygon to simplify rendering VR and AR models – PCWorld

Microsoft buys Simplygon to simplify rendering VR and AR models  PCWorld

Microsoft is betting that less is more in 3D design, with the acquisition of the Swedish developer of a 3D data optimization system, Simplygon.

“Sweden ar” – Google News

How to Implement CycleGAN Models From Scratch With Keras

The Cycle Generative adversarial Network, or CycleGAN for short, is a generator model for converting images from one domain to another domain.

For example, the model can be used to translate images of horses to images of zebras, or photographs of city landscapes at night to city landscapes during the day.

The benefit of the CycleGAN model is that it can be trained without paired examples. That is, it does not require examples of photographs before and after the translation in order to train the model, e.g. photos of the same city landscape during the day and at night. Instead, it is able to use a collection of photographs from each domain and extract and harness the underlying style of images in the collection in order to perform the translation.

The model is very impressive but has an architecture that appears quite complicated to implement for beginners.

In this tutorial, you will discover how to implement the CycleGAN architecture from scratch using the Keras deep learning framework.

After completing this tutorial, you will know:

  • How to implement the discriminator and generator models.
  • How to define composite models to train the generator models via adversarial and cycle loss.
  • How to implement the training process to update model weights each training iteration.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Develop CycleGAN Models From Scratch With Keras

How to Develop CycleGAN Models From Scratch With Keras
Photo by anokarina, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the CycleGAN Architecture?
  2. How to Implement the CycleGAN Discriminator Model
  3. How to Implement the CycleGAN Generator Model
  4. How to Implement Composite Models for Least Squares and Cycle Loss
  5. How to Update Discriminator and Generator Models

What Is the CycleGAN Architecture?

The CycleGAN model was described by Jun-Yan Zhu, et al. in their 2017 paper titled “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.”

The model architecture is comprised of two generator models: one generator (Generator-A) for generating images for the first domain (Domain-A) and the second generator (Generator-B) for generating images for the second domain (Domain-B).

  • Generator-A -> Domain-A
  • Generator-B -> Domain-B

The generator models perform image translation, meaning that the image generation process is conditional on an input image, specifically an image from the other domain. Generator-A takes an image from Domain-B as input and Generator-B takes an image from Domain-A as input.

  • Domain-B -> Generator-A -> Domain-A
  • Domain-A -> Generator-B -> Domain-B

Each generator has a corresponding discriminator model.

The first discriminator model (Discriminator-A) takes real images from Domain-A and generated images from Generator-A and predicts whether they are real or fake. The second discriminator model (Discriminator-B) takes real images from Domain-B and generated images from Generator-B and predicts whether they are real or fake.

  • Domain-A -> Discriminator-A -> [Real/Fake]
  • Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake]
  • Domain-B -> Discriminator-B -> [Real/Fake]
  • Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]

The discriminator and generator models are trained in an adversarial zero-sum process, like normal GAN models.

The generators learn to better fool the discriminators and the discriminators learn to better detect fake images. Together, the models find an equilibrium during the training process.

Additionally, the generator models are regularized not just to create new images in the target domain, but instead create translated versions of the input images from the source domain. This is achieved by using generated images as input to the corresponding generator model and comparing the output image to the original images.

Passing an image through both generators is called a cycle. Together, each pair of generator models are trained to better reproduce the original source image, referred to as cycle consistency.

  • Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B
  • Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

There is one further element to the architecture referred to as the identity mapping.

This is where a generator is provided with images as input from the target domain and is expected to generate the same image without change. This addition to the architecture is optional, although it results in a better matching of the color profile of the input image.

  • Domain-A -> Generator-A -> Domain-A
  • Domain-B -> Generator-B -> Domain-B

Now that we are familiar with the model architecture, we can take a closer look at each model in turn and how they can be implemented.

The paper provides a good description of the models and training process, although the official Torch implementation was used as the definitive description for each model and training process and provides the basis for the the model implementations described below.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the CycleGAN Discriminator Model

The discriminator model is responsible for taking a real or generated image as input and predicting whether it is real or fake.

The discriminator model is implemented as a PatchGAN model.

For the discriminator networks we use 70 × 70 PatchGANs, which aim to classify whether 70 × 70 overlapping image patches are real or fake.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The PatchGAN was described in the 2016 paper titled “Precomputed Real-time Texture Synthesis With Markovian Generative Adversarial Networks” and was used in the pix2pix model for image translation described in the 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks.”

The architecture is described as discriminating an input image as real or fake by averaging the prediction for nxn squares or patches of the source image.

… we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each NxN patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

This can be implemented directly by using a somewhat standard deep convolutional discriminator model.

Instead of outputting a single value like a traditional discriminator model, the PatchGAN discriminator model can output a square or one-channel feature map of predictions. The 70×70 refers to the effective receptive field of the model on the input, not the actual shape of the output feature map.

The receptive field of a convolutional layer refers to the number of pixels that one output of the layer maps to in the input to the layer. The effective receptive field refers to the mapping of one pixel in the output of a deep convolutional model (multiple layers) to the input image. Here, the PatchGAN is an approach to designing a deep convolutional network based on the effective receptive field, where one output activation of the model maps to a 70×70 patch of the input image, regardless of the size of the input image.

The PatchGAN has the effect of predicting whether each 70×70 patch in the input image is real or fake. These predictions can then be averaged to give the output of the model (if needed) or compared directly to a matrix (or a vector if flattened) of expected values (e.g. 0 or 1 values).

The discriminator model described in the paper takes 256×256 color images as input and defines an explicit architecture that is used on all of the test problems. The architecture uses blocks of Conv2D-InstanceNorm-LeakyReLU layers, with 4×4 filters and a 2×2 stride.

Let Ck denote a 4×4 Convolution-InstanceNorm-LeakyReLU layer with k filters and stride 2. After the last layer, we apply a convolution to produce a 1-dimensional output. We do not use InstanceNorm for the first C64 layer. We use leaky ReLUs with a slope of 0.2.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The architecture for the discriminator is as follows:

  • C64-C128-C256-C512

This is referred to as a 3-layer PatchGAN in the CycleGAN and Pix2Pix nomenclature, as excluding the first hidden layer, the model has three hidden layers that could be scaled up or down to give different sized PatchGAN models.

Not listed in the paper, the model also has a final hidden layer C512 with a 1×1 stride, and an output layer C1, also with a 1×1 stride with a linear activation function. Given the model is mostly used with 256×256 sized images as input, the size of the output feature map of activations is 16×16. If 128×128 images were used as input, then the size of the output feature map of activations would be 8×8.

The model does not use batch normalization; instead, instance normalization is used.

Instance normalization was described in the 2016 paper titled “Instance Normalization: The Missing Ingredient for Fast Stylization.” It is a very simple type of normalization and involves standardizing (e.g. scaling to a standard Gaussian) the values on each feature map.

The intent is to remove image-specific contrast information from the image during image generation, resulting in better generated images.

The key idea is to replace batch normalization layers in the generator architecture with instance normalization layers, and to keep them at test time (as opposed to freeze and simplify them out as done for batch normalization). Intuitively, the normalization process allows to remove instance-specific contrast information from the content image, which simplifies generation. In practice, this results in vastly improved images.

Instance Normalization: The Missing Ingredient for Fast Stylization, 2016.

Although designed for generator models, it can also prove effective in discriminator models.

An implementation of instance normalization is provided in the keras-contrib project that provides early access to community-supplied Keras features.

The keras-contrib library can be installed via pip as follows:

sudo pip install git+https://www.github.com/keras-team/keras-contrib.git

Or, if you are using an Anaconda virtual environment, such as on EC2:

git clone https://www.github.com/keras-team/keras-contrib.git cd keras-contrib sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

The new InstanceNormalization layer can then be used as follows:

... from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization # define layer layer = InstanceNormalization(axis=-1) ...

The “axis” argument is set to -1 to ensure that features are normalized per feature map.

The network weights are initialized to Gaussian random numbers with a standard deviation of 0.02, as is described for DCGANs more generally.

Weights are initialized from a Gaussian distribution N (0, 0.02).

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The discriminator model is updated using a least squares loss (L2), a so-called Least-Squared Generative Adversarial Network, or LSGAN.

… we replace the negative log likelihood objective by a least-squares loss. This loss is more stable during training and generates higher quality results.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be implemented using “mean squared error” between the target values of class=1 for real images and class=0 for fake images.

Additionally, the paper suggests dividing the loss for the discriminator by half during training, in an effort to slow down updates to the discriminator relative to the generator.

In practice, we divide the objective by 2 while optimizing D, which slows down the rate at which D learns, relative to the rate of G.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be achieved by setting the “loss_weights” argument to 0.5 when compiling the model. Note that this weighting does not appear to be implemented in the official Torch implementation when updating discriminator models are defined in the fDx_basic() function.

We can tie all of this together in the example below with a define_discriminator() function that defines the PatchGAN discriminator. The model configuration matches the description in the appendix of the paper with additional details from the official Torch implementation defined in the defineD_n_layers() function.

# example of defining a 70x70 patchgan discriminator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import BatchNormalization from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_image = Input(shape=image_shape) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	# define model 	model = Model(in_image, patch_out) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) 	return model  # define image shape image_shape = (256,256,3) # create the model model = define_discriminator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='discriminator_model_plot.png', show_shapes=True, show_layer_names=True)

Note: the plot_model() function requires that both the pydot and pygraphviz libraries are installed. If this is a problem, you can comment out both the import and call to this function.

Running the example summarizes the model showing the size inputs and outputs for each layer.

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= input_1 (InputLayer)         (None, 256, 256, 3)       0 _________________________________________________________________ conv2d_1 (Conv2D)            (None, 128, 128, 64)      3136 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU)    (None, 128, 128, 64)      0 _________________________________________________________________ conv2d_2 (Conv2D)            (None, 64, 64, 128)       131200 _________________________________________________________________ instance_normalization_1 (In (None, 64, 64, 128)       256 _________________________________________________________________ leaky_re_lu_2 (LeakyReLU)    (None, 64, 64, 128)       0 _________________________________________________________________ conv2d_3 (Conv2D)            (None, 32, 32, 256)       524544 _________________________________________________________________ instance_normalization_2 (In (None, 32, 32, 256)       512 _________________________________________________________________ leaky_re_lu_3 (LeakyReLU)    (None, 32, 32, 256)       0 _________________________________________________________________ conv2d_4 (Conv2D)            (None, 16, 16, 512)       2097664 _________________________________________________________________ instance_normalization_3 (In (None, 16, 16, 512)       1024 _________________________________________________________________ leaky_re_lu_4 (LeakyReLU)    (None, 16, 16, 512)       0 _________________________________________________________________ conv2d_5 (Conv2D)            (None, 16, 16, 512)       4194816 _________________________________________________________________ instance_normalization_4 (In (None, 16, 16, 512)       1024 _________________________________________________________________ leaky_re_lu_5 (LeakyReLU)    (None, 16, 16, 512)       0 _________________________________________________________________ conv2d_6 (Conv2D)            (None, 16, 16, 1)         8193 ================================================================= Total params: 6,962,369 Trainable params: 6,962,369 Non-trainable params: 0 _________________________________________________________________

A plot of the model architecture is also created to help get an idea of the inputs, outputs, and transitions of the image data through the model.

Plot of the PatchGAN Discriminator Model for the CycleGAN

Plot of the PatchGAN Discriminator Model for the CycleGAN

How to Implement the CycleGAN Generator Model

The CycleGAN Generator model takes an image as input and generates a translated image as output.

The model uses a sequence of downsampling convolutional blocks to encode the input image, a number of residual network (ResNet) convolutional blocks to transform the image, and a number of upsampling convolutional blocks to generate the output image.

Let c7s1-k denote a 7×7 Convolution-InstanceNormReLU layer with k filters and stride 1. dk denotes a 3×3 Convolution-InstanceNorm-ReLU layer with k filters and stride 2. Reflection padding was used to reduce artifacts. Rk denotes a residual block that contains two 3 × 3 convolutional layers with the same number of filters on both layer. uk denotes a 3 × 3 fractional-strided-ConvolutionInstanceNorm-ReLU layer with k filters and stride 1/2.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The architecture for the 6-resnet block generator for 128×128 images is as follows:

  • c7s1-64,d128,d256,R256,R256,R256,R256,R256,R256,u128,u64,c7s1-3

First, we need a function to define the ResNet blocks. These are blocks comprised of two 3×3 CNN layers where the input to the block is concatenated to the output of the block, channel-wise.

This is implemented in the resnet_block() function that creates two Conv-InstanceNorm blocks with 3×3 filters and 1×1 stride and without a ReLU activation after the second block, matching the official Torch implementation in the build_conv_block() function. Same padding is used instead of reflection padded recommended in the paper for simplicity.

# generator a resnet block def resnet_block(n_filters, input_layer): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# first layer convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# second convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	# concatenate merge channel-wise with input layer 	g = Concatenate()([g, input_layer]) 	return g

Next, we can define a function that will create the 9-resnet block version for 256×256 input images. This can easily be changed to the 6-resnet block version by setting image_shape to (128x128x3) and n_resnet function argument to 6.

Importantly, the model outputs pixel values with the shape as the input and pixel values are in the range [-1, 1], typical for GAN generator models.

# define the standalone generator model def define_generator(image_shape=(256,256,3), n_resnet=9): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# c7s1-64 	g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d128 	g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d256 	g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# R256 	for _ in range(n_resnet): 		g = resnet_block(256, g) 	# u128 	g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# u64 	g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# c7s1-3 	g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model

The generator model is not compiled as it is trained via a composite model, seen in the next section.

Tying this together, the complete example is listed below.

# example of an encoder-decoder generator for the cyclegan from keras.optimizers import Adam from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import Activation from keras.initializers import RandomNormal from keras.layers import Concatenate from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model  # generator a resnet block def resnet_block(n_filters, input_layer): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# first layer convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# second convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	# concatenate merge channel-wise with input layer 	g = Concatenate()([g, input_layer]) 	return g  # define the standalone generator model def define_generator(image_shape=(256,256,3), n_resnet=9): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# c7s1-64 	g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d128 	g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d256 	g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# R256 	for _ in range(n_resnet): 		g = resnet_block(256, g) 	# u128 	g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# u64 	g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# c7s1-3 	g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # create the model model = define_generator() # summarize the model model.summary() # plot the model plot_model(model, to_file='generator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 256, 256, 64) 9472        input_1[0][0] __________________________________________________________________________________________________ instance_normalization_1 (Insta (None, 256, 256, 64) 128         conv2d_1[0][0] __________________________________________________________________________________________________ activation_1 (Activation)       (None, 256, 256, 64) 0           instance_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 128, 128, 128 73856       activation_1[0][0] __________________________________________________________________________________________________ instance_normalization_2 (Insta (None, 128, 128, 128 256         conv2d_2[0][0] __________________________________________________________________________________________________ activation_2 (Activation)       (None, 128, 128, 128 0           instance_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 64, 64, 256)  295168      activation_2[0][0] __________________________________________________________________________________________________ instance_normalization_3 (Insta (None, 64, 64, 256)  512         conv2d_3[0][0] __________________________________________________________________________________________________ activation_3 (Activation)       (None, 64, 64, 256)  0           instance_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 64, 64, 256)  590080      activation_3[0][0] __________________________________________________________________________________________________ instance_normalization_4 (Insta (None, 64, 64, 256)  512         conv2d_4[0][0] __________________________________________________________________________________________________ activation_4 (Activation)       (None, 64, 64, 256)  0           instance_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 64, 64, 256)  590080      activation_4[0][0] __________________________________________________________________________________________________ instance_normalization_5 (Insta (None, 64, 64, 256)  512         conv2d_5[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 64, 64, 512)  0           instance_normalization_5[0][0]                                                                  activation_3[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 64, 64, 256)  1179904     concatenate_1[0][0] __________________________________________________________________________________________________ instance_normalization_6 (Insta (None, 64, 64, 256)  512         conv2d_6[0][0] __________________________________________________________________________________________________ activation_5 (Activation)       (None, 64, 64, 256)  0           instance_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 64, 64, 256)  590080      activation_5[0][0] __________________________________________________________________________________________________ instance_normalization_7 (Insta (None, 64, 64, 256)  512         conv2d_7[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate)     (None, 64, 64, 768)  0           instance_normalization_7[0][0]                                                                  concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 64, 64, 256)  1769728     concatenate_2[0][0] __________________________________________________________________________________________________ instance_normalization_8 (Insta (None, 64, 64, 256)  512         conv2d_8[0][0] __________________________________________________________________________________________________ activation_6 (Activation)       (None, 64, 64, 256)  0           instance_normalization_8[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 64, 64, 256)  590080      activation_6[0][0] __________________________________________________________________________________________________ instance_normalization_9 (Insta (None, 64, 64, 256)  512         conv2d_9[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate)     (None, 64, 64, 1024) 0           instance_normalization_9[0][0]                                                                  concatenate_2[0][0] __________________________________________________________________________________________________ conv2d_10 (Conv2D)              (None, 64, 64, 256)  2359552     concatenate_3[0][0] __________________________________________________________________________________________________ instance_normalization_10 (Inst (None, 64, 64, 256)  512         conv2d_10[0][0] __________________________________________________________________________________________________ activation_7 (Activation)       (None, 64, 64, 256)  0           instance_normalization_10[0][0] __________________________________________________________________________________________________ conv2d_11 (Conv2D)              (None, 64, 64, 256)  590080      activation_7[0][0] __________________________________________________________________________________________________ instance_normalization_11 (Inst (None, 64, 64, 256)  512         conv2d_11[0][0] __________________________________________________________________________________________________ concatenate_4 (Concatenate)     (None, 64, 64, 1280) 0           instance_normalization_11[0][0]                                                                  concatenate_3[0][0] __________________________________________________________________________________________________ conv2d_12 (Conv2D)              (None, 64, 64, 256)  2949376     concatenate_4[0][0] __________________________________________________________________________________________________ instance_normalization_12 (Inst (None, 64, 64, 256)  512         conv2d_12[0][0] __________________________________________________________________________________________________ activation_8 (Activation)       (None, 64, 64, 256)  0           instance_normalization_12[0][0] __________________________________________________________________________________________________ conv2d_13 (Conv2D)              (None, 64, 64, 256)  590080      activation_8[0][0] __________________________________________________________________________________________________ instance_normalization_13 (Inst (None, 64, 64, 256)  512         conv2d_13[0][0] __________________________________________________________________________________________________ concatenate_5 (Concatenate)     (None, 64, 64, 1536) 0           instance_normalization_13[0][0]                                                                  concatenate_4[0][0] __________________________________________________________________________________________________ conv2d_14 (Conv2D)              (None, 64, 64, 256)  3539200     concatenate_5[0][0] __________________________________________________________________________________________________ instance_normalization_14 (Inst (None, 64, 64, 256)  512         conv2d_14[0][0] __________________________________________________________________________________________________ activation_9 (Activation)       (None, 64, 64, 256)  0           instance_normalization_14[0][0] __________________________________________________________________________________________________ conv2d_15 (Conv2D)              (None, 64, 64, 256)  590080      activation_9[0][0] __________________________________________________________________________________________________ instance_normalization_15 (Inst (None, 64, 64, 256)  512         conv2d_15[0][0] __________________________________________________________________________________________________ concatenate_6 (Concatenate)     (None, 64, 64, 1792) 0           instance_normalization_15[0][0]                                                                  concatenate_5[0][0] __________________________________________________________________________________________________ conv2d_16 (Conv2D)              (None, 64, 64, 256)  4129024     concatenate_6[0][0] __________________________________________________________________________________________________ instance_normalization_16 (Inst (None, 64, 64, 256)  512         conv2d_16[0][0] __________________________________________________________________________________________________ activation_10 (Activation)      (None, 64, 64, 256)  0           instance_normalization_16[0][0] __________________________________________________________________________________________________ conv2d_17 (Conv2D)              (None, 64, 64, 256)  590080      activation_10[0][0] __________________________________________________________________________________________________ instance_normalization_17 (Inst (None, 64, 64, 256)  512         conv2d_17[0][0] __________________________________________________________________________________________________ concatenate_7 (Concatenate)     (None, 64, 64, 2048) 0           instance_normalization_17[0][0]                                                                  concatenate_6[0][0] __________________________________________________________________________________________________ conv2d_18 (Conv2D)              (None, 64, 64, 256)  4718848     concatenate_7[0][0] __________________________________________________________________________________________________ instance_normalization_18 (Inst (None, 64, 64, 256)  512         conv2d_18[0][0] __________________________________________________________________________________________________ activation_11 (Activation)      (None, 64, 64, 256)  0           instance_normalization_18[0][0] __________________________________________________________________________________________________ conv2d_19 (Conv2D)              (None, 64, 64, 256)  590080      activation_11[0][0] __________________________________________________________________________________________________ instance_normalization_19 (Inst (None, 64, 64, 256)  512         conv2d_19[0][0] __________________________________________________________________________________________________ concatenate_8 (Concatenate)     (None, 64, 64, 2304) 0           instance_normalization_19[0][0]                                                                  concatenate_7[0][0] __________________________________________________________________________________________________ conv2d_20 (Conv2D)              (None, 64, 64, 256)  5308672     concatenate_8[0][0] __________________________________________________________________________________________________ instance_normalization_20 (Inst (None, 64, 64, 256)  512         conv2d_20[0][0] __________________________________________________________________________________________________ activation_12 (Activation)      (None, 64, 64, 256)  0           instance_normalization_20[0][0] __________________________________________________________________________________________________ conv2d_21 (Conv2D)              (None, 64, 64, 256)  590080      activation_12[0][0] __________________________________________________________________________________________________ instance_normalization_21 (Inst (None, 64, 64, 256)  512         conv2d_21[0][0] __________________________________________________________________________________________________ concatenate_9 (Concatenate)     (None, 64, 64, 2560) 0           instance_normalization_21[0][0]                                                                  concatenate_8[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 128, 128, 128 2949248     concatenate_9[0][0] __________________________________________________________________________________________________ instance_normalization_22 (Inst (None, 128, 128, 128 256         conv2d_transpose_1[0][0] __________________________________________________________________________________________________ activation_13 (Activation)      (None, 128, 128, 128 0           instance_normalization_22[0][0] __________________________________________________________________________________________________ conv2d_transpose_2 (Conv2DTrans (None, 256, 256, 64) 73792       activation_13[0][0] __________________________________________________________________________________________________ instance_normalization_23 (Inst (None, 256, 256, 64) 128         conv2d_transpose_2[0][0] __________________________________________________________________________________________________ activation_14 (Activation)      (None, 256, 256, 64) 0           instance_normalization_23[0][0] __________________________________________________________________________________________________ conv2d_22 (Conv2D)              (None, 256, 256, 3)  9411        activation_14[0][0] __________________________________________________________________________________________________ instance_normalization_24 (Inst (None, 256, 256, 3)  6           conv2d_22[0][0] __________________________________________________________________________________________________ activation_15 (Activation)      (None, 256, 256, 3)  0           instance_normalization_24[0][0] ================================================================================================== Total params: 35,276,553 Trainable params: 35,276,553 Non-trainable params: 0 __________________________________________________________________________________________________

A Plot of the generator model is also created, showing the skip connections in the ResNet blocks.

Plot of the Generator Model for the CycleGAN

Plot of the Generator Model for the CycleGAN

How to Implement Composite Models for Least Squares and Cycle Loss

The generator models are not updated directly. Instead, the generator models are updated via composite models.

An update to each generator model involves changes to the model weights based on four concerns:

  • Adversarial loss (L2 or mean squared error).
  • Identity loss (L1 or mean absolute error).
  • Forward cycle loss (L1 or mean absolute error).
  • Backward cycle loss (L1 or mean absolute error).

The adversarial loss is the standard approach for updating the generator via the discriminator, although in this case, the least squares loss function is used instead of the negative log likelihood (e.g. binary cross entropy).

First, we can use our function to define the two generators and two discriminators used in the CycleGAN.

... # input shape image_shape = (256,256,3) # generator: A -> B g_model_AtoB = define_generator(image_shape) # generator: B -> A g_model_BtoA = define_generator(image_shape) # discriminator: A -> [real/fake] d_model_A = define_discriminator(image_shape) # discriminator: B -> [real/fake] d_model_B = define_discriminator(image_shape)

A composite model is required for each generator model that is responsible for only updating the weights of that generator model, although it is required to share the weights with the related discriminator model and the other generator model.

This can be achieved by marking the weights of the other models as not trainable in the context of the composite model to ensure we are only updating the intended generator.

... # ensure the model we're updating is trainable g_model_1.trainable = True # mark discriminator as not trainable d_model.trainable = False # mark other generator model as not trainable g_model_2.trainable = False

The model can be constructed piecewise using the Keras functional API.

The first step is to define the input of the real image from the source domain, pass it through our generator model, then connect the output of the generator to the discriminator and classify it as real or fake.

... # discriminator element input_gen = Input(shape=image_shape) gen1_out = g_model_1(input_gen) output_d = d_model(gen1_out)

Next, we can connect the identity mapping element with a new input for the real image from the target domain, pass it through our generator model, and output the (hopefully) untranslated image directly.

... # identity element input_id = Input(shape=image_shape) output_id = g_model_1(input_id)

So far, we have a composite model with two real image inputs and a discriminator classification and identity image output. Next, we need to add the forward and backward cycles.

The forward cycle can be achieved by connecting the output of our generator to the other generator, the output of which can be compared to the input to our generator and should be identical.

... # forward cycle output_f = g_model_2(gen1_out)

The backward cycle is more complex and involves the input for the real image from the target domain passing through the other generator, then passing through our generator, which should match the real image from the target domain.

... # backward cycle gen2_out = g_model_2(input_id) output_b = g_model_1(gen2_out)

That’s it.

We can then define this composite model with two inputs: one real image for the source and the target domain, and four outputs, one for the discriminator, one for the generator for the identity mapping, one for the other generator for the forward cycle, and one from our generator for the backward cycle.

... # define model graph model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b])

The adversarial loss for the discriminator output uses least squares loss which is implemented as L2 or mean squared error. The outputs from the generators are compared to images and are optimized using L1 loss implemented as mean absolute error.

The generator is updated as a weighted average of the four loss values. The adversarial loss is weighted normally, whereas the forward and backward cycle loss is weighted using a parameter called lambda and is set to 10, e.g. 10 times more important than adversarial loss. The identity loss is also weighted as a fraction of the lambda parameter and is set to 0.5 * 10 or 5 in the official Torch implementation.

... # compile model with weighting of least squares loss and L1 loss model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt)

We can tie all of this together and define the function define_composite_model() for creating a composite model for training a given generator model.

# define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): 	# ensure the model we're updating is trainable 	g_model_1.trainable = True 	# mark discriminator as not trainable 	d_model.trainable = False 	# mark other generator model as not trainable 	g_model_2.trainable = False 	# discriminator element 	input_gen = Input(shape=image_shape) 	gen1_out = g_model_1(input_gen) 	output_d = d_model(gen1_out) 	# identity element 	input_id = Input(shape=image_shape) 	output_id = g_model_1(input_id) 	# forward cycle 	output_f = g_model_2(gen1_out) 	# backward cycle 	gen2_out = g_model_2(input_id) 	output_b = g_model_1(gen2_out) 	# define model graph 	model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) 	# define optimization algorithm configuration 	opt = Adam(lr=0.0002, beta_1=0.5) 	# compile model with weighting of least squares loss and L1 loss 	model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) 	return model

This function can then be called to prepare a composite model for training both the g_model_AtoB generator model and the g_model_BtoA model; for example:

... # composite: A -> B -> [real/fake, A] c_model_AtoBtoA = define_composite_model(g_model_AtoB, d_model_B, g_model_BtoA, image_shape) # composite: B -> A -> [real/fake, B] c_model_BtoAtoB = define_composite_model(g_model_BtoA, d_model_A, g_model_AtoB, image_shape)

Summarizing and plotting the composite model is a bit of a mess as it does not help to see the inputs and outputs of the model clearly.

We can summarize the inputs and outputs for each of the composite models below. Recall that we are sharing or reusing the same set of weights if a given model is used more than once in the composite model.

Generator-A Composite Model

Only Generator-A weights are trainable and weights for other models and not trainable.

  • Adversarial Loss: Domain-B -> Generator-A -> Domain-A -> Discriminator-A -> [real/fake]
  • Identity Loss: Domain-A -> Generator-A -> Domain-A
  • Forward Cycle Loss: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B
  • Backward Cycle Loss: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

Generator-B Composite Model

Only Generator-B weights are trainable and weights for other models are not trainable.

  • Adversarial Loss: Domain-A -> Generator-B -> Domain-B -> Discriminator-B -> [real/fake]
  • Identity Loss: Domain-B -> Generator-B -> Domain-B
  • Forward Cycle Loss: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A
  • Backward Cycle Loss: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B

A complete example of creating all of the models is listed below for completeness.

# example of defining composite models for training cyclegan generators from keras.optimizers import Adam from keras.models import Model from keras.models import Sequential from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import Activation from keras.layers import LeakyReLU from keras.initializers import RandomNormal from keras.layers import Concatenate from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_image = Input(shape=image_shape) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = InstanceNormalization(axis=-1)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	# define model 	model = Model(in_image, patch_out) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) 	return model  # generator a resnet block def resnet_block(n_filters, input_layer): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# first layer convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# second convolutional layer 	g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	# concatenate merge channel-wise with input layer 	g = Concatenate()([g, input_layer]) 	return g  # define the standalone generator model def define_generator(image_shape, n_resnet=9): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# c7s1-64 	g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d128 	g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# d256 	g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# R256 	for _ in range(n_resnet): 		g = resnet_block(256, g) 	# u128 	g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# u64 	g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	g = Activation('relu')(g) 	# c7s1-3 	g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) 	g = InstanceNormalization(axis=-1)(g) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): 	# ensure the model we're updating is trainable 	g_model_1.trainable = True 	# mark discriminator as not trainable 	d_model.trainable = False 	# mark other generator model as not trainable 	g_model_2.trainable = False 	# discriminator element 	input_gen = Input(shape=image_shape) 	gen1_out = g_model_1(input_gen) 	output_d = d_model(gen1_out) 	# identity element 	input_id = Input(shape=image_shape) 	output_id = g_model_1(input_id) 	# forward cycle 	output_f = g_model_2(gen1_out) 	# backward cycle 	gen2_out = g_model_2(input_id) 	output_b = g_model_1(gen2_out) 	# define model graph 	model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) 	# define optimization algorithm configuration 	opt = Adam(lr=0.0002, beta_1=0.5) 	# compile model with weighting of least squares loss and L1 loss 	model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) 	return model  # input shape image_shape = (256,256,3) # generator: A -> B g_model_AtoB = define_generator(image_shape) # generator: B -> A g_model_BtoA = define_generator(image_shape) # discriminator: A -> [real/fake] d_model_A = define_discriminator(image_shape) # discriminator: B -> [real/fake] d_model_B = define_discriminator(image_shape) # composite: A -> B -> [real/fake, A] c_model_AtoB = define_composite_model(g_model_AtoB, d_model_B, g_model_BtoA, image_shape) # composite: B -> A -> [real/fake, B] c_model_BtoA = define_composite_model(g_model_BtoA, d_model_A, g_model_AtoB, image_shape)

How to Update Discriminator and Generator Models

Training the defined models is relatively straightforward.

First, we must define a helper function that will select a batch of real images and the associated target (1.0).

# select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): 	# choose random instances 	ix = randint(0, dataset.shape[0], n_samples) 	# retrieve selected images 	X = dataset[ix] 	# generate 'real' class labels (1) 	y = ones((n_samples, patch_shape, patch_shape, 1)) 	return X, y

Similarly, we need a function to generate a batch of fake images and the associated target (0.0).

# generate a batch of images, returns images and targets def generate_fake_samples(g_model, dataset, patch_shape): 	# generate fake instance 	X = g_model.predict(dataset) 	# create 'fake' class labels (0) 	y = zeros((len(X), patch_shape, patch_shape, 1)) 	return X, y

Now, we can define the steps of a single training iteration. We will model the order of updates based on the implementation in the official Torch implementation in the OptimizeParameters() function (Note: the official code uses a more confusing inverted naming convention).

  1. Update Generator-B (A->B)
  2. Update Discriminator-B
  3. Update Generator-A (B->A)
  4. Update Discriminator-A

First, we must select a batch of real images by calling generate_real_samples() for both Domain-A and Domain-B.

Typically, the batch size (n_batch) is set to 1. In this case, we will assume 256×256 input images, which means the n_patch for the PatchGAN discriminator will be 16.

... # select a batch of real samples X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch)

Next, we can use the batches of selected real images to generate corresponding batches of generated or fake images.

... # generate a batch of fake samples X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch)

The paper describes using a pool of previously generated images from which examples are randomly selected and used to update the discriminator model, where the pool size was set to 50 images.

… [we] update the discriminators using a history of generated images rather than the ones produced by the latest generators. We keep an image buffer that stores the 50 previously created images.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be implemented using a list for each domain and a using a function to populate the pool, then randomly replace elements from the pool once it is at capacity.

The update_image_pool() function below implements this based on the official Torch implementation in image_pool.lua.

# update image pool for fake images def update_image_pool(pool, images, max_size=50): 	selected = list() 	for image in images: 		if len(pool) < max_size: 			# stock the pool 			pool.append(image) 			selected.append(image) 		elif random() < 0.5: 			# use image, but don't add it to the pool 			selected.append(image) 		else: 			# replace an existing image and use replaced image 			ix = randint(0, len(pool)) 			selected.append(pool[ix]) 			pool[ix] = image 	return asarray(selected)

We can then update our image pool with generated fake images, the results of which can be used to train the discriminator models.

... # update fakes from pool X_fakeA = update_image_pool(poolA, X_fakeA) X_fakeB = update_image_pool(poolB, X_fakeB)

Next, we can update Generator-A.

The train_on_batch() function will return a value for each of the four loss functions, one for each output, as well as the weighted sum (first value) used to update the model weights which we are interested in.

... # update generator B->A via adversarial and cycle loss g_loss2, _, _, _, _  = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA])

We can then update the discriminator model using the fake images that may or may not have come from the image pool.

... # update discriminator for A -> [real/fake] dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA)

We can then do the same for the other generator and discriminator models.

... # update generator A->B via adversarial and cycle loss g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) # update discriminator for B -> [real/fake] dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB)

At the end of the training run, we can then report the current loss for the discriminator models on real and fake images and of each generator model.

... # summarize performance print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2))

Tying this all together, we can define a function named train() that takes an instance of each of the defined models and a loaded dataset (list of two NumPy arrays, one for each domain) and trains the model.

A batch size of 1 is used as is described in the paper and the models are fit for 100 training epochs.

# train cyclegan models def train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset): 	# define properties of the training run 	n_epochs, n_batch, = 100, 1 	# determine the output square shape of the discriminator 	n_patch = d_model_A.output_shape[1] 	# unpack dataset 	trainA, trainB = dataset 	# prepare image pool for fakes 	poolA, poolB = list(), list() 	# calculate the number of batches per training epoch 	bat_per_epo = int(len(trainA) / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# manually enumerate epochs 	for i in range(n_steps): 		# select a batch of real samples 		X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) 		X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch) 		# generate a batch of fake samples 		X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) 		X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch) 		# update fakes from pool 		X_fakeA = update_image_pool(poolA, X_fakeA) 		X_fakeB = update_image_pool(poolB, X_fakeB) 		# update generator B->A via adversarial and cycle loss 		g_loss2, _, _, _, _  = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA]) 		# update discriminator for A -> [real/fake] 		dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) 		dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA) 		# update generator A->B via adversarial and cycle loss 		g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) 		# update discriminator for B -> [real/fake] 		dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) 		dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB) 		# summarize performance 		print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2))

The train function can then be called directly with our defined models and loaded dataset.

... # load a dataset as a list of two numpy arrays dataset = ... # train models train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset)

As an improvement, it may be desirable to combine the update to each discriminator model into a single operation as is performed in the fDx_basic() function of the official implementation.

Additionally, the paper describes updating the models for another 100 epochs (200 in total), where the learning rate is decayed to 0.0. This too can be added as a minor extension to the training process.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

API

Projects

Articles

Summary

In this tutorial, you discovered how to implement the CycleGAN architecture from scratch using the Keras deep learning framework.

Specifically, you learned:

  • How to implement the discriminator and generator models.
  • How to define composite models to train the generator models via adversarial and cycle loss.
  • How to implement the training process to update model weights each training iteration.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement CycleGAN Models From Scratch With Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

Zwickau VW plant to produce six electric models

The Volkswagen Group’s plant in Zwickau, Germany is undergoing a massive modernization process. When it’s done, the facility will be one of the biggest all-electric car factories in the world.

The German automaker is investing some €1.2 billion to shift the plant’s production from three legacy models to six pure EVs from VW, Seat and Audi, all based on the new MEB platform.

Zwickau is a historic plant – during the East German era, the famously mediocre Trabant 1.1, known as the “racing cardboard,” was built here. Since the reopening of the site in 1990, it has produced more than 5.5 million VW vehicles.

The conversion of the 1.8-million-square-meter factory began in the summer of 2018 with the modernization of the production lines. The first of two assembly lines for EV production is expected to go into service this August. The site’s second line will be converted and go into operation in 2020. The new lines will incorporate some 1,625 robots. 50 partner companies are helping with the conversion.

The first electric model to roll off the line in Zwickau will be the Volkswagen ID.3. Production is scheduled to begin in late 2019. Production will gradually increase, reaching 330,000 vehicles per year by 2021.

Currently, the plant capacity is 1,350 cars per day – this will increase to 1,500. VW says the single platform, combined with the greater simplicity of EVs, will allow the company to produce more vehicles in the same footprint with the same workforce. The revamped plant incorporates several features designed to make production “largely CO2-free.”

“Volkswagen is set to build the attractive, affordable e-car for everyone,” says the company. “An e-car for millions, not just for millionaires. To do so, the Volkswagen plant in Zwickau is currently being transformed into the first, largest, highest-performing – and most environmentally friendly – e-factory in Europe.”

Source: Volkswagen via InsideEVs

Charged EVs

How to Implement Pix2Pix GAN Models From Scratch With Keras

The Pix2Pix GAN is a generator model for performing image-to-image translation trained on paired examples.

For example, the model can be used to translate images of daytime to nighttime, or from sketches of products like shoes to photographs of products.

The benefit of the Pix2Pix model is that compared to other GANs for conditional image generation, it is relatively simple and capable of generating large high-quality images across a variety of image translation tasks.

The model is very impressive but has an architecture that appears somewhat complicated to implement for beginners.

In this tutorial, you will discover how to implement the Pix2Pix GAN architecture from scratch using the Keras deep learning framework.

After completing this tutorial, you will know:

  • How to develop the PatchGAN discriminator model for the Pix2Pix GAN.
  • How to develop the U-Net encoder-decoder generator model for the Pix2Pix GAN.
  • How to implement the composite model for updating the generator and how to train both models.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Implement Pix2Pix GAN Models From Scratch With Keras

How to Implement Pix2Pix GAN Models From Scratch With Keras
Photo by Ray in Manila, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Pix2Pix GAN?
  2. How to Implement the PatchGAN Discriminator Model
  3. How to Implement the U-Net Generator Model
  4. How to Implement Adversarial and L1 Loss
  5. How to Update Model Weights

What Is the Pix2Pix GAN?

Pix2Pix is a Generative Adversarial Network, or GAN, model designed for general purpose image-to-image translation.

The approach was presented by Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” and presented at CVPR in 2017.

The GAN architecture is comprised of a generator model for outputting new plausible synthetic images and a discriminator model that classifies images as real (from the dataset) or fake (generated). The discriminator model is updated directly, whereas the generator model is updated via the discriminator model. As such, the two models are trained simultaneously in an adversarial process where the generator seeks to better fool the discriminator and the discriminator seeks to better identify the counterfeit images.

The Pix2Pix model is a type of conditional GAN, or cGAN, where the generation of the output image is conditional on an input, in this case, a source image. The discriminator is provided both with a source image and the target image and must determine whether the target is a plausible transformation of the source image.

Again, the discriminator model is updated directly, and the generator model is updated via the discriminator model, although the loss function is updated. The generator is trained via adversarial loss, which encourages the generator to generate plausible images in the target domain. The generator is also updated via L1 loss measured between the generated image and the expected output image. This additional loss encourages the generator model to create plausible translations of the source image.

The Pix2Pix GAN has been demonstrated on a range of image-to-image translation tasks such as converting maps to satellite photographs, black and white photographs to color, and sketches of products to product photographs.

Now that we are familiar with the Pix2Pix GAN, let’s explore how we can implement it using the Keras deep learning library.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the PatchGAN Discriminator Model

The discriminator model in the Pix2Pix GAN is implemented as a PatchGAN.

The PatchGAN is designed based on the size of the receptive field, sometimes called the effective receptive field. The receptive field is the relationship between one output activation of the model to an area on the input image (actually volume as it proceeded down the input channels).

A PatchGAN with the size 70×70 is used, which means that the output (or each output) of the model maps to a 70×70 square of the input image. In effect, a 70×70 PatchGAN will classify 70×70 patches of the input image as real or fake.

… we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each NxN patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Before we dive into the configuration details of the PatchGAN, it is important to get a handle on the calculation of the receptive field.

The receptive field is not the size of the output of the discriminator model, e.g. it does not refer to the shape of the activation map output by the model. It is a definition of the model in terms of one pixel in the output activation map to the input image. The output of the model may be a single value or a square activation map of values that predict whether each patch of the input image is real or fake.

Traditionally, the receptive field refers to the size of the activation map of a single convolutional layer with regards to the input of the layer, the size of the filter, and the size of the stride. The effective receptive field generalizes this idea and calculates the receptive field for the output of a stack of convolutional layers with regard to the raw image input. The terms are often used interchangeably.

The authors of the Pix2Pix GAN provide a Matlab script to calculate the effective receptive field size for different model configurations in a script called receptive_field_sizes.m. It can be helpful to work through an example for the 70×70 PatchGAN receptive field calculation.

The 70×70 PatchGAN has a fixed number of three layers (excluding the output and second last layers), regardless of the size of the input image. The calculation of the receptive field in one dimension is calculated as:

  • receptive field = (output size – 1) * stride + kernel size

Where output size is the size of the prior layers activation map, stride is the number of pixels the filter is moved when applied to the activation, and kernel size is the size of the filter to be applied.

The PatchGAN uses a fixed stride of 2×2 (except in the output and second last layers) and a fixed kernel size of 4×4. We can, therefore, calculate the receptive field size starting with one pixel in the output of the model and working backward to the input image.

We can develop a Python function called receptive_field() to calculate the receptive field, then calculate and print the receptive field for each layer in the Pix2Pix PatchGAN model. The complete example is listed below.

# example of calculating the receptive field for the PatchGAN  # calculate the effective receptive field size def receptive_field(output_size, kernel_size, stride_size):     return (output_size - 1) * stride_size + kernel_size  # output layer 1x1 pixel with 4x4 kernel and 1x1 stride rf = receptive_field(1, 4, 1) print(rf) # second last layer with 4x4 kernel and 1x1 stride rf = receptive_field(rf, 4, 1) print(rf) # 3 PatchGAN layers with 4x4 kernel and 2x2 stride rf = receptive_field(rf, 4, 2) print(rf) rf = receptive_field(rf, 4, 2) print(rf) rf = receptive_field(rf, 4, 2) print(rf)

Running the example prints the size of the receptive field for each layer in the model from the output layer to the input layer.

We can see that each 1×1 pixel in the output layer maps to a 70×70 receptive field in the input layer.

4 7 16 34 70

The authors of the Pix2Pix paper explore different PatchGAN configurations, including a 1×1 receptive field called a PixelGAN and a receptive field that matches the 256×256 pixel images input to the model (resampled to 286×286) called an ImageGAN. They found that the 70×70 PatchGAN resulted in the best trade-off of performance and image quality.

The 70×70 PatchGAN […] achieves slightly better scores. Scaling beyond this, to the full 286×286 ImageGAN, does not appear to improve the visual quality of the results.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The configuration for the PatchGAN is provided in the appendix of the paper and can be confirmed by reviewing the defineD_n_layers() function in the official Torch implementation.

The model takes two images as input, specifically a source and a target image. These images are concatenated together at the channel level, e.g. 3 color channels of each image become 6 channels of the input.

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. […] All convolutions are 4× 4 spatial filters applied with stride 2. […] The 70 × 70 discriminator architecture is: C64-C128-C256-C512. After the last layer, a convolution is applied to map to a 1-dimensional output, followed by a Sigmoid function. As an exception to the above notation, BatchNorm is not applied to the first C64 layer. All ReLUs are leaky, with slope 0.2.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The PatchGAN configuration is defined using a shorthand notation as: C64-C128-C256-C512, where C refers to a block of Convolution-BatchNorm-LeakyReLU layers and the number indicates the number of filters. Batch normalization is not used in the first layer. As mentioned, the kernel size is fixed at 4×4 and a stride of 2×2 is used on all but the last 2 layers of the model. The slope of the LeakyReLU is set to 0.2, and a sigmoid activation function is used in the output layer.

Random jitter was applied by resizing the 256×256 input images to 286 × 286 and then randomly cropping back to size 256 × 256. Weights were initialized from a Gaussian distribution with mean 0 and standard deviation 0.02.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Model weights were initialized via random Gaussian with a mean of 0.0 and standard deviation of 0.02. Images input to the model are 256×256.

… we divide the objective by 2 while optimizing D, which slows down the rate at which D learns relative to G. We use minibatch SGD and apply the Adam solver, with a learning rate of 0.0002, and momentum parameters β1 = 0.5, β2 = 0.999.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The model is trained with a batch size of one image and the Adam version of stochastic gradient descent is used with a small learning range and modest momentum. The loss for the discriminator is weighted by 50% for each model update.

Tying this all together, we can define a function named define_discriminator() that creates the 70×70 PatchGAN discriminator model.

The complete example of defining the model is listed below.

# example of defining a 70x70 patchgan discriminator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import BatchNormalization from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_src_image = Input(shape=image_shape) 	# target image input 	in_target_image = Input(shape=image_shape) 	# concatenate images channel-wise 	merged = Concatenate()([in_src_image, in_target_image]) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	patch_out = Activation('sigmoid')(d) 	# define model 	model = Model([in_src_image, in_target_image], patch_out) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5]) 	return model  # define image shape image_shape = (256,256,3) # create the model model = define_discriminator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='discriminator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model, providing insight into how the input shape is transformed across the layers and the number of parameters in the model.

We can see that the two input images are concatenated together to create one 256x256x6 input to the first hidden convolutional layer. This concatenation of input images could occur before the input layer of the model, but allowing the model to perform the concatenation makes the behavior of the model clearer.

We can see that the model output will be an activation map with the size 16×16 pixels or activations and a single channel, with each value in the map corresponding to a 70×70 pixel patch of the input 256×256 image. If the input image was half the size at 128×128, then the output feature map would also be halved to 8×8.

The model is a binary classification model, meaning it predicts an output as a probability in the range [0,1], in this case, the likelihood of whether the input image is real or from the target dataset. The patch of values can be averaged to give a real/fake prediction by the model. When trained, the target is compared to a matrix of target values, 0 for fake and 1 for real.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ input_2 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 256, 256, 6)  0           input_1[0][0]                                                                  input_2[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 128, 128, 64) 6208        concatenate_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU)       (None, 128, 128, 64) 0           conv2d_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 64, 64, 128)  131200      leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 64, 64, 128)  512         conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 64, 64, 128)  0           batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 32, 32, 256)  524544      leaky_re_lu_2[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 32, 32, 256)  1024        conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 32, 32, 256)  0           batch_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 16, 16, 512)  2097664     leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 16, 16, 512)  2048        conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 16, 16, 512)  0           batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 16, 16, 512)  4194816     leaky_re_lu_4[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 16, 16, 512)  2048        conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 16, 16, 512)  0           batch_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 16, 16, 1)    8193        leaky_re_lu_5[0][0] __________________________________________________________________________________________________ activation_1 (Activation)       (None, 16, 16, 1)    0           conv2d_6[0][0] ================================================================================================== Total params: 6,968,257 Trainable params: 6,965,441 Non-trainable params: 2,816 __________________________________________________________________________________________________

A plot of the model is created showing much the same information in a graphical form. The model is not complex, with a linear path with two input images and a single output prediction.

Note: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the plot_model() function.

Plot of the PatchGAN Model Used in the Pix2Pix GAN Architecture

Plot of the PatchGAN Model Used in the Pix2Pix GAN Architecture

Now that we know how to implement the PatchGAN discriminator model, we can now look at implementing the U-Net generator model.

How to Implement the U-Net Generator Model

The generator model for the Pix2Pix GAN is implemented as a U-Net.

The U-Net model is an encoder-decoder model for image translation where skip connections are used to connect layers in the encoder with corresponding layers in the decoder that have the same sized feature maps.

The encoder part of the model is comprised of convolutional layers that use a 2×2 stride to downsample the input source image down to a bottleneck layer. The decoder part of the model reads the bottleneck output and uses transpose convolutional layers to upsample to the required output image size.

… the input is passed through a series of layers that progressively downsample, until a bottleneck layer, at which point the process is reversed.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Architecture of the U-Net Generator Model

Architecture of the U-Net Generator Model
Taken from Image-to-Image Translation With Conditional Adversarial Networks.

Skip connections are added between the layers with the same sized feature maps so that the first downsampling layer is connected with the last upsampling layer, the second downsampling layer is connected with the second last upsampling layer, and so on. The connections concatenate the channels of the feature map in the downsampling layer with the feature map in the upsampling layer.

Specifically, we add skip connections between each layer i and layer n − i, where n is the total number of layers. Each skip connection simply concatenates all channels at layer i with those at layer n − i.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Unlike traditional generator models in the GAN architecture, the U-Net generator does not take a point from the latent space as input. Instead, dropout layers are used as a source of randomness both during training and when the model is used to make a prediction, e.g. generate an image at inference time.

Similarly, batch normalization is used in the same way during training and inference, meaning that statistics are calculated for each batch and not fixed at the end of the training process. This is referred to as instance normalization, specifically when the batch size is set to 1 as it is with the Pix2Pix model.

At inference time, we run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that we apply dropout at test time, and we apply batch normalization using the statistics of the test batch, rather than aggregated statistics of the training batch.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

In Keras, layers like Dropout and BatchNormalization operate differently during training and in inference model. We can set the “training” argument when calling these layers to “True” to ensure that they always operate in training-model, even when used during inference.

For example, a Dropout layer that will drop out during inference as well as training can be added to the model as follows:

... g = Dropout(0.5)(g, training=True)

As with the discriminator model, the configuration details of the generator model are defined in the appendix of the paper and can be confirmed when comparing against the defineG_unet() function in the official Torch implementation.

The encoder uses blocks of Convolution-BatchNorm-LeakyReLU like the discriminator model, whereas the decoder model uses blocks of Convolution-BatchNorm-Dropout-ReLU with a dropout rate of 50%. All convolutional layers use a filter size of 4×4 and a stride of 2×2.

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNormDropout-ReLU layer with a dropout rate of 50%. All convolutions are 4× 4 spatial filters applied with stride 2.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The architecture of the U-Net model is defined using the shorthand notation as:

  • Encoder: C64-C128-C256-C512-C512-C512-C512-C512
  • Decoder: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128

The last layer of the encoder is the bottleneck layer, which does not use batch normalization, according to an amendment to the paper and confirmation in the code, and uses a ReLU activation instead of LeakyRelu.

… the activations of the bottleneck layer are zeroed by the batchnorm operation, effectively making the innermost layer skipped. This issue can be fixed by removing batchnorm from this layer, as has been done in the public code

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The number of filters in the U-Net decoder is a little misleading as it is the number of filters for the layer after concatenation with the equivalent layer in the encoder. This may become more clear when we create a plot of the model.

The output of the model uses a single convolutional layer with three channels, and tanh activation function is used in the output layer, common to GAN generator models. Batch normalization is not used in the first layer of the decoder.

After the last layer in the decoder, a convolution is applied to map to the number of output channels (3 in general […]), followed by a Tanh function […] BatchNorm is not applied to the first C64 layer in the encoder. All ReLUs in the encoder are leaky, with slope 0.2, while ReLUs in the decoder are not leaky.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Tying this all together, we can define a function named define_generator() that defines the U-Net encoder-decoder generator model. Two helper functions are also provided for defining encoder blocks of layers and decoder blocks of layers.

The complete example of defining the model is listed below.

# example of defining a u-net encoder-decoder generator model from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import Dropout from keras.layers import BatchNormalization from keras.layers import LeakyReLU from keras.utils.vis_utils import plot_model  # define an encoder block def define_encoder_block(layer_in, n_filters, batchnorm=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add downsampling layer 	g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# conditionally add batch normalization 	if batchnorm: 		g = BatchNormalization()(g, training=True) 	# leaky relu activation 	g = LeakyReLU(alpha=0.2)(g) 	return g  # define a decoder block def decoder_block(layer_in, skip_in, n_filters, dropout=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add upsampling layer 	g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# add batch normalization 	g = BatchNormalization()(g, training=True) 	# conditionally add dropout 	if dropout: 		g = Dropout(0.5)(g, training=True) 	# merge with skip connection 	g = Concatenate()([g, skip_in]) 	# relu activation 	g = Activation('relu')(g) 	return g  # define the standalone generator model def define_generator(image_shape=(256,256,3)): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# encoder model: C64-C128-C256-C512-C512-C512-C512-C512 	e1 = define_encoder_block(in_image, 64, batchnorm=False) 	e2 = define_encoder_block(e1, 128) 	e3 = define_encoder_block(e2, 256) 	e4 = define_encoder_block(e3, 512) 	e5 = define_encoder_block(e4, 512) 	e6 = define_encoder_block(e5, 512) 	e7 = define_encoder_block(e6, 512) 	# bottleneck, no batch norm and relu 	b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7) 	b = Activation('relu')(b) 	# decoder model: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128 	d1 = decoder_block(b, e7, 512) 	d2 = decoder_block(d1, e6, 512) 	d3 = decoder_block(d2, e5, 512) 	d4 = decoder_block(d3, e4, 512, dropout=False) 	d5 = decoder_block(d4, e3, 256, dropout=False) 	d6 = decoder_block(d5, e2, 128, dropout=False) 	d7 = decoder_block(d6, e1, 64, dropout=False) 	# output 	g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # define image shape image_shape = (256,256,3) # create the model model = define_generator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='generator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model.

The model has a single input and output, but the skip connections make the summary difficult to read.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 128, 128, 64) 3136        input_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU)       (None, 128, 128, 64) 0           conv2d_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 64, 64, 128)  131200      leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 64, 64, 128)  512         conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 64, 64, 128)  0           batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 32, 32, 256)  524544      leaky_re_lu_2[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 32, 32, 256)  1024        conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 32, 32, 256)  0           batch_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 16, 16, 512)  2097664     leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 16, 16, 512)  2048        conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 16, 16, 512)  0           batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 512)    4194816     leaky_re_lu_4[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 512)    2048        conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 8, 8, 512)    0           batch_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 4, 4, 512)    4194816     leaky_re_lu_5[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 4, 4, 512)    2048        conv2d_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 4, 4, 512)    0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 2, 2, 512)    4194816     leaky_re_lu_6[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 2, 2, 512)    2048        conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU)       (None, 2, 2, 512)    0           batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 1, 1, 512)    4194816     leaky_re_lu_7[0][0] __________________________________________________________________________________________________ activation_1 (Activation)       (None, 1, 1, 512)    0           conv2d_8[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 2, 2, 512)    4194816     activation_1[0][0] __________________________________________________________________________________________________ batch_normalization_7 (BatchNor (None, 2, 2, 512)    2048        conv2d_transpose_1[0][0] __________________________________________________________________________________________________ dropout_1 (Dropout)             (None, 2, 2, 512)    0           batch_normalization_7[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate)     (None, 2, 2, 1024)   0           dropout_1[0][0]                                                                  leaky_re_lu_7[0][0] __________________________________________________________________________________________________ activation_2 (Activation)       (None, 2, 2, 1024)   0           concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_transpose_2 (Conv2DTrans (None, 4, 4, 512)    8389120     activation_2[0][0] __________________________________________________________________________________________________ batch_normalization_8 (BatchNor (None, 4, 4, 512)    2048        conv2d_transpose_2[0][0] __________________________________________________________________________________________________ dropout_2 (Dropout)             (None, 4, 4, 512)    0           batch_normalization_8[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate)     (None, 4, 4, 1024)   0           dropout_2[0][0]                                                                  leaky_re_lu_6[0][0] __________________________________________________________________________________________________ activation_3 (Activation)       (None, 4, 4, 1024)   0           concatenate_2[0][0] __________________________________________________________________________________________________ conv2d_transpose_3 (Conv2DTrans (None, 8, 8, 512)    8389120     activation_3[0][0] __________________________________________________________________________________________________ batch_normalization_9 (BatchNor (None, 8, 8, 512)    2048        conv2d_transpose_3[0][0] __________________________________________________________________________________________________ dropout_3 (Dropout)             (None, 8, 8, 512)    0           batch_normalization_9[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate)     (None, 8, 8, 1024)   0           dropout_3[0][0]                                                                  leaky_re_lu_5[0][0] __________________________________________________________________________________________________ activation_4 (Activation)       (None, 8, 8, 1024)   0           concatenate_3[0][0] __________________________________________________________________________________________________ conv2d_transpose_4 (Conv2DTrans (None, 16, 16, 512)  8389120     activation_4[0][0] __________________________________________________________________________________________________ batch_normalization_10 (BatchNo (None, 16, 16, 512)  2048        conv2d_transpose_4[0][0] __________________________________________________________________________________________________ concatenate_4 (Concatenate)     (None, 16, 16, 1024) 0           batch_normalization_10[0][0]                                                                  leaky_re_lu_4[0][0] __________________________________________________________________________________________________ activation_5 (Activation)       (None, 16, 16, 1024) 0           concatenate_4[0][0] __________________________________________________________________________________________________ conv2d_transpose_5 (Conv2DTrans (None, 32, 32, 256)  4194560     activation_5[0][0] __________________________________________________________________________________________________ batch_normalization_11 (BatchNo (None, 32, 32, 256)  1024        conv2d_transpose_5[0][0] __________________________________________________________________________________________________ concatenate_5 (Concatenate)     (None, 32, 32, 512)  0           batch_normalization_11[0][0]                                                                  leaky_re_lu_3[0][0] __________________________________________________________________________________________________ activation_6 (Activation)       (None, 32, 32, 512)  0           concatenate_5[0][0] __________________________________________________________________________________________________ conv2d_transpose_6 (Conv2DTrans (None, 64, 64, 128)  1048704     activation_6[0][0] __________________________________________________________________________________________________ batch_normalization_12 (BatchNo (None, 64, 64, 128)  512         conv2d_transpose_6[0][0] __________________________________________________________________________________________________ concatenate_6 (Concatenate)     (None, 64, 64, 256)  0           batch_normalization_12[0][0]                                                                  leaky_re_lu_2[0][0] __________________________________________________________________________________________________ activation_7 (Activation)       (None, 64, 64, 256)  0           concatenate_6[0][0] __________________________________________________________________________________________________ conv2d_transpose_7 (Conv2DTrans (None, 128, 128, 64) 262208      activation_7[0][0] __________________________________________________________________________________________________ batch_normalization_13 (BatchNo (None, 128, 128, 64) 256         conv2d_transpose_7[0][0] __________________________________________________________________________________________________ concatenate_7 (Concatenate)     (None, 128, 128, 128 0           batch_normalization_13[0][0]                                                                  leaky_re_lu_1[0][0] __________________________________________________________________________________________________ activation_8 (Activation)       (None, 128, 128, 128 0           concatenate_7[0][0] __________________________________________________________________________________________________ conv2d_transpose_8 (Conv2DTrans (None, 256, 256, 3)  6147        activation_8[0][0] __________________________________________________________________________________________________ activation_9 (Activation)       (None, 256, 256, 3)  0           conv2d_transpose_8[0][0] ================================================================================================== Total params: 54,429,315 Trainable params: 54,419,459 Non-trainable params: 9,856 __________________________________________________________________________________________________

A plot of the model is created showing much the same information in a graphical form. The model is complex, and the plot helps to understand the skip connections and their impact on the number of filters in the decoder.

Note: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the plot_model() function.

Working backward from the output layer, if we look at the Concatenate layers and the first Conv2DTranspose layer of the decoder, we can see the number of channels as:

  • [128, 256, 512, 1024, 1024, 1024, 1024, 512].

Reversing this list gives the stated configuration of the number of filters for each layer in the decoder from the paper of:

  • CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128
Plot of the U-Net Encoder-Decoder Model Used in the Pix2Pix GAN Architecture

Plot of the U-Net Encoder-Decoder Model Used in the Pix2Pix GAN Architecture

Now that we have defined both models, we can look at how the generator model is updated via the discriminator model.

How to Implement Adversarial and L1 Loss

The discriminator model can be updated directly, whereas the generator model must be updated via the discriminator model.

This can be achieved by defining a new composite model in Keras that connects the output of the generator model as input to the discriminator model. The discriminator model can then predict whether a generated image is real or fake. We can update the weights of the composite model in such a way that the generated image has the label of “real” instead of “fake“, which will cause the generator weights to be updated towards generating a better fake image. We can also mark the discriminator weights as not trainable in this context, to avoid the misleading update.

Additionally, the generator needs to be updated to better match the targeted translation of the input image. This means that the composite model must also output the generated image directly, allowing it to be compared to the target image.

Therefore, we can summarize the inputs and outputs of this composite model as follows:

  • Inputs: Source image
  • Outputs: Classification of real/fake, generated target image.

The weights of the generator will be updated via both adversarial loss via the discriminator output and L1 loss via the direct image output. The loss scores are added together, where the L1 loss is treated as a regularizing term and weighted via a hyperparameter called lambda, set to 100.

  • loss = adversarial loss + lambda * L1 loss

The define_gan() function below implements this, taking the defined generator and discriminator models as input and creating the composite GAN model that can be used to update the generator model weights.

The source image input is provided both to the generator and the discriminator as input and the output of the generator is also connected to the discriminator as input.

Two loss functions are specified when the model is compiled for the discriminator and generator outputs respectively. The loss_weights argument is used to define the weighting of each loss when added together to update the generator model weights.

# define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model, image_shape): 	# make weights in the discriminator not trainable 	d_model.trainable = False 	# define the source image 	in_src = Input(shape=image_shape) 	# connect the source image to the generator input 	gen_out = g_model(in_src) 	# connect the source input and generator output to the discriminator input 	dis_out = d_model([in_src, gen_out]) 	# src image as input, generated image and classification output 	model = Model(in_src, [dis_out, gen_out]) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss=['binary_crossentropy', 'mae'], optimizer=opt, loss_weights=[1,100]) 	return model

Tying this together with the model definitions from the previous sections, the complete example is listed below.

# example of defining a composite model for training the generator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import Dropout from keras.layers import BatchNormalization from keras.layers import LeakyReLU from keras.utils.vis_utils import plot_model  # define the discriminator model def define_discriminator(image_shape): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# source image input 	in_src_image = Input(shape=image_shape) 	# target image input 	in_target_image = Input(shape=image_shape) 	# concatenate images channel-wise 	merged = Concatenate()([in_src_image, in_target_image]) 	# C64 	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged) 	d = LeakyReLU(alpha=0.2)(d) 	# C128 	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C256 	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# C512 	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# second last output layer 	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# patch output 	d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) 	patch_out = Activation('sigmoid')(d) 	# define model 	model = Model([in_src_image, in_target_image], patch_out) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5]) 	return model  # define an encoder block def define_encoder_block(layer_in, n_filters, batchnorm=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add downsampling layer 	g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# conditionally add batch normalization 	if batchnorm: 		g = BatchNormalization()(g, training=True) 	# leaky relu activation 	g = LeakyReLU(alpha=0.2)(g) 	return g  # define a decoder block def decoder_block(layer_in, skip_in, n_filters, dropout=True): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# add upsampling layer 	g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) 	# add batch normalization 	g = BatchNormalization()(g, training=True) 	# conditionally add dropout 	if dropout: 		g = Dropout(0.5)(g, training=True) 	# merge with skip connection 	g = Concatenate()([g, skip_in]) 	# relu activation 	g = Activation('relu')(g) 	return g  # define the standalone generator model def define_generator(image_shape=(256,256,3)): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# image input 	in_image = Input(shape=image_shape) 	# encoder model: C64-C128-C256-C512-C512-C512-C512-C512 	e1 = define_encoder_block(in_image, 64, batchnorm=False) 	e2 = define_encoder_block(e1, 128) 	e3 = define_encoder_block(e2, 256) 	e4 = define_encoder_block(e3, 512) 	e5 = define_encoder_block(e4, 512) 	e6 = define_encoder_block(e5, 512) 	e7 = define_encoder_block(e6, 512) 	# bottleneck, no batch norm and relu 	b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7) 	b = Activation('relu')(b) 	# decoder model: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128 	d1 = decoder_block(b, e7, 512) 	d2 = decoder_block(d1, e6, 512) 	d3 = decoder_block(d2, e5, 512) 	d4 = decoder_block(d3, e4, 512, dropout=False) 	d5 = decoder_block(d4, e3, 256, dropout=False) 	d6 = decoder_block(d5, e2, 128, dropout=False) 	d7 = decoder_block(d6, e1, 64, dropout=False) 	# output 	g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7) 	out_image = Activation('tanh')(g) 	# define model 	model = Model(in_image, out_image) 	return model  # define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model, image_shape): 	# make weights in the discriminator not trainable 	d_model.trainable = False 	# define the source image 	in_src = Input(shape=image_shape) 	# connect the source image to the generator input 	gen_out = g_model(in_src) 	# connect the source input and generator output to the discriminator input 	dis_out = d_model([in_src, gen_out]) 	# src image as input, generated image and classification output 	model = Model(in_src, [dis_out, gen_out]) 	# compile model 	opt = Adam(lr=0.0002, beta_1=0.5) 	model.compile(loss=['binary_crossentropy', 'mae'], optimizer=opt, loss_weights=[1,100]) 	return model  # define image shape image_shape = (256,256,3) # define the models d_model = define_discriminator(image_shape) g_model = define_generator(image_shape) # define the composite model gan_model = define_gan(g_model, d_model, image_shape) # summarize the model gan_model.summary() # plot the model plot_model(gan_model, to_file='gan_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the composite model, showing the 256×256 image input, the same shaped output from model_2 (the generator) and the PatchGAN classification prediction from model_1 (the discriminator).

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_4 (InputLayer)            (None, 256, 256, 3)  0 __________________________________________________________________________________________________ model_2 (Model)                 (None, 256, 256, 3)  54429315    input_4[0][0] __________________________________________________________________________________________________ model_1 (Model)                 (None, 16, 16, 1)    6968257     input_4[0][0]                                                                  model_2[1][0] ================================================================================================== Total params: 61,397,572 Trainable params: 54,419,459 Non-trainable params: 6,978,113 __________________________________________________________________________________________________

A plot of the composite model is also created, showing how the input image flows into the generator and discriminator, and that the model has two outputs or end-points from each of the two models.

Note: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the plot_model() function.

Plot of the Composite GAN Model Used to Train the Generator in the Pix2Pix GAN Architecture

Plot of the Composite GAN Model Used to Train the Generator in the Pix2Pix GAN Architecture

How to Update Model Weights

Training the defined models is relatively straightforward.

First, we must define a helper function that will select a batch of real source and target images and the associated output (1.0). Here, the dataset is a list of two arrays of images.

# select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): 	# unpack dataset 	trainA, trainB = dataset 	# choose random instances 	ix = randint(0, trainA.shape[0], n_samples) 	# retrieve selected images 	X1, X2 = trainA[ix], trainB[ix] 	# generate 'real' class labels (1) 	y = ones((n_samples, patch_shape, patch_shape, 1)) 	return [X1, X2], y

Similarly, we need a function to generate a batch of fake images and the associated output (0.0). Here, the samples are an array of source images for which target images will be generated.

# generate a batch of images, returns images and targets def generate_fake_samples(g_model, samples, patch_shape): 	# generate fake instance 	X = g_model.predict(samples) 	# create 'fake' class labels (0) 	y = zeros((len(X), patch_shape, patch_shape, 1)) 	return X, y

Now, we can define the steps of a single training iteration.

First, we must select a batch of source and target images by calling generate_real_samples().

Typically, the batch size (n_batch) is set to 1. In this case, we will assume 256×256 input images, which means the n_patch for the PatchGAN discriminator will be 16 to indicate a 16×16 output feature map.

... # select a batch of real samples [X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch)

Next, we can use the batches of selected real source images to generate corresponding batches of generated or fake target images.

... # generate a batch of fake samples X_fakeB, y_fake = generate_fake_samples(g_model, X_realA, n_patch)

We can then use the real and fake images, as well as their targets, to update the standalone discriminator model.

... # update discriminator for real samples d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real) # update discriminator for generated samples d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake)

So far, this is normal for updating a GAN in Keras.

Next, we can update the generator model via adversarial loss and L1 loss. Recall that the composite GAN model takes a batch of source images as input and predicts first the classification of real/fake and second the generated target. Here, we provide a target to indicate the generated images are “real” (class=1) to the discriminator output of the composite model. The real target images are provided for calculating the L1 loss between them and the generated target images.

We have two loss functions, but three loss values calculated for a batch update, where only the first loss value is of interest as it is the weighted sum of the adversarial and L1 loss values for the batch.

... # update the generator g_loss, _, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB])

That’s all there is to it.

We can define all of this in a function called train() that takes the defined models and a loaded dataset (as a list of two NumPy arrays) and trains the models.

# train pix2pix models def train(d_model, g_model, gan_model, dataset, n_epochs=100, n_batch=1, n_patch=16): 	# unpack dataset 	trainA, trainB = dataset 	# calculate the number of batches per training epoch 	bat_per_epo = int(len(trainA) / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# manually enumerate epochs 	for i in range(n_steps): 		# select a batch of real samples 		[X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch) 		# generate a batch of fake samples 		X_fakeB, y_fake = generate_fake_samples(g_model, X_realA, n_patch) 		# update discriminator for real samples 		d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real) 		# update discriminator for generated samples 		d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake) 		# update the generator 		g_loss, _, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB]) 		# summarize performance 		print('>%d, d1[%.3f] d2[%.3f] g[%.3f]' % (i+1, d_loss1, d_loss2, g_loss))

The train function can then be called directly with our defined models and loaded dataset.

... # load image data dataset = ... # train model train(d_model, g_model, gan_model, dataset)

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Official

API

Articles

Summary

In this tutorial, you discovered how to implement the Pix2Pix GAN architecture from scratch using the Keras deep learning framework.

Specifically, you learned:

  • How to develop the PatchGAN discriminator model for the Pix2Pix GAN.
  • How to develop the U-Net encoder-decoder generator model for the Pix2Pix GAN.
  • How to implement the composite model for updating the generator and how to train both models.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Pix2Pix GAN Models From Scratch With Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

Opinion: Ali Krieger and Ashlyn Harris are role models on and off the soccer field – USA TODAY

Opinion: Ali Krieger and Ashlyn Harris are role models on and off the soccer field  USA TODAY

Four years after the Supreme Court legalized gay marriage, Ashlyn Harris and Ali Krieger are first out couple to play on senior U.S. soccer team.

“Sweden ar” – Google News

Will future models propel Volvo to a million cars a year? – just-auto.com

Will future models propel Volvo to a million cars a year?  just-auto.com

Volvo is selling ever more cars and SUVs, even in China. Global deliveries for the first half of 2019 saw the brand delivering a record 340826 cars, …

“Sweden ev” – Google News