Today”s Market View – Beowulf Mining, Keras Resources and Thor Mining… Proactive Investors UK

“Sweden ev” – Google News

# Tag: Keras

## How to Train a Progressive Growing GAN in Keras for Synthesizing Faces

Generative adversarial networks, or GANs, are effective at generating high-quality synthetic images.

A limitation of GANs is that the are only capable of generating relatively small images, such as 64×64 pixels.

The Progressive Growing GAN is an extension to the GAN training procedure that involves training a GAN to generate very small images, such as 4×4, and incrementally increasing the size of the generated images to 8×8, 16×16, until the desired output size is met. This has allowed the progressive GAN to generate photorealistic synthetic faces with 1024×1024 pixel resolution.

The key innovation of the progressive growing GAN is the two-phase training procedure that involves the fading-in of new blocks to support higher-resolution images followed by fine-tuning.

In this tutorial, you will discover how to implement and train a progressive growing generative adversarial network for generating celebrity faces.

After completing this tutorial, you will know:

- How to prepare the celebrity faces dataset for training a progressive growing GAN model.
- How to define and train the progressive growing GAN on the celebrity faces dataset.
- How to load saved generator models and use them for generating ad hoc synthetic celebrity faces.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

## Tutorial Overview

This tutorial is divided into five parts; they are:

- What Is the Progressive Growing GAN
- How to Prepare the Celebrity Faces Dataset
- How to Develop Progressive Growing GAN Models
- How to Train Progressive Growing GAN Models
- How to Synthesize Images With a Progressive Growing GAN Model

## What Is the Progressive Growing GAN

GANs are effective at generating crisp synthetic images, although are typically limited in the size of the images that can be generated.

The Progressive Growing GAN is an extension to the GAN that allows the training generator models to be capable of generating large high-quality images, such as photorealistic faces with the size 1024×1024 pixels. It was described in the 2017 paper by Tero Karras, et al. from Nvidia titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation.”

The key innovation of the Progressive Growing GAN is the incremental increase in the size of images output by the generator, starting with a 4×4 pixel image and doubling to 8×8, 16×16, and so on until the desired output resolution.

This is achieved by a training procedure that involves periods of fine-tuning the model with a given output resolution, and periods of slowly phasing in a new model with a larger resolution. All layers remain trainable during the training process, including existing layers when new layers are added.

Progressive Growing GAN involves using a generator and discriminator model with the same general structure and starting with very small images. During training, new blocks of convolutional layers are systematically added to both the generator model and the discriminator models.

The incremental addition of the layers allows the models to effectively learn coarse-level detail and later learn ever-finer detail, both on the generator and discriminator sides.

This incremental nature allows the training to first discover large-scale structure of the image distribution and then shift attention to increasingly finer-scale detail, instead of having to learn all scales simultaneously.

The next step is to select a dataset to use for developing a Progressive Growing GAN.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Prepare the Celebrity Faces Dataset

In this tutorial, we will use the Large-scale Celebrity Faces Attributes Dataset, referred to as CelebA.

This dataset was developed and published by Ziwei Liu, et al. for their 2015 paper tilted “From Facial Parts Responses to Face Detection: A Deep Learning Approach.”

The dataset provides about 200,000 photographs of celebrity faces along with annotations for what appears in given photos, such as glasses, face shape, hats, hair type, etc. As part of the dataset, the authors provide a version of each photo centered on the face and cropped to the portrait with varying sizes around 150 pixels wide and 200 pixels tall. We will use this as the basis for developing our GAN model.

The dataset can be easily downloaded from the Kaggle webpage. Note: this may require an account with Kaggle.

Specifically, download the file “*img_align_celeba.zip*“, which is about 1.3 gigabytes. To do this, click on the filename on the Kaggle website and then click the download icon.

The download might take a while depending on the speed of your internet connection.

After downloading, unzip the archive.

This will create a new directory named “*img_align_celeba*” that contains all of the images with filenames like *202599.jpg* and *202598.jpg*.

When working with a GAN, it is easier to model a dataset if all of the images are small and square in shape.

Further, as we are only interested in the face in each photo and not the background, we can perform face detection and extract only the face before resizing the result to a fixed size.

There are many ways to perform face detection. In this case, we will use a pre-trained Multi-Task Cascaded Convolutional Neural Network, or MTCNN. This is a state-of-the-art deep learning model for face detection, described in the 2016 paper titled “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.”

We will use the implementation provided by Iván de Paz Centeno in the ipazc/mtcnn project. This can also be installed via pip as follows:

sudo pip install mtcnn

We can confirm that the library was installed correctly by importing the library and printing the version; for example:

# confirm mtcnn was installed correctly import mtcnn # print version print(mtcnn.__version__)

Running the example prints the current version of the library.

0.0.8

The MTCNN model is very easy to use.

First, an instance of the MTCNN model is created, then the *detect_faces()* function can be called passing in the pixel data for one image.

The result a list of detected faces, with a bounding box defined in pixel offset values.

... # prepare model model = MTCNN() # detect face in the image faces = model.detect_faces(pixels) # extract details of the face x1, y1, width, height = faces[0]['box']

Although the progressive growing GAN supports the synthesis of large images, such as 1024×1024, this requires enormous resources, such as a single top of the line GPU training the model for a month.

Instead, we will reduce the size of the generated images to 128×128 which will, in turn, allow us to train a reasonable model on a GPU in a few hours and still discover how the progressive growing model can be implemented, trained, and used.

As such, we can develop a function to load a file and extract the face from the photo, then and resize the extracted face pixels to a predefined size. In this case, we will use the square shape of 128×128 pixels.

The *load_image()* function below will load a given photo file name as a NumPy array of pixels.

# load an image as an rgb numpy array def load_image(filename): # load image from file image = Image.open(filename) # convert to RGB, if needed image = image.convert('RGB') # convert to array pixels = asarray(image) return pixels

The *extract_face()* function below takes the MTCNN model and pixel values for a single photograph as arguments and returns a 128x128x3 array of pixel values with just the face, or *None* if no face was detected (which can happen rarely).

# extract the face from a loaded image and resize def extract_face(model, pixels, required_size=(128, 128)): # detect face in the image faces = model.detect_faces(pixels) # skip cases where we could not detect a face if len(faces) == 0: return None # extract details of the face x1, y1, width, height = faces[0]['box'] # force detected pixel values to be positive (bug fix) x1, y1 = abs(x1), abs(y1) # convert into coordinates x2, y2 = x1 + width, y1 + height # retrieve face pixels face_pixels = pixels[y1:y2, x1:x2] # resize pixels to the model size image = Image.fromarray(face_pixels) image = image.resize(required_size) face_array = asarray(image) return face_array

The *load_faces()* function below enumerates all photograph files in a directory and extracts and resizes the face from each and returns a NumPy array of faces.

We limit the total number of faces loaded via the *n_faces* argument, as we don’t need them all.

# load images and extract faces for all images in a directory def load_faces(directory, n_faces): # prepare model model = MTCNN() faces = list() # enumerate files for filename in listdir(directory): # load the image pixels = load_image(directory + filename) # get face face = extract_face(model, pixels) if face is None: continue # store faces.append(face) print(len(faces), face.shape) # stop once we have enough if len(faces) >= n_faces: break return asarray(faces)

Tying this together, the complete example of preparing a dataset of celebrity faces for training a GAN model is listed below.

In this case, we increase the total number of loaded faces to 50,000 to provide a good training dataset for our GAN model.

# example of extracting and resizing faces into a new dataset from os import listdir from numpy import asarray from numpy import savez_compressed from PIL import Image from mtcnn.mtcnn import MTCNN from matplotlib import pyplot # load an image as an rgb numpy array def load_image(filename): # load image from file image = Image.open(filename) # convert to RGB, if needed image = image.convert('RGB') # convert to array pixels = asarray(image) return pixels # extract the face from a loaded image and resize def extract_face(model, pixels, required_size=(128, 128)): # detect face in the image faces = model.detect_faces(pixels) # skip cases where we could not detect a face if len(faces) == 0: return None # extract details of the face x1, y1, width, height = faces[0]['box'] # force detected pixel values to be positive (bug fix) x1, y1 = abs(x1), abs(y1) # convert into coordinates x2, y2 = x1 + width, y1 + height # retrieve face pixels face_pixels = pixels[y1:y2, x1:x2] # resize pixels to the model size image = Image.fromarray(face_pixels) image = image.resize(required_size) face_array = asarray(image) return face_array # load images and extract faces for all images in a directory def load_faces(directory, n_faces): # prepare model model = MTCNN() faces = list() # enumerate files for filename in listdir(directory): # load the image pixels = load_image(directory + filename) # get face face = extract_face(model, pixels) if face is None: continue # store faces.append(face) print(len(faces), face.shape) # stop once we have enough if len(faces) >= n_faces: break return asarray(faces) # directory that contains all images directory = 'img_align_celeba/' # load and extract all faces all_faces = load_faces(directory, 50000) print('Loaded: ', all_faces.shape) # save in compressed format savez_compressed('img_align_celeba_128.npz', all_faces)

Running the example may take a few minutes given the larger number of faces to be loaded.

At the end of the run, the array of extracted and resized faces is saved as a compressed NumPy array with the filename ‘*img_align_celeba_128.npz*‘.

The prepared dataset can then be loaded any time, as follows.

# load the prepared dataset from numpy import load # load the face dataset data = load('img_align_celeba_128.npz') faces = data['arr_0'] print('Loaded: ', faces.shape)

Loading the dataset summarizes the shape of the array, showing 50K images with the size of 128×128 pixels and three color channels.

Loaded: (50000, 128, 128, 3)

We can elaborate on this example and plot the first 100 faces in the dataset as a 10×10 grid. The complete example is listed below.

# load the prepared dataset from numpy import load from matplotlib import pyplot # plot a list of loaded faces def plot_faces(faces, n): for i in range(n * n): # define subplot pyplot.subplot(n, n, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(faces[i].astype('uint8')) pyplot.show() # load the face dataset data = load('img_align_celeba_128.npz') faces = data['arr_0'] print('Loaded: ', faces.shape) plot_faces(faces, 10)

Running the example loads the dataset and creates a plot of the first 100 images.

We can see that each image only contains the face and all faces have the same square shape. Our goal is to generate new faces with the same general properties.

We are now ready to develop a GAN model to generate faces using this dataset.

## How to Develop Progressive Growing GAN Models

There are many ways to implement the progressive growing GAN models.

In this tutorial, we will develop and implement each phase of growth as a separate Keras model and each model will share the same layers and weights.

This approach allows for the convenient training of each model, just like a normal Keras model, although it requires a slightly complicated model construction process to ensure that the layers are reused correctly.

First, we will define some custom layers required in the definition of the generator and discriminator models, then proceed to define functions to create and grow the discriminator and generator models themselves.

### Progressive Growing Custom Layers

There are three custom layers required to implement the progressive growing generative adversarial network.

They are the layers:

**WeightedSum**: Used to control the weighted sum of the old and new layers during a growth phase.**MinibatchStdev**: Used to summarize statistics for a batch of images in the discriminator.**PixelNormalization**: Used to normalize activation maps in the generator model.

Additionally, a weight constraint is used in the paper referred to as “*equalized learning rate*“. This too would need to be implemented as a custom layer. In the interest of brevity, we won’t use equalized learning rate in this tutorial and instead we use a simple max norm weight constraint.

#### WeightedSum Layer

The *WeightedSum* layer is a merge layer that combines the activations from two input layers, such as two input paths in a discriminator or two output paths in a generator model. It uses a variable called *alpha* that controls how much to weight the first and second inputs.

It is used during the growth phase of training when the model is in transition from one image size to a new image size with double the width and height (quadruple the area), such as from 4×4 to 8×8 pixels.

During the growth phase, the alpha parameter is linearly scaled from 0.0 at the beginning to 1.0 at the end, allowing the output of the layer to transition from giving full weight to the old layers to giving full weight to the new layers (second input).

- weighted sum = ((1.0 – alpha) * input1) + (alpha * input2)

The *WeightedSum* class is defined below as an extension to the *Add* merge layer.

# weighted sum output class WeightedSum(Add): # init with default value def __init__(self, alpha=0.0, **kwargs): super(WeightedSum, self).__init__(**kwargs) self.alpha = backend.variable(alpha, name='ws_alpha') # output a weighted sum of inputs def _merge_function(self, inputs): # only supports a weighted sum of two inputs assert (len(inputs) == 2) # ((1-a) * input1) + (a * input2) output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) return output

#### MinibatchStdev

The mini-batch standard deviation layer, or *MinibatchStdev*, is only used in the output block of the discriminator layer.

The objective of the layer is to provide a statistical summary of the batch of activations. The discriminator can then learn to better detect batches of fake samples from batches of real samples. This, in turn, encourages the generator that is trained via the discriminator to create batches of samples with realistic batch statistics.

It is implemented as calculating the standard deviation for each pixel value in the activation maps across the batch, calculating the average of this value, and then creating a new activation map (one channel) that is appended to the list of activation maps provided as input.

The *MinibatchStdev* layer is defined below.

# mini-batch standard deviation layer class MinibatchStdev(Layer): # initialize the layer def __init__(self, **kwargs): super(MinibatchStdev, self).__init__(**kwargs) # perform the operation def call(self, inputs): # calculate the mean value for each pixel across channels mean = backend.mean(inputs, axis=0, keepdims=True) # calculate the squared differences between pixel values and mean squ_diffs = backend.square(inputs - mean) # calculate the average of the squared differences (variance) mean_sq_diff = backend.mean(squ_diffs, axis=0, keepdims=True) # add a small value to avoid a blow-up when we calculate stdev mean_sq_diff += 1e-8 # square root of the variance (stdev) stdev = backend.sqrt(mean_sq_diff) # calculate the mean standard deviation across each pixel coord mean_pix = backend.mean(stdev, keepdims=True) # scale this up to be the size of one input feature map for each sample shape = backend.shape(inputs) output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1)) # concatenate with the output combined = backend.concatenate([inputs, output], axis=-1) return combined # define the output shape of the layer def compute_output_shape(self, input_shape): # create a copy of the input shape as a list input_shape = list(input_shape) # add one to the channel dimension (assume channels-last) input_shape[-1] += 1 # convert list to a tuple return tuple(input_shape)

#### PixelNormalization

The generator and discriminator models don’t use batch normalization like other GAN models; instead, each pixel in the activation maps is normalized to unit length.

This is a variation of local response normalization and is referred to in the paper as pixelwise feature vector normalization. Also, unlike other GAN models, normalization is only used in the generator model, not the discriminator.

This is a type of activity regularization and could be implemented as an activity constraint, although it is easily implemented as a new layer that scales the activations of the prior layer.

The *PixelNormalization* class below implements this and can be used after each Convolution layer in the generator, but before any activation function.

# pixel-wise feature vector normalization layer class PixelNormalization(Layer): # initialize the layer def __init__(self, **kwargs): super(PixelNormalization, self).__init__(**kwargs) # perform the operation def call(self, inputs): # calculate square pixel values values = inputs**2.0 # calculate the mean pixel values mean_values = backend.mean(values, axis=-1, keepdims=True) # ensure the mean is not zero mean_values += 1.0e-8 # calculate the sqrt of the mean squared value (L2 norm) l2 = backend.sqrt(mean_values) # normalize values by the l2 norm normalized = inputs / l2 return normalized # define the output shape of the layer def compute_output_shape(self, input_shape): return input_shape

We now have all of the custom layers required and can define our models.

### Progressive Growing Discriminator Model

The discriminator model is defined as a deep convolutional neural network that expects a 4×4 color image as input and predicts whether it is real or fake.

The first hidden layer is a 1×1 convolutional layer. The output block involves a *MinibatchStdev*, 3×3, and 4×4 convolutional layers, and a fully connected layer that outputs a prediction. Leaky ReLU activation functions are used after all layers and the output layers use a linear activation function.

This model is trained for normal interval then the model undergoes a growth phase to 8×8. This involves adding a block of two 3×3 convolutional layers and an average pooling downsample layer. The input image passes through the new block with a new 1×1 convolutional hidden layer. The input image is also passed through a downsample layer and through the old 1×1 convolutional hidden layer. The output of the old 1×1 convolution layer and the new block are then combined via a *WeightedSum* layer.

After an interval of training transitioning the *WeightedSum’s* alpha parameter from 0.0 (all old) to 1.0 (all new), another training phase is run to tune the new model with the old layer and pathway removed.

This process repeats until the desired image size is met, in our case, 128×128 pixel images.

We can achieve this with two functions: the *define_discriminator()* function that defines the base model that accepts 4×4 images and the *add_discriminator_block()* function that takes a model and creates a growth version of the model with two pathways and the *WeightedSum* and a second version of the model with the same layers/weights but without the old 1×1 layer and *WeightedSum* layers. The *define_discriminator()* function can then call the *add_discriminator_block()* function as many times as is needed to create the models up to the desired level of growth.

All layers are initialized with small Gaussian random numbers with a standard deviation of 0.02, which is common for GAN models. A maxnorm weight constraint is used with a value of 1.0, instead of the more elaborate ‘*equalized learning rate*‘ weight constraint used in the paper.

The paper defines a number of filters that increases with the depth of the model from 16 to 32, 64, all the way up to 512. This requires projection of the number of feature maps during the growth phase so that the weighted sum can be calculated correctly. To avoid this complication, we fix the number of filters to be the same in all layers.

Each model is compiled and will be fit. In this case, we will use Wasserstein loss (or WGAN loss) and the Adam version of stochastic gradient descent configured as is specified in the paper. The authors of the paper recommend exploring using both WGAN-GP loss and least squares loss and found that the former performed slightly better. Nevertheless, we will use Wasserstein loss as it greatly simplifies the implementation.

First, we must define the loss function as the average predicted value multiplied by the target value. The target value will be 1 for real images and -1 for fake images. This means that weight updates will seek to increase the divide between real and fake images.

# calculate wasserstein loss def wasserstein_loss(y_true, y_pred): return backend.mean(y_true * y_pred)

The functions for defining and creating the growth versions of the discriminator models are listed below.

We make careful use of the functional API and knowledge of the model structure to create the two models for each growth phase. The growth phase also always doubles the expected input shape.

# add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) d = LeakyReLU(alpha=0.2)(d) # define new block d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) d = AveragePooling2D()(d) block_new = d # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model1 = Model(in_image, d) # compile model model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input d = WeightedSum()([block_old, block_new]) # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model2 = Model(in_image, d) # compile model model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) return [model1, model2] # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) model_list = list() # base model input in_image = Input(shape=input_shape) # conv 1x1 d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) d = LeakyReLU(alpha=0.2)(d) # conv 3x3 (output block) d = MinibatchStdev()(d) d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) # conv 4x4 d = Conv2D(128, (4,4), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) # dense output layer d = Flatten()(d) out_class = Dense(1)(d) # define model model = Model(in_image, out_class) # compile model model.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_discriminator_block(old_model) # store model model_list.append(models) return model_list

The *define_discriminator()* function is called by specifying the number of blocks to create.

We will create 6 blocks, which will create 6 pairs of models that expect the input image sizes of 4×4, 8×8, 16×16, 32×32, 64×64, 128×128.

The function returns a list where each element in the list contains two models. The first model is the ‘*normal model*‘ or straight through model, and the second is the version of the model that includes the old 1×1 and new block with the weighted sum, used for the transition or growth phase of training.

### Progressive Growing Generator Model

The generator model takes a random point from the latent space as input and generates a synthetic image.

The generator models are defined in the same way as the discriminator models.

Specifically, a base model for generating 4×4 images is defined and growth versions of the model are created for the large image output size.

The main difference is that during the growth phase, the output of the model is the output of the *WeightedSum* layer. The growth phase version of the model involves first adding a nearest neighbor upsampling layer; this is then connected to the new block with the new output layer and to the old old output layer. The old and new output layers are then combined via a *WeightedSum* output layer.

The base model has an input block defined with a fully connected layer with a sufficient number of activations to create a given number of 4×4 feature maps. This is followed by 4×4 and 3×3 convolution layers and a 1×1 output layer that generates color images. New blocks are added with an upsample layer and two 3×3 convolutional layers.

The *LeakyReLU* activation function is used and the *PixelNormalization* layer is used after each convolutional layer. A linear activation function is used in the output layer, instead of the more common tanh function, yet real images are still scaled to the range [-1,1], which is common for most GAN models.

The paper defines the number of feature maps decreasing with the depth of the model from 512 to 16. As with the discriminator, the difference in the number of feature maps across blocks introduces a challenge for the *WeightedSum*, so for simplicity, we fix all layers to have the same number of filters.

Also like the discriminator model, weights are initialized with Gaussian random numbers with a standard deviation of 0.02 and the maxnorm weight constraint is used with a value of 1.0, instead of the equalized learning rate weight constraint used in the paper.

The functions for defining and growing the generator models are defined below.

# add a generator block def add_generator_block(old_model): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(upsampling) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) # define model model1 = Model(old_model.input, out_image) # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged) return [model1, model2] # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) model_list = list() # base model latent input in_latent = Input(shape=(latent_dim,)) # linear scale up to activation maps g = Dense(128 * in_dim * in_dim, kernel_initializer=init, kernel_constraint=const)(in_latent) g = Reshape((in_dim, in_dim, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) # define model model = Model(in_latent, out_image) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_generator_block(old_model) # store model model_list.append(models) return model_list

Calling the *define_generator()* function requires that the size of the latent space be defined.

Like the discriminator, we will set the *n_blocks* argument to 6 to create six pairs of models.

The function returns a list of models where each item in the list contains the normal or straight-through version of each generator and the growth version for phasing in the new block at the larger output image size.

### Composite Models for Training the Generators

The generator models are not compiled as they are not trained directly.

Instead, the generator models are trained via the discriminator models using Wasserstein loss.

This involves presenting generated images to the discriminator as real images and calculating the loss that is then used to update the generator models.

A given generator model must be paired with a given discriminator model both in terms of the same image size (e.g. 4×4 or 8×8) and in terms of the same phase of training, such as growth phase (introducing the new block) or fine-tuning phase (normal or straight-through).

We can achieve this by creating a new model for each pair of models that stacks the generator on top of the discriminator so that the synthetic image feeds directly into the discriminator model to be deemed real or fake. This composite model can then be used to train the generator via the discriminator and the weights of the discriminator can be marked as not trainable (only in this model) to ensure they are not changed during this misleading process.

As such, we can create pairs of composite models, e.g. six pairs for the six levels of image growth, where each pair is comprised of a composite model for the normal or straight-through model, and the growth version of the model.

The *define_composite()* function implements this and is defined below.

# define composite models for training generators via discriminators def define_composite(discriminators, generators): model_list = list() # create composite models for i in range(len(discriminators)): g_models, d_models = generators[i], discriminators[i] # straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model_list.append([model1, model2]) return model_list

Now that we have seen how to define the generator and discriminator models, let’s look at how we can fit these models on the celebrity faces dataset.

## How to Train Progressive Growing GAN Models

First, we need to define some convenience functions for working with samples of data.

The *load_real_samples()* function below loads our prepared celebrity faces dataset, then converts the pixels to floating point values and scales them to the range [-1,1], common to most GAN implementations.

# load dataset def load_real_samples(filename): # load dataset data = load(filename) # extract numpy array X = data['arr_0'] # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 return X

Next, we need to be able to retrieve a random sample of images used to update the discriminator.

The *generate_real_samples()* function below implements this, returning a random sample of images from the loaded dataset and their corresponding target value of *class=1* to indicate that the images are real.

# select real samples def generate_real_samples(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # select images X = dataset[ix] # generate class labels y = ones((n_samples, 1)) return X, y

Next, we need a sample of latent points used to create synthetic images with the generator model.

The *generate_latent_points()* function below implements this, returning a batch of latent points with the required dimensionality.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network x_input = x_input.reshape(n_samples, latent_dim) return x_input

The latent points can be used as input to the generator to create a batch of synthetic images.

This is required to update the discriminator model. It is also required to update the generator model via the discriminator model with the composite models defined in the previous section.

The *generate_fake_samples()* function below takes a generator model and generates and returns a batch of synthetic images and the corresponding target for the discriminator of *class=-1* to indicate that the images are fake. The *generate_latent_points()* function is called to create the required batch worth of random latent points.

# use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space x_input = generate_latent_points(latent_dim, n_samples) # predict outputs X = generator.predict(x_input) # create class labels y = -ones((n_samples, 1)) return X, y

Training the models occurs in two phases: a fade-in phase that involves the transition from a lower-resolution to a higher-resolution image, and the normal phase that involves the fine-tuning of the models at a given higher resolution image.

During the phase-in, the *alpha* value of the *WeightedSum* layers in the discriminator and generator model at a given level requires linear transition from 0.0 to 1.0 based on the training step. The *update_fadein()* function below implements this; given a list of models (such as the generator, discriminator, and composite model), the function locates the *WeightedSum* layer in each and sets the value for the alpha attribute based on the current training step number.

Importantly, this alpha attribute is not a constant but is instead defined as a changeable variable in the *WeightedSum* class and whose value can be changed using the Keras backend *set_value()* function.

This is a clumsy but effective approach to changing the *alpha* values. Perhaps a cleaner implementation would involve a Keras Callback and is left as an exercise for the reader.

# update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): # calculate current alpha (linear from 0 to 1) alpha = step / float(n_steps - 1) # update the alpha for each model for model in models: for layer in model.layers: if isinstance(layer, WeightedSum): backend.set_value(layer.alpha, alpha)

Next, we can define the procedure for training the models for a given training phase.

A training phase takes one generator, discriminator, and composite model and updates them on the dataset for a given number of training epochs. The training phase may be a fade-in transition to a higher resolution, in which case the *update_fadein()* must be called each iteration, or it may be a normal tuning training phase, in which case there are no *WeightedSum* layers present.

The *train_epochs()* function below implements the training of the discriminator and generator models for a single training phase.

A single training iteration involves first selecting a half batch of real images from the dataset and generating a half batch of fake images from the current state of the generator model. These samples are then used to update the discriminator model.

Next, the generator model is updated via the discriminator with the composite model, indicating that the generated images are, in fact, real, and updating generator weights in an effort to better fool the discriminator.

A summary of model performance is printed at the end of each training iteration, summarizing the loss of the discriminator on the real (d1) and fake (d2) images and the loss of the generator (g).

# train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): # calculate the number of batches per training epoch bat_per_epo = int(dataset.shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # manually enumerate epochs for i in range(n_steps): # update alpha for all WeightedSum layers when fading in new blocks if fadein: update_fadein([g_model, d_model, gan_model], i, n_steps) # prepare real and fake samples X_real, y_real = generate_real_samples(dataset, half_batch) X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) # update discriminator model d_loss1 = d_model.train_on_batch(X_real, y_real) d_loss2 = d_model.train_on_batch(X_fake, y_fake) # update the generator via the discriminator's error z_input = generate_latent_points(latent_dim, n_batch) y_real2 = ones((n_batch, 1)) g_loss = gan_model.train_on_batch(z_input, y_real2) # summarize loss on this batch print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss))

Next, we need to call the *train_epochs()* function for each training phase.

This involves first scaling the training dataset to the required pixel dimensions, such as 4×4 or 8×8. The *scale_dataset()* function below implements this, taking the dataset and returning a scaled version.

These scaled versions of the dataset could be pre-computed and loaded instead of re-scaled on each run. This might be a nice extension if you intend to run the example many times.

# scale images to preferred size def scale_dataset(images, new_shape): images_list = list() for image in images: # resize with nearest neighbor interpolation new_image = resize(image, new_shape, 0) # store images_list.append(new_image) return asarray(images_list)

After each training run, we also need to save a plot of generated images and the current state of the generator model.

This is useful so that at the end of the run we can see the progression of the capability and quality of the model, and load and use a generator model at any point during the training process. A generator model could be used to create ad hoc images, or used as the starting point for continued training.

The *summarize_performance()* function below implements this, given a status string such as “*faded*” or “*tuned*“, a generator model, and the size of the latent space. The function will proceed to create a unique name for the state of the system using the “*status*” string such as “*04×04-faded*“, then create a plot of 25 generated images and save the plot and the generator model to file using the defined name.

# generate samples and save as a plot and save the model def summarize_performance(status, g_model, latent_dim, n_samples=25): # devise name gen_shape = g_model.output_shape name = '%03dx%03d-%s' % (gen_shape[1], gen_shape[2], status) # generate images X, _ = generate_fake_samples(g_model, latent_dim, n_samples) # normalize pixel values to the range [0,1] X = (X - X.min()) / (X.max() - X.min()) # plot real images square = int(sqrt(n_samples)) for i in range(n_samples): pyplot.subplot(square, square, 1 + i) pyplot.axis('off') pyplot.imshow(X[i]) # save plot to file filename1 = 'plot_%s.png' % (name) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%s.h5' % (name) g_model.save(filename2) print('>Saved: %s and %s' % (filename1, filename2))

The *train()* function below pulls this together, taking the lists of defined models as input as well as the list of batch sizes and the number of training epochs for the normal and fade-in phases at each level of growth for the model.

The first generator and discriminator model for 4×4 images are fit by calling *train_epochs()* and saved by calling *summarize_performance()*.

Then the steps of growth are enumerated, involving first scaling the image dataset to the preferred size, training and saving the fade-in model for the new image size, then training and saving the normal or fine-tuned model for the new image size.

# train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): # fit the baseline model g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[0], n_batch[0]) summarize_performance('tuned', g_normal, latent_dim) # process each level of growth for i in range(1, len(g_models)): # retrieve models for this level of growth [g_normal, g_fadein] = g_models[i] [d_normal, d_fadein] = d_models[i] [gan_normal, gan_fadein] = gan_models[i] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train fade-in models for next level of growth train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein[i], n_batch[i], True) summarize_performance('faded', g_normal, latent_dim) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[i], n_batch[i]) summarize_performance('tuned', g_normal, latent_dim)

We can then define the configuration, models, and call *train()* to start the training process.

The paper recommends using a batch size of 16 for images sized between 4×4 and 128×128 before reducing the size. It also recommends training each phase for about 800K images. The paper also recommends a latent space of 512 dimensions.

The models are defined with six levels of growth to meet the 128×128 pixel size of our dataset. We also shrink the latent space accordingly to 100 dimensions.

Instead of keeping the batch size and number of epochs constant, we vary it to speed up the training process, using larger batch sizes for early training phases and smaller batch sizes for later training phases for fine-tuning and stability. Additionally, fewer training epochs are used for the smaller models and more epochs for the larger models.

The choice of batch sizes and training epochs is somewhat arbitrary and you may want to experiment with different values and review their effects.

# number of growth phases, e.g. 6 == [4, 8, 16, 32, 64, 128] n_blocks = 6 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(latent_dim, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples('img_align_celeba_128.npz') print('Loaded', dataset.shape) # train model n_batch = [16, 16, 16, 8, 4, 4] # 10 epochs == 500K images per training phase n_epochs = [5, 8, 8, 10, 10, 10] train(g_models, d_models, gan_models, dataset, latent_dim, n_epochs, n_epochs, n_batch)

We can tie all of this together.

The complete example of training a progressive growing generative adversarial network on the celebrity faces dataset is listed below.

# example of progressive growing gan on celebrity faces dataset from math import sqrt from numpy import load from numpy import asarray from numpy import zeros from numpy import ones from numpy.random import randn from numpy.random import randint from skimage.transform import resize from keras.optimizers import Adam from keras.models import Sequential from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import Layer from keras.layers import Add from keras.constraints import max_norm from keras.initializers import RandomNormal from keras import backend from matplotlib import pyplot # pixel-wise feature vector normalization layer class PixelNormalization(Layer): # initialize the layer def __init__(self, **kwargs): super(PixelNormalization, self).__init__(**kwargs) # perform the operation def call(self, inputs): # calculate square pixel values values = inputs**2.0 # calculate the mean pixel values mean_values = backend.mean(values, axis=-1, keepdims=True) # ensure the mean is not zero mean_values += 1.0e-8 # calculate the sqrt of the mean squared value (L2 norm) l2 = backend.sqrt(mean_values) # normalize values by the l2 norm normalized = inputs / l2 return normalized # define the output shape of the layer def compute_output_shape(self, input_shape): return input_shape # mini-batch standard deviation layer class MinibatchStdev(Layer): # initialize the layer def __init__(self, **kwargs): super(MinibatchStdev, self).__init__(**kwargs) # perform the operation def call(self, inputs): # calculate the mean value for each pixel across channels mean = backend.mean(inputs, axis=0, keepdims=True) # calculate the squared differences between pixel values and mean squ_diffs = backend.square(inputs - mean) # calculate the average of the squared differences (variance) mean_sq_diff = backend.mean(squ_diffs, axis=0, keepdims=True) # add a small value to avoid a blow-up when we calculate stdev mean_sq_diff += 1e-8 # square root of the variance (stdev) stdev = backend.sqrt(mean_sq_diff) # calculate the mean standard deviation across each pixel coord mean_pix = backend.mean(stdev, keepdims=True) # scale this up to be the size of one input feature map for each sample shape = backend.shape(inputs) output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1)) # concatenate with the output combined = backend.concatenate([inputs, output], axis=-1) return combined # define the output shape of the layer def compute_output_shape(self, input_shape): # create a copy of the input shape as a list input_shape = list(input_shape) # add one to the channel dimension (assume channels-last) input_shape[-1] += 1 # convert list to a tuple return tuple(input_shape) # weighted sum output class WeightedSum(Add): # init with default value def __init__(self, alpha=0.0, **kwargs): super(WeightedSum, self).__init__(**kwargs) self.alpha = backend.variable(alpha, name='ws_alpha') # output a weighted sum of inputs def _merge_function(self, inputs): # only supports a weighted sum of two inputs assert (len(inputs) == 2) # ((1-a) * input1) + (a * input2) output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) return output # calculate wasserstein loss def wasserstein_loss(y_true, y_pred): return backend.mean(y_true * y_pred) # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) d = LeakyReLU(alpha=0.2)(d) # define new block d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) d = AveragePooling2D()(d) block_new = d # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model1 = Model(in_image, d) # compile model model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input d = WeightedSum()([block_old, block_new]) # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model2 = Model(in_image, d) # compile model model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) return [model1, model2] # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) model_list = list() # base model input in_image = Input(shape=input_shape) # conv 1x1 d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) d = LeakyReLU(alpha=0.2)(d) # conv 3x3 (output block) d = MinibatchStdev()(d) d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) # conv 4x4 d = Conv2D(128, (4,4), padding='same', kernel_initializer=init, kernel_constraint=const)(d) d = LeakyReLU(alpha=0.2)(d) # dense output layer d = Flatten()(d) out_class = Dense(1)(d) # define model model = Model(in_image, out_class) # compile model model.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_discriminator_block(old_model) # store model model_list.append(models) return model_list # add a generator block def add_generator_block(old_model): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(upsampling) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) # define model model1 = Model(old_model.input, out_image) # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged) return [model1, model2] # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): # weight initialization init = RandomNormal(stddev=0.02) # weight constraint const = max_norm(1.0) model_list = list() # base model latent input in_latent = Input(shape=(latent_dim,)) # linear scale up to activation maps g = Dense(128 * in_dim * in_dim, kernel_initializer=init, kernel_constraint=const)(in_latent) g = Reshape((in_dim, in_dim, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) g = PixelNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) # define model model = Model(in_latent, out_image) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_generator_block(old_model) # store model model_list.append(models) return model_list # define composite models for training generators via discriminators def define_composite(discriminators, generators): model_list = list() # create composite models for i in range(len(discriminators)): g_models, d_models = generators[i], discriminators[i] # straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model_list.append([model1, model2]) return model_list # load dataset def load_real_samples(filename): # load dataset data = load(filename) # extract numpy array X = data['arr_0'] # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 return X # select real samples def generate_real_samples(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # select images X = dataset[ix] # generate class labels y = ones((n_samples, 1)) return X, y # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network x_input = x_input.reshape(n_samples, latent_dim) return x_input # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space x_input = generate_latent_points(latent_dim, n_samples) # predict outputs X = generator.predict(x_input) # create class labels y = -ones((n_samples, 1)) return X, y # update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): # calculate current alpha (linear from 0 to 1) alpha = step / float(n_steps - 1) # update the alpha for each model for model in models: for layer in model.layers: if isinstance(layer, WeightedSum): backend.set_value(layer.alpha, alpha) # train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): # calculate the number of batches per training epoch bat_per_epo = int(dataset.shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # manually enumerate epochs for i in range(n_steps): # update alpha for all WeightedSum layers when fading in new blocks if fadein: update_fadein([g_model, d_model, gan_model], i, n_steps) # prepare real and fake samples X_real, y_real = generate_real_samples(dataset, half_batch) X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) # update discriminator model d_loss1 = d_model.train_on_batch(X_real, y_real) d_loss2 = d_model.train_on_batch(X_fake, y_fake) # update the generator via the discriminator's error z_input = generate_latent_points(latent_dim, n_batch) y_real2 = ones((n_batch, 1)) g_loss = gan_model.train_on_batch(z_input, y_real2) # summarize loss on this batch print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss)) # scale images to preferred size def scale_dataset(images, new_shape): images_list = list() for image in images: # resize with nearest neighbor interpolation new_image = resize(image, new_shape, 0) # store images_list.append(new_image) return asarray(images_list) # generate samples and save as a plot and save the model def summarize_performance(status, g_model, latent_dim, n_samples=25): # devise name gen_shape = g_model.output_shape name = '%03dx%03d-%s' % (gen_shape[1], gen_shape[2], status) # generate images X, _ = generate_fake_samples(g_model, latent_dim, n_samples) # normalize pixel values to the range [0,1] X = (X - X.min()) / (X.max() - X.min()) # plot real images square = int(sqrt(n_samples)) for i in range(n_samples): pyplot.subplot(square, square, 1 + i) pyplot.axis('off') pyplot.imshow(X[i]) # save plot to file filename1 = 'plot_%s.png' % (name) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%s.h5' % (name) g_model.save(filename2) print('>Saved: %s and %s' % (filename1, filename2)) # train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): # fit the baseline model g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[0], n_batch[0]) summarize_performance('tuned', g_normal, latent_dim) # process each level of growth for i in range(1, len(g_models)): # retrieve models for this level of growth [g_normal, g_fadein] = g_models[i] [d_normal, d_fadein] = d_models[i] [gan_normal, gan_fadein] = gan_models[i] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train fade-in models for next level of growth train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein[i], n_batch[i], True) summarize_performance('faded', g_normal, latent_dim) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[i], n_batch[i]) summarize_performance('tuned', g_normal, latent_dim) # number of growth phases, e.g. 6 == [4, 8, 16, 32, 64, 128] n_blocks = 6 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(latent_dim, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples('img_align_celeba_128.npz') print('Loaded', dataset.shape) # train model n_batch = [16, 16, 16, 8, 4, 4] # 10 epochs == 500K images per training phase n_epochs = [5, 8, 8, 10, 10, 10] train(g_models, d_models, gan_models, dataset, latent_dim, n_epochs, n_epochs, n_batch)

**Note**: The example can be run on the CPU, although a GPU is recommended.

Running the example may take a number of hours to complete on modern GPU hardware.

**Note**: Your specific results will vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

If loss values during the training iterations go to zero or very large/small numbers, this may be an example of a failure mode and may require a restart of the training process.

Running the example first reports the successful loading of the prepared dataset and the scaling of the dataset to the first image size, then reports the loss of each model for each step of the training process.

Loaded (50000, 128, 128, 3) Scaled Data (50000, 4, 4, 3) >1, d1=0.993, d2=0.001 g=0.951 >2, d1=0.861, d2=0.118 g=0.982 >3, d1=0.829, d2=0.126 g=0.875 >4, d1=0.774, d2=0.202 g=0.912 >5, d1=0.687, d2=0.035 g=0.911 ...

Plots of generated images and the generator model are saved after each fade-in training phase with filenames like:

*plot_008x008-faded.png**model_008x008-faded.h5*

Plots and models are also saved after each tuning phase, with filenames like:

*plot_008x008-tuned.png**model_008x008-tuned.h5*

Reviewing plots of the generated images at each point helps to see the progression both in the size of supported images and their quality before and after the tuning phase.

For example, below is a sample of images generated after the first 4×4 training phase (*plot_004x004-tuned.png*). At this point, we cannot see much at all.

Reviewing generated images after the fade-in training phase for 8×8 images shows more structure (*plot_008x008-faded.png*). The images are blocky but we can see faces.

Next, we can contrast the generated images for 16×16 after the fade-in training phase (*plot_016x016-faded.png*) and after the tuning training phase (*plot_016x016-tuned.png*).

We can see that the images are clearly faces and we can see that the fine-tuning phase appears to improve the coloring or tone of the faces and perhaps the structure.

Finally, we can review generated faces after tuning for the remaining 32×32, 64×64, and 128×128 resolutions. We can see that each step in resolution, the image quality is improved, allowing the model to fill in more structure and detail.

Although not perfect, the generated images show that the progressive growing GAN is capable of not only generating plausible human faces at different resolutions, but it is able to scale building upon what was learned at lower resolutions to generate plausible faces at higher resolutions.

Now that we have seen how the generator models can be fit, next we can see how we might load and use a saved generator model.

## How to Synthesize Images With a Progressive Growing GAN Model

In this section, we will explore how to load a generator model and use it to generate synthetic images on demand.

The saved Keras models can be loaded via the *load_model()* function.

Because the generator models use custom layers, we must specify how to load the custom layers. This is achieved by providing a dict to the load_model() function that maps each of the custom layer names to the appropriate class.

... # load model cust = {'PixelNormalization': PixelNormalization, 'MinibatchStdev': MinibatchStdev, 'WeightedSum': WeightedSum} model = load_model('model_016x016-tuned.h5', cust)

We can then use the *generate_latent_points()* function from the previous section to generate points in latent space as input for the generator model.

... # size of the latent space latent_dim = 100 # number of images to generate n_images = 25 # generate images latent_points = generate_latent_points(latent_dim, n_images) # generate images X = model.predict(latent_points)

We can then plot the results by first scaling the pixel values to the range [0,1] and plotting each image, in this case in a square grid pattern.

# create a plot of generated images def plot_generated(images, n_images): # plot images square = int(sqrt(n_images)) # normalize pixel values to the range [0,1] images = (images - images.min()) / (images.max() - images.min()) for i in range(n_images): # define subplot pyplot.subplot(square, square, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(images[i]) pyplot.show()

Tying this together, the complete example of loading a saved progressive growing GAN generator model and using it to generate new faces is listed below.

In this case, we demonstrate loading the tuned model for generating 16×16 faces.

# example of loading the generator model and generating images from math import sqrt from numpy import asarray from numpy.random import randn from numpy.random import randint from keras.layers import Layer from keras.layers import Add from keras import backend from keras.models import load_model from matplotlib import pyplot # pixel-wise feature vector normalization layer class PixelNormalization(Layer): # initialize the layer def __init__(self, **kwargs): super(PixelNormalization, self).__init__(**kwargs) # perform the operation def call(self, inputs): # calculate square pixel values values = inputs**2.0 # calculate the mean pixel values mean_values = backend.mean(values, axis=-1, keepdims=True) # ensure the mean is not zero mean_values += 1.0e-8 # calculate the sqrt of the mean squared value (L2 norm) l2 = backend.sqrt(mean_values) # normalize values by the l2 norm normalized = inputs / l2 return normalized # define the output shape of the layer def compute_output_shape(self, input_shape): return input_shape # mini-batch standard deviation layer class MinibatchStdev(Layer): # initialize the layer def __init__(self, **kwargs): super(MinibatchStdev, self).__init__(**kwargs) # perform the operation def call(self, inputs): # calculate the mean value for each pixel across channels mean = backend.mean(inputs, axis=0, keepdims=True) # calculate the squared differences between pixel values and mean squ_diffs = backend.square(inputs - mean) # calculate the average of the squared differences (variance) mean_sq_diff = backend.mean(squ_diffs, axis=0, keepdims=True) # add a small value to avoid a blow-up when we calculate stdev mean_sq_diff += 1e-8 # square root of the variance (stdev) stdev = backend.sqrt(mean_sq_diff) # calculate the mean standard deviation across each pixel coord mean_pix = backend.mean(stdev, keepdims=True) # scale this up to be the size of one input feature map for each sample shape = backend.shape(inputs) output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1)) # concatenate with the output combined = backend.concatenate([inputs, output], axis=-1) return combined # define the output shape of the layer def compute_output_shape(self, input_shape): # create a copy of the input shape as a list input_shape = list(input_shape) # add one to the channel dimension (assume channels-last) input_shape[-1] += 1 # convert list to a tuple return tuple(input_shape) # weighted sum output class WeightedSum(Add): # init with default value def __init__(self, alpha=0.0, **kwargs): super(WeightedSum, self).__init__(**kwargs) self.alpha = backend.variable(alpha, name='ws_alpha') # output a weighted sum of inputs def _merge_function(self, inputs): # only supports a weighted sum of two inputs assert (len(inputs) == 2) # ((1-a) * input1) + (a * input2) output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) return output # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_input = x_input.reshape(n_samples, latent_dim) return z_input # create a plot of generated images def plot_generated(images, n_images): # plot images square = int(sqrt(n_images)) # normalize pixel values to the range [0,1] images = (images - images.min()) / (images.max() - images.min()) for i in range(n_images): # define subplot pyplot.subplot(square, square, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(images[i]) pyplot.show() # load model cust = {'PixelNormalization': PixelNormalization, 'MinibatchStdev': MinibatchStdev, 'WeightedSum': WeightedSum} model = load_model('model_016x016-tuned.h5', cust) # size of the latent space latent_dim = 100 # number of images to generate n_images = 25 # generate images latent_points = generate_latent_points(latent_dim, n_images) # generate images X = model.predict(latent_points) # plot the result plot_generated(X, n_images)

Running the example loads the model and generates 25 faces that are plotted in a 5×5 grid.

We can then change the filename to a different model, such as the tuned model for generating 128×128 faces.

... model = load_model('model_128x128-tuned.h5', cust)

Re-running the example generates a plot of higher-resolution synthetic faces.

## Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

**Change Alpha via Callback**. Update the example to use a Keras callback to update the alpha value for the WeightedSum layers during fade-in training.**Pre-Scale Dataset**. Update the example to pre-scale each dataset and save each version to file to be loaded when needed during training.**Equalized Learning Rate**. Update the example to implement the equalized learning rate weight scaling method described in the paper.**Progression in Number of Filters**. Update the example to decrease the number of filters with depth in the generator and increase the number of filters with depth in the discriminator to match the configuration in the paper.**Larger Image Size**. Update the example to generate large image sizes, such as 512×512.

If you explore any of these extensions, I’d love to know.

Post your findings in the comments below.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Official

- Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation, Official.
- progressive_growing_of_gans Project (official), GitHub.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation. Open Review.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation, YouTube.
- Progressive growing of GANs for improved quality, stability and variation, KeyNote, YouTube.

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?
- Keras Contrib Project
- skimage.transform.resize API

### Articles

- Keras-progressive_growing_of_gans Project, GitHub.
- Hands-On-Generative-Adversarial-Networks-with-Keras Project, GitHub.

## Summary

In this tutorial, you discovered how to implement and train a progressive growing generative adversarial network for generating celebrity faces.

Specifically, you learned:

- How to prepare the celebrity faces dataset for training a progressive growing GAN model.
- How to define and train the progressive growing GAN on the celebrity faces dataset.
- How to load saved generator models and use them for generating ad hoc synthetic celebrity faces.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Train a Progressive Growing GAN in Keras for Synthesizing Faces appeared first on Machine Learning Mastery.

## How to Implement Progressive Growing GAN Models in Keras

The progressive growing generative adversarial network is an approach for training a deep convolutional neural network model for generating synthetic images.

It is an extension of the more traditional GAN architecture that involves incrementally growing the size of the generated image during training, starting with a very small image, such as a 4×4 pixels. This allows the stable training and growth of GAN models capable of generating very large high-quality images, such as images of synthetic celebrity faces with the size of 1024×1024 pixels.

In this tutorial, you will discover how to develop progressive growing generative adversarial network models from scratch with Keras.

After completing this tutorial, you will know:

- How to develop pre-defined discriminator and generator models at each level of output image growth.
- How to define composite models for training the generator models via the discriminator models.
- How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

## Tutorial Overview

This tutorial is divided into five parts; they are:

- What Is the Progressive Growing GAN Architecture?
- How to Implement the Progressive Growing GAN Discriminator Model
- How to Implement the Progressive Growing GAN Generator Model
- How to Implement Composite Models for Updating the Generator
- How to Train Discriminator and Generator Models

## What Is the Progressive Growing GAN Architecture?

GANs are effective at generating crisp synthetic images, although are typically limited in the size of the images that can be generated.

The Progressive Growing GAN is an extension to the GAN that allows the training of generator models capable of outputting large high-quality images, such as photorealistic faces with the size 1024×1024 pixels. It was described in the 2017 paper by Tero Karras, et al. from Nvidia titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation.”

The key innovation of the Progressive Growing GAN is the incremental increase in the size of images output by the generator starting with a 4×4 pixel image and double to 8×8, 16×16, and so on until the desired output resolution.

Our primary contribution is a training methodology for GANs where we start with low-resolution images, and then progressively increase the resolution by adding layers to the networks.

— Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

This is achieved by a training procedure that involves periods of fine-tuning the model with a given output resolution, and periods of slowly phasing in a new model with a larger resolution.

When doubling the resolution of the generator (G) and discriminator (D) we fade in the new layers smoothly

— Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

All layers remain trainable during the training process, including existing layers when new layers are added.

All existing layers in both networks remain trainable throughout the training process.

— Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Progressive Growing GAN involves using a generator and discriminator model with the same general structure and starting with very small images. During training, new blocks of convolutional layers are systematically added to both the generator model and the discriminator models.

The incremental addition of the layers allows the models to effectively learn coarse-level detail and later learn ever finer detail, both on the generator and discriminator side.

This incremental nature allows the training to first discover the large-scale structure of the image distribution and then shift attention to increasingly finer-scale detail, instead of having to learn all scales simultaneously.

— Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The model architecture is complex and cannot be implemented directly.

In this tutorial, we will focus on how the progressive growing GAN can be implemented using the Keras deep learning library.

We will step through how each of the discriminator and generator models can be defined, how the generator can be trained via the discriminator model, and how each model can be updated during the training process.

These implementation details will provide the basis for you developing a progressive growing GAN for your own applications.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Implement the Progressive Growing GAN Discriminator Model

The discriminator model is given images as input and must classify them as either real (from the dataset) or fake (generated).

During the training process, the discriminator must grow to support images with ever-increasing size, starting with 4×4 pixel color images and doubling to 8×8, 16×16, 32×32, and so on.

This is achieved by inserting a new input layer to support the larger input image followed by a new block of layers. The output of this new block is then downsampled. Additionally, the new image is also downsampled directly and passed through the old input processing layer before it is combined with the output of the new block.

During the transition from a lower resolution to a higher resolution, e.g. 16×16 to 32×32, the discriminator model will have two input pathways as follows:

- [32×32 Image] -> [fromRGB Conv] -> [NewBlock] -> [Downsample] ->
- [32×32 Image] -> [Downsample] -> [fromRGB Conv] ->

The output of the new block that is downsampled and the output of the old input processing layer are combined using a weighted average, where the weighting is controlled by a new hyperparameter called *alpha*. The weighted sum is calculated as follows:

- Output = ((1 – alpha) * fromRGB) + (alpha * NewBlock)

The weighted average of the two pathways is then fed into the rest of the existing model.

Initially, the weighting is completely biased towards the old input processing layer (*alpha=0*) and is linearly increased over training iterations so that the new block is given more weight until eventually, the output is entirely the product of the new block (*alpha=1*). At this time, the old pathway can be removed.

This can be summarized with the following figure taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

The *fromRGB* layers are implemented as a 1×1 convolutional layer. A block is comprised of two convolutional layers with 3×3 sized filters and the leaky ReLU activation function with a slope of 0.2, followed by a downsampling layer. Average pooling is used for downsampling, which is unlike most other GAN models that use transpose convolutional layers.

The output of the model involves two convolutional layers with 3×3 and 4×4 sized filters and Leaky ReLU activation, followed by a fully connected layer that outputs the single value prediction. The model uses a linear activation function instead of a sigmoid activation function like other discriminator models and is trained directly either by Wasserstein loss (specifically WGAN-GP) or least squares loss; we will use the latter in this tutorial. Model weights are initialized using He Gaussian (he_normal), which is very similar to the method used in the paper.

The model uses a custom layer called Minibatch standard deviation at the beginning of the output block, and instead of batch normalization, each layer uses local response normalization, referred to as pixel-wise normalization in the paper. We will leave out the minibatch normalization and use batch normalization in this tutorial for brevity.

One approach to implementing the progressive growing GAN would be to manually expand a model on demand during training. Another approach is to pre-define all of the models prior to training and carefully use the Keras functional API to ensure that layers are shared across the models and continue training.

I believe the latter approach might be easier and is the approach we will use in this tutorial.

First, we must define a custom layer that we can use when fading in a new higher-resolution input image and block. This new layer must take two sets of activation maps with the same dimensions (width, height, channels) and add them together using a weighted sum.

We can implement this as a new layer called *WeightedSum* that extends the *Add* merge layer and uses a hyperparameter ‘*alpha*‘ to control the contribution of each input. This new class is defined below. The layer assumes only two inputs: the first for the output of the old or existing layers and the second for the newly added layers. The new hyperparameter is defined as a backend variable, meaning that we can change it any time via changing the value of the variable.

# weighted sum output class WeightedSum(Add): # init with default value def __init__(self, alpha=0.0, **kwargs): super(WeightedSum, self).__init__(**kwargs) self.alpha = backend.variable(alpha, name='ws_alpha') # output a weighted sum of inputs def _merge_function(self, inputs): # only supports a weighted sum of two inputs assert (len(inputs) == 2) # ((1-a) * input1) + (a * input2) output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) return output

The discriminator model is by far more complex than the generator to grow because we have to change the model input, so let’s step through this slowly.

Firstly, we can define a discriminator model that takes a 4×4 color image as input and outputs a prediction of whether the image is real or fake. The model is comprised of a 1×1 input processing layer (fromRGB) and an output block.

... # base model input in_image = Input(shape=(4,4,3)) # conv 1x1 g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 (output block) g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 4x4 g = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # dense output layer g = Flatten()(g) out_class = Dense(1)(g) # define model model = Model(in_image, out_class) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

Next, we need to define a new model that handles the intermediate stage between this model and a new discriminator model that takes 8×8 color images as input.

The existing input processing layer must receive a downsampled version of the new 8×8 image. A new input process layer must be defined that takes the 8×8 input image and passes it through a new block of two convolutional layers and a downsampling layer. The output of the new block after downsampling and the old input processing layer must be added together using a weighted sum via our new *WeightedSum* layer and then must reuse the same output block (two convolutional layers and the output layer).

Given the first defined model and our knowledge about this model (e.g. the number of layers in the input processing layer is 2 for the Conv2D and LeakyReLU), we can construct this new intermediate or fade-in model using layer indexes from the old model.

... old_model = model # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # define new block g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = AveragePooling2D()(g) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input g = WeightedSum()([block_old, g]) # skip the input, 1x1 and activation for the old model for i in range(3, len(old_model.layers)): g = old_model.layers[i](g) # define straight-through model model = Model(in_image, g) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

So far, so good.

We also need a version of the same model with the same layers without the fade-in of the input from the old model’s input processing layers.

This straight-through version is required for training before we fade-in the next doubling of the input image size.

We can update the above example to create two versions of the model. First, the straight-through version as it is simpler, then the version used for the fade-in that reuses the layers from the new block and the output layers of the old model.

The *add_discriminator_block()* function below implements this, returning a list of the two defined models (straight-through and fade-in), and takes the old model as an argument and defines the number of input layers as a default argument (3).

To ensure that the *WeightedSum* layer works correctly, we have fixed all convolutional layers to always have 64 filters, and in turn, output 64 feature maps. If there is a mismatch between the old model’s input processing layer and the new blocks output in terms of the number of feature maps (channels), then the weighted sum will fail.

# add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) d = LeakyReLU(alpha=0.2)(d) # define new block d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = AveragePooling2D()(d) block_new = d # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model1 = Model(in_image, d) # compile model model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input d = WeightedSum()([block_old, block_new]) # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model2 = Model(in_image, d) # compile model model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) return [model1, model2]

It is not an elegant function as we have some repetition, but it is readable and will get the job done.

We can then call this function again and again as we double the size of input images. Importantly, the function expects the straight-through version of the prior model as input.

The example below defines a new function called *define_discriminator()* that defines our base model that expects a 4×4 color image as input, then repeatedly adds blocks to create new versions of the discriminator model each time that expects images with quadruple the area.

# define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): model_list = list() # base model input in_image = Input(shape=input_shape) # conv 1x1 d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) d = LeakyReLU(alpha=0.2)(d) # conv 3x3 (output block) d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # conv 4x4 d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # dense output layer d = Flatten()(d) out_class = Dense(1)(d) # define model model = Model(in_image, out_class) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_discriminator_block(old_model) # store model model_list.append(models) return model_list

This function will return a list of models, where each item in the list is a two-element list that contains first the straight-through version of the model at that resolution, and second the fade-in version of the model for that resolution.

We can tie all of this together and define a new “discriminator model” that will grow from 4×4, through to 8×8, and finally to 16×16. This is achieved by passing he *n_blocks* argument to 3 when calling the *define_discriminator()* function, for the creation of three sets of models.

The complete example is listed below.

# example of defining discriminator models for the progressive growing gan from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend # weighted sum output class WeightedSum(Add): # init with default value def __init__(self, alpha=0.0, **kwargs): super(WeightedSum, self).__init__(**kwargs) self.alpha = backend.variable(alpha, name='ws_alpha') # output a weighted sum of inputs def _merge_function(self, inputs): # only supports a weighted sum of two inputs assert (len(inputs) == 2) # ((1-a) * input1) + (a * input2) output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) return output # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) d = LeakyReLU(alpha=0.2)(d) # define new block d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = AveragePooling2D()(d) block_new = d # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model1 = Model(in_image, d) # compile model model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input d = WeightedSum()([block_old, block_new]) # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model2 = Model(in_image, d) # compile model model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) return [model1, model2] # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): model_list = list() # base model input in_image = Input(shape=input_shape) # conv 1x1 d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) d = LeakyReLU(alpha=0.2)(d) # conv 3x3 (output block) d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # conv 4x4 d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # dense output layer d = Flatten()(d) out_class = Dense(1)(d) # define model model = Model(in_image, out_class) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_discriminator_block(old_model) # store model model_list.append(models) return model_list # define models discriminators = define_discriminator(3) # spot check m = discriminators[2][1] m.summary() plot_model(m, to_file='discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the fade-in version of the third model showing the 16×16 color image inputs and the single value output.

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_3 (InputLayer) (None, 16, 16, 3) 0 __________________________________________________________________________________________________ conv2d_7 (Conv2D) (None, 16, 16, 64) 256 input_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU) (None, 16, 16, 64) 0 conv2d_7[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D) (None, 16, 16, 64) 36928 leaky_re_lu_7[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64) 256 conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_8 (LeakyReLU) (None, 16, 16, 64) 0 batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D) (None, 16, 16, 64) 36928 leaky_re_lu_8[0][0] __________________________________________________________________________________________________ average_pooling2d_4 (AveragePoo (None, 8, 8, 3) 0 input_3[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64) 256 conv2d_9[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 8, 8, 64) 256 average_pooling2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_9 (LeakyReLU) (None, 16, 16, 64) 0 batch_normalization_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 8, 8, 64) 0 conv2d_4[1][0] __________________________________________________________________________________________________ average_pooling2d_3 (AveragePoo (None, 8, 8, 64) 0 leaky_re_lu_9[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum) (None, 8, 8, 64) 0 leaky_re_lu_4[1][0] average_pooling2d_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D) (None, 8, 8, 64) 36928 weighted_sum_2[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64) 256 conv2d_5[2][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU) (None, 8, 8, 64) 0 batch_normalization_3[2][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D) (None, 8, 8, 64) 36928 leaky_re_lu_5[2][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64) 256 conv2d_6[2][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU) (None, 8, 8, 64) 0 batch_normalization_4[2][0] __________________________________________________________________________________________________ average_pooling2d_1 (AveragePoo (None, 4, 4, 64) 0 leaky_re_lu_6[2][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 4, 4, 128) 73856 average_pooling2d_1[2][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128) 512 conv2d_2[4][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 4, 4, 128) 0 batch_normalization_1[4][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D) (None, 4, 4, 128) 262272 leaky_re_lu_2[4][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128) 512 conv2d_3[4][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU) (None, 4, 4, 128) 0 batch_normalization_2[4][0] __________________________________________________________________________________________________ flatten_1 (Flatten) (None, 2048) 0 leaky_re_lu_3[4][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 1) 2049 flatten_1[4][0] ================================================================================================== Total params: 488,449 Trainable params: 487,425 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

**Note**: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to plot_model().

The plot shows the 16×16 input image that is downsampled and passed through the 8×8 input processing layers from the prior model (left). It also shows the addition of the new block (right) and the weighted average that combines both streams of input, before using the existing model layers to continue processing and outputting a prediction.

Now that we have seen how we can define the discriminator models, let’s look at how we can define the generator models.

## How to Implement the Progressive Growing GAN Generator Model

The generator models for the progressive growing GAN are easier to implement in Keras than the discriminator models.

The reason for this is because each fade-in requires a minor change to the output of the model.

Increasing the resolution of the generator involves first upsampling the output of the end of the last block. This is then connected to the new block and a new output layer for an image that is double the height and width dimensions or quadruple the area. During the phase-in, the upsampling is also connected to the output layer from the old model and the output from both output layers is merged using a weighted average.

After the phase-in is complete, the old output layer is removed.

This can be summarized with the following figure, taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

The toRGB layer is a convolutional layer with 3 1×1 filters, sufficient to output a color image.

The model takes a point in the latent space as input, e.g. such as a 100-element or 512-element vector as described in the paper. This is scaled up to provided the basis for 4×4 activation maps, followed by a convolutional layer with 4×4 filters and another with 3×3 filters. Like the discriminator, LeakyReLU activations are used, as is pixel normalization, which we will substitute with batch normalization for brevity.

A block involves an upsample layer followed by two convolutional layers with 3×3 filters. Upsampling is achieved using a nearest neighbor method (e.g. duplicating input rows and columns) via a UpSampling2D layer instead of the more common transpose convolutional layer.

We can define the baseline model that will take a point in latent space as input and output a 4×4 color image as follows:

... # base model latent input in_latent = Input(shape=(100,)) # linear scale up to activation maps g = Dense(128 * 4 * 4, kernel_initializer='he_normal')(in_latent) g = Reshape((4, 4, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(in_latent, out_image)

Next, we need to define a version of the model that uses all of the same input layers, although adds a new block (upsample and 2 convolutional layers) and a new output layer (a 1×1 convolutional layer).

This would be the model after the phase-in to the new output resolution. This can be achieved by using own knowledge about the baseline model and that the end of the last block is the second last layer, e.g. layer at index -2 in the model’s list of layers.

The new model with the addition of a new block and output layer is defined as follows:

... old_model = model # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(old_model.input, out_image)

That is pretty straightforward; we have chopped off the old output layer at the end of the last block and grafted on a new block and output layer.

Now we need a version of this new model to use during the fade-in.

This involves connecting the old output layer to the new upsampling layer at the start of the new block and using an instance of our WeightedSum layer defined in the previous section to combine the output of the old and new output layers.

... # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged)

We can combine the definition of these two operations into a function named *add_generator_block()*, defined below, that will expand a given model and return both the new generator model with the added block (*model1*) and a version of the model with the fading in of the new block with the old output layer (*model2*).

# add a generator block def add_generator_block(old_model): # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model1 = Model(old_model.input, out_image) # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged) return [model1, model2]

We can then call this function with our baseline model to create models with one added block and continue to call it with subsequent models to keep adding blocks.

The *define_generator()* function below implements this, taking the size of the latent space and number of blocks to add (models to create).

The baseline model is defined as outputting a color image with the shape 4×4, controlled by the default argument *in_dim*.

# define generator models def define_generator(latent_dim, n_blocks, in_dim=4): model_list = list() # base model latent input in_latent = Input(shape=(latent_dim,)) # linear scale up to activation maps g = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) g = Reshape((in_dim, in_dim, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(in_latent, out_image) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_generator_block(old_model) # store model model_list.append(models) return model_list

We can tie all of this together and define a baseline generator and the addition of two blocks, so three models in total, where a straight-through and fade-in version of each model is defined.

The complete example is listed below.

# example of defining generator models for the progressive growing gan from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend # weighted sum output class WeightedSum(Add): # init with default value def __init__(self, alpha=0.0, **kwargs): super(WeightedSum, self).__init__(**kwargs) self.alpha = backend.variable(alpha, name='ws_alpha') # output a weighted sum of inputs def _merge_function(self, inputs): # only supports a weighted sum of two inputs assert (len(inputs) == 2) # ((1-a) * input1) + (a * input2) output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) return output # add a generator block def add_generator_block(old_model): # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model1 = Model(old_model.input, out_image) # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged) return [model1, model2] # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): model_list = list() # base model latent input in_latent = Input(shape=(latent_dim,)) # linear scale up to activation maps g = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) g = Reshape((in_dim, in_dim, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(in_latent, out_image) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_generator_block(old_model) # store model model_list.append(models) return model_list # define models generators = define_generator(100, 3) # spot check m = generators[2][1] m.summary() plot_model(m, to_file='generator_plot.png', show_shapes=True, show_layer_names=True)

The example chooses the fade-in model for the last model to summarize.

Running the example first summarizes a linear list of the layers in the model. We can see that the last model takes a point from the latent space and outputs a 16×16 image.

This matches as our expectations as the baseline model outputs a 4×4 image, adding one block increases this to 8×8, and adding one more block increases this to 16×16.

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 100) 0 __________________________________________________________________________________________________ dense_1 (Dense) (None, 2048) 206848 input_1[0][0] __________________________________________________________________________________________________ reshape_1 (Reshape) (None, 4, 4, 128) 0 dense_1[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D) (None, 4, 4, 128) 147584 reshape_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128) 512 conv2d_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 4, 4, 128) 0 batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 4, 4, 128) 147584 leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128) 512 conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 4, 4, 128) 0 batch_normalization_2[0][0] __________________________________________________________________________________________________ up_sampling2d_1 (UpSampling2D) (None, 8, 8, 128) 0 leaky_re_lu_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 8, 8, 64) 73792 up_sampling2d_1[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64) 256 conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU) (None, 8, 8, 64) 0 batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D) (None, 8, 8, 64) 36928 leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64) 256 conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 8, 8, 64) 0 batch_normalization_4[0][0] __________________________________________________________________________________________________ up_sampling2d_2 (UpSampling2D) (None, 16, 16, 64) 0 leaky_re_lu_4[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D) (None, 16, 16, 64) 36928 up_sampling2d_2[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64) 256 conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU) (None, 16, 16, 64) 0 batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D) (None, 16, 16, 64) 36928 leaky_re_lu_5[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64) 256 conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU) (None, 16, 16, 64) 0 batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D) multiple 195 up_sampling2d_2[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D) (None, 16, 16, 3) 195 leaky_re_lu_6[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum) (None, 16, 16, 3) 0 conv2d_6[1][0] conv2d_9[0][0] ================================================================================================== Total params: 689,030 Trainable params: 688,006 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

**Note**: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to *plot_model()*.

We can see that the output from the last block passes through an UpSampling2D layer before feeding the added block and a new output layer as well as the old output layer before being merged via a weighted sum into the final output layer.

Now that we have seen how to define the generator models, we can review how the generator models may be updated via the discriminator models.

## How to Implement Composite Models for Updating the Generator

The discriminator models are trained directly with real and fake images as input and a target value of 0 for fake and 1 for real.

The generator models are not trained directly; instead, they are trained indirectly via the discriminator models, just like a normal GAN model.

We can create a composite model for each level of growth of the model, e.g. pair 4×4 generators and 4×4 discriminators. We can also pair the straight-through models together, and the fade-in models together.

For example, we can retrieve the generator and discriminator models for a given level of growth.

... g_models, d_models = generators[0], discriminators[0]

Then we can use them to create a composite model for training the straight-through generator, where the output of the generator is fed directly to the discriminator in order to classify.

# straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

And do the same for the composite model for the fade-in generator.

# fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

The function below, named *define_composite()*, automates this; given a list of defined discriminator and generator models, it will create an appropriate composite model for training each generator model.

# define composite models for training generators via discriminators def define_composite(discriminators, generators): model_list = list() # create composite models for i in range(len(discriminators)): g_models, d_models = generators[i], discriminators[i] # straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model_list.append([model1, model2]) return model_list

Tying this together with the definition of the discriminator and generator models above, the complete example of defining all models at each pre-defined level of growth is listed below.

# example of defining composite models for the progressive growing gan from keras.optimizers import Adam from keras.models import Sequential from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend # weighted sum output class WeightedSum(Add): # init with default value def __init__(self, alpha=0.0, **kwargs): super(WeightedSum, self).__init__(**kwargs) self.alpha = backend.variable(alpha, name='ws_alpha') # output a weighted sum of inputs def _merge_function(self, inputs): # only supports a weighted sum of two inputs assert (len(inputs) == 2) # ((1-a) * input1) + (a * input2) output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) return output # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) d = LeakyReLU(alpha=0.2)(d) # define new block d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) d = AveragePooling2D()(d) block_new = d # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model1 = Model(in_image, d) # compile model model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input d = WeightedSum()([block_old, block_new]) # skip the input, 1x1 and activation for the old model for i in range(n_input_layers, len(old_model.layers)): d = old_model.layers[i](d) # define straight-through model model2 = Model(in_image, d) # compile model model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) return [model1, model2] # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): model_list = list() # base model input in_image = Input(shape=input_shape) # conv 1x1 d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) d = LeakyReLU(alpha=0.2)(d) # conv 3x3 (output block) d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # conv 4x4 d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # dense output layer d = Flatten()(d) out_class = Dense(1)(d) # define model model = Model(in_image, out_class) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_discriminator_block(old_model) # store model model_list.append(models) return model_list # add a generator block def add_generator_block(old_model): # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model1 = Model(old_model.input, out_image) # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged) return [model1, model2] # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): model_list = list() # base model latent input in_latent = Input(shape=(latent_dim,)) # linear scale up to activation maps g = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) g = Reshape((in_dim, in_dim, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(in_latent, out_image) # store model model_list.append([model, model]) # create submodels for i in range(1, n_blocks): # get prior model without the fade-on old_model = model_list[i - 1][0] # create new model for next resolution models = add_generator_block(old_model) # store model model_list.append(models) return model_list # define composite models for training generators via discriminators def define_composite(discriminators, generators): model_list = list() # create composite models for i in range(len(discriminators)): g_models, d_models = generators[i], discriminators[i] # straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) # store model_list.append([model1, model2]) return model_list # define models discriminators = define_discriminator(3) # define models generators = define_generator(100, 3) # define composite models composite = define_composite(discriminators, generators)

Now that we know how to define all of the models, we can review how the models might be updated during training.

## How to Train Discriminator and Generator Models

Pre-defining the generator, discriminator, and composite models was the hard part; training the models is straight forward and much like training any other GAN.

Importantly, in each training iteration the alpha variable in each *WeightedSum* layer must be set to a new value. This must be set for the layer in both the generator and discriminator models and allows for the smooth linear transition from the old model layers to the new model layers, e.g. alpha values set from 0 to 1 over a fixed number of training iterations.

The *update_fadein()* function below implements this and will loop through a list of models and set the alpha value on each based on the current step in a given number of training steps. You may be able to implement this more elegantly using a callback.

# update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): # calculate current alpha (linear from 0 to 1) alpha = step / float(n_steps - 1) # update the alpha for each model for model in models: for layer in model.layers: if isinstance(layer, WeightedSum): backend.set_value(layer.alpha, alpha)

We can define a generic function for training a given generator, discriminator, and composite model for a given number of training epochs.

The *train_epochs()* function below implements this where first the discriminator model is updated on real and fake images, then the generator model is updated, and the process is repeated for the required number of training iterations based on the dataset size and the number of epochs.

This function calls helper functions for retrieving a batch of real images via *generate_real_samples()*, generating a batch of fake samples with the generator *generate_fake_samples()*, and generating a sample of points in latent space *generate_latent_points()*. You can define these functions yourself quite trivially.

# train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): # calculate the number of batches per training epoch bat_per_epo = int(dataset.shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # manually enumerate epochs for i in range(n_steps): # update alpha for all WeightedSum layers when fading in new blocks if fadein: update_fadein([g_model, d_model, gan_model], i, n_steps) # prepare real and fake samples X_real, y_real = generate_real_samples(dataset, half_batch) X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) # update discriminator model d_loss1 = d_model.train_on_batch(X_real, y_real) d_loss2 = d_model.train_on_batch(X_fake, y_fake) # update the generator via the discriminator's error z_input = generate_latent_points(latent_dim, n_batch) y_real2 = ones((n_batch, 1)) g_loss = gan_model.train_on_batch(z_input, y_real2) # summarize loss on this batch print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss))

The images must be scaled to the size of each model. If the images are in-memory, we can define a simple scale_dataset() function to scale the loaded images.

In this case, we are using the skimage.transform.resize function from the scikit-image library to resize the NumPy array of pixels to the required size and use nearest neighbor interpolation.

# scale images to preferred size def scale_dataset(images, new_shape): images_list = list() for image in images: # resize with nearest neighbor interpolation new_image = resize(image, new_shape, 0) # store images_list.append(new_image) return asarray(images_list)

First, the baseline model must be fit for a given number of training epochs, e.g. the model that outputs 4×4 sized images.

This will require that the loaded images be scaled to the required size defined by the shape of the generator models output layer.

# fit the baseline model g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can then process each level of growth, e.g. the first being 8×8.

This involves first retrieving the models, scaling the data to the appropriate size, then fitting the fade-in model followed by training the straight-through version of the model for fine tuning.

We can repeat this for each level of growth in a loop.

# process each level of growth for i in range(1, len(g_models)): # retrieve models for this level of growth [g_normal, g_fadein] = g_models[i] [d_normal, d_fadein] = d_models[i] [gan_normal, gan_fadein] = gan_models[i] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train fade-in models for next level of growth train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can tie this together and define a function called *train()* to train the progressive growing GAN function.

# train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): # fit the baseline model g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch) # process each level of growth for i in range(1, len(g_models)): # retrieve models for this level of growth [g_normal, g_fadein] = g_models[i] [d_normal, d_fadein] = d_models[i] [gan_normal, gan_fadein] = gan_models[i] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train fade-in models for next level of growth train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch, True) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

The number of epochs for the normal phase is defined by the *e_norm* argument and the number of epochs during the fade-in phase is defined by the *e_fadein* argument.

The number of epochs must be specified based on the size of the image dataset and the same number of epochs can be used for each phase, as was used in the paper.

We start with 4×4 resolution and train the networks until we have shown the discriminator 800k real images in total. We then alternate between two phases: fade in the first 3-layer block during the next 800k images, stabilize the networks for 800k images, fade in the next 3-layer block during 800k images, etc.

— Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

We can then define our models as we did in the previous section, then call the training function.

# number of growth phase, e.g. 3 = 16x16 images n_blocks = 3 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(100, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples() # train model train(g_models, d_models, gan_models, dataset, latent_dim, 100, 100, 16)

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Official

- Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation, Official.
- progressive_growing_of_gans Project (official), GitHub.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation. Open Review.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation, YouTube.
- Progressive growing of GANs for improved quality, stability and variation, KeyNote, YouTube.

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?
- Keras Contrib Project
- skimage.transform.resize API

### Articles

- Keras-progressive_growing_of_gans Project, GitHub.
- Hands-On-Generative-Adversarial-Networks-with-Keras Project, GitHub.

## Summary

In this tutorial, you discovered how to develop progressive growing generative adversarial network models from scratch with Keras.

Specifically, you learned:

- How to develop pre-defined discriminator and generator models at each level of output image growth.
- How to define composite models for training the generator models via the discriminator models.
- How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Progressive Growing GAN Models in Keras appeared first on Machine Learning Mastery.

## How to Develop a CycleGAN for Image-to-Image Translation with Keras

The Cycle Generative Adversarial Network, or CycleGAN, is an approach to training a deep convolutional neural network for image-to-image translation tasks.

Unlike other GAN models for image translation, the CycleGAN does not require a dataset of paired images. For example, if we are interested in translating photographs of oranges to apples, we do not require a training dataset of oranges that have been manually converted to apples. This allows the development of a translation model on problems where training datasets may not exist, such as translating paintings to photographs.

In this tutorial, you will discover how to develop a CycleGAN model to translate photos of horses to zebras, and back again.

After completing this tutorial, you will know:

- How to load and prepare the horses to zebras image translation dataset for modeling.
- How to train a pair of CycleGAN generator models for translating horses to zebras and zebras to horses.
- How to load saved CycleGAN models and use them to translate photographs.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

## Tutorial Overview

This tutorial is divided into four parts; they are:

- What Is the CycleGAN?
- How to Prepare the Horses to Zebras Dataset
- How to Develop a CycleGAN to Translate Horses to Zebras
- How to Perform Image Translation with CycleGAN Generators

## What Is the CycleGAN?

The CycleGAN model was described by Jun-Yan Zhu, et al. in their 2017 paper titled “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.”

The benefit of the CycleGAN model is that it can be trained without paired examples. That is, it does not require examples of photographs before and after the translation in order to train the model, e.g. photos of the same city landscape during the day and at night. Instead, the model is able to use a collection of photographs from each domain and extract and harness the underlying style of images in the collection in order to perform the translation.

The model architecture is comprised of two generator models: one generator (Generator-A) for generating images for the first domain (Domain-A) and the second generator (Generator-B) for generating images for the second domain (Domain-B).

- Generator-A -> Domain-A
- Generator-B -> Domain-B

The generator models perform image translation, meaning that the image generation process is conditional on an input image, specifically an image from the other domain. Generator-A takes an image from Domain-B as input and Generator-B takes an image from Domain-A as input.

- Domain-B -> Generator-A -> Domain-A
- Domain-A -> Generator-B -> Domain-B

Each generator has a corresponding discriminator model. The first discriminator model (Discriminator-A) takes real images from Domain-A and generated images from Generator-A and predicts whether they are real or fake. The second discriminator model (Discriminator-B) takes real images from Domain-B and generated images from Generator-B and predicts whether they are real or fake.

- Domain-A -> Discriminator-A -> [Real/Fake]
- Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake]
- Domain-B -> Discriminator-B -> [Real/Fake]
- Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]

The discriminator and generator models are trained in an adversarial zero-sum process, like normal GAN models. The generators learn to better fool the discriminators and the discriminator learn to better detect fake images. Together, the models find an equilibrium during the training process.

Additionally, the generator models are regularized to not just create new images in the target domain, but instead translate more reconstructed versions of the input images from the source domain. This is achieved by using generated images as input to the corresponding generator model and comparing the output image to the original images. Passing an image through both generators is called a cycle. Together, each pair of generator models are trained to better reproduce the original source image, referred to as cycle consistency.

- Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B
- Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

There is one further element to the architecture, referred to as the identity mapping. This is where a generator is provided with images as input from the target domain and is expected to generate the same image without change. This addition to the architecture is optional, although results in a better matching of the color profile of the input image.

- Domain-A -> Generator-A -> Domain-A
- Domain-B -> Generator-B -> Domain-B

Now that we are familiar with the model architecture, we can take a closer look at each model in turn and how they can be implemented.

The paper provides a good description of the models and training process, although the official Torch implementation was used as the definitive description for each model and training process and provides the basis for the the model implementations described below.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Prepare the Horses to Zebras Dataset

One of the impressive examples of the CycleGAN in the paper was to transform photographs of horses to zebras, and the reverse, zebras to horses.

The authors of the paper referred to this as the problem of “*object transfiguration*” and it was also demonstrated on photographs of apples and oranges.

In this tutorial, we will develop a CycleGAN from scratch for image-to-image translation (or object transfiguration) from horses to zebras and the reverse.

We will refer to this dataset as “*horses2zebra*“. The zip file for this dataset about 111 megabytes and can be downloaded from the CycleGAN webpage:

Download the dataset into your current working directory.

You will see the following directory structure:

horse2zebra ├── testA ├── testB ├── trainA └── trainB

The “*A*” category refers to horse and “*B*” category refers to zebra, and the dataset is comprised of train and test elements. We will load all photographs and use them as a training dataset.

The photographs are square with the shape 256×256 and have filenames like “*n02381460_2.jpg*“.

The example below will load all photographs from the train and test folders and create an array of images for category A and another for category B.

Both arrays are then saved to a new file in compressed NumPy array format.

# example of preparing the horses and zebra dataset from os import listdir from numpy import asarray from numpy import vstack from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img from numpy import savez_compressed # load all images in a directory into memory def load_images(path, size=(256,256)): data_list = list() # enumerate filenames in directory, assume all are images for filename in listdir(path): # load and resize the image pixels = load_img(path + filename, target_size=size) # convert to numpy array pixels = img_to_array(pixels) # store data_list.append(pixels) return asarray(data_list) # dataset path path = 'horse2zebra/' # load dataset A dataA1 = load_images(path + 'trainA/') dataAB = load_images(path + 'testA/') dataA = vstack((dataA1, dataAB)) print('Loaded dataA: ', dataA.shape) # load dataset B dataB1 = load_images(path + 'trainB/') dataB2 = load_images(path + 'testB/') dataB = vstack((dataB1, dataB2)) print('Loaded dataB: ', dataB.shape) # save as compressed numpy array filename = 'horse2zebra_256.npz' savez_compressed(filename, dataA, dataB) print('Saved dataset: ', filename)

Running the example first loads all images into memory, showing that there are 1,187 photos in category A (horses) and 1,474 in category B (zebras).

The arrays are then saved in compressed NumPy format with the filename “*horse2zebra_256.npz*“. Note: this data file is about 570 megabytes, larger than the raw images as we are storing pixel values as 32-bit floating point values.

Loaded dataA: (1187, 256, 256, 3) Loaded dataB: (1474, 256, 256, 3) Saved dataset: horse2zebra_256.npz

We can then load the dataset and plot some of the photos to confirm that we are handling the image data correctly.

The complete example is listed below.

# load and plot the prepared dataset from numpy import load from matplotlib import pyplot # load the dataset data = load('horse2zebra_256.npz') dataA, dataB = data['arr_0'], data['arr_1'] print('Loaded: ', dataA.shape, dataB.shape) # plot source images n_samples = 3 for i in range(n_samples): pyplot.subplot(2, n_samples, 1 + i) pyplot.axis('off') pyplot.imshow(dataA[i].astype('uint8')) # plot target image for i in range(n_samples): pyplot.subplot(2, n_samples, 1 + n_samples + i) pyplot.axis('off') pyplot.imshow(dataB[i].astype('uint8')) pyplot.show()

Running the example first loads the dataset, confirming the number of examples and shape of the color images match our expectations.

Loaded: (1187, 256, 256, 3) (1474, 256, 256, 3)

A plot is created showing a row of three images from the horse photo dataset (dataA) and a row of three images from the zebra dataset (dataB).

Now that we have prepared the dataset for modeling, we can develop the CycleGAN generator models that can translate photos from one category to the other, and the reverse.

## How to Develop a CycleGAN to Translate Horse to Zebra

In this section, we will develop the CycleGAN model for translating photos of horses to zebras and photos of zebras to horses

The same model architecture and configuration described in the paper was used across a range of image-to-image translation tasks. This architecture is both described in the body paper, with additional detail in the appendix of the paper, and a fully working implementation provided as open source implemented for the Torch deep learning framework.

The implementation in this section will use the Keras deep learning framework based directly on the model described in the paper and implemented in the author’s codebase, designed to take and generate color images with the size 256×256 pixels.

The architecture is comprised of four models, two discriminator models, and two generator models.

The discriminator is a deep convolutional neural network that performs image classification. It takes a source image as input and predicts the likelihood of whether the target image is a real or fake image. Two discriminator models are used, one for Domain-A (horses) and one for Domain-B (zebras).

The discriminator design is based on the effective receptive field of the model, which defines the relationship between one output of the model to the number of pixels in the input image. This is called a PatchGAN model and is carefully designed so that each output prediction of the model maps to a 70×70 square or patch of the input image. The benefit of this approach is that the same model can be applied to input images of different sizes, e.g. larger or smaller than 256×256 pixels.

The output of the model depends on the size of the input image but may be one value or a square activation map of values. Each value is a probability for the likelihood that a patch in the input image is real. These values can be averaged to give an overall likelihood or classification score if needed.

A pattern of Convolutional-BatchNorm-LeakyReLU layers is used in the model, which is common to deep convolutional discriminator models. Unlike other models, the CycleGAN discriminator uses *InstanceNormalization* instead of *BatchNormalization*. It is a very simple type of normalization and involves standardizing (e.g. scaling to a standard Gaussian) the values on each output feature map, rather than across features in a batch.

An implementation of instance normalization is provided in the keras-contrib project that provides early access to community supplied Keras features.

The keras-contrib library can be installed via pip as follows:

sudo pip install git+https://www.github.com/keras-team/keras-contrib.git

Or, if you are using an Anaconda virtual environment, such as on EC2:

git clone https://www.github.com/keras-team/keras-contrib.git cd keras-contrib sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

The new *InstanceNormalization* layer can then be used as follows:

... from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization # define layer layer = InstanceNormalization(axis=-1) ...

The “*axis*” argument is set to -1 to ensure that features are normalized per feature map.

The *define_discriminator()* function below implements the 70×70 PatchGAN discriminator model as per the design of the model in the paper. The model takes a 256×256 sized image as input and outputs a patch of predictions. The model is optimized using least squares loss (L2) implemented as mean squared error, and a weighting it used so that updates to the model have half (0.5) the usual effect. The authors of CycleGAN paper recommend this weighting of model updates to slow down changes to the discriminator, relative to the generator model during training.

# define the discriminator model def define_discriminator(image_shape): # weight initialization init = RandomNormal(stddev=0.02) # source image input in_image = Input(shape=image_shape) # C64 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) d = LeakyReLU(alpha=0.2)(d) # C128 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C256 d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C512 d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # second last output layer d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # patch output patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) # define model model = Model(in_image, patch_out) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) return model

The generator model is more complex than the discriminator model.

The generator is an encoder-decoder model architecture. The model takes a source image (e.g. horse photo) and generates a target image (e.g. zebra photo). It does this by first downsampling or encoding the input image down to a bottleneck layer, then interpreting the encoding with a number of ResNet layers that use skip connections, followed by a series of layers that upsample or decode the representation to the size of the output image.

First, we need a function to define the ResNet blocks. These are blocks comprised of two 3×3 CNN layers where the input to the block is concatenated to the output of the block, channel-wise.

This is implemented in the *resnet_block()* function that creates two *Convolution-InstanceNorm* blocks with 3×3 filters and 1×1 stride and without a ReLU activation after the second block, matching the official Torch implementation in the build_conv_block() function. Same padding is used instead of reflection padded recommended in the paper for simplicity.

# generator a resnet block def resnet_block(n_filters, input_layer): # weight initialization init = RandomNormal(stddev=0.02) # first layer convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # second convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) # concatenate merge channel-wise with input layer g = Concatenate()([g, input_layer]) return g

Next, we can define a function that will create the 9-resnet block version for 256×256 input images. This can easily be changed to the 6-resnet block version by setting *image_shape* to (128x128x3) and *n_resnet* function argument to 6.

Importantly, the model outputs pixel values with the shape as the input and pixel values are in the range [-1, 1], typical for GAN generator models.

# define the standalone generator model def define_generator(image_shape, n_resnet=9): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=image_shape) # c7s1-64 g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d128 g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d256 g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # R256 for _ in range(n_resnet): g = resnet_block(256, g) # u128 g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # u64 g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # c7s1-3 g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) out_image = Activation('tanh')(g) # define model model = Model(in_image, out_image) return model

The discriminator models are trained directly on real and generated images, whereas the generator models are not.

Instead, the generator models are trained via their related discriminator models. Specifically, they are updated to minimize the loss predicted by the discriminator for generated images marked as “*real*“, called adversarial loss. As such, they are encouraged to generate images that better fit into the target domain.

The generator models are also updated based on how effective they are at the regeneration of a source image when used with the other generator model, called cycle loss. Finally, a generator model is expected to output an image without translation when provided an example from the target domain, called identity loss.

Altogether, each generator model is optimized via the combination of four outputs with four loss functions:

- Adversarial loss (L2 or mean squared error).
- Identity loss (L1 or mean absolute error).
- Forward cycle loss (L1 or mean absolute error).
- Backward cycle loss (L1 or mean absolute error).

This can be achieved by defining a composite model used to train each generator model that is responsible for only updating the weights of that generator model, although it is required to share the weights with the related discriminator model and the other generator model.

This is implemented in the *define_composite_model()* function below that takes a defined generator model (*g_model_1*) as well as the defined discriminator model for the generator models output (*d_model*) and the other generator model (*g_model_2*). The weights of the other models are marked as not trainable as we are only interested in updating the first generator model, i.e. the focus of this composite model.

The discriminator is connected to the output of the generator in order to classify generated images as real or fake. A second input for the composite model is defined as an image from the target domain (instead of the source domain), which the generator is expected to output without translation for the identity mapping. Next, forward cycle loss involves connecting the output of the generator to the other generator, which will reconstruct the source image. Finally, the backward cycle loss involves the image from the target domain used for the identity mapping that is also passed through the other generator whose output is connected to our main generator as input and outputs a reconstructed version of that image from the target domain.

To summarize, a composite model has two inputs for the real photos from Domain-A and Domain-B, and four outputs for the discriminator output, identity generated image, forward cycle generated image, and backward cycle generated image.

Only the weights of the first or main generator model are updated for the composite model and this is done via the weighted sum of all loss functions. The cycle loss is given more weight (10-times) than the adversarial loss as described in the paper, and the identity loss is always used with a weighting half that of the cycle loss (5-times), matching the official implementation source code.

# define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): # ensure the model we're updating is trainable g_model_1.trainable = True # mark discriminator as not trainable d_model.trainable = False # mark other generator model as not trainable g_model_2.trainable = False # discriminator element input_gen = Input(shape=image_shape) gen1_out = g_model_1(input_gen) output_d = d_model(gen1_out) # identity element input_id = Input(shape=image_shape) output_id = g_model_1(input_id) # forward cycle output_f = g_model_2(gen1_out) # backward cycle gen2_out = g_model_2(input_id) output_b = g_model_1(gen2_out) # define model graph model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) # define optimization algorithm configuration opt = Adam(lr=0.0002, beta_1=0.5) # compile model with weighting of least squares loss and L1 loss model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) return model

We need to create a composite model for each generator model, e.g. the Generator-A (BtoA) for zebra to horse translation, and the Generator-B (AtoB) for horse to zebra translation.

All of this forward and backward across two domains gets confusing. Below is a complete listing of all of the inputs and outputs for each of the composite models. Identity and cycle loss are calculated as the L1 distance between the input and output image for each sequence of translations. Adversarial loss is calculated as the L2 distance between the model output and the target values of 1.0 for real and 0.0 for fake.

**Generator-A Composite Model (BtoA or Zebra to Horse)**

The inputs, transformations, and outputs of the model are as follows:

**Adversarial Loss: Domain-B**-> Generator-A -> Domain-A -> Discriminator-A -> [real/fake]**Identity Loss**: Domain-A -> Generator-A -> Domain-A**Forward Cycle Loss**: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B**Backward Cycle Loss**: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

We can summarize the inputs and outputs as:

**Inputs**: Domain-B, Domain-A**Outputs**: Real, Domain-A, Domain-B, Domain-A

**Generator-B Composite Model (AtoB or Horse to Zebra)**

The inputs, transformations, and outputs of the model are as follows:

**Adversarial Loss**: Domain-A -> Generator-B -> Domain-B -> Discriminator-B -> [real/fake]**Identity Loss**: Domain-B -> Generator-B -> Domain-B**Forward Cycle Loss**: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A**Backward Cycle Loss**: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B

We can summarize the inputs and outputs as:

**Inputs**: Domain-A, Domain-B**Outputs**: Real, Domain-B, Domain-A, Domain-B

Defining the models is the hard part of the CycleGAN; the rest is standard GAN training and relatively straightforward.

Next, we can load our paired images dataset in compressed NumPy array format. This will return a list of two NumPy arrays: the first for source images and the second for corresponding target images.

# load and prepare training images def load_real_samples(filename): # load the dataset data = load(filename) # unpack arrays X1, X2 = data['arr_0'], data['arr_1'] # scale from [0,255] to [-1,1] X1 = (X1 - 127.5) / 127.5 X2 = (X2 - 127.5) / 127.5 return [X1, X2]

Each training iteration we will require a sample of real images from each domain as input to the discriminator and composite generator models. This can be achieved by selecting a random batch of samples.

The *generate_real_samples()* function below implements this, taking a NumPy array for a domain as input and returning the requested number of randomly selected images, as well as the target for the PatchGAN discriminator model indicating the images are real (*target=1.0*). As such, the shape of the PatchgAN output is also provided, which in the case of 256×256 images will be 16, or a 16x16x1 activation map, defined by the patch_shape function argument.

# select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # retrieve selected images X = dataset[ix] # generate 'real' class labels (1) y = ones((n_samples, patch_shape, patch_shape, 1)) return X, y

Similarly, a sample of generated images is required to update each discriminator model in each training iteration.

The *generate_fake_samples()* function below generates this sample given a generator model and the sample of real images from the source domain. Again, target values for each generated image are provided with the correct shape of the PatchGAN, indicating that they are fake or generated (*target=0.0*).

# generate a batch of images, returns images and targets def generate_fake_samples(g_model, dataset, patch_shape): # generate fake instance X = g_model.predict(dataset) # create 'fake' class labels (0) y = zeros((len(X), patch_shape, patch_shape, 1)) return X, y

Typically, GAN models do not converge; instead, an equilibrium is found between the generator and discriminator models. As such, we cannot easily judge whether training should stop. Therefore, we can save the model and use it to generate sample image-to-image translations periodically during training, such as every one or five training epochs.

We can then review the generated images at the end of training and use the image quality to choose a final model.

The *save_models()* function below will save each generator model to the current directory in H5 format, including the training iteration number in the filename. This will require that the h5py library is installed.

# save the generator models to file def save_models(step, g_model_AtoB, g_model_BtoA): # save the first generator model filename1 = 'g_model_AtoB_%06d.h5' % (step+1) g_model_AtoB.save(filename1) # save the second generator model filename2 = 'g_model_BtoA_%06d.h5' % (step+1) g_model_BtoA.save(filename2) print('>Saved: %s and %s' % (filename1, filename2))

The *summarize_performance()* function below uses a given generator model to generate translated versions of a few randomly selected source photographs and saves the plot to file.

The source images are plotted on the first row and the generated images are plotted on the second row. Again, the plot filename includes the training iteration number.

# generate samples and save as a plot and save the model def summarize_performance(step, g_model, trainX, name, n_samples=5): # select a sample of input images X_in, _ = generate_real_samples(trainX, n_samples, 0) # generate translated images X_out, _ = generate_fake_samples(g_model, X_in, 0) # scale all pixels from [-1,1] to [0,1] X_in = (X_in + 1) / 2.0 X_out = (X_out + 1) / 2.0 # plot real images for i in range(n_samples): pyplot.subplot(2, n_samples, 1 + i) pyplot.axis('off') pyplot.imshow(X_in[i]) # plot translated image for i in range(n_samples): pyplot.subplot(2, n_samples, 1 + n_samples + i) pyplot.axis('off') pyplot.imshow(X_out[i]) # save plot to file filename1 = '%s_generated_plot_%06d.png' % (name, (step+1)) pyplot.savefig(filename1) pyplot.close()

We are nearly ready to define the training of the models.

The discriminator models are updated directly on real and generated images, although in an effort to further manage how quickly the discriminator models learn, a pool of fake images is maintained.

The paper defines an image pool of 50 generated images for each discriminator model that is first populated and probabilistically either adds new images to the pool by replacing an existing image or uses a generated image directly. We can implement this as a Python list of images for each discriminator and use the *update_image_pool()* function below to maintain each pool list.

# update image pool for fake images def update_image_pool(pool, images, max_size=50): selected = list() for image in images: if len(pool) < max_size: # stock the pool pool.append(image) selected.append(image) elif random() < 0.5: # use image, but don't add it to the pool selected.append(image) else: # replace an existing image and use replaced image ix = randint(0, len(pool)) selected.append(pool[ix]) pool[ix] = image return asarray(selected)

We can now define the training of each of the generator models.

The *train()* function below takes all six models (two discriminator, two generator, and two composite models) as arguments along with the dataset and trains the models.

The batch size is fixed at one image to match the description in the paper and the models are fit for 100 epochs. Given that the horses dataset has 1,187 images, one epoch is defined as 1,187 batches and the same number of training iterations. Images are generated using both generators each epoch and models are saved every five epochs or (1187 * 5) 5,935 training iterations.

The order of model updates is implemented to match the official Torch implementation. First, a batch of real images from each domain is selected, then a batch of fake images for each domain is generated. The fake images are then used to update each discriminator’s fake image pool.

Next, the Generator-A model (zebras to horses) is updated via the composite model, followed by the Discriminator-A model (horses). Then the Generator-B (horses to zebra) composite model and Discriminator-B (zebras) models are updated.

Loss for each of the updated models is then reported at the end of the training iteration. Importantly, only the weighted average loss used to update each generator is reported.

# train cyclegan models def train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset): # define properties of the training run n_epochs, n_batch, = 100, 1 # determine the output square shape of the discriminator n_patch = d_model_A.output_shape[1] # unpack dataset trainA, trainB = dataset # prepare image pool for fakes poolA, poolB = list(), list() # calculate the number of batches per training epoch bat_per_epo = int(len(trainA) / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # manually enumerate epochs for i in range(n_steps): # select a batch of real samples X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch) # generate a batch of fake samples X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch) # update fakes from pool X_fakeA = update_image_pool(poolA, X_fakeA) X_fakeB = update_image_pool(poolB, X_fakeB) # update generator B->A via adversarial and cycle loss g_loss2, _, _, _, _ = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA]) # update discriminator for A -> [real/fake] dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA) # update generator A->B via adversarial and cycle loss g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) # update discriminator for B -> [real/fake] dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB) # summarize performance print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2)) # evaluate the model performance every so often if (i+1) % (bat_per_epo * 1) == 0: # plot A->B translation summarize_performance(i, g_model_AtoB, trainA, 'AtoB') # plot B->A translation summarize_performance(i, g_model_BtoA, trainB, 'BtoA') if (i+1) % (bat_per_epo * 5) == 0: # save the models save_models(i, g_model_AtoB, g_model_BtoA)

Tying all of this together, the complete example of training a CycleGAN model to translate photos of horses to zebras and zebras to horses is listed below.

# example of training a cyclegan on the horse2zebra dataset from random import random from numpy import load from numpy import zeros from numpy import ones from numpy import asarray from numpy.random import randint from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from matplotlib import pyplot # define the discriminator model def define_discriminator(image_shape): # weight initialization init = RandomNormal(stddev=0.02) # source image input in_image = Input(shape=image_shape) # C64 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) d = LeakyReLU(alpha=0.2)(d) # C128 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C256 d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C512 d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # second last output layer d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # patch output patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) # define model model = Model(in_image, patch_out) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) return model # generator a resnet block def resnet_block(n_filters, input_layer): # weight initialization init = RandomNormal(stddev=0.02) # first layer convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # second convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) # concatenate merge channel-wise with input layer g = Concatenate()([g, input_layer]) return g # define the standalone generator model def define_generator(image_shape, n_resnet=9): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=image_shape) # c7s1-64 g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d128 g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d256 g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # R256 for _ in range(n_resnet): g = resnet_block(256, g) # u128 g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # u64 g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # c7s1-3 g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) out_image = Activation('tanh')(g) # define model model = Model(in_image, out_image) return model # define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): # ensure the model we're updating is trainable g_model_1.trainable = True # mark discriminator as not trainable d_model.trainable = False # mark other generator model as not trainable g_model_2.trainable = False # discriminator element input_gen = Input(shape=image_shape) gen1_out = g_model_1(input_gen) output_d = d_model(gen1_out) # identity element input_id = Input(shape=image_shape) output_id = g_model_1(input_id) # forward cycle output_f = g_model_2(gen1_out) # backward cycle gen2_out = g_model_2(input_id) output_b = g_model_1(gen2_out) # define model graph model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) # define optimization algorithm configuration opt = Adam(lr=0.0002, beta_1=0.5) # compile model with weighting of least squares loss and L1 loss model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) return model # load and prepare training images def load_real_samples(filename): # load the dataset data = load(filename) # unpack arrays X1, X2 = data['arr_0'], data['arr_1'] # scale from [0,255] to [-1,1] X1 = (X1 - 127.5) / 127.5 X2 = (X2 - 127.5) / 127.5 return [X1, X2] # select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # retrieve selected images X = dataset[ix] # generate 'real' class labels (1) y = ones((n_samples, patch_shape, patch_shape, 1)) return X, y # generate a batch of images, returns images and targets def generate_fake_samples(g_model, dataset, patch_shape): # generate fake instance X = g_model.predict(dataset) # create 'fake' class labels (0) y = zeros((len(X), patch_shape, patch_shape, 1)) return X, y # save the generator models to file def save_models(step, g_model_AtoB, g_model_BtoA): # save the first generator model filename1 = 'g_model_AtoB_%06d.h5' % (step+1) g_model_AtoB.save(filename1) # save the second generator model filename2 = 'g_model_BtoA_%06d.h5' % (step+1) g_model_BtoA.save(filename2) print('>Saved: %s and %s' % (filename1, filename2)) # generate samples and save as a plot and save the model def summarize_performance(step, g_model, trainX, name, n_samples=5): # select a sample of input images X_in, _ = generate_real_samples(trainX, n_samples, 0) # generate translated images X_out, _ = generate_fake_samples(g_model, X_in, 0) # scale all pixels from [-1,1] to [0,1] X_in = (X_in + 1) / 2.0 X_out = (X_out + 1) / 2.0 # plot real images for i in range(n_samples): pyplot.subplot(2, n_samples, 1 + i) pyplot.axis('off') pyplot.imshow(X_in[i]) # plot translated image for i in range(n_samples): pyplot.subplot(2, n_samples, 1 + n_samples + i) pyplot.axis('off') pyplot.imshow(X_out[i]) # save plot to file filename1 = '%s_generated_plot_%06d.png' % (name, (step+1)) pyplot.savefig(filename1) pyplot.close() # update image pool for fake images def update_image_pool(pool, images, max_size=50): selected = list() for image in images: if len(pool) < max_size: # stock the pool pool.append(image) selected.append(image) elif random() < 0.5: # use image, but don't add it to the pool selected.append(image) else: # replace an existing image and use replaced image ix = randint(0, len(pool)) selected.append(pool[ix]) pool[ix] = image return asarray(selected) # train cyclegan models def train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset): # define properties of the training run n_epochs, n_batch, = 100, 1 # determine the output square shape of the discriminator n_patch = d_model_A.output_shape[1] # unpack dataset trainA, trainB = dataset # prepare image pool for fakes poolA, poolB = list(), list() # calculate the number of batches per training epoch bat_per_epo = int(len(trainA) / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # manually enumerate epochs for i in range(n_steps): # select a batch of real samples X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch) # generate a batch of fake samples X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch) # update fakes from pool X_fakeA = update_image_pool(poolA, X_fakeA) X_fakeB = update_image_pool(poolB, X_fakeB) # update generator B->A via adversarial and cycle loss g_loss2, _, _, _, _ = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA]) # update discriminator for A -> [real/fake] dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA) # update generator A->B via adversarial and cycle loss g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) # update discriminator for B -> [real/fake] dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB) # summarize performance print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2)) # evaluate the model performance every so often if (i+1) % (bat_per_epo * 1) == 0: # plot A->B translation summarize_performance(i, g_model_AtoB, trainA, 'AtoB') # plot B->A translation summarize_performance(i, g_model_BtoA, trainB, 'BtoA') if (i+1) % (bat_per_epo * 5) == 0: # save the models save_models(i, g_model_AtoB, g_model_BtoA) # load image data dataset = load_real_samples('horse2zebra_256.npz') print('Loaded', dataset[0].shape, dataset[1].shape) # define input shape based on the loaded dataset image_shape = dataset[0].shape[1:] # generator: A -> B g_model_AtoB = define_generator(image_shape) # generator: B -> A g_model_BtoA = define_generator(image_shape) # discriminator: A -> [real/fake] d_model_A = define_discriminator(image_shape) # discriminator: B -> [real/fake] d_model_B = define_discriminator(image_shape) # composite: A -> B -> [real/fake, A] c_model_AtoB = define_composite_model(g_model_AtoB, d_model_B, g_model_BtoA, image_shape) # composite: B -> A -> [real/fake, B] c_model_BtoA = define_composite_model(g_model_BtoA, d_model_A, g_model_AtoB, image_shape) # train models train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset)

The example can be run on CPU hardware, although GPU hardware is recommended.

The example might take a number of hours to run on modern GPU hardware.

If needed, you can access cheap GPU hardware via Amazon EC2; see the tutorial:

**Note**: your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

The loss is reported each training iteration, including the Discriminator-A loss on real and fake examples (*dA*), Discriminator-B loss on real and fake examples (*dB*), and Generator-AtoB and Generator-BtoA loss, each of which is a weighted average of adversarial, identity, forward, and backward cycle loss (*g*).

If loss for the discriminator goes to zero and stays there for a long time, consider re-starting the training run as it is an example of a training failure.

>1, dA[2.284,0.678] dB[1.422,0.918] g[18.747,18.452] >2, dA[2.129,1.226] dB[1.039,1.331] g[19.469,22.831] >3, dA[1.644,3.909] dB[1.097,1.680] g[19.192,23.757] >4, dA[1.427,1.757] dB[1.236,3.493] g[20.240,18.390] >5, dA[1.737,0.808] dB[1.662,2.312] g[16.941,14.915] ... >118696, dA[0.004,0.016] dB[0.001,0.001] g[2.623,2.359] >118697, dA[0.001,0.028] dB[0.003,0.002] g[3.045,3.194] >118698, dA[0.002,0.008] dB[0.001,0.002] g[2.685,2.071] >118699, dA[0.010,0.010] dB[0.001,0.001] g[2.430,2.345] >118700, dA[0.002,0.008] dB[0.000,0.004] g[2.487,2.169] >Saved: g_model_AtoB_118700.h5 and g_model_BtoA_118700.h5

Plots of generated images are saved at the end of every epoch or after every 1,187 training iterations and the iteration number is used in the filename.

AtoB_generated_plot_001187.png AtoB_generated_plot_002374.png ... BtoA_generated_plot_001187.png BtoA_generated_plot_002374.png

Models are saved after every five epochs or (1187 * 5) 5,935 training iterations, and again the iteration number is used in the filenames.

g_model_AtoB_053415.h5 g_model_AtoB_059350.h5 ... g_model_BtoA_053415.h5 g_model_BtoA_059350.h5

The plots of generated images can be used to choose a model and more training iterations may not necessarily mean better quality generated images.

Horses to Zebras translation starts to become reliable after about 50 epochs.

The translation from Zebras to Horses appears to be more challenging for the model to learn, although somewhat plausible translations also begin to be generated after 50 to 60 epochs.

I suspect that better quality results could be achieved with an additional 100 training epochs with weight decay, as is used in the paper, and perhaps with a data generator that systematically works through each dataset rather than randomly sampling.

Now that we have fit our CycleGAN generators, we can use them to translate photographs in an ad hoc manner.

## How to Perform Image Translation With CycleGAN Generators

The saved generator models can be loaded and used for ad hoc image translation.

The first step is to load the dataset. We can use the same *load_real_samples()* function as we developed in the previous section.

... # load dataset A_data, B_data = load_real_samples('horse2zebra_256.npz') print('Loaded', A_data.shape, B_data.shape)

Review the plots of generated images and select a pair of models that we can use for image generation. In this case, we will use the model saved around epoch 89 (training iteration 89,025). Our generator models used a custom layer from the *keras_contrib* library, specifically the *InstanceNormalization* layer. Therefore, we need to specify how to load this layer when loading each generator model.

This can be achieved by specifying a dictionary mapping of the layer name to the object and passing this as an argument to the *load_model()* keras function.

... # load the models cust = {'InstanceNormalization': InstanceNormalization} model_AtoB = load_model('g_model_AtoB_089025.h5', cust) model_BtoA = load_model('g_model_BtoA_089025.h5', cust)

We can use the *select_sample()* function that we developed in the previous section to select a random photo from the dataset.

# select a random sample of images from the dataset def select_sample(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # retrieve selected images X = dataset[ix] return X

Next, we can use the Generator-AtoB model, first by selecting a random image from Domain-A (horses) as input, using Generator-AtoB to translate it to Domain-B (zebras), then use the Generator-BtoA model to reconstruct the original image (horse).

# plot A->B->A A_real = select_sample(A_data, 1) B_generated = model_AtoB.predict(A_real) A_reconstructed = model_BtoA.predict(B_generated)

We can then plot the three photos side by side as the original or real photo, the translated photo, and the reconstruction of the original photo. The *show_plot()* function below implements this.

# plot the image, the translation, and the reconstruction def show_plot(imagesX, imagesY1, imagesY2): images = vstack((imagesX, imagesY1, imagesY2)) titles = ['Real', 'Generated', 'Reconstructed'] # scale from [-1,1] to [0,1] images = (images + 1) / 2.0 # plot images row by row for i in range(len(images)): # define subplot pyplot.subplot(1, len(images), 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(images[i]) # title pyplot.title(titles[i]) pyplot.show()

We can then call this function to plot our real and generated photos.

... show_plot(A_real, B_generated, A_reconstructed)

This is a good test of both models, however, we can also perform the same operation in reverse.

Specifically, a real photo from Domain-B (zebra) translated to Domain-A (horse), then reconstructed as Domain-B (zebra).

# plot B->A->B B_real = select_sample(B_data, 1) A_generated = model_BtoA.predict(B_real) B_reconstructed = model_AtoB.predict(A_generated) show_plot(B_real, A_generated, B_reconstructed)

Tying all of this together, the complete example is listed below.

# example of using saved cyclegan models for image translation from keras.models import load_model from numpy import load from numpy import vstack from matplotlib import pyplot from numpy.random import randint from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization # load and prepare training images def load_real_samples(filename): # load the dataset data = load(filename) # unpack arrays X1, X2 = data['arr_0'], data['arr_1'] # scale from [0,255] to [-1,1] X1 = (X1 - 127.5) / 127.5 X2 = (X2 - 127.5) / 127.5 return [X1, X2] # select a random sample of images from the dataset def select_sample(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # retrieve selected images X = dataset[ix] return X # plot the image, the translation, and the reconstruction def show_plot(imagesX, imagesY1, imagesY2): images = vstack((imagesX, imagesY1, imagesY2)) titles = ['Real', 'Generated', 'Reconstructed'] # scale from [-1,1] to [0,1] images = (images + 1) / 2.0 # plot images row by row for i in range(len(images)): # define subplot pyplot.subplot(1, len(images), 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(images[i]) # title pyplot.title(titles[i]) pyplot.show() # load dataset A_data, B_data = load_real_samples('horse2zebra_256.npz') print('Loaded', A_data.shape, B_data.shape) # load the models cust = {'InstanceNormalization': InstanceNormalization} model_AtoB = load_model('g_model_AtoB_089025.h5', cust) model_BtoA = load_model('g_model_BtoA_089025.h5', cust) # plot A->B->A A_real = select_sample(A_data, 1) B_generated = model_AtoB.predict(A_real) A_reconstructed = model_BtoA.predict(B_generated) show_plot(A_real, B_generated, A_reconstructed) # plot B->A->B B_real = select_sample(B_data, 1) A_generated = model_BtoA.predict(B_real) B_reconstructed = model_AtoB.predict(A_generated) show_plot(B_real, A_generated, B_reconstructed)

Running the example first selects a random photo of a horse, translates it, and then tries to reconstruct the original photo.

Then a similar process is performed in reverse, selecting a random photo of a zebra, translating it to a horse, then reconstructing the original photo of the zebra.

**Note**: your results will vary given the stochastic training of the CycleGAN model and choice of a random photograph. Try running the example a few times.

The models are not perfect, especially the zebra to horse model, so you may want to generate many translated examples to review.

It also seems that both models are more effective when reconstructing an image, which is interesting as they are essentially performing the same translation task as when operating on real photographs. This may be a sign that the adversarial loss is not strong enough during training.

We may also want to use a generator model in a standalone way on individual photograph files.

First, we can select a photo from the training dataset. In this case, we will use “*horse2zebra/trainA/n02381460_541.jpg*“.

We can develop a function to load this image and scale it to the preferred size of 256×256, scale pixel values to the range [-1,1], and convert the array of pixels to a single sample.

The *load_image()* function below implements this.

def load_image(filename, size=(256,256)): # load and resize the image pixels = load_img(filename, target_size=size) # convert to numpy array pixels = img_to_array(pixels) # transform in a sample pixels = expand_dims(pixels, 0) # scale from [0,255] to [-1,1] pixels = (pixels - 127.5) / 127.5 return pixels

We can then load our selected image as well as the AtoB generator model, as we did before.

... # load the image image_src = load_image('horse2zebra/trainA/n02381460_541.jpg') # load the model cust = {'InstanceNormalization': InstanceNormalization} model_AtoB = load_model('g_model_AtoB_089025.h5', cust)

We can then translate the loaded image, scale the pixel values back to the expected range, and plot the result.

... # translate image image_tar = model_AtoB.predict(image_src) # scale from [-1,1] to [0,1] image_tar = (image_tar + 1) / 2.0 # plot the translated image pyplot.imshow(image_tar[0]) pyplot.show()

Tying this all together, the complete example is listed below.

# example of using saved cyclegan models for image translation from numpy import load from numpy import expand_dims from keras.models import load_model from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img from matplotlib import pyplot # load an image to the preferred size def load_image(filename, size=(256,256)): # load and resize the image pixels = load_img(filename, target_size=size) # convert to numpy array pixels = img_to_array(pixels) # transform in a sample pixels = expand_dims(pixels, 0) # scale from [0,255] to [-1,1] pixels = (pixels - 127.5) / 127.5 return pixels # load the image image_src = load_image('horse2zebra/trainA/n02381460_541.jpg') # load the model cust = {'InstanceNormalization': InstanceNormalization} model_AtoB = load_model('g_model_AtoB_100895.h5', cust) # translate image image_tar = model_AtoB.predict(image_src) # scale from [-1,1] to [0,1] image_tar = (image_tar + 1) / 2.0 # plot the translated image pyplot.imshow(image_tar[0]) pyplot.show()

Running the example loads the selected image, loads the generator model, translates the photograph of a horse to a zebra, and plots the results.

## Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

**Smaller Image Size**. Update the example to use a smaller image size, such as 128×128, and adjust the size of the generator model to use 6 ResNet layers as is used in the cycleGAN paper.**Different Dataset**. Update the example to use the apples to oranges dataset.**Without Identity Mapping**. Update the example to train the generator models without the identity mapping and compare results.

If you explore any of these extensions, I’d love to know.

Post your findings in the comments below.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Papers

### Projects

- CycleGAN Project (official), GitHub.
- pytorch-CycleGAN-and-pix2pix (official), GitHub.
- CycleGAN Project Page (official)

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?
- Keras Contrib Project

### Articles

## Summary

In this tutorial, you discovered how to develop a CycleGAN model to translate photos of horses to zebras, and back again.

Specifically, you learned:

- How to load and prepare the horses to zebra image translation dataset for modeling.
- How to train a pair of CycleGAN generator models for translating horses to zebra and zebra to horses.
- How to load saved CycleGAN models and use them to translate photographs.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Develop a CycleGAN for Image-to-Image Translation with Keras appeared first on Machine Learning Mastery.

## How to Implement CycleGAN Models From Scratch With Keras

The Cycle Generative adversarial Network, or CycleGAN for short, is a generator model for converting images from one domain to another domain.

For example, the model can be used to translate images of horses to images of zebras, or photographs of city landscapes at night to city landscapes during the day.

The benefit of the CycleGAN model is that it can be trained without paired examples. That is, it does not require examples of photographs before and after the translation in order to train the model, e.g. photos of the same city landscape during the day and at night. Instead, it is able to use a collection of photographs from each domain and extract and harness the underlying style of images in the collection in order to perform the translation.

The model is very impressive but has an architecture that appears quite complicated to implement for beginners.

In this tutorial, you will discover how to implement the CycleGAN architecture from scratch using the Keras deep learning framework.

After completing this tutorial, you will know:

- How to implement the discriminator and generator models.
- How to define composite models to train the generator models via adversarial and cycle loss.
- How to implement the training process to update model weights each training iteration.

Let’s get started.

## Tutorial Overview

This tutorial is divided into five parts; they are:

- What Is the CycleGAN Architecture?
- How to Implement the CycleGAN Discriminator Model
- How to Implement the CycleGAN Generator Model
- How to Implement Composite Models for Least Squares and Cycle Loss
- How to Update Discriminator and Generator Models

## What Is the CycleGAN Architecture?

The CycleGAN model was described by Jun-Yan Zhu, et al. in their 2017 paper titled “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.”

The model architecture is comprised of two generator models: one generator (Generator-A) for generating images for the first domain (Domain-A) and the second generator (Generator-B) for generating images for the second domain (Domain-B).

- Generator-A -> Domain-A
- Generator-B -> Domain-B

The generator models perform image translation, meaning that the image generation process is conditional on an input image, specifically an image from the other domain. Generator-A takes an image from Domain-B as input and Generator-B takes an image from Domain-A as input.

- Domain-B -> Generator-A -> Domain-A
- Domain-A -> Generator-B -> Domain-B

Each generator has a corresponding discriminator model.

The first discriminator model (Discriminator-A) takes real images from Domain-A and generated images from Generator-A and predicts whether they are real or fake. The second discriminator model (Discriminator-B) takes real images from Domain-B and generated images from Generator-B and predicts whether they are real or fake.

- Domain-A -> Discriminator-A -> [Real/Fake]
- Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake]
- Domain-B -> Discriminator-B -> [Real/Fake]
- Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]

The discriminator and generator models are trained in an adversarial zero-sum process, like normal GAN models.

The generators learn to better fool the discriminators and the discriminators learn to better detect fake images. Together, the models find an equilibrium during the training process.

Additionally, the generator models are regularized not just to create new images in the target domain, but instead create translated versions of the input images from the source domain. This is achieved by using generated images as input to the corresponding generator model and comparing the output image to the original images.

Passing an image through both generators is called a cycle. Together, each pair of generator models are trained to better reproduce the original source image, referred to as cycle consistency.

- Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B
- Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

There is one further element to the architecture referred to as the identity mapping.

This is where a generator is provided with images as input from the target domain and is expected to generate the same image without change. This addition to the architecture is optional, although it results in a better matching of the color profile of the input image.

- Domain-A -> Generator-A -> Domain-A
- Domain-B -> Generator-B -> Domain-B

Now that we are familiar with the model architecture, we can take a closer look at each model in turn and how they can be implemented.

The paper provides a good description of the models and training process, although the official Torch implementation was used as the definitive description for each model and training process and provides the basis for the the model implementations described below.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Implement the CycleGAN Discriminator Model

The discriminator model is responsible for taking a real or generated image as input and predicting whether it is real or fake.

The discriminator model is implemented as a PatchGAN model.

For the discriminator networks we use 70 × 70 PatchGANs, which aim to classify whether 70 × 70 overlapping image patches are real or fake.

— Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The PatchGAN was described in the 2016 paper titled “Precomputed Real-time Texture Synthesis With Markovian Generative Adversarial Networks” and was used in the pix2pix model for image translation described in the 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks.”

The architecture is described as discriminating an input image as real or fake by averaging the prediction for nxn squares or patches of the source image.

… we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each NxN patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

This can be implemented directly by using a somewhat standard deep convolutional discriminator model.

Instead of outputting a single value like a traditional discriminator model, the PatchGAN discriminator model can output a square or one-channel feature map of predictions. The 70×70 refers to the effective receptive field of the model on the input, not the actual shape of the output feature map.

The receptive field of a convolutional layer refers to the number of pixels that one output of the layer maps to in the input to the layer. The effective receptive field refers to the mapping of one pixel in the output of a deep convolutional model (multiple layers) to the input image. Here, the PatchGAN is an approach to designing a deep convolutional network based on the effective receptive field, where one output activation of the model maps to a 70×70 patch of the input image, regardless of the size of the input image.

The PatchGAN has the effect of predicting whether each 70×70 patch in the input image is real or fake. These predictions can then be averaged to give the output of the model (if needed) or compared directly to a matrix (or a vector if flattened) of expected values (e.g. 0 or 1 values).

The discriminator model described in the paper takes 256×256 color images as input and defines an explicit architecture that is used on all of the test problems. The architecture uses blocks of Conv2D-InstanceNorm-LeakyReLU layers, with 4×4 filters and a 2×2 stride.

Let Ck denote a 4×4 Convolution-InstanceNorm-LeakyReLU layer with k filters and stride 2. After the last layer, we apply a convolution to produce a 1-dimensional output. We do not use InstanceNorm for the first C64 layer. We use leaky ReLUs with a slope of 0.2.

— Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The architecture for the discriminator is as follows:

- C64-C128-C256-C512

This is referred to as a 3-layer PatchGAN in the CycleGAN and Pix2Pix nomenclature, as excluding the first hidden layer, the model has three hidden layers that could be scaled up or down to give different sized PatchGAN models.

Not listed in the paper, the model also has a final hidden layer C512 with a 1×1 stride, and an output layer C1, also with a 1×1 stride with a linear activation function. Given the model is mostly used with 256×256 sized images as input, the size of the output feature map of activations is 16×16. If 128×128 images were used as input, then the size of the output feature map of activations would be 8×8.

The model does not use batch normalization; instead, instance normalization is used.

Instance normalization was described in the 2016 paper titled “Instance Normalization: The Missing Ingredient for Fast Stylization.” It is a very simple type of normalization and involves standardizing (e.g. scaling to a standard Gaussian) the values on each feature map.

The intent is to remove image-specific contrast information from the image during image generation, resulting in better generated images.

The key idea is to replace batch normalization layers in the generator architecture with instance normalization layers, and to keep them at test time (as opposed to freeze and simplify them out as done for batch normalization). Intuitively, the normalization process allows to remove instance-specific contrast information from the content image, which simplifies generation. In practice, this results in vastly improved images.

— Instance Normalization: The Missing Ingredient for Fast Stylization, 2016.

Although designed for generator models, it can also prove effective in discriminator models.

An implementation of instance normalization is provided in the keras-contrib project that provides early access to community-supplied Keras features.

The keras-contrib library can be installed via *pip* as follows:

sudo pip install git+https://www.github.com/keras-team/keras-contrib.git

Or, if you are using an Anaconda virtual environment, such as on EC2:

git clone https://www.github.com/keras-team/keras-contrib.git cd keras-contrib sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

The new *InstanceNormalization* layer can then be used as follows:

... from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization # define layer layer = InstanceNormalization(axis=-1) ...

The *“axis*” argument is set to -1 to ensure that features are normalized per feature map.

The network weights are initialized to Gaussian random numbers with a standard deviation of 0.02, as is described for DCGANs more generally.

Weights are initialized from a Gaussian distribution N (0, 0.02).

— Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The discriminator model is updated using a least squares loss (L2), a so-called Least-Squared Generative Adversarial Network, or LSGAN.

… we replace the negative log likelihood objective by a least-squares loss. This loss is more stable during training and generates higher quality results.

— Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be implemented using “mean squared error” between the target values of class=1 for real images and class=0 for fake images.

Additionally, the paper suggests dividing the loss for the discriminator by half during training, in an effort to slow down updates to the discriminator relative to the generator.

In practice, we divide the objective by 2 while optimizing D, which slows down the rate at which D learns, relative to the rate of G.

— Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be achieved by setting the “*loss_weights*” argument to 0.5 when compiling the model. Note that this weighting does not appear to be implemented in the official Torch implementation when updating discriminator models are defined in the fDx_basic() function.

We can tie all of this together in the example below with a *define_discriminator()* function that defines the PatchGAN discriminator. The model configuration matches the description in the appendix of the paper with additional details from the official Torch implementation defined in the defineD_n_layers() function.

# example of defining a 70x70 patchgan discriminator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import BatchNormalization from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model # define the discriminator model def define_discriminator(image_shape): # weight initialization init = RandomNormal(stddev=0.02) # source image input in_image = Input(shape=image_shape) # C64 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) d = LeakyReLU(alpha=0.2)(d) # C128 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C256 d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C512 d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # second last output layer d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # patch output patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) # define model model = Model(in_image, patch_out) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) return model # define image shape image_shape = (256,256,3) # create the model model = define_discriminator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='discriminator_model_plot.png', show_shapes=True, show_layer_names=True)

**Note**: the *plot_model()* function requires that both the pydot and pygraphviz libraries are installed. If this is a problem, you can comment out both the import and call to this function.

Running the example summarizes the model showing the size inputs and outputs for each layer.

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 256, 256, 3) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 128, 128, 64) 3136 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 128, 128, 64) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 64, 64, 128) 131200 _________________________________________________________________ instance_normalization_1 (In (None, 64, 64, 128) 256 _________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 64, 64, 128) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 32, 32, 256) 524544 _________________________________________________________________ instance_normalization_2 (In (None, 32, 32, 256) 512 _________________________________________________________________ leaky_re_lu_3 (LeakyReLU) (None, 32, 32, 256) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 16, 16, 512) 2097664 _________________________________________________________________ instance_normalization_3 (In (None, 16, 16, 512) 1024 _________________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 16, 16, 512) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 16, 16, 512) 4194816 _________________________________________________________________ instance_normalization_4 (In (None, 16, 16, 512) 1024 _________________________________________________________________ leaky_re_lu_5 (LeakyReLU) (None, 16, 16, 512) 0 _________________________________________________________________ conv2d_6 (Conv2D) (None, 16, 16, 1) 8193 ================================================================= Total params: 6,962,369 Trainable params: 6,962,369 Non-trainable params: 0 _________________________________________________________________

A plot of the model architecture is also created to help get an idea of the inputs, outputs, and transitions of the image data through the model.

## How to Implement the CycleGAN Generator Model

The CycleGAN Generator model takes an image as input and generates a translated image as output.

The model uses a sequence of downsampling convolutional blocks to encode the input image, a number of residual network (ResNet) convolutional blocks to transform the image, and a number of upsampling convolutional blocks to generate the output image.

Let c7s1-k denote a 7×7 Convolution-InstanceNormReLU layer with k filters and stride 1. dk denotes a 3×3 Convolution-InstanceNorm-ReLU layer with k filters and stride 2. Reflection padding was used to reduce artifacts. Rk denotes a residual block that contains two 3 × 3 convolutional layers with the same number of filters on both layer. uk denotes a 3 × 3 fractional-strided-ConvolutionInstanceNorm-ReLU layer with k filters and stride 1/2.

— Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

The architecture for the 6-resnet block generator for 128×128 images is as follows:

- c7s1-64,d128,d256,R256,R256,R256,R256,R256,R256,u128,u64,c7s1-3

First, we need a function to define the ResNet blocks. These are blocks comprised of two 3×3 CNN layers where the input to the block is concatenated to the output of the block, channel-wise.

This is implemented in the *resnet_block()* function that creates two Conv-InstanceNorm blocks with 3×3 filters and 1×1 stride and without a ReLU activation after the second block, matching the official Torch implementation in the build_conv_block() function. Same padding is used instead of reflection padded recommended in the paper for simplicity.

# generator a resnet block def resnet_block(n_filters, input_layer): # weight initialization init = RandomNormal(stddev=0.02) # first layer convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # second convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) # concatenate merge channel-wise with input layer g = Concatenate()([g, input_layer]) return g

Next, we can define a function that will create the 9-resnet block version for 256×256 input images. This can easily be changed to the 6-resnet block version by setting *image_shape* to (128x128x3) and *n_resnet* function argument to 6.

Importantly, the model outputs pixel values with the shape as the input and pixel values are in the range [-1, 1], typical for GAN generator models.

# define the standalone generator model def define_generator(image_shape=(256,256,3), n_resnet=9): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=image_shape) # c7s1-64 g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d128 g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d256 g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # R256 for _ in range(n_resnet): g = resnet_block(256, g) # u128 g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # u64 g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # c7s1-3 g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) out_image = Activation('tanh')(g) # define model model = Model(in_image, out_image) return model

The generator model is not compiled as it is trained via a composite model, seen in the next section.

Tying this together, the complete example is listed below.

# example of an encoder-decoder generator for the cyclegan from keras.optimizers import Adam from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import Activation from keras.initializers import RandomNormal from keras.layers import Concatenate from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model # generator a resnet block def resnet_block(n_filters, input_layer): # weight initialization init = RandomNormal(stddev=0.02) # first layer convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # second convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) # concatenate merge channel-wise with input layer g = Concatenate()([g, input_layer]) return g # define the standalone generator model def define_generator(image_shape=(256,256,3), n_resnet=9): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=image_shape) # c7s1-64 g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d128 g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d256 g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # R256 for _ in range(n_resnet): g = resnet_block(256, g) # u128 g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # u64 g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # c7s1-3 g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) out_image = Activation('tanh')(g) # define model model = Model(in_image, out_image) return model # create the model model = define_generator() # summarize the model model.summary() # plot the model plot_model(model, to_file='generator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model.

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 256, 256, 3) 0 __________________________________________________________________________________________________ conv2d_1 (Conv2D) (None, 256, 256, 64) 9472 input_1[0][0] __________________________________________________________________________________________________ instance_normalization_1 (Insta (None, 256, 256, 64) 128 conv2d_1[0][0] __________________________________________________________________________________________________ activation_1 (Activation) (None, 256, 256, 64) 0 instance_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 128, 128, 128 73856 activation_1[0][0] __________________________________________________________________________________________________ instance_normalization_2 (Insta (None, 128, 128, 128 256 conv2d_2[0][0] __________________________________________________________________________________________________ activation_2 (Activation) (None, 128, 128, 128 0 instance_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D) (None, 64, 64, 256) 295168 activation_2[0][0] __________________________________________________________________________________________________ instance_normalization_3 (Insta (None, 64, 64, 256) 512 conv2d_3[0][0] __________________________________________________________________________________________________ activation_3 (Activation) (None, 64, 64, 256) 0 instance_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 64, 64, 256) 590080 activation_3[0][0] __________________________________________________________________________________________________ instance_normalization_4 (Insta (None, 64, 64, 256) 512 conv2d_4[0][0] __________________________________________________________________________________________________ activation_4 (Activation) (None, 64, 64, 256) 0 instance_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D) (None, 64, 64, 256) 590080 activation_4[0][0] __________________________________________________________________________________________________ instance_normalization_5 (Insta (None, 64, 64, 256) 512 conv2d_5[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 64, 64, 512) 0 instance_normalization_5[0][0] activation_3[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D) (None, 64, 64, 256) 1179904 concatenate_1[0][0] __________________________________________________________________________________________________ instance_normalization_6 (Insta (None, 64, 64, 256) 512 conv2d_6[0][0] __________________________________________________________________________________________________ activation_5 (Activation) (None, 64, 64, 256) 0 instance_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D) (None, 64, 64, 256) 590080 activation_5[0][0] __________________________________________________________________________________________________ instance_normalization_7 (Insta (None, 64, 64, 256) 512 conv2d_7[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate) (None, 64, 64, 768) 0 instance_normalization_7[0][0] concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D) (None, 64, 64, 256) 1769728 concatenate_2[0][0] __________________________________________________________________________________________________ instance_normalization_8 (Insta (None, 64, 64, 256) 512 conv2d_8[0][0] __________________________________________________________________________________________________ activation_6 (Activation) (None, 64, 64, 256) 0 instance_normalization_8[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D) (None, 64, 64, 256) 590080 activation_6[0][0] __________________________________________________________________________________________________ instance_normalization_9 (Insta (None, 64, 64, 256) 512 conv2d_9[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate) (None, 64, 64, 1024) 0 instance_normalization_9[0][0] concatenate_2[0][0] __________________________________________________________________________________________________ conv2d_10 (Conv2D) (None, 64, 64, 256) 2359552 concatenate_3[0][0] __________________________________________________________________________________________________ instance_normalization_10 (Inst (None, 64, 64, 256) 512 conv2d_10[0][0] __________________________________________________________________________________________________ activation_7 (Activation) (None, 64, 64, 256) 0 instance_normalization_10[0][0] __________________________________________________________________________________________________ conv2d_11 (Conv2D) (None, 64, 64, 256) 590080 activation_7[0][0] __________________________________________________________________________________________________ instance_normalization_11 (Inst (None, 64, 64, 256) 512 conv2d_11[0][0] __________________________________________________________________________________________________ concatenate_4 (Concatenate) (None, 64, 64, 1280) 0 instance_normalization_11[0][0] concatenate_3[0][0] __________________________________________________________________________________________________ conv2d_12 (Conv2D) (None, 64, 64, 256) 2949376 concatenate_4[0][0] __________________________________________________________________________________________________ instance_normalization_12 (Inst (None, 64, 64, 256) 512 conv2d_12[0][0] __________________________________________________________________________________________________ activation_8 (Activation) (None, 64, 64, 256) 0 instance_normalization_12[0][0] __________________________________________________________________________________________________ conv2d_13 (Conv2D) (None, 64, 64, 256) 590080 activation_8[0][0] __________________________________________________________________________________________________ instance_normalization_13 (Inst (None, 64, 64, 256) 512 conv2d_13[0][0] __________________________________________________________________________________________________ concatenate_5 (Concatenate) (None, 64, 64, 1536) 0 instance_normalization_13[0][0] concatenate_4[0][0] __________________________________________________________________________________________________ conv2d_14 (Conv2D) (None, 64, 64, 256) 3539200 concatenate_5[0][0] __________________________________________________________________________________________________ instance_normalization_14 (Inst (None, 64, 64, 256) 512 conv2d_14[0][0] __________________________________________________________________________________________________ activation_9 (Activation) (None, 64, 64, 256) 0 instance_normalization_14[0][0] __________________________________________________________________________________________________ conv2d_15 (Conv2D) (None, 64, 64, 256) 590080 activation_9[0][0] __________________________________________________________________________________________________ instance_normalization_15 (Inst (None, 64, 64, 256) 512 conv2d_15[0][0] __________________________________________________________________________________________________ concatenate_6 (Concatenate) (None, 64, 64, 1792) 0 instance_normalization_15[0][0] concatenate_5[0][0] __________________________________________________________________________________________________ conv2d_16 (Conv2D) (None, 64, 64, 256) 4129024 concatenate_6[0][0] __________________________________________________________________________________________________ instance_normalization_16 (Inst (None, 64, 64, 256) 512 conv2d_16[0][0] __________________________________________________________________________________________________ activation_10 (Activation) (None, 64, 64, 256) 0 instance_normalization_16[0][0] __________________________________________________________________________________________________ conv2d_17 (Conv2D) (None, 64, 64, 256) 590080 activation_10[0][0] __________________________________________________________________________________________________ instance_normalization_17 (Inst (None, 64, 64, 256) 512 conv2d_17[0][0] __________________________________________________________________________________________________ concatenate_7 (Concatenate) (None, 64, 64, 2048) 0 instance_normalization_17[0][0] concatenate_6[0][0] __________________________________________________________________________________________________ conv2d_18 (Conv2D) (None, 64, 64, 256) 4718848 concatenate_7[0][0] __________________________________________________________________________________________________ instance_normalization_18 (Inst (None, 64, 64, 256) 512 conv2d_18[0][0] __________________________________________________________________________________________________ activation_11 (Activation) (None, 64, 64, 256) 0 instance_normalization_18[0][0] __________________________________________________________________________________________________ conv2d_19 (Conv2D) (None, 64, 64, 256) 590080 activation_11[0][0] __________________________________________________________________________________________________ instance_normalization_19 (Inst (None, 64, 64, 256) 512 conv2d_19[0][0] __________________________________________________________________________________________________ concatenate_8 (Concatenate) (None, 64, 64, 2304) 0 instance_normalization_19[0][0] concatenate_7[0][0] __________________________________________________________________________________________________ conv2d_20 (Conv2D) (None, 64, 64, 256) 5308672 concatenate_8[0][0] __________________________________________________________________________________________________ instance_normalization_20 (Inst (None, 64, 64, 256) 512 conv2d_20[0][0] __________________________________________________________________________________________________ activation_12 (Activation) (None, 64, 64, 256) 0 instance_normalization_20[0][0] __________________________________________________________________________________________________ conv2d_21 (Conv2D) (None, 64, 64, 256) 590080 activation_12[0][0] __________________________________________________________________________________________________ instance_normalization_21 (Inst (None, 64, 64, 256) 512 conv2d_21[0][0] __________________________________________________________________________________________________ concatenate_9 (Concatenate) (None, 64, 64, 2560) 0 instance_normalization_21[0][0] concatenate_8[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 128, 128, 128 2949248 concatenate_9[0][0] __________________________________________________________________________________________________ instance_normalization_22 (Inst (None, 128, 128, 128 256 conv2d_transpose_1[0][0] __________________________________________________________________________________________________ activation_13 (Activation) (None, 128, 128, 128 0 instance_normalization_22[0][0] __________________________________________________________________________________________________ conv2d_transpose_2 (Conv2DTrans (None, 256, 256, 64) 73792 activation_13[0][0] __________________________________________________________________________________________________ instance_normalization_23 (Inst (None, 256, 256, 64) 128 conv2d_transpose_2[0][0] __________________________________________________________________________________________________ activation_14 (Activation) (None, 256, 256, 64) 0 instance_normalization_23[0][0] __________________________________________________________________________________________________ conv2d_22 (Conv2D) (None, 256, 256, 3) 9411 activation_14[0][0] __________________________________________________________________________________________________ instance_normalization_24 (Inst (None, 256, 256, 3) 6 conv2d_22[0][0] __________________________________________________________________________________________________ activation_15 (Activation) (None, 256, 256, 3) 0 instance_normalization_24[0][0] ================================================================================================== Total params: 35,276,553 Trainable params: 35,276,553 Non-trainable params: 0 __________________________________________________________________________________________________

A Plot of the generator model is also created, showing the skip connections in the ResNet blocks.

## How to Implement Composite Models for Least Squares and Cycle Loss

The generator models are not updated directly. Instead, the generator models are updated via composite models.

An update to each generator model involves changes to the model weights based on four concerns:

- Adversarial loss (L2 or mean squared error).
- Identity loss (L1 or mean absolute error).
- Forward cycle loss (L1 or mean absolute error).
- Backward cycle loss (L1 or mean absolute error).

The adversarial loss is the standard approach for updating the generator via the discriminator, although in this case, the least squares loss function is used instead of the negative log likelihood (e.g. binary cross entropy).

First, we can use our function to define the two generators and two discriminators used in the CycleGAN.

... # input shape image_shape = (256,256,3) # generator: A -> B g_model_AtoB = define_generator(image_shape) # generator: B -> A g_model_BtoA = define_generator(image_shape) # discriminator: A -> [real/fake] d_model_A = define_discriminator(image_shape) # discriminator: B -> [real/fake] d_model_B = define_discriminator(image_shape)

A composite model is required for each generator model that is responsible for only updating the weights of that generator model, although it is required to share the weights with the related discriminator model and the other generator model.

This can be achieved by marking the weights of the other models as not trainable in the context of the composite model to ensure we are only updating the intended generator.

... # ensure the model we're updating is trainable g_model_1.trainable = True # mark discriminator as not trainable d_model.trainable = False # mark other generator model as not trainable g_model_2.trainable = False

The model can be constructed piecewise using the Keras functional API.

The first step is to define the input of the real image from the source domain, pass it through our generator model, then connect the output of the generator to the discriminator and classify it as real or fake.

... # discriminator element input_gen = Input(shape=image_shape) gen1_out = g_model_1(input_gen) output_d = d_model(gen1_out)

Next, we can connect the identity mapping element with a new input for the real image from the target domain, pass it through our generator model, and output the (hopefully) untranslated image directly.

... # identity element input_id = Input(shape=image_shape) output_id = g_model_1(input_id)

So far, we have a composite model with two real image inputs and a discriminator classification and identity image output. Next, we need to add the forward and backward cycles.

The forward cycle can be achieved by connecting the output of our generator to the other generator, the output of which can be compared to the input to our generator and should be identical.

... # forward cycle output_f = g_model_2(gen1_out)

The backward cycle is more complex and involves the input for the real image from the target domain passing through the other generator, then passing through our generator, which should match the real image from the target domain.

... # backward cycle gen2_out = g_model_2(input_id) output_b = g_model_1(gen2_out)

That’s it.

We can then define this composite model with two inputs: one real image for the source and the target domain, and four outputs, one for the discriminator, one for the generator for the identity mapping, one for the other generator for the forward cycle, and one from our generator for the backward cycle.

... # define model graph model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b])

The adversarial loss for the discriminator output uses least squares loss which is implemented as L2 or mean squared error. The outputs from the generators are compared to images and are optimized using L1 loss implemented as mean absolute error.

The generator is updated as a weighted average of the four loss values. The adversarial loss is weighted normally, whereas the forward and backward cycle loss is weighted using a parameter called *lambda* and is set to 10, e.g. 10 times more important than adversarial loss. The identity loss is also weighted as a fraction of the lambda parameter and is set to 0.5 * 10 or 5 in the official Torch implementation.

... # compile model with weighting of least squares loss and L1 loss model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt)

We can tie all of this together and define the function *define_composite_model()* for creating a composite model for training a given generator model.

# define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): # ensure the model we're updating is trainable g_model_1.trainable = True # mark discriminator as not trainable d_model.trainable = False # mark other generator model as not trainable g_model_2.trainable = False # discriminator element input_gen = Input(shape=image_shape) gen1_out = g_model_1(input_gen) output_d = d_model(gen1_out) # identity element input_id = Input(shape=image_shape) output_id = g_model_1(input_id) # forward cycle output_f = g_model_2(gen1_out) # backward cycle gen2_out = g_model_2(input_id) output_b = g_model_1(gen2_out) # define model graph model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) # define optimization algorithm configuration opt = Adam(lr=0.0002, beta_1=0.5) # compile model with weighting of least squares loss and L1 loss model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) return model

This function can then be called to prepare a composite model for training both the *g_model_AtoB* generator model and the *g_model_BtoA* model; for example:

... # composite: A -> B -> [real/fake, A] c_model_AtoBtoA = define_composite_model(g_model_AtoB, d_model_B, g_model_BtoA, image_shape) # composite: B -> A -> [real/fake, B] c_model_BtoAtoB = define_composite_model(g_model_BtoA, d_model_A, g_model_AtoB, image_shape)

Summarizing and plotting the composite model is a bit of a mess as it does not help to see the inputs and outputs of the model clearly.

We can summarize the inputs and outputs for each of the composite models below. Recall that we are sharing or reusing the same set of weights if a given model is used more than once in the composite model.

**Generator-A Composite Model**

Only Generator-A weights are trainable and weights for other models and not trainable.

**Adversarial Loss**: Domain-B -> Generator-A -> Domain-A -> Discriminator-A -> [real/fake]**Identity Loss**: Domain-A -> Generator-A -> Domain-A**Forward Cycle Loss**: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B**Backward Cycle Loss**: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A

**Generator-B Composite Model**

Only Generator-B weights are trainable and weights for other models are not trainable.

**Adversarial Loss**: Domain-A -> Generator-B -> Domain-B -> Discriminator-B -> [real/fake]**Identity Loss**: Domain-B -> Generator-B -> Domain-B**Forward Cycle Loss**: Domain-A -> Generator-B -> Domain-B -> Generator-A -> Domain-A**Backward Cycle Loss**: Domain-B -> Generator-A -> Domain-A -> Generator-B -> Domain-B

A complete example of creating all of the models is listed below for completeness.

# example of defining composite models for training cyclegan generators from keras.optimizers import Adam from keras.models import Model from keras.models import Sequential from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import Activation from keras.layers import LeakyReLU from keras.initializers import RandomNormal from keras.layers import Concatenate from keras_contrib.layers.normalization.instancenormalization import InstanceNormalization from keras.utils.vis_utils import plot_model # define the discriminator model def define_discriminator(image_shape): # weight initialization init = RandomNormal(stddev=0.02) # source image input in_image = Input(shape=image_shape) # C64 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) d = LeakyReLU(alpha=0.2)(d) # C128 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C256 d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # C512 d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # second last output layer d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) d = InstanceNormalization(axis=-1)(d) d = LeakyReLU(alpha=0.2)(d) # patch output patch_out = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) # define model model = Model(in_image, patch_out) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5), loss_weights=[0.5]) return model # generator a resnet block def resnet_block(n_filters, input_layer): # weight initialization init = RandomNormal(stddev=0.02) # first layer convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(input_layer) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # second convolutional layer g = Conv2D(n_filters, (3,3), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) # concatenate merge channel-wise with input layer g = Concatenate()([g, input_layer]) return g # define the standalone generator model def define_generator(image_shape, n_resnet=9): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=image_shape) # c7s1-64 g = Conv2D(64, (7,7), padding='same', kernel_initializer=init)(in_image) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d128 g = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # d256 g = Conv2D(256, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # R256 for _ in range(n_resnet): g = resnet_block(256, g) # u128 g = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # u64 g = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) g = Activation('relu')(g) # c7s1-3 g = Conv2D(3, (7,7), padding='same', kernel_initializer=init)(g) g = InstanceNormalization(axis=-1)(g) out_image = Activation('tanh')(g) # define model model = Model(in_image, out_image) return model # define a composite model for updating generators by adversarial and cycle loss def define_composite_model(g_model_1, d_model, g_model_2, image_shape): # ensure the model we're updating is trainable g_model_1.trainable = True # mark discriminator as not trainable d_model.trainable = False # mark other generator model as not trainable g_model_2.trainable = False # discriminator element input_gen = Input(shape=image_shape) gen1_out = g_model_1(input_gen) output_d = d_model(gen1_out) # identity element input_id = Input(shape=image_shape) output_id = g_model_1(input_id) # forward cycle output_f = g_model_2(gen1_out) # backward cycle gen2_out = g_model_2(input_id) output_b = g_model_1(gen2_out) # define model graph model = Model([input_gen, input_id], [output_d, output_id, output_f, output_b]) # define optimization algorithm configuration opt = Adam(lr=0.0002, beta_1=0.5) # compile model with weighting of least squares loss and L1 loss model.compile(loss=['mse', 'mae', 'mae', 'mae'], loss_weights=[1, 5, 10, 10], optimizer=opt) return model # input shape image_shape = (256,256,3) # generator: A -> B g_model_AtoB = define_generator(image_shape) # generator: B -> A g_model_BtoA = define_generator(image_shape) # discriminator: A -> [real/fake] d_model_A = define_discriminator(image_shape) # discriminator: B -> [real/fake] d_model_B = define_discriminator(image_shape) # composite: A -> B -> [real/fake, A] c_model_AtoB = define_composite_model(g_model_AtoB, d_model_B, g_model_BtoA, image_shape) # composite: B -> A -> [real/fake, B] c_model_BtoA = define_composite_model(g_model_BtoA, d_model_A, g_model_AtoB, image_shape)

## How to Update Discriminator and Generator Models

Training the defined models is relatively straightforward.

First, we must define a helper function that will select a batch of real images and the associated target (1.0).

# select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # retrieve selected images X = dataset[ix] # generate 'real' class labels (1) y = ones((n_samples, patch_shape, patch_shape, 1)) return X, y

Similarly, we need a function to generate a batch of fake images and the associated target (0.0).

# generate a batch of images, returns images and targets def generate_fake_samples(g_model, dataset, patch_shape): # generate fake instance X = g_model.predict(dataset) # create 'fake' class labels (0) y = zeros((len(X), patch_shape, patch_shape, 1)) return X, y

Now, we can define the steps of a single training iteration. We will model the order of updates based on the implementation in the official Torch implementation in the OptimizeParameters() function (**Note**: the official code uses a more confusing inverted naming convention).

- Update Generator-B (A->B)
- Update Discriminator-B
- Update Generator-A (B->A)
- Update Discriminator-A

First, we must select a batch of real images by calling *generate_real_samples()* for both Domain-A and Domain-B.

Typically, the batch size (*n_batch*) is set to 1. In this case, we will assume 256×256 input images, which means the *n_patch* for the PatchGAN discriminator will be 16.

... # select a batch of real samples X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch)

Next, we can use the batches of selected real images to generate corresponding batches of generated or fake images.

... # generate a batch of fake samples X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch)

The paper describes using a pool of previously generated images from which examples are randomly selected and used to update the discriminator model, where the pool size was set to 50 images.

… [we] update the discriminators using a history of generated images rather than the ones produced by the latest generators. We keep an image buffer that stores the 50 previously created images.

— Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

This can be implemented using a list for each domain and a using a function to populate the pool, then randomly replace elements from the pool once it is at capacity.

The *update_image_pool()* function below implements this based on the official Torch implementation in image_pool.lua.

# update image pool for fake images def update_image_pool(pool, images, max_size=50): selected = list() for image in images: if len(pool) < max_size: # stock the pool pool.append(image) selected.append(image) elif random() < 0.5: # use image, but don't add it to the pool selected.append(image) else: # replace an existing image and use replaced image ix = randint(0, len(pool)) selected.append(pool[ix]) pool[ix] = image return asarray(selected)

We can then update our image pool with generated fake images, the results of which can be used to train the discriminator models.

... # update fakes from pool X_fakeA = update_image_pool(poolA, X_fakeA) X_fakeB = update_image_pool(poolB, X_fakeB)

Next, we can update Generator-A.

The *train_on_batch()* function will return a value for each of the four loss functions, one for each output, as well as the weighted sum (first value) used to update the model weights which we are interested in.

... # update generator B->A via adversarial and cycle loss g_loss2, _, _, _, _ = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA])

We can then update the discriminator model using the fake images that may or may not have come from the image pool.

... # update discriminator for A -> [real/fake] dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA)

We can then do the same for the other generator and discriminator models.

... # update generator A->B via adversarial and cycle loss g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) # update discriminator for B -> [real/fake] dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB)

At the end of the training run, we can then report the current loss for the discriminator models on real and fake images and of each generator model.

... # summarize performance print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2))

Tying this all together, we can define a function named *train()* that takes an instance of each of the defined models and a loaded dataset (list of two NumPy arrays, one for each domain) and trains the model.

A batch size of 1 is used as is described in the paper and the models are fit for 100 training epochs.

# train cyclegan models def train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset): # define properties of the training run n_epochs, n_batch, = 100, 1 # determine the output square shape of the discriminator n_patch = d_model_A.output_shape[1] # unpack dataset trainA, trainB = dataset # prepare image pool for fakes poolA, poolB = list(), list() # calculate the number of batches per training epoch bat_per_epo = int(len(trainA) / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # manually enumerate epochs for i in range(n_steps): # select a batch of real samples X_realA, y_realA = generate_real_samples(trainA, n_batch, n_patch) X_realB, y_realB = generate_real_samples(trainB, n_batch, n_patch) # generate a batch of fake samples X_fakeA, y_fakeA = generate_fake_samples(g_model_BtoA, X_realB, n_patch) X_fakeB, y_fakeB = generate_fake_samples(g_model_AtoB, X_realA, n_patch) # update fakes from pool X_fakeA = update_image_pool(poolA, X_fakeA) X_fakeB = update_image_pool(poolB, X_fakeB) # update generator B->A via adversarial and cycle loss g_loss2, _, _, _, _ = c_model_BtoA.train_on_batch([X_realB, X_realA], [y_realA, X_realA, X_realB, X_realA]) # update discriminator for A -> [real/fake] dA_loss1 = d_model_A.train_on_batch(X_realA, y_realA) dA_loss2 = d_model_A.train_on_batch(X_fakeA, y_fakeA) # update generator A->B via adversarial and cycle loss g_loss1, _, _, _, _ = c_model_AtoB.train_on_batch([X_realA, X_realB], [y_realB, X_realB, X_realA, X_realB]) # update discriminator for B -> [real/fake] dB_loss1 = d_model_B.train_on_batch(X_realB, y_realB) dB_loss2 = d_model_B.train_on_batch(X_fakeB, y_fakeB) # summarize performance print('>%d, dA[%.3f,%.3f] dB[%.3f,%.3f] g[%.3f,%.3f]' % (i+1, dA_loss1,dA_loss2, dB_loss1,dB_loss2, g_loss1,g_loss2))

The train function can then be called directly with our defined models and loaded dataset.

... # load a dataset as a list of two numpy arrays dataset = ... # train models train(d_model_A, d_model_B, g_model_AtoB, g_model_BtoA, c_model_AtoB, c_model_BtoA, dataset)

As an improvement, it may be desirable to combine the update to each discriminator model into a single operation as is performed in the fDx_basic() function of the official implementation.

Additionally, the paper describes updating the models for another 100 epochs (200 in total), where the learning rate is decayed to 0.0. This too can be added as a minor extension to the training process.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Papers

- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.
- Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016.
- Image-to-Image Translation with Conditional Adversarial Networks, 2016.
- Least Squares Generative Adversarial Networks, 2016.
- Precomputed Real-time Texture Synthesis With Markovian Generative Adversarial Networks, 2016.
- Instance Normalization: The Missing Ingredient for Fast Stylization
- Layer Normalization

### API

### Projects

- CycleGAN Project (official), GitHub
- pytorch-CycleGAN-and-pix2pix (official), GitHub.
- CycleGAN Project Page (official)
- Keras-GAN: Keras implementations of Generative Adversarial Networks.
- CycleGAN-Keras: Keras implementation of CycleGAN using a tensorflow backend.
- cyclegan-keras: keras implementation of cycle-gan

### Articles

## Summary

In this tutorial, you discovered how to implement the CycleGAN architecture from scratch using the Keras deep learning framework.

Specifically, you learned:

- How to implement the discriminator and generator models.
- How to define composite models to train the generator models via adversarial and cycle loss.
- How to implement the training process to update model weights each training iteration.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Implement CycleGAN Models From Scratch With Keras appeared first on Machine Learning Mastery.

## How to Implement Pix2Pix GAN Models From Scratch With Keras

The Pix2Pix GAN is a generator model for performing image-to-image translation trained on paired examples.

For example, the model can be used to translate images of daytime to nighttime, or from sketches of products like shoes to photographs of products.

The benefit of the Pix2Pix model is that compared to other GANs for conditional image generation, it is relatively simple and capable of generating large high-quality images across a variety of image translation tasks.

The model is very impressive but has an architecture that appears somewhat complicated to implement for beginners.

In this tutorial, you will discover how to implement the Pix2Pix GAN architecture from scratch using the Keras deep learning framework.

After completing this tutorial, you will know:

- How to develop the PatchGAN discriminator model for the Pix2Pix GAN.
- How to develop the U-Net encoder-decoder generator model for the Pix2Pix GAN.
- How to implement the composite model for updating the generator and how to train both models.

Let’s get started.

## Tutorial Overview

This tutorial is divided into five parts; they are:

- What Is the Pix2Pix GAN?
- How to Implement the PatchGAN Discriminator Model
- How to Implement the U-Net Generator Model
- How to Implement Adversarial and L1 Loss
- How to Update Model Weights

## What Is the Pix2Pix GAN?

Pix2Pix is a Generative Adversarial Network, or GAN, model designed for general purpose image-to-image translation.

The approach was presented by Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” and presented at CVPR in 2017.

The GAN architecture is comprised of a generator model for outputting new plausible synthetic images and a discriminator model that classifies images as real (from the dataset) or fake (generated). The discriminator model is updated directly, whereas the generator model is updated via the discriminator model. As such, the two models are trained simultaneously in an adversarial process where the generator seeks to better fool the discriminator and the discriminator seeks to better identify the counterfeit images.

The Pix2Pix model is a type of conditional GAN, or cGAN, where the generation of the output image is conditional on an input, in this case, a source image. The discriminator is provided both with a source image and the target image and must determine whether the target is a plausible transformation of the source image.

Again, the discriminator model is updated directly, and the generator model is updated via the discriminator model, although the loss function is updated. The generator is trained via adversarial loss, which encourages the generator to generate plausible images in the target domain. The generator is also updated via L1 loss measured between the generated image and the expected output image. This additional loss encourages the generator model to create plausible translations of the source image.

The Pix2Pix GAN has been demonstrated on a range of image-to-image translation tasks such as converting maps to satellite photographs, black and white photographs to color, and sketches of products to product photographs.

Now that we are familiar with the Pix2Pix GAN, let’s explore how we can implement it using the Keras deep learning library.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Implement the PatchGAN Discriminator Model

The discriminator model in the Pix2Pix GAN is implemented as a PatchGAN.

The PatchGAN is designed based on the size of the receptive field, sometimes called the effective receptive field. The receptive field is the relationship between one output activation of the model to an area on the input image (actually volume as it proceeded down the input channels).

A PatchGAN with the size 70×70 is used, which means that the output (or each output) of the model maps to a 70×70 square of the input image. In effect, a 70×70 PatchGAN will classify 70×70 patches of the input image as real or fake.

… we design a discriminator architecture – which we term a PatchGAN – that only penalizes structure at the scale of patches. This discriminator tries to classify if each NxN patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output of D.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Before we dive into the configuration details of the PatchGAN, it is important to get a handle on the calculation of the receptive field.

The receptive field is not the size of the output of the discriminator model, e.g. it does not refer to the shape of the activation map output by the model. It is a definition of the model in terms of one pixel in the output activation map to the input image. The output of the model may be a single value or a square activation map of values that predict whether each patch of the input image is real or fake.

Traditionally, the receptive field refers to the size of the activation map of a single convolutional layer with regards to the input of the layer, the size of the filter, and the size of the stride. The effective receptive field generalizes this idea and calculates the receptive field for the output of a stack of convolutional layers with regard to the raw image input. The terms are often used interchangeably.

The authors of the Pix2Pix GAN provide a Matlab script to calculate the effective receptive field size for different model configurations in a script called receptive_field_sizes.m. It can be helpful to work through an example for the 70×70 PatchGAN receptive field calculation.

The 70×70 PatchGAN has a fixed number of three layers (excluding the output and second last layers), regardless of the size of the input image. The calculation of the receptive field in one dimension is calculated as:

- receptive field = (output size – 1) * stride + kernel size

Where output size is the size of the prior layers activation map, stride is the number of pixels the filter is moved when applied to the activation, and kernel size is the size of the filter to be applied.

The PatchGAN uses a fixed stride of 2×2 (except in the output and second last layers) and a fixed kernel size of 4×4. We can, therefore, calculate the receptive field size starting with one pixel in the output of the model and working backward to the input image.

We can develop a Python function called *receptive_field()* to calculate the receptive field, then calculate and print the receptive field for each layer in the Pix2Pix PatchGAN model. The complete example is listed below.

# example of calculating the receptive field for the PatchGAN # calculate the effective receptive field size def receptive_field(output_size, kernel_size, stride_size): return (output_size - 1) * stride_size + kernel_size # output layer 1x1 pixel with 4x4 kernel and 1x1 stride rf = receptive_field(1, 4, 1) print(rf) # second last layer with 4x4 kernel and 1x1 stride rf = receptive_field(rf, 4, 1) print(rf) # 3 PatchGAN layers with 4x4 kernel and 2x2 stride rf = receptive_field(rf, 4, 2) print(rf) rf = receptive_field(rf, 4, 2) print(rf) rf = receptive_field(rf, 4, 2) print(rf)

Running the example prints the size of the receptive field for each layer in the model from the output layer to the input layer.

We can see that each 1×1 pixel in the output layer maps to a 70×70 receptive field in the input layer.

4 7 16 34 70

The authors of the Pix2Pix paper explore different PatchGAN configurations, including a 1×1 receptive field called a PixelGAN and a receptive field that matches the 256×256 pixel images input to the model (resampled to 286×286) called an ImageGAN. They found that the 70×70 PatchGAN resulted in the best trade-off of performance and image quality.

The 70×70 PatchGAN […] achieves slightly better scores. Scaling beyond this, to the full 286×286 ImageGAN, does not appear to improve the visual quality of the results.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The configuration for the PatchGAN is provided in the appendix of the paper and can be confirmed by reviewing the defineD_n_layers() function in the official Torch implementation.

The model takes two images as input, specifically a source and a target image. These images are concatenated together at the channel level, e.g. 3 color channels of each image become 6 channels of the input.

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. […] All convolutions are 4× 4 spatial filters applied with stride 2. […] The 70 × 70 discriminator architecture is: C64-C128-C256-C512. After the last layer, a convolution is applied to map to a 1-dimensional output, followed by a Sigmoid function. As an exception to the above notation, BatchNorm is not applied to the first C64 layer. All ReLUs are leaky, with slope 0.2.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The PatchGAN configuration is defined using a shorthand notation as: C64-C128-C256-C512, where C refers to a block of Convolution-BatchNorm-LeakyReLU layers and the number indicates the number of filters. Batch normalization is not used in the first layer. As mentioned, the kernel size is fixed at 4×4 and a stride of 2×2 is used on all but the last 2 layers of the model. The slope of the LeakyReLU is set to 0.2, and a sigmoid activation function is used in the output layer.

Random jitter was applied by resizing the 256×256 input images to 286 × 286 and then randomly cropping back to size 256 × 256. Weights were initialized from a Gaussian distribution with mean 0 and standard deviation 0.02.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Model weights were initialized via random Gaussian with a mean of 0.0 and standard deviation of 0.02. Images input to the model are 256×256.

… we divide the objective by 2 while optimizing D, which slows down the rate at which D learns relative to G. We use minibatch SGD and apply the Adam solver, with a learning rate of 0.0002, and momentum parameters β1 = 0.5, β2 = 0.999.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The model is trained with a batch size of one image and the Adam version of stochastic gradient descent is used with a small learning range and modest momentum. The loss for the discriminator is weighted by 50% for each model update.

Tying this all together, we can define a function named *define_discriminator()* that creates the 70×70 PatchGAN discriminator model.

The complete example of defining the model is listed below.

# example of defining a 70x70 patchgan discriminator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import BatchNormalization from keras.utils.vis_utils import plot_model # define the discriminator model def define_discriminator(image_shape): # weight initialization init = RandomNormal(stddev=0.02) # source image input in_src_image = Input(shape=image_shape) # target image input in_target_image = Input(shape=image_shape) # concatenate images channel-wise merged = Concatenate()([in_src_image, in_target_image]) # C64 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged) d = LeakyReLU(alpha=0.2)(d) # C128 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # C256 d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # C512 d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # second last output layer d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # patch output d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) patch_out = Activation('sigmoid')(d) # define model model = Model([in_src_image, in_target_image], patch_out) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5]) return model # define image shape image_shape = (256,256,3) # create the model model = define_discriminator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='discriminator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model, providing insight into how the input shape is transformed across the layers and the number of parameters in the model.

We can see that the two input images are concatenated together to create one 256x256x6 input to the first hidden convolutional layer. This concatenation of input images could occur before the input layer of the model, but allowing the model to perform the concatenation makes the behavior of the model clearer.

We can see that the model output will be an activation map with the size 16×16 pixels or activations and a single channel, with each value in the map corresponding to a 70×70 pixel patch of the input 256×256 image. If the input image was half the size at 128×128, then the output feature map would also be halved to 8×8.

The model is a binary classification model, meaning it predicts an output as a probability in the range [0,1], in this case, the likelihood of whether the input image is real or from the target dataset. The patch of values can be averaged to give a real/fake prediction by the model. When trained, the target is compared to a matrix of target values, 0 for fake and 1 for real.

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 256, 256, 3) 0 __________________________________________________________________________________________________ input_2 (InputLayer) (None, 256, 256, 3) 0 __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 256, 256, 6) 0 input_1[0][0] input_2[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D) (None, 128, 128, 64) 6208 concatenate_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 128, 128, 64) 0 conv2d_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 64, 64, 128) 131200 leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 64, 64, 128) 512 conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 64, 64, 128) 0 batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D) (None, 32, 32, 256) 524544 leaky_re_lu_2[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 32, 32, 256) 1024 conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU) (None, 32, 32, 256) 0 batch_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 16, 16, 512) 2097664 leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 16, 16, 512) 2048 conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 16, 16, 512) 0 batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D) (None, 16, 16, 512) 4194816 leaky_re_lu_4[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 16, 16, 512) 2048 conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU) (None, 16, 16, 512) 0 batch_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D) (None, 16, 16, 1) 8193 leaky_re_lu_5[0][0] __________________________________________________________________________________________________ activation_1 (Activation) (None, 16, 16, 1) 0 conv2d_6[0][0] ================================================================================================== Total params: 6,968,257 Trainable params: 6,965,441 Non-trainable params: 2,816 __________________________________________________________________________________________________

A plot of the model is created showing much the same information in a graphical form. The model is not complex, with a linear path with two input images and a single output prediction.

**Note**: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the *plot_model()* function.

Now that we know how to implement the PatchGAN discriminator model, we can now look at implementing the U-Net generator model.

## How to Implement the U-Net Generator Model

The generator model for the Pix2Pix GAN is implemented as a U-Net.

The U-Net model is an encoder-decoder model for image translation where skip connections are used to connect layers in the encoder with corresponding layers in the decoder that have the same sized feature maps.

The encoder part of the model is comprised of convolutional layers that use a 2×2 stride to downsample the input source image down to a bottleneck layer. The decoder part of the model reads the bottleneck output and uses transpose convolutional layers to upsample to the required output image size.

… the input is passed through a series of layers that progressively downsample, until a bottleneck layer, at which point the process is reversed.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Skip connections are added between the layers with the same sized feature maps so that the first downsampling layer is connected with the last upsampling layer, the second downsampling layer is connected with the second last upsampling layer, and so on. The connections concatenate the channels of the feature map in the downsampling layer with the feature map in the upsampling layer.

Specifically, we add skip connections between each layer i and layer n − i, where n is the total number of layers. Each skip connection simply concatenates all channels at layer i with those at layer n − i.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Unlike traditional generator models in the GAN architecture, the U-Net generator does not take a point from the latent space as input. Instead, dropout layers are used as a source of randomness both during training and when the model is used to make a prediction, e.g. generate an image at inference time.

Similarly, batch normalization is used in the same way during training and inference, meaning that statistics are calculated for each batch and not fixed at the end of the training process. This is referred to as instance normalization, specifically when the batch size is set to 1 as it is with the Pix2Pix model.

At inference time, we run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that we apply dropout at test time, and we apply batch normalization using the statistics of the test batch, rather than aggregated statistics of the training batch.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

In Keras, layers like Dropout and BatchNormalization operate differently during training and in inference model. We can set the “*training*” argument when calling these layers to “True” to ensure that they always operate in training-model, even when used during inference.

For example, a Dropout layer that will drop out during inference as well as training can be added to the model as follows:

... g = Dropout(0.5)(g, training=True)

As with the discriminator model, the configuration details of the generator model are defined in the appendix of the paper and can be confirmed when comparing against the defineG_unet() function in the official Torch implementation.

The encoder uses blocks of Convolution-BatchNorm-LeakyReLU like the discriminator model, whereas the decoder model uses blocks of Convolution-BatchNorm-Dropout-ReLU with a dropout rate of 50%. All convolutional layers use a filter size of 4×4 and a stride of 2×2.

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNormDropout-ReLU layer with a dropout rate of 50%. All convolutions are 4× 4 spatial filters applied with stride 2.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The architecture of the U-Net model is defined using the shorthand notation as:

**Encoder**: C64-C128-C256-C512-C512-C512-C512-C512**Decoder**: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128

The last layer of the encoder is the bottleneck layer, which does not use batch normalization, according to an amendment to the paper and confirmation in the code, and uses a ReLU activation instead of LeakyRelu.

… the activations of the bottleneck layer are zeroed by the batchnorm operation, effectively making the innermost layer skipped. This issue can be fixed by removing batchnorm from this layer, as has been done in the public code

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

The number of filters in the U-Net decoder is a little misleading as it is the number of filters for the layer after concatenation with the equivalent layer in the encoder. This may become more clear when we create a plot of the model.

The output of the model uses a single convolutional layer with three channels, and tanh activation function is used in the output layer, common to GAN generator models. Batch normalization is not used in the first layer of the decoder.

After the last layer in the decoder, a convolution is applied to map to the number of output channels (3 in general […]), followed by a Tanh function […] BatchNorm is not applied to the first C64 layer in the encoder. All ReLUs in the encoder are leaky, with slope 0.2, while ReLUs in the decoder are not leaky.

— Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Tying this all together, we can define a function named *define_generator()* that defines the U-Net encoder-decoder generator model. Two helper functions are also provided for defining encoder blocks of layers and decoder blocks of layers.

The complete example of defining the model is listed below.

# example of defining a u-net encoder-decoder generator model from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import Dropout from keras.layers import BatchNormalization from keras.layers import LeakyReLU from keras.utils.vis_utils import plot_model # define an encoder block def define_encoder_block(layer_in, n_filters, batchnorm=True): # weight initialization init = RandomNormal(stddev=0.02) # add downsampling layer g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) # conditionally add batch normalization if batchnorm: g = BatchNormalization()(g, training=True) # leaky relu activation g = LeakyReLU(alpha=0.2)(g) return g # define a decoder block def decoder_block(layer_in, skip_in, n_filters, dropout=True): # weight initialization init = RandomNormal(stddev=0.02) # add upsampling layer g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) # add batch normalization g = BatchNormalization()(g, training=True) # conditionally add dropout if dropout: g = Dropout(0.5)(g, training=True) # merge with skip connection g = Concatenate()([g, skip_in]) # relu activation g = Activation('relu')(g) return g # define the standalone generator model def define_generator(image_shape=(256,256,3)): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=image_shape) # encoder model: C64-C128-C256-C512-C512-C512-C512-C512 e1 = define_encoder_block(in_image, 64, batchnorm=False) e2 = define_encoder_block(e1, 128) e3 = define_encoder_block(e2, 256) e4 = define_encoder_block(e3, 512) e5 = define_encoder_block(e4, 512) e6 = define_encoder_block(e5, 512) e7 = define_encoder_block(e6, 512) # bottleneck, no batch norm and relu b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7) b = Activation('relu')(b) # decoder model: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128 d1 = decoder_block(b, e7, 512) d2 = decoder_block(d1, e6, 512) d3 = decoder_block(d2, e5, 512) d4 = decoder_block(d3, e4, 512, dropout=False) d5 = decoder_block(d4, e3, 256, dropout=False) d6 = decoder_block(d5, e2, 128, dropout=False) d7 = decoder_block(d6, e1, 64, dropout=False) # output g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7) out_image = Activation('tanh')(g) # define model model = Model(in_image, out_image) return model # define image shape image_shape = (256,256,3) # create the model model = define_generator(image_shape) # summarize the model model.summary() # plot the model plot_model(model, to_file='generator_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the model.

The model has a single input and output, but the skip connections make the summary difficult to read.

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 256, 256, 3) 0 __________________________________________________________________________________________________ conv2d_1 (Conv2D) (None, 128, 128, 64) 3136 input_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 128, 128, 64) 0 conv2d_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 64, 64, 128) 131200 leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 64, 64, 128) 512 conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 64, 64, 128) 0 batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D) (None, 32, 32, 256) 524544 leaky_re_lu_2[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 32, 32, 256) 1024 conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU) (None, 32, 32, 256) 0 batch_normalization_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 16, 16, 512) 2097664 leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 16, 16, 512) 2048 conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 16, 16, 512) 0 batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D) (None, 8, 8, 512) 4194816 leaky_re_lu_4[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 512) 2048 conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU) (None, 8, 8, 512) 0 batch_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D) (None, 4, 4, 512) 4194816 leaky_re_lu_5[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 4, 4, 512) 2048 conv2d_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU) (None, 4, 4, 512) 0 batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D) (None, 2, 2, 512) 4194816 leaky_re_lu_6[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 2, 2, 512) 2048 conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU) (None, 2, 2, 512) 0 batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D) (None, 1, 1, 512) 4194816 leaky_re_lu_7[0][0] __________________________________________________________________________________________________ activation_1 (Activation) (None, 1, 1, 512) 0 conv2d_8[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 2, 2, 512) 4194816 activation_1[0][0] __________________________________________________________________________________________________ batch_normalization_7 (BatchNor (None, 2, 2, 512) 2048 conv2d_transpose_1[0][0] __________________________________________________________________________________________________ dropout_1 (Dropout) (None, 2, 2, 512) 0 batch_normalization_7[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 2, 2, 1024) 0 dropout_1[0][0] leaky_re_lu_7[0][0] __________________________________________________________________________________________________ activation_2 (Activation) (None, 2, 2, 1024) 0 concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_transpose_2 (Conv2DTrans (None, 4, 4, 512) 8389120 activation_2[0][0] __________________________________________________________________________________________________ batch_normalization_8 (BatchNor (None, 4, 4, 512) 2048 conv2d_transpose_2[0][0] __________________________________________________________________________________________________ dropout_2 (Dropout) (None, 4, 4, 512) 0 batch_normalization_8[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate) (None, 4, 4, 1024) 0 dropout_2[0][0] leaky_re_lu_6[0][0] __________________________________________________________________________________________________ activation_3 (Activation) (None, 4, 4, 1024) 0 concatenate_2[0][0] __________________________________________________________________________________________________ conv2d_transpose_3 (Conv2DTrans (None, 8, 8, 512) 8389120 activation_3[0][0] __________________________________________________________________________________________________ batch_normalization_9 (BatchNor (None, 8, 8, 512) 2048 conv2d_transpose_3[0][0] __________________________________________________________________________________________________ dropout_3 (Dropout) (None, 8, 8, 512) 0 batch_normalization_9[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate) (None, 8, 8, 1024) 0 dropout_3[0][0] leaky_re_lu_5[0][0] __________________________________________________________________________________________________ activation_4 (Activation) (None, 8, 8, 1024) 0 concatenate_3[0][0] __________________________________________________________________________________________________ conv2d_transpose_4 (Conv2DTrans (None, 16, 16, 512) 8389120 activation_4[0][0] __________________________________________________________________________________________________ batch_normalization_10 (BatchNo (None, 16, 16, 512) 2048 conv2d_transpose_4[0][0] __________________________________________________________________________________________________ concatenate_4 (Concatenate) (None, 16, 16, 1024) 0 batch_normalization_10[0][0] leaky_re_lu_4[0][0] __________________________________________________________________________________________________ activation_5 (Activation) (None, 16, 16, 1024) 0 concatenate_4[0][0] __________________________________________________________________________________________________ conv2d_transpose_5 (Conv2DTrans (None, 32, 32, 256) 4194560 activation_5[0][0] __________________________________________________________________________________________________ batch_normalization_11 (BatchNo (None, 32, 32, 256) 1024 conv2d_transpose_5[0][0] __________________________________________________________________________________________________ concatenate_5 (Concatenate) (None, 32, 32, 512) 0 batch_normalization_11[0][0] leaky_re_lu_3[0][0] __________________________________________________________________________________________________ activation_6 (Activation) (None, 32, 32, 512) 0 concatenate_5[0][0] __________________________________________________________________________________________________ conv2d_transpose_6 (Conv2DTrans (None, 64, 64, 128) 1048704 activation_6[0][0] __________________________________________________________________________________________________ batch_normalization_12 (BatchNo (None, 64, 64, 128) 512 conv2d_transpose_6[0][0] __________________________________________________________________________________________________ concatenate_6 (Concatenate) (None, 64, 64, 256) 0 batch_normalization_12[0][0] leaky_re_lu_2[0][0] __________________________________________________________________________________________________ activation_7 (Activation) (None, 64, 64, 256) 0 concatenate_6[0][0] __________________________________________________________________________________________________ conv2d_transpose_7 (Conv2DTrans (None, 128, 128, 64) 262208 activation_7[0][0] __________________________________________________________________________________________________ batch_normalization_13 (BatchNo (None, 128, 128, 64) 256 conv2d_transpose_7[0][0] __________________________________________________________________________________________________ concatenate_7 (Concatenate) (None, 128, 128, 128 0 batch_normalization_13[0][0] leaky_re_lu_1[0][0] __________________________________________________________________________________________________ activation_8 (Activation) (None, 128, 128, 128 0 concatenate_7[0][0] __________________________________________________________________________________________________ conv2d_transpose_8 (Conv2DTrans (None, 256, 256, 3) 6147 activation_8[0][0] __________________________________________________________________________________________________ activation_9 (Activation) (None, 256, 256, 3) 0 conv2d_transpose_8[0][0] ================================================================================================== Total params: 54,429,315 Trainable params: 54,419,459 Non-trainable params: 9,856 __________________________________________________________________________________________________

A plot of the model is created showing much the same information in a graphical form. The model is complex, and the plot helps to understand the skip connections and their impact on the number of filters in the decoder.

**Note**: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the *plot_model()* function.

Working backward from the output layer, if we look at the Concatenate layers and the first Conv2DTranspose layer of the decoder, we can see the number of channels as:

- [128, 256, 512, 1024, 1024, 1024, 1024, 512].

Reversing this list gives the stated configuration of the number of filters for each layer in the decoder from the paper of:

- CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128

Now that we have defined both models, we can look at how the generator model is updated via the discriminator model.

## How to Implement Adversarial and L1 Loss

The discriminator model can be updated directly, whereas the generator model must be updated via the discriminator model.

This can be achieved by defining a new composite model in Keras that connects the output of the generator model as input to the discriminator model. The discriminator model can then predict whether a generated image is real or fake. We can update the weights of the composite model in such a way that the generated image has the label of “*real*” instead of “*fake*“, which will cause the generator weights to be updated towards generating a better fake image. We can also mark the discriminator weights as not trainable in this context, to avoid the misleading update.

Additionally, the generator needs to be updated to better match the targeted translation of the input image. This means that the composite model must also output the generated image directly, allowing it to be compared to the target image.

Therefore, we can summarize the inputs and outputs of this composite model as follows:

**Inputs**: Source image**Outputs**: Classification of real/fake, generated target image.

The weights of the generator will be updated via both adversarial loss via the discriminator output and L1 loss via the direct image output. The loss scores are added together, where the L1 loss is treated as a regularizing term and weighted via a hyperparameter called *lambda*, set to 100.

- loss = adversarial loss + lambda * L1 loss

The *define_gan()* function below implements this, taking the defined generator and discriminator models as input and creating the composite GAN model that can be used to update the generator model weights.

The source image input is provided both to the generator and the discriminator as input and the output of the generator is also connected to the discriminator as input.

Two loss functions are specified when the model is compiled for the discriminator and generator outputs respectively. The *loss_weights* argument is used to define the weighting of each loss when added together to update the generator model weights.

# define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model, image_shape): # make weights in the discriminator not trainable d_model.trainable = False # define the source image in_src = Input(shape=image_shape) # connect the source image to the generator input gen_out = g_model(in_src) # connect the source input and generator output to the discriminator input dis_out = d_model([in_src, gen_out]) # src image as input, generated image and classification output model = Model(in_src, [dis_out, gen_out]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'mae'], optimizer=opt, loss_weights=[1,100]) return model

Tying this together with the model definitions from the previous sections, the complete example is listed below.

# example of defining a composite model for training the generator model from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.models import Model from keras.models import Input from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Activation from keras.layers import Concatenate from keras.layers import Dropout from keras.layers import BatchNormalization from keras.layers import LeakyReLU from keras.utils.vis_utils import plot_model # define the discriminator model def define_discriminator(image_shape): # weight initialization init = RandomNormal(stddev=0.02) # source image input in_src_image = Input(shape=image_shape) # target image input in_target_image = Input(shape=image_shape) # concatenate images channel-wise merged = Concatenate()([in_src_image, in_target_image]) # C64 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged) d = LeakyReLU(alpha=0.2)(d) # C128 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # C256 d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # C512 d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # second last output layer d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d) d = BatchNormalization()(d) d = LeakyReLU(alpha=0.2)(d) # patch output d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d) patch_out = Activation('sigmoid')(d) # define model model = Model([in_src_image, in_target_image], patch_out) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5]) return model # define an encoder block def define_encoder_block(layer_in, n_filters, batchnorm=True): # weight initialization init = RandomNormal(stddev=0.02) # add downsampling layer g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) # conditionally add batch normalization if batchnorm: g = BatchNormalization()(g, training=True) # leaky relu activation g = LeakyReLU(alpha=0.2)(g) return g # define a decoder block def decoder_block(layer_in, skip_in, n_filters, dropout=True): # weight initialization init = RandomNormal(stddev=0.02) # add upsampling layer g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in) # add batch normalization g = BatchNormalization()(g, training=True) # conditionally add dropout if dropout: g = Dropout(0.5)(g, training=True) # merge with skip connection g = Concatenate()([g, skip_in]) # relu activation g = Activation('relu')(g) return g # define the standalone generator model def define_generator(image_shape=(256,256,3)): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=image_shape) # encoder model: C64-C128-C256-C512-C512-C512-C512-C512 e1 = define_encoder_block(in_image, 64, batchnorm=False) e2 = define_encoder_block(e1, 128) e3 = define_encoder_block(e2, 256) e4 = define_encoder_block(e3, 512) e5 = define_encoder_block(e4, 512) e6 = define_encoder_block(e5, 512) e7 = define_encoder_block(e6, 512) # bottleneck, no batch norm and relu b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7) b = Activation('relu')(b) # decoder model: CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128 d1 = decoder_block(b, e7, 512) d2 = decoder_block(d1, e6, 512) d3 = decoder_block(d2, e5, 512) d4 = decoder_block(d3, e4, 512, dropout=False) d5 = decoder_block(d4, e3, 256, dropout=False) d6 = decoder_block(d5, e2, 128, dropout=False) d7 = decoder_block(d6, e1, 64, dropout=False) # output g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7) out_image = Activation('tanh')(g) # define model model = Model(in_image, out_image) return model # define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model, image_shape): # make weights in the discriminator not trainable d_model.trainable = False # define the source image in_src = Input(shape=image_shape) # connect the source image to the generator input gen_out = g_model(in_src) # connect the source input and generator output to the discriminator input dis_out = d_model([in_src, gen_out]) # src image as input, generated image and classification output model = Model(in_src, [dis_out, gen_out]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'mae'], optimizer=opt, loss_weights=[1,100]) return model # define image shape image_shape = (256,256,3) # define the models d_model = define_discriminator(image_shape) g_model = define_generator(image_shape) # define the composite model gan_model = define_gan(g_model, d_model, image_shape) # summarize the model gan_model.summary() # plot the model plot_model(gan_model, to_file='gan_model_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the composite model, showing the 256×256 image input, the same shaped output from *model_2* (the generator) and the PatchGAN classification prediction from *model_1* (the discriminator).

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_4 (InputLayer) (None, 256, 256, 3) 0 __________________________________________________________________________________________________ model_2 (Model) (None, 256, 256, 3) 54429315 input_4[0][0] __________________________________________________________________________________________________ model_1 (Model) (None, 16, 16, 1) 6968257 input_4[0][0] model_2[1][0] ================================================================================================== Total params: 61,397,572 Trainable params: 54,419,459 Non-trainable params: 6,978,113 __________________________________________________________________________________________________

A plot of the composite model is also created, showing how the input image flows into the generator and discriminator, and that the model has two outputs or end-points from each of the two models.

**Note**: creating the plot assumes that pydot and pygraphviz libraries are installed. If this is a problem, you can comment out the import and call to the plot_model() function.

## How to Update Model Weights

Training the defined models is relatively straightforward.

First, we must define a helper function that will select a batch of real source and target images and the associated output (1.0). Here, the dataset is a list of two arrays of images.

# select a batch of random samples, returns images and target def generate_real_samples(dataset, n_samples, patch_shape): # unpack dataset trainA, trainB = dataset # choose random instances ix = randint(0, trainA.shape[0], n_samples) # retrieve selected images X1, X2 = trainA[ix], trainB[ix] # generate 'real' class labels (1) y = ones((n_samples, patch_shape, patch_shape, 1)) return [X1, X2], y

Similarly, we need a function to generate a batch of fake images and the associated output (0.0). Here, the samples are an array of source images for which target images will be generated.

# generate a batch of images, returns images and targets def generate_fake_samples(g_model, samples, patch_shape): # generate fake instance X = g_model.predict(samples) # create 'fake' class labels (0) y = zeros((len(X), patch_shape, patch_shape, 1)) return X, y

Now, we can define the steps of a single training iteration.

First, we must select a batch of source and target images by calling *generate_real_samples().*

Typically, the batch size (*n_batch*) is set to 1. In this case, we will assume 256×256 input images, which means the *n_patch* for the PatchGAN discriminator will be 16 to indicate a 16×16 output feature map.

... # select a batch of real samples [X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch)

Next, we can use the batches of selected real source images to generate corresponding batches of generated or fake target images.

... # generate a batch of fake samples X_fakeB, y_fake = generate_fake_samples(g_model, X_realA, n_patch)

We can then use the real and fake images, as well as their targets, to update the standalone discriminator model.

... # update discriminator for real samples d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real) # update discriminator for generated samples d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake)

So far, this is normal for updating a GAN in Keras.

Next, we can update the generator model via adversarial loss and L1 loss. Recall that the composite GAN model takes a batch of source images as input and predicts first the classification of real/fake and second the generated target. Here, we provide a target to indicate the generated images are “*real*” (class=1) to the discriminator output of the composite model. The real target images are provided for calculating the L1 loss between them and the generated target images.

We have two loss functions, but three loss values calculated for a batch update, where only the first loss value is of interest as it is the weighted sum of the adversarial and L1 loss values for the batch.

... # update the generator g_loss, _, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB])

That’s all there is to it.

We can define all of this in a function called *train()* that takes the defined models and a loaded dataset (as a list of two NumPy arrays) and trains the models.

# train pix2pix models def train(d_model, g_model, gan_model, dataset, n_epochs=100, n_batch=1, n_patch=16): # unpack dataset trainA, trainB = dataset # calculate the number of batches per training epoch bat_per_epo = int(len(trainA) / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # manually enumerate epochs for i in range(n_steps): # select a batch of real samples [X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch) # generate a batch of fake samples X_fakeB, y_fake = generate_fake_samples(g_model, X_realA, n_patch) # update discriminator for real samples d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real) # update discriminator for generated samples d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake) # update the generator g_loss, _, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB]) # summarize performance print('>%d, d1[%.3f] d2[%.3f] g[%.3f]' % (i+1, d_loss1, d_loss2, g_loss))

The train function can then be called directly with our defined models and loaded dataset.

... # load image data dataset = ... # train model train(d_model, g_model, gan_model, dataset)

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Official

- Image-to-Image Translation with Conditional Adversarial Networks, 2016.
- Image-to-Image Translation with Conditional Adversarial Nets, Homepage.
- Image-to-image translation with conditional adversarial nets, GitHub.
- pytorch-CycleGAN-and-pix2pix, GitHub.
- Interactive Image-to-Image Demo, 2017.
- Pix2Pix Datasets

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?

### Articles

- A guide to receptive field arithmetic for Convolutional Neural Networks, 2017.
- Question: PatchGAN Discriminator, 2017.
- receptive_field_sizes.m

## Summary

In this tutorial, you discovered how to implement the Pix2Pix GAN architecture from scratch using the Keras deep learning framework.

Specifically, you learned:

- How to develop the PatchGAN discriminator model for the Pix2Pix GAN.
- How to develop the U-Net encoder-decoder generator model for the Pix2Pix GAN.
- How to implement the composite model for updating the generator and how to train both models.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Pix2Pix GAN Models From Scratch With Keras appeared first on Machine Learning Mastery.

## How to Develop a Least Squares Generative Adversarial Network (LSGAN) in Keras

The Least Squares Generative Adversarial Network, or LSGAN for short, is an extension to the GAN architecture that addresses the problem of vanishing gradients and loss saturation.

It is motivated by the desire to provide a signal to the generator about fake samples that are far from the discriminator model’s decision boundary for classifying them as real or fake. The further the generated images are from the decision boundary, the larger the error signal provided to the generator, encouraging the generation of more realistic images.

The LSGAN can be implemented with a minor change to the output layer of the discriminator layer and the adoption of the least squares, or L2, loss function.

In this tutorial, you will discover how to develop a least squares generative adversarial network.

After completing this tutorial, you will know:

- The LSGAN addresses vanishing gradients and loss saturation of the deep convolutional GAN.
- The LSGAN can be implemented by a mean squared error or L2 loss function for the discriminator model.
- How to implement the LSGAN model for generating handwritten digits for the MNIST dataset.

Let’s get started.

## Tutorial Overview

This tutorial is divided into three parts; they are:

- What Is Least Squares GAN
- How to Develop an LSGAN for MNIST Handwritten Digits
- How to Generate Images With LSGAN

## What Is Least Squares GAN

The standard Generative Adversarial Network, or GAN for short, is an effective architecture for training an unsupervised generative model.

The architecture involves training a discriminator model to tell the difference between real (from the dataset) and fake (generated) images, and using the discriminator, in turn, to train the generator model. The generator is updated in such a way that it is encouraged to generate images that are more likely to fool the discriminator.

The discriminator is a binary classifier and is trained using binary cross-entropy loss function. A limitation of this loss function is that it is primarily concerned with whether the predictions are correct or not, and less so with how correct or incorrect they might be.

… when we use the fake samples to update the generator by making the discriminator believe they are from real data, it will cause almost no error because they are on the correct side, i.e., the real data side, of the decision boundary

— Least Squares Generative Adversarial Networks, 2016.

This can be conceptualized in two dimensions as a line or decision boundary separating dots that represent real and fake images. The discriminator is responsible for devising the decision boundary to best separate real and fake images and the generator is responsible for creating new points that look like real points, confusing the discriminator.

The choice of cross-entropy loss means that points generated far from the boundary are right or wrong, but provide very little gradient information to the generator on how to generate better images.

This small gradient for generated images far from the decision boundary is referred to as a vanishing gradient problem or a loss saturation. The loss function is unable to give a strong signal as to how to best update the model.

The Least Squares Generative Adversarial Network, or LSGAN for short, is an extension to the GAN architecture proposed by Xudong Mao, et al. in their 2016 paper titled “Least Squares Generative Adversarial Networks.” The LSGAN is a modification to the GAN architecture that changes the loss function for the discriminator from binary cross entropy to a least squares loss.

The motivation for this change is that the least squares loss will penalize generated images based on their distance from the decision boundary. This will provide a strong gradient signal for generated images that are very different or far from the existing data and address the problem of saturated loss.

… minimizing the objective function of regular GAN suffers from vanishing gradients, which makes it hard to update the generator. LSGANs can relieve this problem because LSGANs penalize samples based on their distances to the decision boundary, which generates more gradients to update the generator.

— Least Squares Generative Adversarial Networks, 2016.

This can be conceptualized with a plot, below, taken from the paper, that shows on the left the sigmoid decision boundary (blue) and generated fake points far from the decision boundary (pink), and on the right the least squares decision boundary (red) and the points far from the boundary (pink) given a gradient that moves them closer to the boundary.

In addition to avoiding loss saturation, the LSGAN also results in a more stable training process and the generation of higher quality and larger images than the traditional deep convolutional GAN.

First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process.

— Least Squares Generative Adversarial Networks, 2016.

The LSGAN can be implemented by using the target values of 1.0 for real and 0.0 for fake images and optimizing the model using the mean squared error (MSE) loss function, e.g. L2 loss. The output layer of the discriminator model must be a linear activation function.

The authors propose a generator and discriminator model architecture, inspired by the VGG model architecture, and use interleaving upsampling and normal convolutional layers in the generator model, seen on the left in the image below.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Develop an LSGAN for MNIST Handwritten Digits

In this section, we will develop an LSGAN for the MNIST handwritten digit dataset.

The first step is to define the models.

Both the discriminator and the generator will be based on the Deep Convolutional GAN, or DCGAN, architecture. This involves the use of Convolution-BatchNorm-Activation layer blocks with the use of 2×2 stride for downsampling and transpose convolutional layers for upsampling. LeakyReLU activation layers are used in the discriminator and ReLU activation layers are used in the generator.

The discriminator expects grayscale input images with the shape 28×28, the shape of images in the MNIST dataset, and the output layer is a single node with a linear activation function. The model is optimized using the mean squared error (MSE) loss function as per the LSGAN. The *define_discriminator()* function below defines the discriminator model.

# define the standalone discriminator model def define_discriminator(in_shape=(28,28,1)): # weight initialization init = RandomNormal(stddev=0.02) # define model model = Sequential() # downsample to 14x14 model.add(Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init, input_shape=in_shape)) model.add(BatchNormalization()) model.add(LeakyReLU(alpha=0.2)) # downsample to 7x7 model.add(Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)) model.add(BatchNormalization()) model.add(LeakyReLU(alpha=0.2)) # classifier model.add(Flatten()) model.add(Dense(1, activation='linear', kernel_initializer=init)) # compile model with L2 loss model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5)) return model

The generator model takes a point in latent space as input and outputs a grayscale image with the shape 28×28 pixels, where pixel values are in the range [-1,1] via the tanh activation function on the output layer.

The *define_generator()* function below defines the generator model. This model is not compiled as it is not trained in a standalone manner.

# define the standalone generator model def define_generator(latent_dim): # weight initialization init = RandomNormal(stddev=0.02) # define model model = Sequential() # foundation for 7x7 image n_nodes = 256 * 7 * 7 model.add(Dense(n_nodes, kernel_initializer=init, input_dim=latent_dim)) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(Reshape((7, 7, 256))) # upsample to 14x14 model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)) model.add(BatchNormalization()) model.add(Activation('relu')) # upsample to 28x28 model.add(Conv2DTranspose(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)) model.add(BatchNormalization()) model.add(Activation('relu')) # output 28x28x1 model.add(Conv2D(1, (7,7), padding='same', kernel_initializer=init)) model.add(Activation('tanh')) return model

The generator model is updated via the discriminator model. This is achieved by creating a composite model that stacks the generator on top of the discriminator so that error signals can flow back through the discriminator to the generator.

The weights of the discriminator are marked as not trainable when used in this composite model. Updates via the composite model involve using the generator to create new images by providing random points in the latent space as input. The generated images are passed to the discriminator, which will classify them as real or fake. The weights are updated as though the generated images are real (e.g. target of 1.0), allowing the generator to be updated toward generating more realistic images.

The *define_gan()* function defines and compiles the composite model for updating the generator model via the discriminator, again optimized via mean squared error as per the LSGAN.

# define the combined generator and discriminator model, for updating the generator def define_gan(generator, discriminator): # make weights in the discriminator not trainable discriminator.trainable = False # connect them model = Sequential() # add generator model.add(generator) # add the discriminator model.add(discriminator) # compile model with L2 loss model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5)) return model

Next, we can define a function to load the MNIST handwritten digit dataset and scale the pixel values to the range [-1,1] to match the images output by the generator model.

Only the training part of the MNIST dataset is used, which contains 60,000 centered grayscale images of digits zero through nine.

# load mnist images def load_real_samples(): # load dataset (trainX, _), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 return X

We can then define a function to retrieve a batch of randomly selected images from the training dataset.

The real images are returned with corresponding target values for the discriminator model, e.g. y=1.0, to indicate they are real.

# select real samples def generate_real_samples(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # select images X = dataset[ix] # generate class labels y = ones((n_samples, 1)) return X, y

Next, we can develop the corresponding functions for the generator.

First, a function for generating random points in the latent space to use as input for generating images via the generator model.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network x_input = x_input.reshape(n_samples, latent_dim) return x_input

Next, a function that will use the generator model to generate a batch of fake images for updating the discriminator model, along with the target value (y=0) to indicate the images are fake.

# use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space x_input = generate_latent_points(latent_dim, n_samples) # predict outputs X = generator.predict(x_input) # create class labels y = zeros((n_samples, 1)) return X, y

We need to use the generator periodically during training to generate images that we can subjectively inspect and use as the basis for choosing a final generator model.

The *summarize_performance()* function below can be called during training to generate and save a plot of images and save the generator model. Images are plotted using a reverse grayscale color map to make the digits black on a white background.

# generate samples and save as a plot and save the model def summarize_performance(step, g_model, latent_dim, n_samples=100): # prepare fake examples X, _ = generate_fake_samples(g_model, latent_dim, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(10 * 10): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%06d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%06d.h5' % (step+1) g_model.save(filename2) print('Saved %s and %s' % (filename1, filename2))

We are also interested in the behavior of loss during training.

As such, we can record loss in lists across each training iteration, then create and save a line plot of the learning dynamics of the models. Creating and saving the plot of learning curves is implemented in the *plot_history()* function.

# create a line plot of loss for the gan and save to file def plot_history(d1_hist, d2_hist, g_hist): pyplot.plot(d1_hist, label='dloss1') pyplot.plot(d2_hist, label='dloss2') pyplot.plot(g_hist, label='gloss') pyplot.legend() filename = 'plot_line_plot_loss.png' pyplot.savefig(filename) pyplot.close() print('Saved %s' % (filename))

Finally, we can define the main training loop via the *train()* function.

The function takes the defined models and dataset as arguments and parameterizes the number of training epochs and batch size as default function arguments.

Each training loop involves first generating a half-batch of real and fake samples and using them to create one batch worth of weight updates to the discriminator. Next, the generator is updated via the composite model, providing the real (y=1) target as the expected output for the model.

The loss is reported each training iteration, and the model performance is summarized in terms of a plot of generated images at the end of every epoch. The plot of learning curves is created and saved at the end of the run.

# train the generator and discriminator def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=20, n_batch=64): # calculate the number of batches per training epoch bat_per_epo = int(dataset.shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # lists for storing loss, for plotting later d1_hist, d2_hist, g_hist = list(), list(), list() # manually enumerate epochs for i in range(n_steps): # prepare real and fake samples X_real, y_real = generate_real_samples(dataset, half_batch) X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) # update discriminator model d_loss1 = d_model.train_on_batch(X_real, y_real) d_loss2 = d_model.train_on_batch(X_fake, y_fake) # update the generator via the discriminator's error z_input = generate_latent_points(latent_dim, n_batch) y_real2 = ones((n_batch, 1)) g_loss = gan_model.train_on_batch(z_input, y_real2) # summarize loss on this batch print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss)) # record history d1_hist.append(d_loss1) d2_hist.append(d_loss2) g_hist.append(g_loss) # evaluate the model performance every 'epoch' if (i+1) % (bat_per_epo * 1) == 0: summarize_performance(i, g_model, latent_dim) # create line plot of training history plot_history(d1_hist, d2_hist, g_hist)

Tying all of this together, the complete code example of training an LSGAN on the MNIST handwritten digit dataset is listed below.

# example of lsgan for mnist from numpy import expand_dims from numpy import zeros from numpy import ones from numpy.random import randn from numpy.random import randint from keras.datasets.mnist import load_data from keras.optimizers import Adam from keras.models import Sequential from keras.layers import Dense from keras.layers import Reshape from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import Activation from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.initializers import RandomNormal from matplotlib import pyplot # define the standalone discriminator model def define_discriminator(in_shape=(28,28,1)): # weight initialization init = RandomNormal(stddev=0.02) # define model model = Sequential() # downsample to 14x14 model.add(Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init, input_shape=in_shape)) model.add(BatchNormalization()) model.add(LeakyReLU(alpha=0.2)) # downsample to 7x7 model.add(Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)) model.add(BatchNormalization()) model.add(LeakyReLU(alpha=0.2)) # classifier model.add(Flatten()) model.add(Dense(1, activation='linear', kernel_initializer=init)) # compile model with L2 loss model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5)) return model # define the standalone generator model def define_generator(latent_dim): # weight initialization init = RandomNormal(stddev=0.02) # define model model = Sequential() # foundation for 7x7 image n_nodes = 256 * 7 * 7 model.add(Dense(n_nodes, kernel_initializer=init, input_dim=latent_dim)) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(Reshape((7, 7, 256))) # upsample to 14x14 model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)) model.add(BatchNormalization()) model.add(Activation('relu')) # upsample to 28x28 model.add(Conv2DTranspose(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)) model.add(BatchNormalization()) model.add(Activation('relu')) # output 28x28x1 model.add(Conv2D(1, (7,7), padding='same', kernel_initializer=init)) model.add(Activation('tanh')) return model # define the combined generator and discriminator model, for updating the generator def define_gan(generator, discriminator): # make weights in the discriminator not trainable discriminator.trainable = False # connect them model = Sequential() # add generator model.add(generator) # add the discriminator model.add(discriminator) # compile model with L2 loss model.compile(loss='mse', optimizer=Adam(lr=0.0002, beta_1=0.5)) return model # load mnist images def load_real_samples(): # load dataset (trainX, _), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 return X # # select real samples def generate_real_samples(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # select images X = dataset[ix] # generate class labels y = ones((n_samples, 1)) return X, y # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network x_input = x_input.reshape(n_samples, latent_dim) return x_input # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space x_input = generate_latent_points(latent_dim, n_samples) # predict outputs X = generator.predict(x_input) # create class labels y = zeros((n_samples, 1)) return X, y # generate samples and save as a plot and save the model def summarize_performance(step, g_model, latent_dim, n_samples=100): # prepare fake examples X, _ = generate_fake_samples(g_model, latent_dim, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(10 * 10): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%06d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%06d.h5' % (step+1) g_model.save(filename2) print('Saved %s and %s' % (filename1, filename2)) # create a line plot of loss for the gan and save to file def plot_history(d1_hist, d2_hist, g_hist): pyplot.plot(d1_hist, label='dloss1') pyplot.plot(d2_hist, label='dloss2') pyplot.plot(g_hist, label='gloss') pyplot.legend() filename = 'plot_line_plot_loss.png' pyplot.savefig(filename) pyplot.close() print('Saved %s' % (filename)) # train the generator and discriminator def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=20, n_batch=64): # calculate the number of batches per training epoch bat_per_epo = int(dataset.shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # lists for storing loss, for plotting later d1_hist, d2_hist, g_hist = list(), list(), list() # manually enumerate epochs for i in range(n_steps): # prepare real and fake samples X_real, y_real = generate_real_samples(dataset, half_batch) X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) # update discriminator model d_loss1 = d_model.train_on_batch(X_real, y_real) d_loss2 = d_model.train_on_batch(X_fake, y_fake) # update the generator via the discriminator's error z_input = generate_latent_points(latent_dim, n_batch) y_real2 = ones((n_batch, 1)) g_loss = gan_model.train_on_batch(z_input, y_real2) # summarize loss on this batch print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss)) # record history d1_hist.append(d_loss1) d2_hist.append(d_loss2) g_hist.append(g_loss) # evaluate the model performance every 'epoch' if (i+1) % (bat_per_epo * 1) == 0: summarize_performance(i, g_model, latent_dim) # create line plot of training history plot_history(d1_hist, d2_hist, g_hist) # size of the latent space latent_dim = 100 # create the discriminator discriminator = define_discriminator() # create the generator generator = define_generator(latent_dim) # create the gan gan_model = define_gan(generator, discriminator) # load image data dataset = load_real_samples() print(dataset.shape) # train model train(generator, discriminator, gan_model, dataset, latent_dim)

**Note**: the example can be run on the CPU, although it may take a while and running on GPU hardware is recommended.

Running the example will report the loss of the discriminator on real (*d1*) and fake (*d2*) examples and the loss of the generator via the discriminator on generated examples presented as real (*g*). These scores are printed at the end of each training run and are expected to remain small values throughout the training process. Values of zero for an extended period may indicate a failure mode and the training process should be restarted.

**Note**: your specific results will vary given the stochastic nature of the learning algorithm.

>1, d1=9.292, d2=0.153 g=2.530 >2, d1=1.173, d2=2.057 g=0.903 >3, d1=1.347, d2=1.922 g=2.215 >4, d1=0.604, d2=0.545 g=1.846 >5, d1=0.643, d2=0.734 g=1.619 ...

Plots of generated images are created at the end of every epoch.

The generated images at the beginning of the run are rough.

After a handful of training epochs, the generated images begin to look crisp and realistic.

Remember: more training epochs may or may not correspond to a generator that outputs higher quality images. Review the generated plots and choose a final model with the best quality images.

At the end of the training run, a plot of learning curves is created for the discriminator and generator.

In this case, we can see that training remains somewhat stable throughout the run, with some very large peaks observed, which wash out the scale of the plot.

## How to Generate Images With LSGAN

We can use the saved generator model to create new images on demand.

This can be achieved by first selecting a final model based on image quality, then loading it and providing new points from the latent space as input in order to generate new plausible images from the domain.

In this case, we will use the model saved after 20 epochs, or 18,740 (60K/64 or 937 batches per epoch * 20 epochs) training iterations.

# example of loading the generator model and generating images from keras.models import load_model from numpy.random import randn from matplotlib import pyplot # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network x_input = x_input.reshape(n_samples, latent_dim) return x_input # create a plot of generated images (reversed grayscale) def plot_generated(examples, n): # plot images for i in range(n * n): # define subplot pyplot.subplot(n, n, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(examples[i, :, :, 0], cmap='gray_r') pyplot.show() # load model model = load_model('model_018740.h5') # generate images latent_points = generate_latent_points(100, 100) # generate images X = model.predict(latent_points) # plot the result plot_generated(X, 10)

Running the example generates a plot of 10×10, or 100, new and plausible handwritten digits.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Papers

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?
- MatplotLib API
- NumPy Random sampling (numpy.random) API
- NumPy Array manipulation routines

### Articles

## Summary

In this tutorial, you discovered how to develop a least squares generative adversarial network.

Specifically, you learned:

- The LSGAN addresses vanishing gradients and loss saturation of the deep convolutional GAN.
- The LSGAN can be implemented by a mean squared error or L2 loss function for the discriminator model.
- How to implement the LSGAN model for generating handwritten digits for the MNIST dataset.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Develop a Least Squares Generative Adversarial Network (LSGAN) in Keras appeared first on Machine Learning Mastery.

## How to Implement a Semi-Supervised GAN (SGAN) From Scratch in Keras

Semi-supervised learning is the challenging problem of training a classifier in a dataset that contains a small number of labeled examples and a much larger number of unlabeled examples.

The Generative Adversarial Network, or GAN, is an architecture that makes effective use of large, unlabeled datasets to train an image generator model via an image discriminator model. The discriminator model can be used as a starting point for developing a classifier model in some cases.

The semi-supervised GAN, or SGAN, model is an extension of the GAN architecture that involves the simultaneous training of a supervised discriminator, unsupervised discriminator, and a generator model. The result is both a supervised classification model that generalizes well to unseen examples and a generator model that outputs plausible examples of images from the domain.

In this tutorial, you will discover how to develop a Semi-Supervised Generative Adversarial Network from scratch.

After completing this tutorial, you will know:

- The semi-supervised GAN is an extension of the GAN architecture for training a classifier model while making use of labeled and unlabeled data.
- There are at least three approaches to implementing the supervised and unsupervised discriminator models in Keras used in the semi-supervised GAN.
- How to train a semi-supervised GAN from scratch on MNIST and load and use the trained classifier for making predictions.

Let’s get started.

## Tutorial Overview

This tutorial is divided into four parts; they are:

- What Is the Semi-Supervised GAN?
- How to Implement the Semi-Supervised Discriminator Model
- How to Develop a Semi-Supervised GAN for MNIST
- How to Load and Use the Final SGAN Classifier Model

## What Is the Semi-Supervised GAN?

Semi-supervised learning refers to a problem where a predictive model is required and there are few labeled examples and many unlabeled examples.

The most common example is a classification predictive modeling problem in which there may be a very large dataset of examples, but only a small fraction have target labels. The model must learn from the small set of labeled examples and somehow harness the larger dataset of unlabeled examples in order to generalize to classifying new examples in the future.

The Semi-Supervised GAN, or sometimes SGAN for short, is an extension of the Generative Adversarial Network architecture for addressing semi-supervised learning problems.

One of the primary goals of this work is to improve the effectiveness of generative adversarial networks for semi-supervised learning (improving the performance of a supervised task, in this case, classification, by learning on additional unlabeled examples).

— Improved Techniques for Training GANs, 2016.

The discriminator in a traditional GAN is trained to predict whether a given image is real (from the dataset) or fake (generated), allowing it to learn features from unlabeled images. The discriminator can then be used via transfer learning as a starting point when developing a classifier for the same dataset, allowing the supervised prediction task to benefit from the unsupervised training of the GAN.

In the Semi-Supervised GAN, the discriminator model is updated to predict K+1 classes, where K is the number of classes in the prediction problem and the additional class label is added for a new “*fake*” class. It involves directly training the discriminator model for both the unsupervised GAN task and the supervised classification task simultaneously.

We train a generative model G and a discriminator D on a dataset with inputs belonging to one of N classes. At training time, D is made to predict which of N+1 classes the input belongs to, where an extra class is added to correspond to the outputs of G.

— Semi-Supervised Learning with Generative Adversarial Networks, 2016.

As such, the discriminator is trained in two modes: a supervised and unsupervised mode.

**Unsupervised Training**: In the unsupervised mode, the discriminator is trained in the same way as the traditional GAN, to predict whether the example is either real or fake.**Supervised Training**: In the supervised mode, the discriminator is trained to predict the class label of real examples.

Training in unsupervised mode allows the model to learn useful feature extraction capabilities from a large unlabeled dataset, whereas training in supervised mode allows the model to use the extracted features and apply class labels.

The result is a classifier model that can achieve state-of-the-art results on standard problems such as MNIST when trained on very few labeled examples, such as tens, hundreds, or one thousand. Additionally, the training process can also result in better quality images output by the generator model.

For example, Augustus Odena in his 2016 paper titled “Semi-Supervised Learning with Generative Adversarial Networks” shows how a GAN-trained classifier is able to perform as well as or better than a standalone CNN model on the MNIST handwritten digit recognition task when trained with 25, 50, 100, and 1,000 labeled examples.

Tim Salimans, et al. from OpenAI in their 2016 paper titled “Improved Techniques for Training GANs” achieved at the time state-of-the-art results on a number of image classification tasks using a semi-supervised GAN, including MNIST.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Implement the Semi-Supervised Discriminator Model

There are a number of ways that we can implement the discriminator model for the semi-supervised GAN.

In this section, we will review three candidate approaches.

### Traditional Discriminator Model

Consider a discriminator model for the standard GAN model.

It must take an image as input and predict whether it is real or fake. More specifically, it predicts the likelihood of the input image being real. The output layer uses a sigmoid activation function to predict a probability value in [0,1] and the model is typically optimized using a binary cross entropy loss function.

For example, we can define a simple discriminator model that takes grayscale images as input with the size of 28×28 pixels and predicts a probability of the image being real. We can use best practices and downsample the image using convolutional layers with a 2×2 stride and a leaky ReLU activation function.

The *define_discriminator()* function below implements this and defines our standard discriminator model.

# example of defining the discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.optimizers import Adam from keras.utils.vis_utils import plot_model # define the standalone discriminator model def define_discriminator(in_shape=(28,28,1)): # image input in_image = Input(shape=in_shape) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # flatten feature maps fe = Flatten()(fe) # dropout fe = Dropout(0.4)(fe) # output layer d_out_layer = Dense(1, activation='sigmoid')(fe) # define and compile discriminator model d_model = Model(in_image, d_out_layer) d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) return d_model # create model model = define_discriminator() # plot the model plot_model(model, to_file='discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates a plot of the discriminator model, clearly showing the 28x28x1 shape of the input image and the prediction of a single probability value.

### Separate Discriminator Models With Shared Weights

Starting with the standard GAN discriminator model, we can update it to create two models that share feature extraction weights.

Specifically, we can define one classifier model that predicts whether an input image is real or fake, and a second classifier model that predicts the class of a given model.

**Binary Classifier Model**. Predicts whether the image is real or fake, sigmoid activation function in the output layer, and optimized using the binary cross entropy loss function.**Multi-Class Classifier Model**. Predicts the class of the image, softmax activation function in the output layer, and optimized using the categorical cross entropy loss function.

Both models have different output layers but share all feature extraction layers. This means that updates to one of the classifier models will impact both models.

The example below creates the traditional discriminator model with binary output first, then re-uses the feature extraction layers and creates a new multi-class prediction model, in this case with 10 classes.

# example of defining semi-supervised discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.optimizers import Adam from keras.utils.vis_utils import plot_model # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): # image input in_image = Input(shape=in_shape) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # flatten feature maps fe = Flatten()(fe) # dropout fe = Dropout(0.4)(fe) # unsupervised output d_out_layer = Dense(1, activation='sigmoid')(fe) # define and compile unsupervised discriminator model d_model = Model(in_image, d_out_layer) d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) # supervised output c_out_layer = Dense(n_classes, activation='softmax')(fe) # define and compile supervised discriminator model c_model = Model(in_image, c_out_layer) c_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) return d_model, c_model # create model d_model, c_model = define_discriminator() # plot the model plot_model(d_model, to_file='discriminator1_plot.png', show_shapes=True, show_layer_names=True) plot_model(c_model, to_file='discriminator2_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates and plots both models.

The plot for the first model is the same as before.

The plot of the second model shows the same expected input shape and same feature extraction layers, with a new 10 class classification output layer.

### Single Discriminator Model With Multiple Outputs

Another approach to implementing the semi-supervised discriminator model is to have a single model with multiple output layers.

Specifically, this is a single model with one output layer for the unsupervised task and one output layer for the supervised task.

This is like having separate models for the supervised and unsupervised tasks in that they both share the same feature extraction layers, except that in this case, each input image always has two output predictions, specifically a real/fake prediction and a supervised class prediction.

A problem with this approach is that when the model is updated unlabeled and generated images, there is no supervised class label. In that case, these images must have an output label of “*unknown*” or “*fake*” from the supervised output. This means that an additional class label is required for the supervised output layer.

The example below implements the multi-output single model approach for the discriminator model in the semi-supervised GAN architecture.

We can see that the model is defined with two output layers and that the output layer for the supervised task is defined with n_classes + 1. in this case 11, making room for the additional “*unknown*” class label.

We can also see that the model is compiled to two loss functions, one for each output layer of the model.

# example of defining semi-supervised discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.optimizers import Adam from keras.utils.vis_utils import plot_model # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): # image input in_image = Input(shape=in_shape) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # flatten feature maps fe = Flatten()(fe) # dropout fe = Dropout(0.4)(fe) # unsupervised output d_out_layer = Dense(1, activation='sigmoid')(fe) # supervised output c_out_layer = Dense(n_classes + 1, activation='softmax')(fe) # define and compile supervised discriminator model model = Model(in_image, [d_out_layer, c_out_layer]) model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) return model # create model model = define_discriminator() # plot the model plot_model(model, to_file='multioutput_discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates and plots the single multi-output model.

The plot clearly shows the shared layers and the separate unsupervised and supervised output layers.

### Stacked Discriminator Models With Shared Weights

A final approach is very similar to the prior two approaches and involves creating separate logical unsupervised and supervised models but attempts to reuse the output layers of one model to feed as input into another model.

The approach is based on the definition of the semi-supervised model in the 2016 paper by Tim Salimans, et al. from OpenAI titled “Improved Techniques for Training GANs.”

In the paper, they describe an efficient implementation, where first the supervised model is created with K output classes and a softmax activation function. The unsupervised model is then defined that takes the output of the supervised model prior to the softmax activation, then calculates a normalized sum of the exponential outputs.

To make this clearer, we can implement this activation function in NumPy and run some sample activations through it to see what happens.

The complete example is listed below.

# example of custom activation function import numpy as np # custom activation function def custom_activation(output): logexpsum = np.sum(np.exp(output)) result = logexpsum / (logexpsum + 1.0) return result # all -10s output = np.asarray([-10.0, -10.0, -10.0]) print(custom_activation(output)) # all -1s output = np.asarray([-1.0, -1.0, -1.0]) print(custom_activation(output)) # all 0s output = np.asarray([0.0, 0.0, 0.0]) print(custom_activation(output)) # all 1s output = np.asarray([1.0, 1.0, 1.0]) print(custom_activation(output)) # all 10s output = np.asarray([10.0, 10.0, 10.0]) print(custom_activation(output))

Remember, the output of the unsupervised model prior to the softmax activation function will be the activations of the nodes directly. They will be small positive or negative values, but not normalized, as this would be performed by the softmax activation.

The custom activation function will output a value between 0.0 and 1.0.

A value close to 0.0 is output for a small or negative activation and a value close to 1.0 for a positive or large activation. We can see this when we run the example.

0.00013618124143106674 0.5246331135813284 0.75 0.890768227426964 0.9999848669190928

This means that the model is encouraged to output a strong class prediction for real examples, and a small class prediction or low activation for fake examples. It’s a clever trick and allows the re-use of the same output nodes from the supervised model in both models.

The activation function can be implemented almost directly via the Keras backend and called from a *Lambda* layer, e.g. a layer that will apply a custom function to the input to the layer.

The complete example is listed below. First, the supervised model is defined with a softmax activation and categorical cross entropy loss function. The unsupervised model is stacked on top of the output layer of the supervised model before the softmax activation, and the activations of the nodes pass through our custom activation function via the Lambda layer.

No need for a sigmoid activation function as we have already normalized the activation. As before, the unsupervised model is fit using binary cross entropy loss.

# example of defining semi-supervised discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.layers import Activation from keras.layers import Lambda from keras.optimizers import Adam from keras.utils.vis_utils import plot_model from keras import backend # custom activation function def custom_activation(output): logexpsum = backend.sum(backend.exp(output), axis=-1, keepdims=True) result = logexpsum / (logexpsum + 1.0) return result # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): # image input in_image = Input(shape=in_shape) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # flatten feature maps fe = Flatten()(fe) # dropout fe = Dropout(0.4)(fe) # output layer nodes fe = Dense(n_classes)(fe) # supervised output c_out_layer = Activation('softmax')(fe) # define and compile supervised discriminator model c_model = Model(in_image, c_out_layer) c_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) # unsupervised output d_out_layer = Lambda(custom_activation)(fe) # define and compile unsupervised discriminator model d_model = Model(in_image, d_out_layer) d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) return d_model, c_model # create model d_model, c_model = define_discriminator() # plot the model plot_model(d_model, to_file='stacked_discriminator1_plot.png', show_shapes=True, show_layer_names=True) plot_model(c_model, to_file='stacked_discriminator2_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates and plots the two models, which look much the same as the two models in the first example.

Stacked version of the unsupervised discriminator model:

Stacked version of the supervised discriminator model:

Now that we have seen how to implement the discriminator model in the semi-supervised GAN, we can develop a complete example for image generation and semi-supervised classification.

## How to Develop a Semi-Supervised GAN for MNIST

In this section, we will develop a semi-supervised GAN model for the MNIST handwritten digit dataset.

The dataset has 10 classes for the digits 0-9, therefore the classifier model will have 10 output nodes. The model will be fit on the training dataset that contains 60,000 examples. Only 100 of the images in the training dataset will be used with labels, 10 from each of the 10 classes.

We will start off by defining the models.

We will use the stacked discriminator model, exactly as defined in the previous section.

Next, we can define the generator model. In this case, the generator model will take as input a point in the latent space and will use transpose convolutional layers to output a 28×28 grayscale image. The *define_generator()* function below implements this and returns the defined generator model.

# define the standalone generator model def define_generator(latent_dim): # image generator input in_lat = Input(shape=(latent_dim,)) # foundation for 7x7 image n_nodes = 128 * 7 * 7 gen = Dense(n_nodes)(in_lat) gen = LeakyReLU(alpha=0.2)(gen) gen = Reshape((7, 7, 128))(gen) # upsample to 14x14 gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) gen = LeakyReLU(alpha=0.2)(gen) # upsample to 28x28 gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) gen = LeakyReLU(alpha=0.2)(gen) # output out_layer = Conv2D(1, (7,7), activation='tanh', padding='same')(gen) # define model model = Model(in_lat, out_layer) return model

The generator model will be fit via the unsupervised discriminator model.

We will use the composite model architecture, common to training the generator model when implemented in Keras. Specifically, weight sharing is used where the output of the generator model is passed directly to the unsupervised discriminator model, and the weights of the discriminator are marked as not trainable.

The *define_gan()* function below implements this, taking the already-defined generator and discriminator models as input and returning the composite model used to train the weights of the generator model.

# define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model): # make weights in the discriminator not trainable d_model.trainable = False # connect image output from generator as input to discriminator gan_output = d_model(g_model.output) # define gan model as taking noise and outputting a classification model = Model(g_model.input, gan_output) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss='binary_crossentropy', optimizer=opt) return model

We can load the training dataset and scale the pixels to the range [-1, 1] to match the output values of the generator model.

# load the images def load_real_samples(): # load dataset (trainX, trainy), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 print(X.shape, trainy.shape) return [X, trainy]

We can also define a function to select a subset of the training dataset in which we keep the labels and train the supervised version of the discriminator model.

The *select_supervised_samples()* function below implements this and is careful to ensure that the selection of examples is random and that the classes are balanced. The number of labeled examples is parameterized and set at 100, meaning that each of the 10 classes will have 10 randomly selected examples.

# select a supervised subset of the dataset, ensures classes are balanced def select_supervised_samples(dataset, n_samples=100, n_classes=10): X, y = dataset X_list, y_list = list(), list() n_per_class = int(n_samples / n_classes) for i in range(n_classes): # get all images for this class X_with_class = X[y == i] # choose random instances ix = randint(0, len(X_with_class), n_per_class) # add to list [X_list.append(X_with_class[j]) for j in ix] [y_list.append(i) for j in ix] return asarray(X_list), asarray(y_list)

Next, we can define a function for retrieving a batch of real training examples.

A sample of images and labels is selected, with replacement. This same function can be used to retrieve examples from the labeled and unlabeled dataset, later when we train the models. In the case of the “*unlabeled dataset*“, we will ignore the labels.

# select real samples def generate_real_samples(dataset, n_samples): # split into images and labels images, labels = dataset # choose random instances ix = randint(0, images.shape[0], n_samples) # select images and labels X, labels = images[ix], labels[ix] # generate class labels y = ones((n_samples, 1)) return [X, labels], y

Next, we can define functions to help in generating images using the generator model.

First, the *generate_latent_points()* function will create a batch worth of random points in the latent space that can be used as input for generating images. The *generate_fake_samples()* function will call this function to generate a batch worth of images that can be fed to the unsupervised discriminator model or the composite GAN model during training.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space z_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_input = z_input.reshape(n_samples, latent_dim) return z_input # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space z_input = generate_latent_points(latent_dim, n_samples) # predict outputs images = generator.predict(z_input) # create class labels y = zeros((n_samples, 1)) return images, y

Next, we can define a function to be called when we want to evaluate the performance of the model.

This function will generate and plot 100 images using the current state of the generator model. This plot of images can be used to subjectively evaluate the performance of the generator model.

The supervised discriminator model is then evaluated on the entire training dataset, and the classification accuracy is reported. Finally, the generator model and the supervised discriminator model are saved to file, to be used later.

The *summarize_performance()* function below implements this and can be called periodically, such as the end of every training epoch. The results can be reviewed at the end of the run to select a classifier and even generator models.

# generate samples and save as a plot and save the model def summarize_performance(step, g_model, c_model, latent_dim, dataset, n_samples=100): # prepare fake examples X, _ = generate_fake_samples(g_model, latent_dim, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(100): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%04d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # evaluate the classifier model X, y = dataset _, acc = c_model.evaluate(X, y, verbose=0) print('Classifier Accuracy: %.3f%%' % (acc * 100)) # save the generator model filename2 = 'g_model_%04d.h5' % (step+1) g_model.save(filename2) # save the classifier model filename3 = 'c_model_%04d.h5' % (step+1) c_model.save(filename3) print('>Saved: %s, %s, and %s' % (filename1, filename2, filename3))

Next, we can define a function to train the models. The defined models and loaded training dataset are provided as arguments, and the number of training epochs and batch size are parameterized with default values, in this case 20 epochs and a batch size of 100.

The chosen model configuration was found to overfit the training dataset quickly, hence the relatively smaller number of training epochs. Increasing the epochs to 100 or more results in much higher-quality generated images, but a lower-quality classifier model. Balancing these two concerns might make a fun extension.

First, the labeled subset of the training dataset is selected, and the number of training steps is calculated.

The training process is almost identical to the training of a vanilla GAN model, with the addition of updating the supervised model with labeled examples.

A single cycle through updating the models involves first updating the supervised discriminator model with labeled examples, then updating the unsupervised discriminator model with unlabeled real and generated examples. Finally, the generator model is updated via the composite model.

The shared weights of the discriminator model get updated with 1.5 batches worth of samples, whereas the weights of the generator model are updated with one batch worth of samples each iteration. Changing this so that each model is updated by the same amount might improve the model training process.

# train the generator and discriminator def train(g_model, d_model, c_model, gan_model, dataset, latent_dim, n_epochs=20, n_batch=100): # select supervised dataset X_sup, y_sup = select_supervised_samples(dataset) print(X_sup.shape, y_sup.shape) # calculate the number of batches per training epoch bat_per_epo = int(dataset[0].shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) print('n_epochs=%d, n_batch=%d, 1/2=%d, b/e=%d, steps=%d' % (n_epochs, n_batch, half_batch, bat_per_epo, n_steps)) # manually enumerate epochs for i in range(n_steps): # update supervised discriminator (c) [Xsup_real, ysup_real], _ = generate_real_samples([X_sup, y_sup], half_batch) c_loss, c_acc = c_model.train_on_batch(Xsup_real, ysup_real) # update unsupervised discriminator (d) [X_real, _], y_real = generate_real_samples(dataset, half_batch) d_loss1 = d_model.train_on_batch(X_real, y_real) X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) d_loss2 = d_model.train_on_batch(X_fake, y_fake) # update generator (g) X_gan, y_gan = generate_latent_points(latent_dim, n_batch), ones((n_batch, 1)) g_loss = gan_model.train_on_batch(X_gan, y_gan) # summarize loss on this batch print('>%d, c[%.3f,%.0f], d[%.3f,%.3f], g[%.3f]' % (i+1, c_loss, c_acc*100, d_loss1, d_loss2, g_loss)) # evaluate the model performance every so often if (i+1) % (bat_per_epo * 1) == 0: summarize_performance(i, g_model, c_model, latent_dim, dataset)

Finally, we can define the models and call the function to train and save the models.

# size of the latent space latent_dim = 100 # create the discriminator models d_model, c_model = define_discriminator() # create the generator g_model = define_generator(latent_dim) # create the gan gan_model = define_gan(g_model, d_model) # load image data dataset = load_real_samples() # train model train(g_model, d_model, c_model, gan_model, dataset, latent_dim)

Tying all of this together, the complete example of training a semi-supervised GAN on the MNIST handwritten digit image classification task is listed below.

# example of semi-supervised gan for mnist from numpy import expand_dims from numpy import zeros from numpy import ones from numpy import asarray from numpy.random import randn from numpy.random import randint from keras.datasets.mnist import load_data from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Lambda from keras.layers import Activation from matplotlib import pyplot from keras import backend # custom activation function def custom_activation(output): logexpsum = backend.sum(backend.exp(output), axis=-1, keepdims=True) result = logexpsum / (logexpsum + 1.0) return result # define the standalone supervised and unsupervised discriminator models def define_discriminator(in_shape=(28,28,1), n_classes=10): # image input in_image = Input(shape=in_shape) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(in_image) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # downsample fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe) fe = LeakyReLU(alpha=0.2)(fe) # flatten feature maps fe = Flatten()(fe) # dropout fe = Dropout(0.4)(fe) # output layer nodes fe = Dense(n_classes)(fe) # supervised output c_out_layer = Activation('softmax')(fe) # define and compile supervised discriminator model c_model = Model(in_image, c_out_layer) c_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy']) # unsupervised output d_out_layer = Lambda(custom_activation)(fe) # define and compile unsupervised discriminator model d_model = Model(in_image, d_out_layer) d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) return d_model, c_model # define the standalone generator model def define_generator(latent_dim): # image generator input in_lat = Input(shape=(latent_dim,)) # foundation for 7x7 image n_nodes = 128 * 7 * 7 gen = Dense(n_nodes)(in_lat) gen = LeakyReLU(alpha=0.2)(gen) gen = Reshape((7, 7, 128))(gen) # upsample to 14x14 gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) gen = LeakyReLU(alpha=0.2)(gen) # upsample to 28x28 gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen) gen = LeakyReLU(alpha=0.2)(gen) # output out_layer = Conv2D(1, (7,7), activation='tanh', padding='same')(gen) # define model model = Model(in_lat, out_layer) return model # define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model): # make weights in the discriminator not trainable d_model.trainable = False # connect image output from generator as input to discriminator gan_output = d_model(g_model.output) # define gan model as taking noise and outputting a classification model = Model(g_model.input, gan_output) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss='binary_crossentropy', optimizer=opt) return model # load the images def load_real_samples(): # load dataset (trainX, trainy), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 print(X.shape, trainy.shape) return [X, trainy] # select a supervised subset of the dataset, ensures classes are balanced def select_supervised_samples(dataset, n_samples=100, n_classes=10): X, y = dataset X_list, y_list = list(), list() n_per_class = int(n_samples / n_classes) for i in range(n_classes): # get all images for this class X_with_class = X[y == i] # choose random instances ix = randint(0, len(X_with_class), n_per_class) # add to list [X_list.append(X_with_class[j]) for j in ix] [y_list.append(i) for j in ix] return asarray(X_list), asarray(y_list) # select real samples def generate_real_samples(dataset, n_samples): # split into images and labels images, labels = dataset # choose random instances ix = randint(0, images.shape[0], n_samples) # select images and labels X, labels = images[ix], labels[ix] # generate class labels y = ones((n_samples, 1)) return [X, labels], y # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): # generate points in the latent space z_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_input = z_input.reshape(n_samples, latent_dim) return z_input # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space z_input = generate_latent_points(latent_dim, n_samples) # predict outputs images = generator.predict(z_input) # create class labels y = zeros((n_samples, 1)) return images, y # generate samples and save as a plot and save the model def summarize_performance(step, g_model, c_model, latent_dim, dataset, n_samples=100): # prepare fake examples X, _ = generate_fake_samples(g_model, latent_dim, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(100): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%04d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # evaluate the classifier model X, y = dataset _, acc = c_model.evaluate(X, y, verbose=0) print('Classifier Accuracy: %.3f%%' % (acc * 100)) # save the generator model filename2 = 'g_model_%04d.h5' % (step+1) g_model.save(filename2) # save the classifier model filename3 = 'c_model_%04d.h5' % (step+1) c_model.save(filename3) print('>Saved: %s, %s, and %s' % (filename1, filename2, filename3)) # train the generator and discriminator def train(g_model, d_model, c_model, gan_model, dataset, latent_dim, n_epochs=20, n_batch=100): # select supervised dataset X_sup, y_sup = select_supervised_samples(dataset) print(X_sup.shape, y_sup.shape) # calculate the number of batches per training epoch bat_per_epo = int(dataset[0].shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) print('n_epochs=%d, n_batch=%d, 1/2=%d, b/e=%d, steps=%d' % (n_epochs, n_batch, half_batch, bat_per_epo, n_steps)) # manually enumerate epochs for i in range(n_steps): # update supervised discriminator (c) [Xsup_real, ysup_real], _ = generate_real_samples([X_sup, y_sup], half_batch) c_loss, c_acc = c_model.train_on_batch(Xsup_real, ysup_real) # update unsupervised discriminator (d) [X_real, _], y_real = generate_real_samples(dataset, half_batch) d_loss1 = d_model.train_on_batch(X_real, y_real) X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) d_loss2 = d_model.train_on_batch(X_fake, y_fake) # update generator (g) X_gan, y_gan = generate_latent_points(latent_dim, n_batch), ones((n_batch, 1)) g_loss = gan_model.train_on_batch(X_gan, y_gan) # summarize loss on this batch print('>%d, c[%.3f,%.0f], d[%.3f,%.3f], g[%.3f]' % (i+1, c_loss, c_acc*100, d_loss1, d_loss2, g_loss)) # evaluate the model performance every so often if (i+1) % (bat_per_epo * 1) == 0: summarize_performance(i, g_model, c_model, latent_dim, dataset) # size of the latent space latent_dim = 100 # create the discriminator models d_model, c_model = define_discriminator() # create the generator g_model = define_generator(latent_dim) # create the gan gan_model = define_gan(g_model, d_model) # load image data dataset = load_real_samples() # train model train(g_model, d_model, c_model, gan_model, dataset, latent_dim)

The example can be run on a workstation with a CPU or GPU hardware, although a GPU is recommended for faster execution.

Given the stochastic nature of the training algorithm, your specific results will vary. Consider running the example a few times.

At the start of the run, the size of the training dataset is summarized, as is the supervised subset, confirming our configuration.

The performance of each model is summarized at the end of each update, including the loss and accuracy of the supervised discriminator model (*c*), the loss of the unsupervised discriminator model on real and generated examples (*d*), and the loss of the generator model updated via the composite model (*g*).

The loss for the supervised model will shrink to a small value close to zero and accuracy will hit 100%, which will be maintained for the entire run. The loss of the unsupervised discriminator and generator should remain at modest values throughout the run if they are kept in equilibrium.

(60000, 28, 28, 1) (60000,) (100, 28, 28, 1) (100,) n_epochs=20, n_batch=100, 1/2=50, b/e=600, steps=12000 >1, c[2.305,6], d[0.096,2.399], g[0.095] >2, c[2.298,18], d[0.089,2.399], g[0.095] >3, c[2.308,10], d[0.084,2.401], g[0.095] >4, c[2.304,8], d[0.080,2.404], g[0.095] >5, c[2.254,18], d[0.077,2.407], g[0.095] ...

The supervised classification model is evaluated on the entire training dataset at the end of every training epoch, in this case after every 600 training updates. At this time, the performance of the model is summarized, showing that it rapidly achieves good skill.

This is surprising given that the model is only trained on 10 labeled examples of each class.

Classifier Accuracy: 85.543% Classifier Accuracy: 91.487% Classifier Accuracy: 92.628% Classifier Accuracy: 94.017% Classifier Accuracy: 94.252% Classifier Accuracy: 93.828% Classifier Accuracy: 94.122% Classifier Accuracy: 93.597% Classifier Accuracy: 95.283% Classifier Accuracy: 95.287% Classifier Accuracy: 95.263% Classifier Accuracy: 95.432% Classifier Accuracy: 95.270% Classifier Accuracy: 95.212% Classifier Accuracy: 94.803% Classifier Accuracy: 94.640% Classifier Accuracy: 93.622% Classifier Accuracy: 91.870% Classifier Accuracy: 92.525% Classifier Accuracy: 92.180%

The models are also saved at the end of each training epoch and plots of generated images are also created.

The quality of the generated images is good given the relatively small number of training epochs.

## How to Load and Use the Final SGAN Classifier Model

Now that we have trained the generator and discriminator models, we can make use of them.

In the case of the semi-supervised GAN, we are less interested in the generator model and more interested in the supervised model.

Reviewing the results for the specific run, we can select a specific saved model that is known to have good performance on the test dataset. In this case, the model saved after 12 training epochs, or 7,200 updates, that had a classification accuracy of about 95.432% on the training dataset.

We can load the model directly via the *load_model()* Keras function.

... # load the model model = load_model('c_model_7200.h5')

Once loaded, we can evaluate it on the entire training dataset again to confirm the finding, then evaluate it on the holdout test dataset.

Recall, the feature extraction layers expect the input images to have the pixel values scaled to the range [-1,1], therefore, this must be performed before any images are provided to the model.

The complete example of loading the saved semi-supervised classifier model and evaluating it in the complete MNIST dataset is listed below.

# example of loading the classifier model and generating images from numpy import expand_dims from keras.models import load_model from keras.datasets.mnist import load_data # load the model model = load_model('c_model_7200.h5') # load the dataset (trainX, trainy), (testX, testy) = load_data() # expand to 3d, e.g. add channels trainX = expand_dims(trainX, axis=-1) testX = expand_dims(testX, axis=-1) # convert from ints to floats trainX = trainX.astype('float32') testX = testX.astype('float32') # scale from [0,255] to [-1,1] trainX = (trainX - 127.5) / 127.5 testX = (testX - 127.5) / 127.5 # evaluate the model _, train_acc = model.evaluate(trainX, trainy, verbose=0) print('Train Accuracy: %.3f%%' % (train_acc * 100)) _, test_acc = model.evaluate(testX, testy, verbose=0) print('Test Accuracy: %.3f%%' % (test_acc * 100))

Running the example loads the model and evaluates it on the MNIST dataset.

We can see that, in this case, the model achieves the expected performance of 95.432% on the training dataset, confirming we have loaded the correct model.

We can also see that the accuracy on the holdout test dataset is as good, or slightly better, at about 95.920%. This shows that the learned classifier has good generalization.

Train Accuracy: 95.432% Test Accuracy: 95.920%

We have successfully demonstrated the training and evaluation of a semi-supervised classifier model fit via the GAN architecture.

## Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

**Standalone Classifier**. Fit a standalone classifier model on the labeled dataset directly and compare performance to the SGAN model.**Number of Labeled Examples**. Repeat the example of more or fewer labeled examples and compare the performance of the model**Model Tuning**. Tune the performance of the discriminator and generator model to further lift the performance of the supervised model closer toward state-of-the-art results.

If you explore any of these extensions, I’d love to know.

Post your findings in the comments below.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Papers

- Semi-Supervised Learning with Generative Adversarial Networks, 2016.
- Improved Techniques for Training GANs, 2016.
- Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks, 2015.
- Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference, 2017.
- Semi-Supervised Learning with GANs: Revisiting Manifold Regularization, 2018.

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?
- MatplotLib API
- NumPy Random sampling (numpy.random) API
- NumPy Array manipulation routines

### Articles

- Semi-supervised learning with GANs, 2018.
- Semi-supervised learning with Generative Adversarial Networks (GANs), 2017.

### Projects

- Improved GAN Project (Official), GitHub.
- Keras GAN Project, GitHub.
- GAN for semi-supervised learning Project, GitHub

## Summary

In this tutorial, you discovered how to develop a Semi-Supervised Generative Adversarial Network from scratch.

Specifically, you learned:

- The semi-supervised GAN is an extension of the GAN architecture for training a classifier model while making use of labeled and unlabeled data.
- There are at least three approaches to implementing the supervised and unsupervised discriminator models in Keras used in the semi-supervised GAN.
- How to train a semi-supervised GAN from scratch on MNIST and load and use the trained classifier for making predictions.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Implement a Semi-Supervised GAN (SGAN) From Scratch in Keras appeared first on Machine Learning Mastery.

## How to Develop an Information Maximizing GAN (InfoGAN) in Keras

The Generative Adversarial Network, or GAN, is an architecture for training deep convolutional models for generating synthetic images.

Although remarkably effective, the default GAN provides no control over the types of images that are generated. The Information Maximizing GAN, or InfoGAN for short, is an extension to the GAN architecture that introduces control variables that are automatically learned by the architecture and allow control over the generated image, such as style, thickness, and type in the case of generating images of handwritten digits.

In this tutorial, you will discover how to implement an Information Maximizing Generative Adversarial Network model from scratch.

After completing this tutorial, you will know:

- The InfoGAN is motivated by the desire to disentangle and control the properties in generated images.
- The InfoGAN involves the addition of control variables to generate an auxiliary model that predicts the control variables, trained via mutual information loss function.
- How to develop and train an InfoGAN model from scratch and use the control variables to control which digit is generated by the model.

Let’s get started.

## Tutorial Overview

This tutorial is divided into four parts; they are:

- What Is the Information Maximizing GAN
- How to Implement the InfoGAN Loss Function
- How to Develop an InfoGAN for MNIST
- How to Use Control Codes With a Trained InfoGAN Model

## What Is the Information Maximizing GAN

The Generative Adversarial Network, or GAN for short, is an architecture for training a generative model, such as a model for generating synthetic images.

It involves the simultaneous training of the generator model for generating images with a discriminator model that learns to classify images as either real (from the training dataset) or fake (generated). The two models compete in a zero-sum game such that convergence of the training process involves finding a balance between the generator’s skill in generating convincing images and the discriminator’s in being able to detect them.

The generator model takes as input a random point from a latent space, typically 50 to 100 random Gaussian variables. The generator applies a unique meaning to the points in the latent space via training and maps points to specific output synthetic images. This means that although the latent space is structured by the generator model, there is no control over the generated image.

The GAN formulation uses a simple factored continuous input noise vector z, while imposing no restrictions on the manner in which the generator may use this noise. As a result, it is possible that the noise will be used by the generator in a highly entangled way, causing the individual dimensions of z to not correspond to semantic features of the data.

— InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, 2016.

The latent space can be explored and generated images compared in an attempt to understand the mapping that the generator model has learned. Alternatively, the generation process can be conditioned, such as via a class label, so that images of a specific type can be created on demand. This is the basis for the Conditional Generative Adversarial Network, CGAN or cGAN for short.

Another approach is to provide control variables as input to the generator, along with the point in latent space (noise). The generator can be trained to use the control variables to influence specific properties of the generated images. This is the approach taken with the Information Maximizing Generative Adversarial Network, or InfoGAN for short.

InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner.

— InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, 2016.

The structured mapping learned by the generator during the training process is somewhat random. Although the generator model learns to spatially separate properties of generated images in the latent space, there is no control. The properties are entangled. The InfoGAN is motivated by the desire to disentangle the properties of generated images.

For example, in the case of faces, the properties of generating a face can be disentangled and controlled, such as the shape of the face, hair color, hairstyle, and so on.

For example, for a dataset of faces, a useful disentangled representation may allocate a separate set of dimensions for each of the following attributes: facial expression, eye color, hairstyle, presence or absence of eyeglasses, and the identity of the corresponding person.

— InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, 2016.

Control variables are provided along with the noise as input to the generator and the model is trained via a mutual information loss function.

… we present a simple modification to the generative adversarial network objective that encourages it to learn interpretable and meaningful representations. We do so by maximizing the mutual information between a fixed small subset of the GAN’s noise variables and the observations, which turns out to be relatively straightforward.

Mutual information refers to the amount of information learned about one variable given another variable. In this case, we are interested in the information about the control variables given the image generated using noise and the control variables.

In information theory, mutual information between X and Y , I(X; Y ), measures the “*amount of information*” learned from knowledge of random variable Y about the other random variable X.

The Mutual Information (MI) is calculated as the conditional entropy of the image (created by the generator (G) from the noise (z) and the control variable (c)) given the control variables (c) subtracted from the marginal entropy of the control variables (c); for example:

- MI = Entropy(c) – Entropy(c | G(z,c))

Calculating the true mutual information, in practice, is often intractable, although simplifications are adopted in the paper, referred to as Variational Information Maximization, and the entropy for the control codes is kept constant.

Training the generator via mutual information is achieved through the use of a new model, referred to as Q or the auxiliary model. The new model shares all of the same weights as the discriminator model for interpreting an input image, but unlike the discriminator model that predicts whether the image is real or fake, the auxiliary model predicts the control codes that were used to generate the image.

Both models are used to update the generator model, first to improve the likelihood of generating images that fool the discriminator model, and second to improve the mutual information between the control codes used to generate an image and the auxiliary model’s prediction of the control codes.

The result is that the generator model is regularized via mutual information loss such that the control codes capture salient properties of the generated images and, in turn, can be used to control the image generation process.

… mutual information can be utilized whenever we are interested in learning a parametrized mapping from a given input X to a higher level representation Y which preserves information about the original input. […] show that the task of maximizing mutual information is essentially equivalent to training an autoencoder to minimize reconstruction error.

— Understanding Mutual Information and its use in InfoGAN, 2016.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## How to Implement the InfoGAN Loss Function

The InfoGAN is reasonably straightforward to implement once you are familiar with the inputs and outputs of the model.

The only stumbling block might be the mutual information loss function, especially if you don’t have a strong math background, like most developers.

There are two main types of control variables used with the InfoGan: categorical and continuous, and continuous variables may have different data distributions, which impact how the mutual loss is calculated. The mutual loss can be calculated and summed across all control variables based on the variable type, and this is the approach used in the official InfoGAN implementation released by OpenAI for TensorFlow.

In Keras, it might be easier to simplify the control variables to categorical and either Gaussian or Uniform continuous variables and have separate outputs on the auxiliary model for each control variable type. This is so that a different loss function can be used, greatly simplifying the implementation.

See the papers and posts in the further reading section for more background on the recommendations in this section.

### Categorical Control Variables

The categorical variable may be used to control the type or class of the generated image.

This is implemented as a one hot encoded vector. That is, if the class has 10 values, then the control code would be one class, e.g. 6, and the categorical control vector input to the generator model would be a 10 element vector of all zero values with a one value for class 6, for example, [0, 0, 0, 0, 0, 0, 1, 0, 0].

We do not need to choose the categorical control variables when training the model; instead, they are generated randomly, e.g. each selected with a uniform probability for each sample.

… a uniform categorical distribution on latent codes c ∼ Cat(K = 10, p = 0.1)

In the auxiliary model, the output layer for the categorical variable would also be a one hot encoded vector to match the input control code, and the softmax activation function is used.

For categorical latent code ci , we use the natural choice of softmax nonlinearity to represent Q(ci |x).

Recall that the mutual information is calculated as the conditional entropy from the control variable and the output of the auxiliary model subtracted from the entropy of the control variable provided to the input variable. We can implement this directly, but it’s not necessary.

The entropy of the control variable is a constant and comes out to be a very small number close to zero; as such, we can remove it from our calculation. The conditional entropy can be calculated directly as the cross-entropy between the control variable input and the output from the auxiliary model. Therefore, the categorical cross-entropy loss function can be used, as we would on any multi-class classification problem.

A hyperparameter, lambda, is used to scale the mutual information loss function and is set to 1, and therefore can be ignored.

Even though InfoGAN introduces an extra hyperparameter λ, it’s easy to tune and simply setting to 1 is sufficient for discrete latent codes.

### Continuous Control Variables

The continuous control variables may be used to control the style of the image.

The continuous variables are sampled from a uniform distribution, such as between -1 and 1, and provided as input to the generator model.

… continuous codes that can capture variations that are continuous in nature: c2, c3 ∼ Unif(−1, 1)

The auxiliary model can implement the prediction of continuous control variables with a Gaussian distribution, where the output layer is configured to have one node, the mean, and one node for the standard deviation of the Gaussian, e.g. two outputs are required for each continuous control variable.

For continuous latent code cj , there are more options depending on what is the true posterior P(cj |x). In our experiments, we have found that simply treating Q(cj |x) as a factored Gaussian is sufficient.

Nodes that output the mean can use a linear activation function, whereas nodes that output the standard deviation must produce a positive value, therefore an activation function such as the sigmoid can be used to create a value between 0 and 1.

For continuous latent codes, we parameterize the approximate posterior through a diagonal Gaussian distribution, and the recognition network outputs its mean and standard deviation, where the standard deviation is parameterized through an exponential transformation of the network output to ensure positivity.

The loss function must be calculated as the mutual information on the Gaussian control codes, meaning they must be reconstructed from the mean and standard deviation prior to calculating the loss. Calculating the entropy and conditional entropy for Gaussian distributed variables can be implemented directly, although is not necessary. Instead, the mean squared error loss can be used.

Alternately, the output distribution can be simplified to a uniform distribution for each control variable, a single output node for each variable in the auxiliary model with linear activation can be used, and the model can use the mean squared error loss function.

## How to Develop an InfoGAN for MNIST

In this section, we will take a closer look at the generator (g), discriminator (d), and auxiliary models (q) and how to implement them in Keras.

We will develop an InfoGAN implementation for the MNIST dataset, as was done in the InfoGAN paper.

The paper explores two versions; the first uses just categorical control codes and allows the model to map one categorical variable to approximately one digit (although there is no ordering of the digits by categorical variables).

The paper also explores a version of the InfoGAN architecture with the one hot encoded categorical variable (c1) and two continuous control variables (c2 and c3).

The first continuous variable is discovered to control the rotation of the digits and the second controls the thickness of the digits.

We will focus on the simpler case of using a categorical control variable with 10 values and encourage the model to learn to let this variable control the generated digit. You may want to extend this example by either changing the cardinality of the categorical control variable or adding continuous control variables.

The configuration of the GAN model used for training on the MNIST dataset was provided as an appendix to the paper, reproduced below. We will use the listed configuration as a starting point in developing our own generator (g), discriminator (d), and auxiliary (q) models.

Let’s start off by developing the generator model as a deep convolutional neural network (e.g. a DCGAN).

The model could take the noise vector (z) and control vector (c) as separate inputs and concatenate them before using them as the basis for generating the image. Alternately, the vectors can be concatenated beforehand and provided to a single input layer in the model. The approaches are equivalent and we will use the latter in this case to keep the model simple.

The *define_generator()* function below defines the generator model and takes the size of the input vector as an argument.

A fully connected layer takes the input vector and produces a sufficient number of activations to create 512 7×7 feature maps from which the activations are reshaped. These then pass through a normal convolutional layer with 1×1 stride, then two subsequent upsamplings transpose convolutional layers with a 2×2 stride first to 14×14 feature maps then to the desired 1 channel 28×28 feature map output with pixel values in the range [-1,-1] via a tanh activation function.

Good generator configuration heuristics are as follows, including a random Gaussian weight initialization, ReLU activations in the hidden layers, and use of batch normalization.

# define the standalone generator model def define_generator(gen_input_size): # weight initialization init = RandomNormal(stddev=0.02) # image generator input in_lat = Input(shape=(gen_input_size,)) # foundation for 7x7 image n_nodes = 512 * 7 * 7 gen = Dense(n_nodes, kernel_initializer=init)(in_lat) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) gen = Reshape((7, 7, 512))(gen) # normal gen = Conv2D(128, (4,4), padding='same', kernel_initializer=init)(gen) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) # upsample to 14x14 gen = Conv2DTranspose(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(gen) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) # upsample to 28x28 gen = Conv2DTranspose(1, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(gen) # tanh output out_layer = Activation('tanh')(gen) # define model model = Model(in_lat, out_layer) return model

Next, we can define the discriminator and auxiliary models.

The discriminator model is trained in a standalone manner on real and fake images, as per a normal GAN. Neither the generator nor the auxiliary models are fit directly; instead, they are fit as part of a composite model.

Both the discriminator and auxiliary models share the same input and feature extraction layers but differ in their output layers. Therefore, it makes sense to define them both at the same time.

Again, there are many ways that this architecture could be implemented, but defining the discriminator and auxiliary models as separate models first allows us later to combine them into a larger GAN model directly via the functional API.

The *define_discriminator()* function below defines the discriminator and auxiliary models and takes the cardinality of the categorical variable (e.g.number of values, such as 10) as an input. The shape of the input image is also parameterized as a function argument and set to the default value of the size of the MNIST images.

The feature extraction layers involve two downsampling layers, used instead of pooling layers as a best practice. Also following best practice for DCGAN models, we use the LeakyReLU activation and batch normalization.

The discriminator model (d) has a single output node and predicts the probability of an input image being real via the sigmoid activation function. The model is compiled as it will be used in a standalone way, optimizing the binary cross entropy function via the Adam version of stochastic gradient descent with best practice learning rate and momentum.

The auxiliary model (q) has one node output for each value in the categorical variable and uses a softmax activation function. A fully connected layer is added between the feature extraction layers and the output layer, as was used in the InfoGAN paper. The model is not compiled as it is not for or used in a standalone manner.

# define the standalone discriminator model def define_discriminator(n_cat, in_shape=(28,28,1)): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=in_shape) # downsample to 14x14 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) d = LeakyReLU(alpha=0.1)(d) # downsample to 7x7 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = LeakyReLU(alpha=0.1)(d) d = BatchNormalization()(d) # normal d = Conv2D(256, (4,4), padding='same', kernel_initializer=init)(d) d = LeakyReLU(alpha=0.1)(d) d = BatchNormalization()(d) # flatten feature maps d = Flatten()(d) # real/fake output out_classifier = Dense(1, activation='sigmoid')(d) # define d model d_model = Model(in_image, out_classifier) # compile d model d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) # create q model layers q = Dense(128)(d) q = BatchNormalization()(q) q = LeakyReLU(alpha=0.1)(q) # q model output out_codes = Dense(n_cat, activation='softmax')(q) # define q model q_model = Model(in_image, out_codes) return d_model, q_model

Next, we can define the composite GAN model.

This model uses all submodels and is the basis for training the weights of the generator model.

The *define_gan()* function below implements this and defines and returns the model, taking the three submodels as input.

The discriminator is trained in a standalone manner as mentioned, therefore all weights of the discriminator are set as not trainable (in this context only). The output of the generator model is connected to the input of the discriminator model, and to the input of the auxiliary model.

This creates a new composite model that takes a [noise + control] vector as input, that then passes through the generator to produce an image. The image then passes through the discriminator model to produce a classification and through the auxiliary model to produce a prediction of the control variables.

The model has two output layers that need to be trained with different loss functions. Binary cross entropy loss is used for the discriminator output, as we did when compiling the discriminator for standalone use, and mutual information loss is used for the auxiliary model, which, in this case, can be implemented directly as categorical cross-entropy and achieve the desired result.

# define the combined discriminator, generator and q network model def define_gan(g_model, d_model, q_model): # make weights in the discriminator (some shared with the q model) as not trainable d_model.trainable = False # connect g outputs to d inputs d_output = d_model(g_model.output) # connect g outputs to q inputs q_output = q_model(g_model.output) # define composite model model = Model(g_model.input, [d_output, q_output]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'categorical_crossentropy'], optimizer=opt) return model

To make the GAN model architecture clearer, we can create the models and a plot of the composite model.

The complete example is listed below.

# create and plot the infogan model for mnist from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Activation from keras.initializers import RandomNormal from keras.utils.vis_utils import plot_model # define the standalone discriminator model def define_discriminator(n_cat, in_shape=(28,28,1)): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=in_shape) # downsample to 14x14 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) d = LeakyReLU(alpha=0.1)(d) # downsample to 7x7 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = LeakyReLU(alpha=0.1)(d) d = BatchNormalization()(d) # normal d = Conv2D(256, (4,4), padding='same', kernel_initializer=init)(d) d = LeakyReLU(alpha=0.1)(d) d = BatchNormalization()(d) # flatten feature maps d = Flatten()(d) # real/fake output out_classifier = Dense(1, activation='sigmoid')(d) # define d model d_model = Model(in_image, out_classifier) # compile d model d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) # create q model layers q = Dense(128)(d) q = BatchNormalization()(q) q = LeakyReLU(alpha=0.1)(q) # q model output out_codes = Dense(n_cat, activation='softmax')(q) # define q model q_model = Model(in_image, out_codes) return d_model, q_model # define the standalone generator model def define_generator(gen_input_size): # weight initialization init = RandomNormal(stddev=0.02) # image generator input in_lat = Input(shape=(gen_input_size,)) # foundation for 7x7 image n_nodes = 512 * 7 * 7 gen = Dense(n_nodes, kernel_initializer=init)(in_lat) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) gen = Reshape((7, 7, 512))(gen) # normal gen = Conv2D(128, (4,4), padding='same', kernel_initializer=init)(gen) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) # upsample to 14x14 gen = Conv2DTranspose(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(gen) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) # upsample to 28x28 gen = Conv2DTranspose(1, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(gen) # tanh output out_layer = Activation('tanh')(gen) # define model model = Model(in_lat, out_layer) return model # define the combined discriminator, generator and q network model def define_gan(g_model, d_model, q_model): # make weights in the discriminator (some shared with the q model) as not trainable d_model.trainable = False # connect g outputs to d inputs d_output = d_model(g_model.output) # connect g outputs to q inputs q_output = q_model(g_model.output) # define composite model model = Model(g_model.input, [d_output, q_output]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'categorical_crossentropy'], optimizer=opt) return model # number of values for the categorical control code n_cat = 10 # size of the latent space latent_dim = 62 # create the discriminator d_model, q_model = define_discriminator(n_cat) # create the generator gen_input_size = latent_dim + n_cat g_model = define_generator(gen_input_size) # create the gan gan_model = define_gan(g_model, d_model, q_model) # plot the model plot_model(gan_model, to_file='gan_plot.png', show_shapes=True, show_layer_names=True)

Running the example creates all three models, then creates the composite GAN model and saves a plot of the model architecture.

**Note**: creating this plot assumes that the pydot and graphviz libraries are installed. If this is a problem, you can comment out the import statement and the call to the *plot_model()* function.

The plot shows all of the detail for the generator model and the compressed description of the discriminator and auxiliary models. Importantly, note the shape of the output of the discriminator as a single node for predicting whether the image is real or fake, and the 10 nodes for the auxiliary model to predict the categorical control code.

Recall that this composite model will only be used to update the model weights of the generator and auxiliary models, and all weights in the discriminator model will remain untrainable, i.e. only updated when the standalone discriminator model is updated.

Next, we will develop inputs for the generator.

Each input will be a vector comprised of noise and the control codes. Specifically, a vector of Gaussian random numbers and a one hot encoded randomly selected categorical value.

The *generate_latent_points()* function below implements this, taking as input the size of the latent space, the number of categorical values, and the number of samples to generate as arguments. The function returns the input concatenated vectors as input for the generator model, as well as the standalone control codes. The standalone control codes will be required when updating the generator and auxiliary models via the composite GAN model, specifically for calculating the mutual information loss for the auxiliary model.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_cat, n_samples): # generate points in the latent space z_latent = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_latent = z_latent.reshape(n_samples, latent_dim) # generate categorical codes cat_codes = randint(0, n_cat, n_samples) # one hot encode cat_codes = to_categorical(cat_codes, num_classes=n_cat) # concatenate latent points and control codes z_input = hstack((z_latent, cat_codes)) return [z_input, cat_codes]

Next, we can generate real and fake examples.

The MNIST dataset can be loaded, transformed into 3D input by adding an additional dimension for the grayscale images, and scaling all pixel values to the range [-1,1] to match the output from the generator model. This is implemented in the *load_real_samples()* function below.

We can retrieve batches of real samples required when training the discriminator by choosing a random subset of the dataset. This is implemented in the *generate_real_samples()* function below that returns the images and the class label of 1, to indicate to the discriminator that they are real images.

The discriminator also requires batches of fake samples generated via the generator, using the vectors from *generate_latent_points()* function as input. The *generate_fake_samples()* function below implements this, returning the generated images along with the class label of 0, to indicate to the discriminator that they are fake images.

# load images def load_real_samples(): # load dataset (trainX, _), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 print(X.shape) return X # select real samples def generate_real_samples(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # select images and labels X = dataset[ix] # generate class labels y = ones((n_samples, 1)) return X, y # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_cat, n_samples): # generate points in latent space and control codes z_input, _ = generate_latent_points(latent_dim, n_cat, n_samples) # predict outputs images = generator.predict(z_input) # create class labels y = zeros((n_samples, 1)) return images, y

Next, we need to keep track of the quality of the generated images.

We will periodically use the generator to generate a sample of images and save the generator and composite models to file. We can then review the generated images at the end of training in order to choose a final generator model and load the model to start using it to generate images.

The *summarize_performance()* function below implements this, first generating 100 images, scaling their pixel values back to the range [0,1], and saving them as a plot of images in a 10×10 square.

The generator and composite GAN models are also saved to file, with a unique filename based on the training iteration number.

# generate samples and save as a plot and save the model def summarize_performance(step, g_model, gan_model, latent_dim, n_cat, n_samples=100): # prepare fake examples X, _ = generate_fake_samples(g_model, latent_dim, n_cat, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(100): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%04d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%04d.h5' % (step+1) g_model.save(filename2) # save the gan model filename3 = 'gan_model_%04d.h5' % (step+1) gan_model.save(filename3) print('>Saved: %s, %s, and %s' % (filename1, filename2, filename3))

Finally, we can train the InfoGAN.

This is implemented in the *train()* function below that takes the defined models and configuration as arguments and runs the training process.

The models are trained for 100 epochs and 64 samples are used in each batch. There are 60,000 images in the MNIST training dataset, therefore one epoch involves 60,000/64, or 937 batches or training iterations. Multiplying this by the number of epochs, or 100, means that there will be a total of 93,700 total training iterations.

Each training iteration involves first updating the discriminator with half a batch of real samples and half a batch of fake samples to form one batch worth of weight updates, or 64, each iteration. Next, the composite GAN model is updated based on a batch worth of noise and control code inputs. The loss of the discriminator on real and fake images and the loss of the generator and auxiliary model is reported each training iteration.

# train the generator and discriminator def train(g_model, d_model, gan_model, dataset, latent_dim, n_cat, n_epochs=100, n_batch=64): # calculate the number of batches per training epoch bat_per_epo = int(dataset.shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # manually enumerate epochs for i in range(n_steps): # get randomly selected 'real' samples X_real, y_real = generate_real_samples(dataset, half_batch) # update discriminator and q model weights d_loss1 = d_model.train_on_batch(X_real, y_real) # generate 'fake' examples X_fake, y_fake = generate_fake_samples(g_model, latent_dim, n_cat, half_batch) # update discriminator model weights d_loss2 = d_model.train_on_batch(X_fake, y_fake) # prepare points in latent space as input for the generator z_input, cat_codes = generate_latent_points(latent_dim, n_cat, n_batch) # create inverted labels for the fake samples y_gan = ones((n_batch, 1)) # update the g via the d and q error _,g_1,g_2 = gan_model.train_on_batch(z_input, [y_gan, cat_codes]) # summarize loss on this batch print('>%d, d[%.3f,%.3f], g[%.3f] q[%.3f]' % (i+1, d_loss1, d_loss2, g_1, g_2)) # evaluate the model performance every 'epoch' if (i+1) % (bat_per_epo * 10) == 0: summarize_performance(i, g_model, gan_model, latent_dim, n_cat)

We can then configure and create the models, then run the training process.

We will use 10 values for the single categorical variable to match the 10 known classes in the MNIST dataset. We will use a latent space with 64 dimensions to match the InfoGAN paper, meaning, in this case, each input vector to the generator model will be 64 (random Gaussian variables) + 10 (one hot encoded control variable) or 72 elements in length.

# number of values for the categorical control code n_cat = 10 # size of the latent space latent_dim = 62 # create the discriminator d_model, q_model = define_discriminator(n_cat) # create the generator gen_input_size = latent_dim + n_cat g_model = define_generator(gen_input_size) # create the gan gan_model = define_gan(g_model, d_model, q_model) # load image data dataset = load_real_samples() # train model train(g_model, d_model, gan_model, dataset, latent_dim, n_cat)

Tying this together, the complete example of training an InfoGAN model on the MNIST dataset with a single categorical control variable is listed below.

# example of training an infogan on mnist from numpy import zeros from numpy import ones from numpy import expand_dims from numpy import hstack from numpy.random import randn from numpy.random import randint from keras.datasets.mnist import load_data from keras.optimizers import Adam from keras.initializers import RandomNormal from keras.utils import to_categorical from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Activation from matplotlib import pyplot # define the standalone discriminator model def define_discriminator(n_cat, in_shape=(28,28,1)): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=in_shape) # downsample to 14x14 d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_image) d = LeakyReLU(alpha=0.1)(d) # downsample to 7x7 d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d) d = LeakyReLU(alpha=0.1)(d) d = BatchNormalization()(d) # normal d = Conv2D(256, (4,4), padding='same', kernel_initializer=init)(d) d = LeakyReLU(alpha=0.1)(d) d = BatchNormalization()(d) # flatten feature maps d = Flatten()(d) # real/fake output out_classifier = Dense(1, activation='sigmoid')(d) # define d model d_model = Model(in_image, out_classifier) # compile d model d_model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5)) # create q model layers q = Dense(128)(d) q = BatchNormalization()(q) q = LeakyReLU(alpha=0.1)(q) # q model output out_codes = Dense(n_cat, activation='softmax')(q) # define q model q_model = Model(in_image, out_codes) return d_model, q_model # define the standalone generator model def define_generator(gen_input_size): # weight initialization init = RandomNormal(stddev=0.02) # image generator input in_lat = Input(shape=(gen_input_size,)) # foundation for 7x7 image n_nodes = 512 * 7 * 7 gen = Dense(n_nodes, kernel_initializer=init)(in_lat) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) gen = Reshape((7, 7, 512))(gen) # normal gen = Conv2D(128, (4,4), padding='same', kernel_initializer=init)(gen) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) # upsample to 14x14 gen = Conv2DTranspose(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(gen) gen = Activation('relu')(gen) gen = BatchNormalization()(gen) # upsample to 28x28 gen = Conv2DTranspose(1, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(gen) # tanh output out_layer = Activation('tanh')(gen) # define model model = Model(in_lat, out_layer) return model # define the combined discriminator, generator and q network model def define_gan(g_model, d_model, q_model): # make weights in the discriminator (some shared with the q model) as not trainable d_model.trainable = False # connect g outputs to d inputs d_output = d_model(g_model.output) # connect g outputs to q inputs q_output = q_model(g_model.output) # define composite model model = Model(g_model.input, [d_output, q_output]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'categorical_crossentropy'], optimizer=opt) return model # load images def load_real_samples(): # load dataset (trainX, _), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 print(X.shape) return X # select real samples def generate_real_samples(dataset, n_samples): # choose random instances ix = randint(0, dataset.shape[0], n_samples) # select images and labels X = dataset[ix] # generate class labels y = ones((n_samples, 1)) return X, y # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_cat, n_samples): # generate points in the latent space z_latent = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_latent = z_latent.reshape(n_samples, latent_dim) # generate categorical codes cat_codes = randint(0, n_cat, n_samples) # one hot encode cat_codes = to_categorical(cat_codes, num_classes=n_cat) # concatenate latent points and control codes z_input = hstack((z_latent, cat_codes)) return [z_input, cat_codes] # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_cat, n_samples): # generate points in latent space and control codes z_input, _ = generate_latent_points(latent_dim, n_cat, n_samples) # predict outputs images = generator.predict(z_input) # create class labels y = zeros((n_samples, 1)) return images, y # generate samples and save as a plot and save the model def summarize_performance(step, g_model, gan_model, latent_dim, n_cat, n_samples=100): # prepare fake examples X, _ = generate_fake_samples(g_model, latent_dim, n_cat, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(100): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%04d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%04d.h5' % (step+1) g_model.save(filename2) # save the gan model filename3 = 'gan_model_%04d.h5' % (step+1) gan_model.save(filename3) print('>Saved: %s, %s, and %s' % (filename1, filename2, filename3)) # train the generator and discriminator def train(g_model, d_model, gan_model, dataset, latent_dim, n_cat, n_epochs=100, n_batch=64): # calculate the number of batches per training epoch bat_per_epo = int(dataset.shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # manually enumerate epochs for i in range(n_steps): # get randomly selected 'real' samples X_real, y_real = generate_real_samples(dataset, half_batch) # update discriminator and q model weights d_loss1 = d_model.train_on_batch(X_real, y_real) # generate 'fake' examples X_fake, y_fake = generate_fake_samples(g_model, latent_dim, n_cat, half_batch) # update discriminator model weights d_loss2 = d_model.train_on_batch(X_fake, y_fake) # prepare points in latent space as input for the generator z_input, cat_codes = generate_latent_points(latent_dim, n_cat, n_batch) # create inverted labels for the fake samples y_gan = ones((n_batch, 1)) # update the g via the d and q error _,g_1,g_2 = gan_model.train_on_batch(z_input, [y_gan, cat_codes]) # summarize loss on this batch print('>%d, d[%.3f,%.3f], g[%.3f] q[%.3f]' % (i+1, d_loss1, d_loss2, g_1, g_2)) # evaluate the model performance every 'epoch' if (i+1) % (bat_per_epo * 10) == 0: summarize_performance(i, g_model, gan_model, latent_dim, n_cat) # number of values for the categorical control code n_cat = 10 # size of the latent space latent_dim = 62 # create the discriminator d_model, q_model = define_discriminator(n_cat) # create the generator gen_input_size = latent_dim + n_cat g_model = define_generator(gen_input_size) # create the gan gan_model = define_gan(g_model, d_model, q_model) # load image data dataset = load_real_samples() # train model train(g_model, d_model, gan_model, dataset, latent_dim, n_cat)

Running the example may take some time, and GPU hardware is recommended, but not required.

**Note**: Given the stochastic nature of the training algorithm, your results may vary. Try running the example a few times.

The loss across the models is reported each training iteration. If the loss for the discriminator remains at 0.0 or goes to 0.0 for an extended time, this may be a sign of a training failure and you may want to restart the training process. The discriminator loss may start at 0.0, but will likely rise, as it did in this specific case.

The loss for the auxiliary model will likely go to zero, as it perfectly predicts the categorical variable. Loss for the generator and discriminator models is likely to hover around 1.0 eventually, to demonstrate a stable training process or equilibrium between the training of the two models.

>1, d[0.924,0.758], g[0.448] q[2.909] >2, d[0.000,2.699], g[0.547] q[2.704] >3, d[0.000,1.557], g[1.175] q[2.820] >4, d[0.000,0.941], g[1.466] q[2.813] >5, d[0.000,1.013], g[1.908] q[2.715] ... >93696, d[0.814,1.212], g[1.283] q[0.000] >93697, d[1.063,0.920], g[1.132] q[0.000] >93698, d[0.999,1.188], g[1.128] q[0.000] >93699, d[0.935,0.985], g[1.229] q[0.000] >93700, d[0.968,1.016], g[1.200] q[0.001] >Saved: generated_plot_93700.png, model_93700.h5, and gan_model_93700.h5

Plots and models are saved every 10 epochs or every 9,370 training iterations.

Reviewing the plots should show poor quality images in early epochs and improved and stable quality images in later epochs.

For example, the plot of images saved after the first 10 epochs is below showing low-quality generated images.

More epochs does not mean better quality, meaning that the best quality images may not be those from the final model saved at the end of training.

Review the plots and choose a final model with the best image quality. In this case, we will use the model saved after 100 epochs or 93,700 training iterations.

## How to Use Control Codes With a Trained InfoGAN Model

Now that we have trained the InfoGAN model, we can explore how to use it.

First, we can load the model and use it to generate random images, as we did during training.

The complete example is listed below.

Change the model filename to match the model filename that generated the best images during your training run.

# example of loading the generator model and generating images from math import sqrt from numpy import hstack from numpy.random import randn from numpy.random import randint from keras.models import load_model from keras.utils import to_categorical from matplotlib import pyplot # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_cat, n_samples): # generate points in the latent space z_latent = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_latent = z_latent.reshape(n_samples, latent_dim) # generate categorical codes cat_codes = randint(0, n_cat, n_samples) # one hot encode cat_codes = to_categorical(cat_codes, num_classes=n_cat) # concatenate latent points and control codes z_input = hstack((z_latent, cat_codes)) return [z_input, cat_codes] # create a plot of generated images def create_plot(examples, n_examples): # plot images for i in range(n_examples): # define subplot pyplot.subplot(sqrt(n_examples), sqrt(n_examples), 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(examples[i, :, :, 0], cmap='gray_r') pyplot.show() # load model model = load_model('model_93700.h5') # number of values for the categorical control code n_cat = 10 # size of the latent space latent_dim = 62 # number of examples to generate n_samples = 100 # generate points in latent space and control codes z_input, _ = generate_latent_points(latent_dim, n_cat, n_samples) # predict outputs X = model.predict(z_input) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot the result create_plot(X, n_samples)

Running the example will load the saved generator model and use it to generate 100 random images and plot the images on a 10×10 grid.

Next, we can update the example to test how much control our control variable gives us.

We can update the *generate_latent_points()* function to take an argument of the value for the categorical value in [0,9], encode it, and use it as input along with noise vectors.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_cat, n_samples, digit): # generate points in the latent space z_latent = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_latent = z_latent.reshape(n_samples, latent_dim) # define categorical codes cat_codes = asarray([digit for _ in range(n_samples)]) # one hot encode cat_codes = to_categorical(cat_codes, num_classes=n_cat) # concatenate latent points and control codes z_input = hstack((z_latent, cat_codes)) return [z_input, cat_codes]

We can test this by generating a grid of 25 images with the categorical value 1.

The complete example is listed below.

# example of testing different values of the categorical control variable from math import sqrt from numpy import asarray from numpy import hstack from numpy.random import randn from numpy.random import randint from keras.models import load_model from keras.utils import to_categorical from matplotlib import pyplot # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_cat, n_samples, digit): # generate points in the latent space z_latent = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_latent = z_latent.reshape(n_samples, latent_dim) # define categorical codes cat_codes = asarray([digit for _ in range(n_samples)]) # one hot encode cat_codes = to_categorical(cat_codes, num_classes=n_cat) # concatenate latent points and control codes z_input = hstack((z_latent, cat_codes)) return [z_input, cat_codes] # create and save a plot of generated images def save_plot(examples, n_examples): # plot images for i in range(n_examples): # define subplot pyplot.subplot(sqrt(n_examples), sqrt(n_examples), 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(examples[i, :, :, 0], cmap='gray_r') pyplot.show() # load model model = load_model('model_93700.h5') # number of categorical control codes n_cat = 10 # size of the latent space latent_dim = 62 # number of examples to generate n_samples = 25 # define digit digit = 1 # generate points in latent space and control codes z_input, _ = generate_latent_points(latent_dim, n_cat, n_samples, digit) # predict outputs X = model.predict(z_input) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot the result save_plot(X, n_samples)

The result is a grid of 25 generated images generated with the categorical code set to the value 1.

Note: Given the stochastic nature of the training algorithm, your results may vary.

The values of the control code are expected to influence the generated images; specifically, they are expected to influence the digit type. They are not expected to be ordered though, e.g. control codes of 1, 2, and 3 to create those digits.

Nevertheless, in this case, the control code with a value of 1 has resulted in images generated that look like a 1.

Experiment with different digits and review what the value is controlling exactly about the image.

For example, setting the value to 5 in this case (digit = 5) results in generated images that look like the number “*8*“.

## Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

**Change Cardinality**. Update the example to use different cardinality of the categorical control variable (e.g. more or fewer values) and review the effect on the training process and the control over generated images.**Uniform Control Variables**. Update the example and add two uniform continuous control variables to the auxiliary model and review the effect on the training process and the control over generated images.**Gaussian Control Variables**. Update the example and add two Gaussian continuous control variables to the auxiliary model and review the effect on the training process and the control over generated images.

If you explore any of these extensions, I’d love to know.

Post your findings in the comments below.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Papers

- InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, 2016.
- Understanding Mutual Information and its use in InfoGAN, 2016.
- The IM algorithm: a variational approach to information maximization, 2004.

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?
- MatplotLib API
- NumPy Random sampling (numpy.random) API
- NumPy Array manipulation routines

### Projects

- InfoGAN (Official), OpenAI, GitHub.
- Keras Infogan: Keras implementation of InfoGAN, GitHub.
- Keras-GAN: Keras implementations of Generative Adversarial Networks, GitHub.
- Advanced-Deep-Learning-with-Keras, PacktPublishing, GitHub.
- DeepLearningImplementations: Implementation of recent Deep Learning papers, GitHub.

### Articles

- Mutual information, Wikipedia.
- Conditional mutual information, Wikipedia.
- InfoGAN: using the variational bound on mutual information (twice), 2016.
- GANs, mutual information, and possibly algorithm selection?, 2016.
- Implementing InfoGAN: easier than it seems?, 2016.
- cost function question, DeepLearningImplementations Project, GitHub.

## Summary

In this tutorial, you discovered how to implement an Information Maximizing Generative Adversarial Network model from scratch.

Specifically, you learned:

- The InfoGAN is motivated by the desire to disentangle and control the properties in generated images.
- The InfoGAN involves the addition of control variables to generate an auxiliary model that predicts the control variables, trained via mutual information loss function.
- How to develop and train an InfoGAN model from scratch and use the control variables to control which digit is generated by the model.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Develop an Information Maximizing GAN (InfoGAN) in Keras appeared first on Machine Learning Mastery.

## How to Develop an Auxiliary Classifier GAN (AC-GAN) From Scratch with Keras

Generative Adversarial Networks, or GANs, are an architecture for training generative models, such as deep convolutional neural networks for generating images.

The conditional generative adversarial network, or cGAN for short, is a type of GAN that involves the conditional generation of images by a generator model. Image generation can be conditional on a class label, if available, allowing the targeted generated of images of a given type.

The Auxiliary Classifier GAN, or AC-GAN for short, is an extension of the conditional GAN that changes the discriminator to predict the class label of a given image rather than receive it as input. It has the effect of stabilizing the training process and allowing the generation of large high-quality images whilst learning a representation in the latent space that is independent of the class label.

In this tutorial, you will discover how to develop an auxiliary classifier generative adversarial network for generating photographs of clothing.

After completing this tutorial, you will know:

- The auxiliary classifier GAN is a type of conditional GAN that requires that the discriminator predict the class label of a given image.
- How to develop generator, discriminator, and composite models for the AC-GAN.
- How to train, evaluate, and use an AC-GAN to generate photographs of clothing from the Fashion-MNIST dataset.

Let’s get started.

## Tutorial Overview

This tutorial is divided into five parts; they are:

- Auxiliary Classifier Generative Adversarial Networks
- Fashion-MNIST Clothing Photograph Dataset
- How to Define AC-GAN Models
- How to Develop an AC-GAN for Fashion-MNIST
- How to Generate Items of Clothing With the AC-GAN

## Auxiliary Classifier Generative Adversarial Networks

The generative adversarial network is an architecture for training a generative model, typically deep convolutional neural networks for generating image.

The architecture is comprised of both a generator model that takes random points from a latent space as input and generates images, and a discriminator for classifying images as either real (from the dataset) or fake (generate). Both models are then trained simultaneously in a zero-sum game.

A conditional GAN, cGAN or CGAN for short, is an extension of the GAN architecture that adds structure to the latent space. The training of the GAN model is changed so that the generator is provided both with a point in the latent space and a class label as input, and attempts to generate an image for that class. The discriminator is provided with both an image and the class label and must classify whether the image is real or fake as before.

The addition of the class as input makes the image generation process, and image classification process, conditional on the class label, hence the name. The effect is both a more stable training process and a resulting generator model that can be used to generate images of a given specific type, e.g. for a class label.

The Auxiliary Classifier GAN, or AC-GAN for short, is a further extension of the GAN architecture building upon the CGAN extension. It was introduced by Augustus Odena, et al. from Google Brain in the 2016 paper titled “Conditional Image Synthesis with Auxiliary Classifier GANs.”

As with the conditional GAN, the generator model in the AC-GAN is provided both with a point in the latent space and the class label as input, e.g. the image generation process is conditional.

The main difference is in the discriminator model, which is only provided with the image as input, unlike the conditional GAN that is provided with the image and class label as input. The discriminator model must then predict whether the given image is real or fake as before, and must also predict the class label of the image.

… the model […] is class conditional, but with an auxiliary decoder that is tasked with reconstructing class labels.

— Conditional Image Synthesis With Auxiliary Classifier GANs, 2016.

The architecture is described in such a way that the discriminator and auxiliary classifier may be considered separate models that share model weights. In practice, the discriminator and auxiliary classifier can be implemented as a single neural network model with two outputs.

The first output is a single probability via the sigmoid activation function that indicates the “*realness*” of the input image and is optimized using binary cross entropy like a normal GAN discriminator model.

The second output is a probability of the image belonging to each class via the softmax activation function, like any given multi-class classification neural network model, and is optimized using categorical cross entropy.

To summarize:

#### Generator Model:

**Input**: Random point from the latent space, and the class label.**Output**: Generated image.

#### Discriminator Model:

**Input**: Image.**Output**: Probability that the provided image is real, probability of the image belonging to each known class.

The plot below summarizes the inputs and outputs of a range of conditional GANs, including the AC-GAN, providing some context for the differences.

The discriminator seeks to maximize the probability of correctly classifying real and fake images (LS) and correctly predicting the class label (LC) of a real or fake image (e.g. LS + LC). The generator seeks to minimize the ability of the discriminator to discriminate real and fake images whilst also maximizing the ability of the discriminator predicting the class label of real and fake images (e.g. LC – LS).

The objective function has two parts: the log-likelihood of the correct source, LS, and the log-likelihood of the correct class, LC. […] D is trained to maximize LS + LC while G is trained to maximize LC − LS.

— Conditional Image Synthesis With Auxiliary Classifier GANs, 2016.

The resulting generator learns a latent space representation that is independent of the class label, unlike the conditional GAN.

AC-GANs learn a representation for z that is independent of class label.

— Conditional Image Synthesis With Auxiliary Classifier GANs, 2016.

The effect of changing the conditional GAN in this way is both a more stable training process and the ability of the model to generate higher quality images with a larger size than had been previously possible, e.g. 128×128 pixels.

… we demonstrate that adding more structure to the GAN latent space along with a specialized cost function results in higher quality samples. […] Importantly, we demonstrate quantitatively that our high resolution samples are not just naive resizings of low resolution samples.

— Conditional Image Synthesis With Auxiliary Classifier GANs, 2016.

### Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

## Fashion-MNIST Clothing Photograph Dataset

The Fashion-MNIST dataset is proposed as a more challenging replacement dataset for the MNIST handwritten digit dataset.

It is a dataset comprised of 60,000 small square 28×28 pixel grayscale images of items of 10 types of clothing, such as shoes, t-shirts, dresses, and more.

Keras provides access to the Fashion-MNIST dataset via the fashion_mnist.load_dataset() function. It returns two tuples, one with the input and output elements for the standard training dataset, and another with the input and output elements for the standard test dataset.

The example below loads the dataset and summarizes the shape of the loaded dataset.

**Note**: the first time you load the dataset, Keras will automatically download a compressed version of the images and save them under your home directory in *~/.keras/datasets/*. The download is fast as the dataset is only about 25 megabytes in its compressed form.

# example of loading the fashion_mnist dataset from keras.datasets.fashion_mnist import load_data # load the images into memory (trainX, trainy), (testX, testy) = load_data() # summarize the shape of the dataset print('Train', trainX.shape, trainy.shape) print('Test', testX.shape, testy.shape)

Running the example loads the dataset and prints the shape of the input and output components of the train and test splits of images.

We can see that there are 60K examples in the training set and 10K in the test set and that each image is a square of 28 by 28 pixels.

Train (60000, 28, 28) (60000,) Test (10000, 28, 28) (10000,)

The images are grayscale with a black background (0 pixel value) and the items of clothing in white ( pixel values near 255). This means if the images were plotted, they would be mostly black with a white item of clothing in the middle.

We can plot some of the images from the training dataset using the matplotlib library with the imshow() function and specify the color map via the ‘*cmap*‘ argument as ‘*gray*‘ to show the pixel values correctly.

# plot raw pixel data pyplot.imshow(trainX[i], cmap='gray')

Alternately, the images are easier to review when we reverse the colors and plot the background as white and the clothing in black.

They are easier to view as most of the image is now white with the area of interest in black. This can be achieved using a reverse grayscale color map, as follows:

# plot raw pixel data pyplot.imshow(trainX[i], cmap='gray_r')

The example below plots the first 100 images from the training dataset in a 10 by 10 square.

# example of loading the fashion_mnist dataset from keras.datasets.fashion_mnist import load_data from matplotlib import pyplot # load the images into memory (trainX, trainy), (testX, testy) = load_data() # plot images from the training dataset for i in range(100): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(trainX[i], cmap='gray_r') pyplot.show()

Running the example creates a figure with a plot of 100 images from the MNIST training dataset, arranged in a 10×10 square.

We will use the images in the training dataset as the basis for training a Generative Adversarial Network.

Specifically, the generator model will learn how to generate new plausible items of clothing, using a discriminator that will try to distinguish between real images from the Fashion MNIST training dataset and new images output by the generator model, and predict the class label for each.

This is a relatively simple problem that does not require a sophisticated generator or discriminator model, although it does require the generation of a grayscale output image.

## How to Define AC-GAN Models

In this section, we will develop the generator, discriminator, and composite models for the AC-GAN.

The appendix of the AC-GAN paper provides suggestions for generator and discriminator configurations that we will use as inspiration. The table below summarizes these suggestions for the CIFAR-10 dataset, taken from the paper.

### AC-GAN Discriminator Model

Let’s start with the discriminator model.

The discriminator model must take as input an image and predict both the probability of the ‘*realness*‘ of the image and the probability of the image belonging to each of the given classes.

The input images will have the shape 28x28x1 and there are 10 classes for the items of clothing in the Fashion MNIST dataset.

The model can be defined as per the DCGAN architecture. That is, using Gaussian weight initialization, batch normalization, LeakyReLU, Dropout, and a 2×2 stride for downsampling instead of pooling layers.

For example, below is the bulk of the discriminator model defined using the Keras functional API.

... # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=in_shape) # downsample to 14x14 fe = Conv2D(32, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(in_image) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(64, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # downsample to 7x7 fe = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(256, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # flatten feature maps fe = Flatten()(fe) ...

The main difference is that the model has two output layers.

The first is a single node with the sigmoid activation for predicting the real-ness of the image.

... # real/fake output out1 = Dense(1, activation='sigmoid')(fe)

The second is multiple nodes, one for each class, using the softmax activation function to predict the class label of the given image.

... # class label output out2 = Dense(n_classes, activation='softmax')(fe)

We can then construct the image with a single input and two outputs.

... # define model model = Model(in_image, [out1, out2])

The model must be trained with two loss functions, binary cross entropy for the first output layer, and categorical cross-entropy loss for the second output layer.

Rather than comparing a one hot encoding of the class labels to the second output layer, as we might do normally, we can compare the integer class labels directly. We can achieve this automatically using the sparse categorical cross-entropy loss function. This will have the identical effect of the categorical cross-entropy but avoids the step of having to manually one hot encode the target labels.

When compiling the model, we can inform Keras to use the two different loss functions for the two output layers by specifying a list of function names as strings; for example:

loss=['binary_crossentropy', 'sparse_categorical_crossentropy']

The model is fit using the Adam version of stochastic gradient descent with a small learning rate and modest momentum, as is recommended for DCGANs.

... # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=opt)

Tying this together, the define_discriminator() function will define and compile the discriminator model for the AC-GAN.

The shape of the input images and the number of classes are parameterized and set with defaults, allowing them to be easily changed for your own project in the future.

# define the standalone discriminator model def define_discriminator(in_shape=(28,28,1), n_classes=10): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=in_shape) # downsample to 14x14 fe = Conv2D(32, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(in_image) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(64, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # downsample to 7x7 fe = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(256, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # flatten feature maps fe = Flatten()(fe) # real/fake output out1 = Dense(1, activation='sigmoid')(fe) # class label output out2 = Dense(n_classes, activation='softmax')(fe) # define model model = Model(in_image, [out1, out2]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=opt) return model

We can define and summarize this model.

The complete example is listed below.

# example of defining the discriminator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Conv2D from keras.layers import LeakyReLU from keras.layers import Dropout from keras.layers import Flatten from keras.layers import BatchNormalization from keras.initializers import RandomNormal from keras.optimizers import Adam from keras.utils.vis_utils import plot_model # define the standalone discriminator model def define_discriminator(in_shape=(28,28,1), n_classes=10): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=in_shape) # downsample to 14x14 fe = Conv2D(32, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(in_image) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(64, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # downsample to 7x7 fe = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(256, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # flatten feature maps fe = Flatten()(fe) # real/fake output out1 = Dense(1, activation='sigmoid')(fe) # class label output out2 = Dense(n_classes, activation='softmax')(fe) # define model model = Model(in_image, [out1, out2]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=opt) return model # define the discriminator model model = define_discriminator() # summarize the model model.summary() # plot the model plot_model(model, to_file='discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example first prints a summary of the model.

This confirms the expected shape of the input images and the two output layers, although the linear organization does make the two separate output layers clear.

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 28, 28, 1) 0 __________________________________________________________________________________________________ conv2d_1 (Conv2D) (None, 14, 14, 32) 320 input_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 14, 14, 32) 0 conv2d_1[0][0] __________________________________________________________________________________________________ dropout_1 (Dropout) (None, 14, 14, 32) 0 leaky_re_lu_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 14, 14, 64) 18496 dropout_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 14, 14, 64) 256 conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 14, 14, 64) 0 batch_normalization_1[0][0] __________________________________________________________________________________________________ dropout_2 (Dropout) (None, 14, 14, 64) 0 leaky_re_lu_2[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D) (None, 7, 7, 128) 73856 dropout_2[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 7, 7, 128) 512 conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU) (None, 7, 7, 128) 0 batch_normalization_2[0][0] __________________________________________________________________________________________________ dropout_3 (Dropout) (None, 7, 7, 128) 0 leaky_re_lu_3[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 7, 7, 256) 295168 dropout_3[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 7, 7, 256) 1024 conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 7, 7, 256) 0 batch_normalization_3[0][0] __________________________________________________________________________________________________ dropout_4 (Dropout) (None, 7, 7, 256) 0 leaky_re_lu_4[0][0] __________________________________________________________________________________________________ flatten_1 (Flatten) (None, 12544) 0 dropout_4[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 1) 12545 flatten_1[0][0] __________________________________________________________________________________________________ dense_2 (Dense) (None, 10) 125450 flatten_1[0][0] ================================================================================================== Total params: 527,627 Trainable params: 526,731 Non-trainable params: 896 __________________________________________________________________________________________________

A plot of the model is also created, showing the linear processing of the input image and the two clear output layers.

Now that we have defined our AC-GAN discriminator model, we can develop the generator model.

### AC-GAN Generator Model

The generator model must take a random point from the latent space as input, and the class label, then output a generated grayscale image with the shape 28x28x1.

The AC-GAN paper describes the AC-GAN generator model taking a vector input that is a concatenation of the point in latent space (100 dimensions) and the one hot encoded class label (10 dimensions) that is 110 dimensions.

An alternative approach that has proven effective and is now generally recommended is to interpret the class label as an additional channel or feature map early in the generator model.

This can be achieved by using a learned embedding with an arbitrary number of dimensions (e.g. 50), the output of which can be interpreted by a fully connected layer with a linear activation resulting in one additional 7×7 feature map.

... # label input in_label = Input(shape=(1,)) # embedding for categorical input li = Embedding(n_classes, 50)(in_label) # linear multiplication n_nodes = 7 * 7 li = Dense(n_nodes, kernel_initializer=init)(li) # reshape to additional channel li = Reshape((7, 7, 1))(li)

The point in latent space can be interpreted by a fully connected layer with sufficient activations to create multiple 7×7 feature maps, in this case, 384, and provide the basis for a low-resolution version of our output image.

The 7×7 single feature map interpretation of the class label can then be channel-wise concatenated, resulting in 385 feature maps.

... # image generator input in_lat = Input(shape=(latent_dim,)) # foundation for 7x7 image n_nodes = 384 * 7 * 7 gen = Dense(n_nodes, kernel_initializer=init)(in_lat) gen = Activation('relu')(gen) gen = Reshape((7, 7, 384))(gen) # merge image gen and label input merge = Concatenate()([gen, li])

These feature maps can then go through the process of two transpose convolutional layers to upsample the 7×7 feature maps first to 14×14 pixels, and then finally to 28×28 features, quadrupling the area of the feature maps with each upscaling step.

The output of the generator is a single feature map or grayscale image with the shape 28×28 and pixel values in the range [-1, 1] given the choice of a tanh activation function. We use ReLU activation for the upscaling layers instead of LeakyReLU given the suggestion the AC-GAN paper.

# upsample to 14x14 gen = Conv2DTranspose(192, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(merge) gen = BatchNormalization()(gen) gen = Activation('relu')(gen) # upsample to 28x28 gen = Conv2DTranspose(1, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(gen) out_layer = Activation('tanh')(gen)

We can tie all of this together and into the define_generator() function defined below that will create and return the generator model for the AC-GAN.

The model is intentionally not compiled as it is not trained directly; instead, it is trained via the discriminator model.

# define the standalone generator model def define_generator(latent_dim, n_classes=10): # weight initialization init = RandomNormal(stddev=0.02) # label input in_label = Input(shape=(1,)) # embedding for categorical input li = Embedding(n_classes, 50)(in_label) # linear multiplication n_nodes = 7 * 7 li = Dense(n_nodes, kernel_initializer=init)(li) # reshape to additional channel li = Reshape((7, 7, 1))(li) # image generator input in_lat = Input(shape=(latent_dim,)) # foundation for 7x7 image n_nodes = 384 * 7 * 7 gen = Dense(n_nodes, kernel_initializer=init)(in_lat) gen = Activation('relu')(gen) gen = Reshape((7, 7, 384))(gen) # merge image gen and label input merge = Concatenate()([gen, li]) # upsample to 14x14 gen = Conv2DTranspose(192, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(merge) gen = BatchNormalization()(gen) gen = Activation('relu')(gen) # upsample to 28x28 gen = Conv2DTranspose(1, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(gen) out_layer = Activation('tanh')(gen) # define model model = Model([in_lat, in_label], out_layer) return model

We can create this model and summarize and plot its structure.

The complete example is listed below.

# example of defining the generator model from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Conv2DTranspose from keras.layers import Embedding from keras.layers import Concatenate from keras.layers import Activation from keras.layers import BatchNormalization from keras.initializers import RandomNormal from keras.utils.vis_utils import plot_model # define the standalone generator model def define_generator(latent_dim, n_classes=10): # weight initialization init = RandomNormal(stddev=0.02) # label input in_label = Input(shape=(1,)) # embedding for categorical input li = Embedding(n_classes, 50)(in_label) # linear multiplication n_nodes = 7 * 7 li = Dense(n_nodes, kernel_initializer=init)(li) # reshape to additional channel li = Reshape((7, 7, 1))(li) # image generator input in_lat = Input(shape=(latent_dim,)) # foundation for 7x7 image n_nodes = 384 * 7 * 7 gen = Dense(n_nodes, kernel_initializer=init)(in_lat) gen = Activation('relu')(gen) gen = Reshape((7, 7, 384))(gen) # merge image gen and label input merge = Concatenate()([gen, li]) # upsample to 14x14 gen = Conv2DTranspose(192, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(merge) gen = BatchNormalization()(gen) gen = Activation('relu')(gen) # upsample to 28x28 gen = Conv2DTranspose(1, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(gen) out_layer = Activation('tanh')(gen) # define model model = Model([in_lat, in_label], out_layer) return model # define the size of the latent space latent_dim = 100 # define the generator model model = define_generator(latent_dim) # summarize the model model.summary() # plot the model plot_model(model, to_file='generator_plot.png', show_shapes=True, show_layer_names=True)

Running the example first prints a summary of the layers and their output shape in the model.

We can confirm that the latent dimension input is 100 dimensions and that the class label input is a single integer. We can also confirm that the output of the embedding class label is correctly concatenated as an additional channel resulting in 385 7×7 feature maps prior to the transpose convolutional layers.

The summary also confirms the expected output shape of a single grayscale 28×28 image.

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_2 (InputLayer) (None, 100) 0 __________________________________________________________________________________________________ input_1 (InputLayer) (None, 1) 0 __________________________________________________________________________________________________ dense_2 (Dense) (None, 18816) 1900416 input_2[0][0] __________________________________________________________________________________________________ embedding_1 (Embedding) (None, 1, 50) 500 input_1[0][0] __________________________________________________________________________________________________ activation_1 (Activation) (None, 18816) 0 dense_2[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 1, 49) 2499 embedding_1[0][0] __________________________________________________________________________________________________ reshape_2 (Reshape) (None, 7, 7, 384) 0 activation_1[0][0] __________________________________________________________________________________________________ reshape_1 (Reshape) (None, 7, 7, 1) 0 dense_1[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 7, 7, 385) 0 reshape_2[0][0] reshape_1[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 14, 14, 192) 1848192 concatenate_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 14, 14, 192) 768 conv2d_transpose_1[0][0] __________________________________________________________________________________________________ activation_2 (Activation) (None, 14, 14, 192) 0 batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_transpose_2 (Conv2DTrans (None, 28, 28, 1) 4801 activation_2[0][0] __________________________________________________________________________________________________ activation_3 (Activation) (None, 28, 28, 1) 0 conv2d_transpose_2[0][0] ================================================================================================== Total params: 3,757,176 Trainable params: 3,756,792 Non-trainable params: 384 __________________________________________________________________________________________________

A plot of the network is also created summarizing the input and output shapes for each layer.

The plot confirms the two inputs to the network and the correct concatenation of the inputs.

Now that we have defined the generator model, we can show how it might be fit.

### AC-GAN Composite Model

The generator model is not updated directly; instead, it is updated via the discriminator model.

This can be achieved by creating a composite model that stacks the generator model on top of the discriminator model.

The input to this composite model is the input to the generator model, namely a random point from the latent space and a class label. The generator model is connected directly to the discriminator model, which takes the generated image directly as input. Finally, the discriminator model predicts both the realness of the generated image and the class label. As such, the composite model is optimized using two loss functions, one for each output of the discriminator model.

The discriminator model is updated in a standalone manner using real and fake examples, and we will review how to do this in the next section. Therefore, we do not want to update the discriminator model when updating (training) the composite model; we only want to use this composite model to update the weights of the generator model.

This can be achieved by setting the layers of the discriminator as not trainable prior to compiling the composite model. This only has an effect on the layer weights when viewed or used by the composite model and prevents them from being updated when the composite model is updated.

The *define_gan()* function below implements this, taking the already defined generator and discriminator models as input and defining a new composite model that can be used to update the generator model only.

# define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model): # make weights in the discriminator not trainable d_model.trainable = False # connect the outputs of the generator to the inputs of the discriminator gan_output = d_model(g_model.output) # define gan model as taking noise and label and outputting real/fake and label outputs model = Model(g_model.input, gan_output) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=opt) return model

Now that we have defined the models used in the AC-GAN, we can fit them on the Fashion-MNIST dataset.

## How to Develop an AC-GAN for Fashion-MNIST

The first step is to load and prepare the Fashion MNIST dataset.

We only require the images in the training dataset. The images are black and white, therefore we must add an additional channel dimension to transform them to be three dimensional, as expected by the convolutional layers of our models. Finally, the pixel values must be scaled to the range [-1,1] to match the output of the generator model.

The *load_real_samples()* function below implements this, returning the loaded and scaled Fashion MNIST training dataset ready for modeling.

# load images def load_real_samples(): # load dataset (trainX, trainy), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 print(X.shape, trainy.shape) return [X, trainy]

We will require one batch (or a half batch) of real images from the dataset each update to the GAN model. A simple way to achieve this is to select a random sample of images from the dataset each time.

The *generate_real_samples()* function below implements this, taking the prepared dataset as an argument, selecting and returning a random sample of Fashion MNIST images and clothing class labels.

The “*dataset*” argument provided to the function is a list comprised of the images and class labels as returned from the *load_real_samples()* function. The function also returns their corresponding class label for the discriminator, specifically class=1 indicating that they are real images.

# select real samples def generate_real_samples(dataset, n_samples): # split into images and labels images, labels = dataset # choose random instances ix = randint(0, images.shape[0], n_samples) # select images and labels X, labels = images[ix], labels[ix] # generate class labels y = ones((n_samples, 1)) return [X, labels], y

Next, we need inputs for the generator model.

These are random points from the latent space, specifically Gaussian distributed random variables.

The *generate_latent_points()* function implements this, taking the size of the latent space as an argument and the number of points required, and returning them as a batch of input samples for the generator model. The function also returns randomly selected integers in [0,9] inclusively for the 10 class labels in the Fashion-MNIST dataset.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples, n_classes=10): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_input = x_input.reshape(n_samples, latent_dim) # generate labels labels = randint(0, n_classes, n_samples) return [z_input, labels]

Next, we need to use the points in the latent space and clothing class labels as input to the generator in order to generate new images.

The *generate_fake_samples()* function below implements this, taking the generator model and size of the latent space as arguments, then generating points in the latent space and using them as input to the generator model.

The function returns the generated images, their corresponding clothing class label, and their discriminator class label, specifically class=0 to indicate they are fake or generated.

# use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space z_input, labels_input = generate_latent_points(latent_dim, n_samples) # predict outputs images = generator.predict([z_input, labels_input]) # create class labels y = zeros((n_samples, 1)) return [images, labels_input], y

There are no reliable ways to determine when to stop training a GAN; instead, images can be subjectively inspected in order to choose a final model.

Therefore, we can periodically generate a sample of images using the generator model and save the generator model to file for later use. The *summarize_performance()* function below implements this, generating 100 images, plotting them, and saving the plot and the generator to file with a filename that includes the training “*step*” number.

# generate samples and save as a plot and save the model def summarize_performance(step, g_model, latent_dim, n_samples=100): # prepare fake examples [X, _], _ = generate_fake_samples(g_model, latent_dim, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(100): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%04d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%04d.h5' % (step+1) g_model.save(filename2) print('>Saved: %s and %s' % (filename1, filename2))

We are now ready to fit the GAN models.

The model is fit for 100 training epochs, which is arbitrary, as the model begins generating plausible items of clothing after perhaps 20 epochs. A batch size of 64 samples is used, and each training epoch involves 60,000/64, or about 937, batches of real and fake samples and updates to the model. The *summarize_performance()* function is called every 10 epochs, or every (937 * 10) 9,370 training steps.

For a given training step, first the discriminator model is updated for a half batch of real samples, then a half batch of fake samples, together forming one batch of weight updates. The generator is then updated via the combined GAN model. Importantly, the class label is set to 1, or real, for the fake samples. This has the effect of updating the generator toward getting better at generating real samples on the next batch.

Note, the discriminator and composite model return three loss values from the call to the *train_on_batch()* function. The first value is the sum of the loss values and can be ignored, whereas the second value is the loss for the real/fake output layer and the third value is the loss for the clothing label classification.

The *train()* function below implements this, taking the defined models, dataset, and size of the latent dimension as arguments and parameterizing the number of epochs and batch size with default arguments. The generator model is saved at the end of training.

# train the generator and discriminator def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=100, n_batch=64): # calculate the number of batches per training epoch bat_per_epo = int(dataset[0].shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # manually enumerate epochs for i in range(n_steps): # get randomly selected 'real' samples [X_real, labels_real], y_real = generate_real_samples(dataset, half_batch) # update discriminator model weights _,d_r1,d_r2 = d_model.train_on_batch(X_real, [y_real, labels_real]) # generate 'fake' examples [X_fake, labels_fake], y_fake = generate_fake_samples(g_model, latent_dim, half_batch) # update discriminator model weights _,d_f,d_f2 = d_model.train_on_batch(X_fake, [y_fake, labels_fake]) # prepare points in latent space as input for the generator [z_input, z_labels] = generate_latent_points(latent_dim, n_batch) # create inverted labels for the fake samples y_gan = ones((n_batch, 1)) # update the generator via the discriminator's error _,g_1,g_2 = gan_model.train_on_batch([z_input, z_labels], [y_gan, z_labels]) # summarize loss on this batch print('>%d, dr[%.3f,%.3f], df[%.3f,%.3f], g[%.3f,%.3f]' % (i+1, d_r1,d_r2, d_f,d_f2, g_1,g_2)) # evaluate the model performance every 'epoch' if (i+1) % (bat_per_epo * 10) == 0: summarize_performance(i, g_model, latent_dim)

We can then define the size of the latent space, define all three models, and train them on the loaded fashion MNIST dataset.

# size of the latent space latent_dim = 100 # create the discriminator discriminator = define_discriminator() # create the generator generator = define_generator(latent_dim) # create the gan gan_model = define_gan(generator, discriminator) # load image data dataset = load_real_samples() # train model train(generator, discriminator, gan_model, dataset, latent_dim)

Tying all of this together, the complete example is listed below.

# example of fitting an auxiliary classifier gan (ac-gan) on fashion mnsit from numpy import zeros from numpy import ones from numpy import expand_dims from numpy.random import randn from numpy.random import randint from keras.datasets.fashion_mnist import load_data from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import Conv2DTranspose from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Dropout from keras.layers import Embedding from keras.layers import Activation from keras.layers import Concatenate from keras.initializers import RandomNormal from matplotlib import pyplot # define the standalone discriminator model def define_discriminator(in_shape=(28,28,1), n_classes=10): # weight initialization init = RandomNormal(stddev=0.02) # image input in_image = Input(shape=in_shape) # downsample to 14x14 fe = Conv2D(32, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(in_image) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(64, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # downsample to 7x7 fe = Conv2D(128, (3,3), strides=(2,2), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # normal fe = Conv2D(256, (3,3), padding='same', kernel_initializer=init)(fe) fe = BatchNormalization()(fe) fe = LeakyReLU(alpha=0.2)(fe) fe = Dropout(0.5)(fe) # flatten feature maps fe = Flatten()(fe) # real/fake output out1 = Dense(1, activation='sigmoid')(fe) # class label output out2 = Dense(n_classes, activation='softmax')(fe) # define model model = Model(in_image, [out1, out2]) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=opt) return model # define the standalone generator model def define_generator(latent_dim, n_classes=10): # weight initialization init = RandomNormal(stddev=0.02) # label input in_label = Input(shape=(1,)) # embedding for categorical input li = Embedding(n_classes, 50)(in_label) # linear multiplication n_nodes = 7 * 7 li = Dense(n_nodes, kernel_initializer=init)(li) # reshape to additional channel li = Reshape((7, 7, 1))(li) # image generator input in_lat = Input(shape=(latent_dim,)) # foundation for 7x7 image n_nodes = 384 * 7 * 7 gen = Dense(n_nodes, kernel_initializer=init)(in_lat) gen = Activation('relu')(gen) gen = Reshape((7, 7, 384))(gen) # merge image gen and label input merge = Concatenate()([gen, li]) # upsample to 14x14 gen = Conv2DTranspose(192, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(merge) gen = BatchNormalization()(gen) gen = Activation('relu')(gen) # upsample to 28x28 gen = Conv2DTranspose(1, (5,5), strides=(2,2), padding='same', kernel_initializer=init)(gen) out_layer = Activation('tanh')(gen) # define model model = Model([in_lat, in_label], out_layer) return model # define the combined generator and discriminator model, for updating the generator def define_gan(g_model, d_model): # make weights in the discriminator not trainable d_model.trainable = False # connect the outputs of the generator to the inputs of the discriminator gan_output = d_model(g_model.output) # define gan model as taking noise and label and outputting real/fake and label outputs model = Model(g_model.input, gan_output) # compile model opt = Adam(lr=0.0002, beta_1=0.5) model.compile(loss=['binary_crossentropy', 'sparse_categorical_crossentropy'], optimizer=opt) return model # load images def load_real_samples(): # load dataset (trainX, trainy), (_, _) = load_data() # expand to 3d, e.g. add channels X = expand_dims(trainX, axis=-1) # convert from ints to floats X = X.astype('float32') # scale from [0,255] to [-1,1] X = (X - 127.5) / 127.5 print(X.shape, trainy.shape) return [X, trainy] # select real samples def generate_real_samples(dataset, n_samples): # split into images and labels images, labels = dataset # choose random instances ix = randint(0, images.shape[0], n_samples) # select images and labels X, labels = images[ix], labels[ix] # generate class labels y = ones((n_samples, 1)) return [X, labels], y # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples, n_classes=10): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_input = x_input.reshape(n_samples, latent_dim) # generate labels labels = randint(0, n_classes, n_samples) return [z_input, labels] # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): # generate points in latent space z_input, labels_input = generate_latent_points(latent_dim, n_samples) # predict outputs images = generator.predict([z_input, labels_input]) # create class labels y = zeros((n_samples, 1)) return [images, labels_input], y # generate samples and save as a plot and save the model def summarize_performance(step, g_model, latent_dim, n_samples=100): # prepare fake examples [X, _], _ = generate_fake_samples(g_model, latent_dim, n_samples) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot images for i in range(100): # define subplot pyplot.subplot(10, 10, 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(X[i, :, :, 0], cmap='gray_r') # save plot to file filename1 = 'generated_plot_%04d.png' % (step+1) pyplot.savefig(filename1) pyplot.close() # save the generator model filename2 = 'model_%04d.h5' % (step+1) g_model.save(filename2) print('>Saved: %s and %s' % (filename1, filename2)) # train the generator and discriminator def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=100, n_batch=64): # calculate the number of batches per training epoch bat_per_epo = int(dataset[0].shape[0] / n_batch) # calculate the number of training iterations n_steps = bat_per_epo * n_epochs # calculate the size of half a batch of samples half_batch = int(n_batch / 2) # manually enumerate epochs for i in range(n_steps): # get randomly selected 'real' samples [X_real, labels_real], y_real = generate_real_samples(dataset, half_batch) # update discriminator model weights _,d_r1,d_r2 = d_model.train_on_batch(X_real, [y_real, labels_real]) # generate 'fake' examples [X_fake, labels_fake], y_fake = generate_fake_samples(g_model, latent_dim, half_batch) # update discriminator model weights _,d_f,d_f2 = d_model.train_on_batch(X_fake, [y_fake, labels_fake]) # prepare points in latent space as input for the generator [z_input, z_labels] = generate_latent_points(latent_dim, n_batch) # create inverted labels for the fake samples y_gan = ones((n_batch, 1)) # update the generator via the discriminator's error _,g_1,g_2 = gan_model.train_on_batch([z_input, z_labels], [y_gan, z_labels]) # summarize loss on this batch print('>%d, dr[%.3f,%.3f], df[%.3f,%.3f], g[%.3f,%.3f]' % (i+1, d_r1,d_r2, d_f,d_f2, g_1,g_2)) # evaluate the model performance every 'epoch' if (i+1) % (bat_per_epo * 10) == 0: summarize_performance(i, g_model, latent_dim) # size of the latent space latent_dim = 100 # create the discriminator discriminator = define_discriminator() # create the generator generator = define_generator(latent_dim) # create the gan gan_model = define_gan(generator, discriminator) # load image data dataset = load_real_samples() # train model train(generator, discriminator, gan_model, dataset, latent_dim)

Running the example may take some time, and GPU hardware is recommended, but not required.

**Note**: Given the stochastic nature of the training algorithm, your results may vary. Try running the example a few times.

The loss is reported each training iteration, including the real/fake and class loss for the discriminator on real examples (dr), the discriminator on fake examples (df), and the generator updated via the composite model when generating images (g).

>1, dr[0.934,2.967], df[1.310,3.006], g[0.878,3.368] >2, dr[0.711,2.836], df[0.939,3.262], g[0.947,2.751] >3, dr[0.649,2.980], df[1.001,3.147], g[0.844,3.226] >4, dr[0.732,3.435], df[0.823,3.715], g[1.048,3.292] >5, dr[0.860,3.076], df[0.591,2.799], g[1.123,3.313] ...

A total of 10 sample images are generated and 10 models saved over the run.

Plots of generated clothing after 10 iterations already look plausible.

The images remain reliable throughout the training process.

## How to Generate Items of Clothing With the AC-GAN

In this section, we can load a saved model and use it to generate new items of clothing that plausibly could have come from the Fashion-MNIST dataset.

The AC-GAN technically does not conditionally generate images based on the class label, at least not in the same way as the conditional GAN.

AC-GANs learn a representation for z that is independent of class label.

— Conditional Image Synthesis With Auxiliary Classifier GANs, 2016.

Nevertheless, if used in this way, the generated images mostly match the class label.

The example below loads the model from the end of the run (any saved model would do), and generates 100 examples of class 7 (sneaker).

# example of loading the generator model and generating images from math import sqrt from numpy import asarray from numpy.random import randn from keras.models import load_model from matplotlib import pyplot # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples, n_class): # generate points in the latent space x_input = randn(latent_dim * n_samples) # reshape into a batch of inputs for the network z_input = x_input.reshape(n_samples, latent_dim) # generate labels labels = asarray([n_class for _ in range(n_samples)]) return [z_input, labels] # create and save a plot of generated images def save_plot(examples, n_examples): # plot images for i in range(n_examples): # define subplot pyplot.subplot(sqrt(n_examples), sqrt(n_examples), 1 + i) # turn off axis pyplot.axis('off') # plot raw pixel data pyplot.imshow(examples[i, :, :, 0], cmap='gray_r') pyplot.show() # load model model = load_model('model_93700.h5') latent_dim = 100 n_examples = 100 # must be a square n_class = 7 # sneaker # generate images latent_points, labels = generate_latent_points(latent_dim, n_examples, n_class) # generate images X = model.predict([latent_points, labels]) # scale from [-1,1] to [0,1] X = (X + 1) / 2.0 # plot the result save_plot(X, n_examples)

Running the example, in this case, generates 100 very plausible photos of sneakers.

It may be fun to experiment with other class values.

For example, below are 100 generated coats (n_class = 4). Most of the images are coats, although there are a few pants in there, showing that the latent space is partially, but not completely, class-conditional.

## Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

**Generate Images**. Generate images for each clothing class and compare results across different saved models (e.g. epoch 10, 20, etc.).**Alternate Configuration**. Update the configuration of the generator, discriminator, or both models to have more or less capacity and compare results.**CIFAR-10 Dataset**. Update the example to train on the CIFAR-10 dataset and use model configuration described in the appendix of the paper.

If you explore any of these extensions, I’d love to know.

Post your findings in the comments below.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Papers

- Conditional Image Synthesis With Auxiliary Classifier GANs, 2016.
- Conditional Image Synthesis With Auxiliary Classifier GANs, Reviewer Comments.
- Conditional Image Synthesis with Auxiliary Classifier GANs, NIPS 2016, YouTube.

### API

- Keras Datasets API.
- Keras Sequential Model API
- Keras Convolutional Layers API
- How can I “freeze” Keras layers?
- MatplotLib API
- NumPy Random sampling (numpy.random) API
- NumPy Array manipulation routines

### Articles

- How to Train a GAN? Tips and tricks to make GANs work
- Fashion-MNIST Project, GitHub.
- AC-GAN, Keras GAN Project.
- AC-GAN, Keras Example.

## Summary

In this tutorial, you discovered how to develop an auxiliary classifier generative adversarial network for generating photographs of clothing.

Specifically, you learned:

- The auxiliary classifier GAN is a type of conditional GAN that requires that the discriminator predict the class label of a given image.
- How to develop generator, discriminator, and composite models for the AC-GAN.
- How to train, evaluate, and use an AC-GAN to generate photographs of clothing from the Fashion-MNIST dataset.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Develop an Auxiliary Classifier GAN (AC-GAN) From Scratch with Keras appeared first on Machine Learning Mastery.