How to Train a Progressive Growing GAN in Keras for Synthesizing Faces

Generative adversarial networks, or GANs, are effective at generating high-quality synthetic images.

A limitation of GANs is that the are only capable of generating relatively small images, such as 64×64 pixels.

The Progressive Growing GAN is an extension to the GAN training procedure that involves training a GAN to generate very small images, such as 4×4, and incrementally increasing the size of the generated images to 8×8, 16×16, until the desired output size is met. This has allowed the progressive GAN to generate photorealistic synthetic faces with 1024×1024 pixel resolution.

The key innovation of the progressive growing GAN is the two-phase training procedure that involves the fading-in of new blocks to support higher-resolution images followed by fine-tuning.

In this tutorial, you will discover how to implement and train a progressive growing generative adversarial network for generating celebrity faces.

After completing this tutorial, you will know:

  • How to prepare the celebrity faces dataset for training a progressive growing GAN model.
  • How to define and train the progressive growing GAN on the celebrity faces dataset.
  • How to load saved generator models and use them for generating ad hoc synthetic celebrity faces.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Train a Progressive Growing GAN in Keras for Synthesizing Faces

How to Train a Progressive Growing GAN in Keras for Synthesizing Faces.
Photo by Alessandro Caproni, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Progressive Growing GAN
  2. How to Prepare the Celebrity Faces Dataset
  3. How to Develop Progressive Growing GAN Models
  4. How to Train Progressive Growing GAN Models
  5. How to Synthesize Images With a Progressive Growing GAN Model

What Is the Progressive Growing GAN

GANs are effective at generating crisp synthetic images, although are typically limited in the size of the images that can be generated.

The Progressive Growing GAN is an extension to the GAN that allows the training generator models to be capable of generating large high-quality images, such as photorealistic faces with the size 1024×1024 pixels. It was described in the 2017 paper by Tero Karras, et al. from Nvidia titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation.”

The key innovation of the Progressive Growing GAN is the incremental increase in the size of images output by the generator, starting with a 4×4 pixel image and doubling to 8×8, 16×16, and so on until the desired output resolution.

This is achieved by a training procedure that involves periods of fine-tuning the model with a given output resolution, and periods of slowly phasing in a new model with a larger resolution. All layers remain trainable during the training process, including existing layers when new layers are added.

Progressive Growing GAN involves using a generator and discriminator model with the same general structure and starting with very small images. During training, new blocks of convolutional layers are systematically added to both the generator model and the discriminator models.

The incremental addition of the layers allows the models to effectively learn coarse-level detail and later learn ever-finer detail, both on the generator and discriminator sides.

This incremental nature allows the training to first discover large-scale structure of the image distribution and then shift attention to increasingly finer-scale detail, instead of having to learn all scales simultaneously.

The next step is to select a dataset to use for developing a Progressive Growing GAN.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Prepare the Celebrity Faces Dataset

In this tutorial, we will use the Large-scale Celebrity Faces Attributes Dataset, referred to as CelebA.

This dataset was developed and published by Ziwei Liu, et al. for their 2015 paper tilted “From Facial Parts Responses to Face Detection: A Deep Learning Approach.”

The dataset provides about 200,000 photographs of celebrity faces along with annotations for what appears in given photos, such as glasses, face shape, hats, hair type, etc. As part of the dataset, the authors provide a version of each photo centered on the face and cropped to the portrait with varying sizes around 150 pixels wide and 200 pixels tall. We will use this as the basis for developing our GAN model.

The dataset can be easily downloaded from the Kaggle webpage. Note: this may require an account with Kaggle.

Specifically, download the file “img_align_celeba.zip“, which is about 1.3 gigabytes. To do this, click on the filename on the Kaggle website and then click the download icon.

The download might take a while depending on the speed of your internet connection.

After downloading, unzip the archive.

This will create a new directory named “img_align_celeba” that contains all of the images with filenames like 202599.jpg and 202598.jpg.

When working with a GAN, it is easier to model a dataset if all of the images are small and square in shape.

Further, as we are only interested in the face in each photo and not the background, we can perform face detection and extract only the face before resizing the result to a fixed size.

There are many ways to perform face detection. In this case, we will use a pre-trained Multi-Task Cascaded Convolutional Neural Network, or MTCNN. This is a state-of-the-art deep learning model for face detection, described in the 2016 paper titled “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.”

We will use the implementation provided by Iván de Paz Centeno in the ipazc/mtcnn project. This can also be installed via pip as follows:

sudo pip install mtcnn

We can confirm that the library was installed correctly by importing the library and printing the version; for example:

# confirm mtcnn was installed correctly import mtcnn # print version print(mtcnn.__version__)

Running the example prints the current version of the library.

0.0.8

The MTCNN model is very easy to use.

First, an instance of the MTCNN model is created, then the detect_faces() function can be called passing in the pixel data for one image.

The result a list of detected faces, with a bounding box defined in pixel offset values.

... # prepare model model = MTCNN() # detect face in the image faces = model.detect_faces(pixels) # extract details of the face x1, y1, width, height = faces[0]['box']

Although the progressive growing GAN supports the synthesis of large images, such as 1024×1024, this requires enormous resources, such as a single top of the line GPU training the model for a month.

Instead, we will reduce the size of the generated images to 128×128 which will, in turn, allow us to train a reasonable model on a GPU in a few hours and still discover how the progressive growing model can be implemented, trained, and used.

As such, we can develop a function to load a file and extract the face from the photo, then and resize the extracted face pixels to a predefined size. In this case, we will use the square shape of 128×128 pixels.

The load_image() function below will load a given photo file name as a NumPy array of pixels.

# load an image as an rgb numpy array def load_image(filename): 	# load image from file 	image = Image.open(filename) 	# convert to RGB, if needed 	image = image.convert('RGB') 	# convert to array 	pixels = asarray(image) 	return pixels

The extract_face() function below takes the MTCNN model and pixel values for a single photograph as arguments and returns a 128x128x3 array of pixel values with just the face, or None if no face was detected (which can happen rarely).

# extract the face from a loaded image and resize def extract_face(model, pixels, required_size=(128, 128)): 	# detect face in the image 	faces = model.detect_faces(pixels) 	# skip cases where we could not detect a face 	if len(faces) == 0: 		return None 	# extract details of the face 	x1, y1, width, height = faces[0]['box'] 	# force detected pixel values to be positive (bug fix) 	x1, y1 = abs(x1), abs(y1) 	# convert into coordinates 	x2, y2 = x1 + width, y1 + height 	# retrieve face pixels 	face_pixels = pixels[y1:y2, x1:x2] 	# resize pixels to the model size 	image = Image.fromarray(face_pixels) 	image = image.resize(required_size) 	face_array = asarray(image) 	return face_array

The load_faces() function below enumerates all photograph files in a directory and extracts and resizes the face from each and returns a NumPy array of faces.

We limit the total number of faces loaded via the n_faces argument, as we don’t need them all.

# load images and extract faces for all images in a directory def load_faces(directory, n_faces): 	# prepare model 	model = MTCNN() 	faces = list() 	# enumerate files 	for filename in listdir(directory): 		# load the image 		pixels = load_image(directory + filename) 		# get face 		face = extract_face(model, pixels) 		if face is None: 			continue 		# store 		faces.append(face) 		print(len(faces), face.shape) 		# stop once we have enough 		if len(faces) >= n_faces: 			break 	return asarray(faces)

Tying this together, the complete example of preparing a dataset of celebrity faces for training a GAN model is listed below.

In this case, we increase the total number of loaded faces to 50,000 to provide a good training dataset for our GAN model.

# example of extracting and resizing faces into a new dataset from os import listdir from numpy import asarray from numpy import savez_compressed from PIL import Image from mtcnn.mtcnn import MTCNN from matplotlib import pyplot  # load an image as an rgb numpy array def load_image(filename): 	# load image from file 	image = Image.open(filename) 	# convert to RGB, if needed 	image = image.convert('RGB') 	# convert to array 	pixels = asarray(image) 	return pixels  # extract the face from a loaded image and resize def extract_face(model, pixels, required_size=(128, 128)): 	# detect face in the image 	faces = model.detect_faces(pixels) 	# skip cases where we could not detect a face 	if len(faces) == 0: 		return None 	# extract details of the face 	x1, y1, width, height = faces[0]['box'] 	# force detected pixel values to be positive (bug fix) 	x1, y1 = abs(x1), abs(y1) 	# convert into coordinates 	x2, y2 = x1 + width, y1 + height 	# retrieve face pixels 	face_pixels = pixels[y1:y2, x1:x2] 	# resize pixels to the model size 	image = Image.fromarray(face_pixels) 	image = image.resize(required_size) 	face_array = asarray(image) 	return face_array  # load images and extract faces for all images in a directory def load_faces(directory, n_faces): 	# prepare model 	model = MTCNN() 	faces = list() 	# enumerate files 	for filename in listdir(directory): 		# load the image 		pixels = load_image(directory + filename) 		# get face 		face = extract_face(model, pixels) 		if face is None: 			continue 		# store 		faces.append(face) 		print(len(faces), face.shape) 		# stop once we have enough 		if len(faces) >= n_faces: 			break 	return asarray(faces)  # directory that contains all images directory = 'img_align_celeba/' # load and extract all faces all_faces = load_faces(directory, 50000) print('Loaded: ', all_faces.shape) # save in compressed format savez_compressed('img_align_celeba_128.npz', all_faces)

Running the example may take a few minutes given the larger number of faces to be loaded.

At the end of the run, the array of extracted and resized faces is saved as a compressed NumPy array with the filename ‘img_align_celeba_128.npz‘.

The prepared dataset can then be loaded any time, as follows.

# load the prepared dataset from numpy import load # load the face dataset data = load('img_align_celeba_128.npz') faces = data['arr_0'] print('Loaded: ', faces.shape)

Loading the dataset summarizes the shape of the array, showing 50K images with the size of 128×128 pixels and three color channels.

Loaded: (50000, 128, 128, 3)

We can elaborate on this example and plot the first 100 faces in the dataset as a 10×10 grid. The complete example is listed below.

# load the prepared dataset from numpy import load from matplotlib import pyplot  # plot a list of loaded faces def plot_faces(faces, n): 	for i in range(n * n): 		# define subplot 		pyplot.subplot(n, n, 1 + i) 		# turn off axis 		pyplot.axis('off') 		# plot raw pixel data 		pyplot.imshow(faces[i].astype('uint8')) 	pyplot.show()  # load the face dataset data = load('img_align_celeba_128.npz') faces = data['arr_0'] print('Loaded: ', faces.shape) plot_faces(faces, 10)

Running the example loads the dataset and creates a plot of the first 100 images.

We can see that each image only contains the face and all faces have the same square shape. Our goal is to generate new faces with the same general properties.

Plot of 100 Celebrity Faces in a 10x10 Grid

Plot of 100 Celebrity Faces in a 10×10 Grid

We are now ready to develop a GAN model to generate faces using this dataset.

How to Develop Progressive Growing GAN Models

There are many ways to implement the progressive growing GAN models.

In this tutorial, we will develop and implement each phase of growth as a separate Keras model and each model will share the same layers and weights.

This approach allows for the convenient training of each model, just like a normal Keras model, although it requires a slightly complicated model construction process to ensure that the layers are reused correctly.

First, we will define some custom layers required in the definition of the generator and discriminator models, then proceed to define functions to create and grow the discriminator and generator models themselves.

Progressive Growing Custom Layers

There are three custom layers required to implement the progressive growing generative adversarial network.

They are the layers:

  • WeightedSum: Used to control the weighted sum of the old and new layers during a growth phase.
  • MinibatchStdev: Used to summarize statistics for a batch of images in the discriminator.
  • PixelNormalization: Used to normalize activation maps in the generator model.

Additionally, a weight constraint is used in the paper referred to as “equalized learning rate“. This too would need to be implemented as a custom layer. In the interest of brevity, we won’t use equalized learning rate in this tutorial and instead we use a simple max norm weight constraint.

WeightedSum Layer

The WeightedSum layer is a merge layer that combines the activations from two input layers, such as two input paths in a discriminator or two output paths in a generator model. It uses a variable called alpha that controls how much to weight the first and second inputs.

It is used during the growth phase of training when the model is in transition from one image size to a new image size with double the width and height (quadruple the area), such as from 4×4 to 8×8 pixels.

During the growth phase, the alpha parameter is linearly scaled from 0.0 at the beginning to 1.0 at the end, allowing the output of the layer to transition from giving full weight to the old layers to giving full weight to the new layers (second input).

  • weighted sum = ((1.0 – alpha) * input1) + (alpha * input2)

The WeightedSum class is defined below as an extension to the Add merge layer.

# weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output

MinibatchStdev

The mini-batch standard deviation layer, or MinibatchStdev, is only used in the output block of the discriminator layer.

The objective of the layer is to provide a statistical summary of the batch of activations. The discriminator can then learn to better detect batches of fake samples from batches of real samples. This, in turn, encourages the generator that is trained via the discriminator to create batches of samples with realistic batch statistics.

It is implemented as calculating the standard deviation for each pixel value in the activation maps across the batch, calculating the average of this value, and then creating a new activation map (one channel) that is appended to the list of activation maps provided as input.

The MinibatchStdev layer is defined below.

# mini-batch standard deviation layer class MinibatchStdev(Layer): 	# initialize the layer 	def __init__(self, **kwargs): 		super(MinibatchStdev, self).__init__(**kwargs)  	# perform the operation 	def call(self, inputs): 		# calculate the mean value for each pixel across channels 		mean = backend.mean(inputs, axis=0, keepdims=True) 		# calculate the squared differences between pixel values and mean 		squ_diffs = backend.square(inputs - mean) 		# calculate the average of the squared differences (variance) 		mean_sq_diff = backend.mean(squ_diffs, axis=0, keepdims=True) 		# add a small value to avoid a blow-up when we calculate stdev 		mean_sq_diff += 1e-8 		# square root of the variance (stdev) 		stdev = backend.sqrt(mean_sq_diff) 		# calculate the mean standard deviation across each pixel coord 		mean_pix = backend.mean(stdev, keepdims=True) 		# scale this up to be the size of one input feature map for each sample 		shape = backend.shape(inputs) 		output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1)) 		# concatenate with the output 		combined = backend.concatenate([inputs, output], axis=-1) 		return combined  	# define the output shape of the layer 	def compute_output_shape(self, input_shape): 		# create a copy of the input shape as a list 		input_shape = list(input_shape) 		# add one to the channel dimension (assume channels-last) 		input_shape[-1] += 1 		# convert list to a tuple 		return tuple(input_shape)

PixelNormalization

The generator and discriminator models don’t use batch normalization like other GAN models; instead, each pixel in the activation maps is normalized to unit length.

This is a variation of local response normalization and is referred to in the paper as pixelwise feature vector normalization. Also, unlike other GAN models, normalization is only used in the generator model, not the discriminator.

This is a type of activity regularization and could be implemented as an activity constraint, although it is easily implemented as a new layer that scales the activations of the prior layer.

The PixelNormalization class below implements this and can be used after each Convolution layer in the generator, but before any activation function.

# pixel-wise feature vector normalization layer class PixelNormalization(Layer): 	# initialize the layer 	def __init__(self, **kwargs): 		super(PixelNormalization, self).__init__(**kwargs)  	# perform the operation 	def call(self, inputs): 		# calculate square pixel values 		values = inputs**2.0 		# calculate the mean pixel values 		mean_values = backend.mean(values, axis=-1, keepdims=True) 		# ensure the mean is not zero 		mean_values += 1.0e-8 		# calculate the sqrt of the mean squared value (L2 norm) 		l2 = backend.sqrt(mean_values) 		# normalize values by the l2 norm 		normalized = inputs / l2 		return normalized  	# define the output shape of the layer 	def compute_output_shape(self, input_shape): 		return input_shape

We now have all of the custom layers required and can define our models.

Progressive Growing Discriminator Model

The discriminator model is defined as a deep convolutional neural network that expects a 4×4 color image as input and predicts whether it is real or fake.

The first hidden layer is a 1×1 convolutional layer. The output block involves a MinibatchStdev, 3×3, and 4×4 convolutional layers, and a fully connected layer that outputs a prediction. Leaky ReLU activation functions are used after all layers and the output layers use a linear activation function.

This model is trained for normal interval then the model undergoes a growth phase to 8×8. This involves adding a block of two 3×3 convolutional layers and an average pooling downsample layer. The input image passes through the new block with a new 1×1 convolutional hidden layer. The input image is also passed through a downsample layer and through the old 1×1 convolutional hidden layer. The output of the old 1×1 convolution layer and the new block are then combined via a WeightedSum layer.

After an interval of training transitioning the WeightedSum’s alpha parameter from 0.0 (all old) to 1.0 (all new), another training phase is run to tune the new model with the old layer and pathway removed.

This process repeats until the desired image size is met, in our case, 128×128 pixel images.

We can achieve this with two functions: the define_discriminator() function that defines the base model that accepts 4×4 images and the add_discriminator_block() function that takes a model and creates a growth version of the model with two pathways and the WeightedSum and a second version of the model with the same layers/weights but without the old 1×1 layer and WeightedSum layers. The define_discriminator() function can then call the add_discriminator_block() function as many times as is needed to create the models up to the desired level of growth.

All layers are initialized with small Gaussian random numbers with a standard deviation of 0.02, which is common for GAN models. A maxnorm weight constraint is used with a value of 1.0, instead of the more elaborate ‘equalized learning rate‘ weight constraint used in the paper.

The paper defines a number of filters that increases with the depth of the model from 16 to 32, 64, all the way up to 512. This requires projection of the number of feature maps during the growth phase so that the weighted sum can be calculated correctly. To avoid this complication, we fix the number of filters to be the same in all layers.

Each model is compiled and will be fit. In this case, we will use Wasserstein loss (or WGAN loss) and the Adam version of stochastic gradient descent configured as is specified in the paper. The authors of the paper recommend exploring using both WGAN-GP loss and least squares loss and found that the former performed slightly better. Nevertheless, we will use Wasserstein loss as it greatly simplifies the implementation.

First, we must define the loss function as the average predicted value multiplied by the target value. The target value will be 1 for real images and -1 for fake images. This means that weight updates will seek to increase the divide between real and fake images.

# calculate wasserstein loss def wasserstein_loss(y_true, y_pred): 	return backend.mean(y_true * y_pred)

The functions for defining and creating the growth versions of the discriminator models are listed below.

We make careful use of the functional API and knowledge of the model structure to create the two models for each growth phase. The growth phase also always doubles the expected input shape.

# add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = MinibatchStdev()(d) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

The define_discriminator() function is called by specifying the number of blocks to create.

We will create 6 blocks, which will create 6 pairs of models that expect the input image sizes of 4×4, 8×8, 16×16, 32×32, 64×64, 128×128.

The function returns a list where each element in the list contains two models. The first model is the ‘normal model‘ or straight through model, and the second is the version of the model that includes the old 1×1 and new block with the weighted sum, used for the transition or growth phase of training.

Progressive Growing Generator Model

The generator model takes a random point from the latent space as input and generates a synthetic image.

The generator models are defined in the same way as the discriminator models.

Specifically, a base model for generating 4×4 images is defined and growth versions of the model are created for the large image output size.

The main difference is that during the growth phase, the output of the model is the output of the WeightedSum layer. The growth phase version of the model involves first adding a nearest neighbor upsampling layer; this is then connected to the new block with the new output layer and to the old old output layer. The old and new output layers are then combined via a WeightedSum output layer.

The base model has an input block defined with a fully connected layer with a sufficient number of activations to create a given number of 4×4 feature maps. This is followed by 4×4 and 3×3 convolution layers and a 1×1 output layer that generates color images. New blocks are added with an upsample layer and two 3×3 convolutional layers.

The LeakyReLU activation function is used and the PixelNormalization layer is used after each convolutional layer. A linear activation function is used in the output layer, instead of the more common tanh function, yet real images are still scaled to the range [-1,1], which is common for most GAN models.

The paper defines the number of feature maps decreasing with the depth of the model from 512 to 16. As with the discriminator, the difference in the number of feature maps across blocks introduces a challenge for the WeightedSum, so for simplicity, we fix all layers to have the same number of filters.

Also like the discriminator model, weights are initialized with Gaussian random numbers with a standard deviation of 0.02 and the maxnorm weight constraint is used with a value of 1.0, instead of the equalized learning rate weight constraint used in the paper.

The functions for defining and growing the generator models are defined below.

# add a generator block def add_generator_block(old_model): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(upsampling) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer=init, kernel_constraint=const)(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

Calling the define_generator() function requires that the size of the latent space be defined.

Like the discriminator, we will set the n_blocks argument to 6 to create six pairs of models.

The function returns a list of models where each item in the list contains the normal or straight-through version of each generator and the growth version for phasing in the new block at the larger output image size.

Composite Models for Training the Generators

The generator models are not compiled as they are not trained directly.

Instead, the generator models are trained via the discriminator models using Wasserstein loss.

This involves presenting generated images to the discriminator as real images and calculating the loss that is then used to update the generator models.

A given generator model must be paired with a given discriminator model both in terms of the same image size (e.g. 4×4 or 8×8) and in terms of the same phase of training, such as growth phase (introducing the new block) or fine-tuning phase (normal or straight-through).

We can achieve this by creating a new model for each pair of models that stacks the generator on top of the discriminator so that the synthetic image feeds directly into the discriminator model to be deemed real or fake. This composite model can then be used to train the generator via the discriminator and the weights of the discriminator can be marked as not trainable (only in this model) to ensure they are not changed during this misleading process.

As such, we can create pairs of composite models, e.g. six pairs for the six levels of image growth, where each pair is comprised of a composite model for the normal or straight-through model, and the growth version of the model.

The define_composite() function implements this and is defined below.

# define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list

Now that we have seen how to define the generator and discriminator models, let’s look at how we can fit these models on the celebrity faces dataset.

How to Train Progressive Growing GAN Models

First, we need to define some convenience functions for working with samples of data.

The load_real_samples() function below loads our prepared celebrity faces dataset, then converts the pixels to floating point values and scales them to the range [-1,1], common to most GAN implementations.

# load dataset def load_real_samples(filename): 	# load dataset 	data = load(filename) 	# extract numpy array 	X = data['arr_0'] 	# convert from ints to floats 	X = X.astype('float32') 	# scale from [0,255] to [-1,1] 	X = (X - 127.5) / 127.5 	return X

Next, we need to be able to retrieve a random sample of images used to update the discriminator.

The generate_real_samples() function below implements this, returning a random sample of images from the loaded dataset and their corresponding target value of class=1 to indicate that the images are real.

# select real samples def generate_real_samples(dataset, n_samples): 	# choose random instances 	ix = randint(0, dataset.shape[0], n_samples) 	# select images 	X = dataset[ix] 	# generate class labels 	y = ones((n_samples, 1)) 	return X, y

Next, we need a sample of latent points used to create synthetic images with the generator model.

The generate_latent_points() function below implements this, returning a batch of latent points with the required dimensionality.

# generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): 	# generate points in the latent space 	x_input = randn(latent_dim * n_samples) 	# reshape into a batch of inputs for the network 	x_input = x_input.reshape(n_samples, latent_dim) 	return x_input

The latent points can be used as input to the generator to create a batch of synthetic images.

This is required to update the discriminator model. It is also required to update the generator model via the discriminator model with the composite models defined in the previous section.

The generate_fake_samples() function below takes a generator model and generates and returns a batch of synthetic images and the corresponding target for the discriminator of class=-1 to indicate that the images are fake. The generate_latent_points() function is called to create the required batch worth of random latent points.

# use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): 	# generate points in latent space 	x_input = generate_latent_points(latent_dim, n_samples) 	# predict outputs 	X = generator.predict(x_input) 	# create class labels 	y = -ones((n_samples, 1)) 	return X, y

Training the models occurs in two phases: a fade-in phase that involves the transition from a lower-resolution to a higher-resolution image, and the normal phase that involves the fine-tuning of the models at a given higher resolution image.

During the phase-in, the alpha value of the WeightedSum layers in the discriminator and generator model at a given level requires linear transition from 0.0 to 1.0 based on the training step. The update_fadein() function below implements this; given a list of models (such as the generator, discriminator, and composite model), the function locates the WeightedSum layer in each and sets the value for the alpha attribute based on the current training step number.

Importantly, this alpha attribute is not a constant but is instead defined as a changeable variable in the WeightedSum class and whose value can be changed using the Keras backend set_value() function.

This is a clumsy but effective approach to changing the alpha values. Perhaps a cleaner implementation would involve a Keras Callback and is left as an exercise for the reader.

# update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): 	# calculate current alpha (linear from 0 to 1) 	alpha = step / float(n_steps - 1) 	# update the alpha for each model 	for model in models: 		for layer in model.layers: 			if isinstance(layer, WeightedSum): 				backend.set_value(layer.alpha, alpha)

Next, we can define the procedure for training the models for a given training phase.

A training phase takes one generator, discriminator, and composite model and updates them on the dataset for a given number of training epochs. The training phase may be a fade-in transition to a higher resolution, in which case the update_fadein() must be called each iteration, or it may be a normal tuning training phase, in which case there are no WeightedSum layers present.

The train_epochs() function below implements the training of the discriminator and generator models for a single training phase.

A single training iteration involves first selecting a half batch of real images from the dataset and generating a half batch of fake images from the current state of the generator model. These samples are then used to update the discriminator model.

Next, the generator model is updated via the discriminator with the composite model, indicating that the generated images are, in fact, real, and updating generator weights in an effort to better fool the discriminator.

A summary of model performance is printed at the end of each training iteration, summarizing the loss of the discriminator on the real (d1) and fake (d2) images and the loss of the generator (g).

# train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): 	# calculate the number of batches per training epoch 	bat_per_epo = int(dataset.shape[0] / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# calculate the size of half a batch of samples 	half_batch = int(n_batch / 2) 	# manually enumerate epochs 	for i in range(n_steps): 		# update alpha for all WeightedSum layers when fading in new blocks 		if fadein: 			update_fadein([g_model, d_model, gan_model], i, n_steps) 		# prepare real and fake samples 		X_real, y_real = generate_real_samples(dataset, half_batch) 		X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) 		# update discriminator model 		d_loss1 = d_model.train_on_batch(X_real, y_real) 		d_loss2 = d_model.train_on_batch(X_fake, y_fake) 		# update the generator via the discriminator's error 		z_input = generate_latent_points(latent_dim, n_batch) 		y_real2 = ones((n_batch, 1)) 		g_loss = gan_model.train_on_batch(z_input, y_real2) 		# summarize loss on this batch 		print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss))

Next, we need to call the train_epochs() function for each training phase.

This involves first scaling the training dataset to the required pixel dimensions, such as 4×4 or 8×8. The scale_dataset() function below implements this, taking the dataset and returning a scaled version.

These scaled versions of the dataset could be pre-computed and loaded instead of re-scaled on each run. This might be a nice extension if you intend to run the example many times.

# scale images to preferred size def scale_dataset(images, new_shape): 	images_list = list() 	for image in images: 		# resize with nearest neighbor interpolation 		new_image = resize(image, new_shape, 0) 		# store 		images_list.append(new_image) 	return asarray(images_list)

After each training run, we also need to save a plot of generated images and the current state of the generator model.

This is useful so that at the end of the run we can see the progression of the capability and quality of the model, and load and use a generator model at any point during the training process. A generator model could be used to create ad hoc images, or used as the starting point for continued training.

The summarize_performance() function below implements this, given a status string such as “faded” or “tuned“, a generator model, and the size of the latent space. The function will proceed to create a unique name for the state of the system using the “status” string such as “04×04-faded“, then create a plot of 25 generated images and save the plot and the generator model to file using the defined name.

# generate samples and save as a plot and save the model def summarize_performance(status, g_model, latent_dim, n_samples=25): 	# devise name 	gen_shape = g_model.output_shape 	name = '%03dx%03d-%s' % (gen_shape[1], gen_shape[2], status) 	# generate images 	X, _ = generate_fake_samples(g_model, latent_dim, n_samples) 	# normalize pixel values to the range [0,1] 	X = (X - X.min()) / (X.max() - X.min()) 	# plot real images 	square = int(sqrt(n_samples)) 	for i in range(n_samples): 		pyplot.subplot(square, square, 1 + i) 		pyplot.axis('off') 		pyplot.imshow(X[i]) 	# save plot to file 	filename1 = 'plot_%s.png' % (name) 	pyplot.savefig(filename1) 	pyplot.close() 	# save the generator model 	filename2 = 'model_%s.h5' % (name) 	g_model.save(filename2) 	print('>Saved: %s and %s' % (filename1, filename2))

The train() function below pulls this together, taking the lists of defined models as input as well as the list of batch sizes and the number of training epochs for the normal and fade-in phases at each level of growth for the model.

The first generator and discriminator model for 4×4 images are fit by calling train_epochs() and saved by calling summarize_performance().

Then the steps of growth are enumerated, involving first scaling the image dataset to the preferred size, training and saving the fade-in model for the new image size, then training and saving the normal or fine-tuned model for the new image size.

# train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): 	# fit the baseline model 	g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[0], n_batch[0]) 	summarize_performance('tuned', g_normal, latent_dim) 	# process each level of growth 	for i in range(1, len(g_models)): 		# retrieve models for this level of growth 		[g_normal, g_fadein] = g_models[i] 		[d_normal, d_fadein] = d_models[i] 		[gan_normal, gan_fadein] = gan_models[i] 		# scale dataset to appropriate size 		gen_shape = g_normal.output_shape 		scaled_data = scale_dataset(dataset, gen_shape[1:]) 		print('Scaled Data', scaled_data.shape) 		# train fade-in models for next level of growth 		train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein[i], n_batch[i], True) 		summarize_performance('faded', g_normal, latent_dim) 		# train normal or straight-through models 		train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[i], n_batch[i]) 		summarize_performance('tuned', g_normal, latent_dim)

We can then define the configuration, models, and call train() to start the training process.

The paper recommends using a batch size of 16 for images sized between 4×4 and 128×128 before reducing the size. It also recommends training each phase for about 800K images. The paper also recommends a latent space of 512 dimensions.

The models are defined with six levels of growth to meet the 128×128 pixel size of our dataset. We also shrink the latent space accordingly to 100 dimensions.

Instead of keeping the batch size and number of epochs constant, we vary it to speed up the training process, using larger batch sizes for early training phases and smaller batch sizes for later training phases for fine-tuning and stability. Additionally, fewer training epochs are used for the smaller models and more epochs for the larger models.

The choice of batch sizes and training epochs is somewhat arbitrary and you may want to experiment with different values and review their effects.

# number of growth phases, e.g. 6 == [4, 8, 16, 32, 64, 128] n_blocks = 6 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(latent_dim, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples('img_align_celeba_128.npz') print('Loaded', dataset.shape) # train model n_batch = [16, 16, 16, 8, 4, 4] # 10 epochs == 500K images per training phase n_epochs = [5, 8, 8, 10, 10, 10] train(g_models, d_models, gan_models, dataset, latent_dim, n_epochs, n_epochs, n_batch)

We can tie all of this together.

The complete example of training a progressive growing generative adversarial network on the celebrity faces dataset is listed below.

# example of progressive growing gan on celebrity faces dataset from math import sqrt from numpy import load from numpy import asarray from numpy import zeros from numpy import ones from numpy.random import randn from numpy.random import randint from skimage.transform import resize from keras.optimizers import Adam from keras.models import Sequential from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import Layer from keras.layers import Add from keras.constraints import max_norm from keras.initializers import RandomNormal from keras import backend from matplotlib import pyplot  # pixel-wise feature vector normalization layer class PixelNormalization(Layer): 	# initialize the layer 	def __init__(self, **kwargs): 		super(PixelNormalization, self).__init__(**kwargs)  	# perform the operation 	def call(self, inputs): 		# calculate square pixel values 		values = inputs**2.0 		# calculate the mean pixel values 		mean_values = backend.mean(values, axis=-1, keepdims=True) 		# ensure the mean is not zero 		mean_values += 1.0e-8 		# calculate the sqrt of the mean squared value (L2 norm) 		l2 = backend.sqrt(mean_values) 		# normalize values by the l2 norm 		normalized = inputs / l2 		return normalized  	# define the output shape of the layer 	def compute_output_shape(self, input_shape): 		return input_shape  # mini-batch standard deviation layer class MinibatchStdev(Layer): 	# initialize the layer 	def __init__(self, **kwargs): 		super(MinibatchStdev, self).__init__(**kwargs)  	# perform the operation 	def call(self, inputs): 		# calculate the mean value for each pixel across channels 		mean = backend.mean(inputs, axis=0, keepdims=True) 		# calculate the squared differences between pixel values and mean 		squ_diffs = backend.square(inputs - mean) 		# calculate the average of the squared differences (variance) 		mean_sq_diff = backend.mean(squ_diffs, axis=0, keepdims=True) 		# add a small value to avoid a blow-up when we calculate stdev 		mean_sq_diff += 1e-8 		# square root of the variance (stdev) 		stdev = backend.sqrt(mean_sq_diff) 		# calculate the mean standard deviation across each pixel coord 		mean_pix = backend.mean(stdev, keepdims=True) 		# scale this up to be the size of one input feature map for each sample 		shape = backend.shape(inputs) 		output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1)) 		# concatenate with the output 		combined = backend.concatenate([inputs, output], axis=-1) 		return combined  	# define the output shape of the layer 	def compute_output_shape(self, input_shape): 		# create a copy of the input shape as a list 		input_shape = list(input_shape) 		# add one to the channel dimension (assume channels-last) 		input_shape[-1] += 1 		# convert list to a tuple 		return tuple(input_shape)  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # calculate wasserstein loss def wasserstein_loss(y_true, y_pred): 	return backend.mean(y_true * y_pred)  # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(128, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = MinibatchStdev()(d) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer=init, kernel_constraint=const)(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # add a generator block def add_generator_block(old_model): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(upsampling) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	# weight initialization 	init = RandomNormal(stddev=0.02) 	# weight constraint 	const = max_norm(1.0) 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer=init, kernel_constraint=const)(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	g = PixelNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list  # load dataset def load_real_samples(filename): 	# load dataset 	data = load(filename) 	# extract numpy array 	X = data['arr_0'] 	# convert from ints to floats 	X = X.astype('float32') 	# scale from [0,255] to [-1,1] 	X = (X - 127.5) / 127.5 	return X  # select real samples def generate_real_samples(dataset, n_samples): 	# choose random instances 	ix = randint(0, dataset.shape[0], n_samples) 	# select images 	X = dataset[ix] 	# generate class labels 	y = ones((n_samples, 1)) 	return X, y  # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): 	# generate points in the latent space 	x_input = randn(latent_dim * n_samples) 	# reshape into a batch of inputs for the network 	x_input = x_input.reshape(n_samples, latent_dim) 	return x_input  # use the generator to generate n fake examples, with class labels def generate_fake_samples(generator, latent_dim, n_samples): 	# generate points in latent space 	x_input = generate_latent_points(latent_dim, n_samples) 	# predict outputs 	X = generator.predict(x_input) 	# create class labels 	y = -ones((n_samples, 1)) 	return X, y  # update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): 	# calculate current alpha (linear from 0 to 1) 	alpha = step / float(n_steps - 1) 	# update the alpha for each model 	for model in models: 		for layer in model.layers: 			if isinstance(layer, WeightedSum): 				backend.set_value(layer.alpha, alpha)  # train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): 	# calculate the number of batches per training epoch 	bat_per_epo = int(dataset.shape[0] / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# calculate the size of half a batch of samples 	half_batch = int(n_batch / 2) 	# manually enumerate epochs 	for i in range(n_steps): 		# update alpha for all WeightedSum layers when fading in new blocks 		if fadein: 			update_fadein([g_model, d_model, gan_model], i, n_steps) 		# prepare real and fake samples 		X_real, y_real = generate_real_samples(dataset, half_batch) 		X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) 		# update discriminator model 		d_loss1 = d_model.train_on_batch(X_real, y_real) 		d_loss2 = d_model.train_on_batch(X_fake, y_fake) 		# update the generator via the discriminator's error 		z_input = generate_latent_points(latent_dim, n_batch) 		y_real2 = ones((n_batch, 1)) 		g_loss = gan_model.train_on_batch(z_input, y_real2) 		# summarize loss on this batch 		print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss))  # scale images to preferred size def scale_dataset(images, new_shape): 	images_list = list() 	for image in images: 		# resize with nearest neighbor interpolation 		new_image = resize(image, new_shape, 0) 		# store 		images_list.append(new_image) 	return asarray(images_list)  # generate samples and save as a plot and save the model def summarize_performance(status, g_model, latent_dim, n_samples=25): 	# devise name 	gen_shape = g_model.output_shape 	name = '%03dx%03d-%s' % (gen_shape[1], gen_shape[2], status) 	# generate images 	X, _ = generate_fake_samples(g_model, latent_dim, n_samples) 	# normalize pixel values to the range [0,1] 	X = (X - X.min()) / (X.max() - X.min()) 	# plot real images 	square = int(sqrt(n_samples)) 	for i in range(n_samples): 		pyplot.subplot(square, square, 1 + i) 		pyplot.axis('off') 		pyplot.imshow(X[i]) 	# save plot to file 	filename1 = 'plot_%s.png' % (name) 	pyplot.savefig(filename1) 	pyplot.close() 	# save the generator model 	filename2 = 'model_%s.h5' % (name) 	g_model.save(filename2) 	print('>Saved: %s and %s' % (filename1, filename2))  # train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): 	# fit the baseline model 	g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[0], n_batch[0]) 	summarize_performance('tuned', g_normal, latent_dim) 	# process each level of growth 	for i in range(1, len(g_models)): 		# retrieve models for this level of growth 		[g_normal, g_fadein] = g_models[i] 		[d_normal, d_fadein] = d_models[i] 		[gan_normal, gan_fadein] = gan_models[i] 		# scale dataset to appropriate size 		gen_shape = g_normal.output_shape 		scaled_data = scale_dataset(dataset, gen_shape[1:]) 		print('Scaled Data', scaled_data.shape) 		# train fade-in models for next level of growth 		train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein[i], n_batch[i], True) 		summarize_performance('faded', g_normal, latent_dim) 		# train normal or straight-through models 		train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm[i], n_batch[i]) 		summarize_performance('tuned', g_normal, latent_dim)  # number of growth phases, e.g. 6 == [4, 8, 16, 32, 64, 128] n_blocks = 6 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(latent_dim, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples('img_align_celeba_128.npz') print('Loaded', dataset.shape) # train model n_batch = [16, 16, 16, 8, 4, 4] # 10 epochs == 500K images per training phase n_epochs = [5, 8, 8, 10, 10, 10] train(g_models, d_models, gan_models, dataset, latent_dim, n_epochs, n_epochs, n_batch)

Note: The example can be run on the CPU, although a GPU is recommended.

Running the example may take a number of hours to complete on modern GPU hardware.

Note: Your specific results will vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

If loss values during the training iterations go to zero or very large/small numbers, this may be an example of a failure mode and may require a restart of the training process.

Running the example first reports the successful loading of the prepared dataset and the scaling of the dataset to the first image size, then reports the loss of each model for each step of the training process.

Loaded (50000, 128, 128, 3) Scaled Data (50000, 4, 4, 3) >1, d1=0.993, d2=0.001 g=0.951 >2, d1=0.861, d2=0.118 g=0.982 >3, d1=0.829, d2=0.126 g=0.875 >4, d1=0.774, d2=0.202 g=0.912 >5, d1=0.687, d2=0.035 g=0.911 ...

Plots of generated images and the generator model are saved after each fade-in training phase with filenames like:

  • plot_008x008-faded.png
  • model_008x008-faded.h5

Plots and models are also saved after each tuning phase, with filenames like:

  • plot_008x008-tuned.png
  • model_008x008-tuned.h5

Reviewing plots of the generated images at each point helps to see the progression both in the size of supported images and their quality before and after the tuning phase.

For example, below is a sample of images generated after the first 4×4 training phase (plot_004x004-tuned.png). At this point, we cannot see much at all.

Synthetic Celebrity Faces at 4x4 Resolution Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 4×4 Resolution Generated by the Progressive Growing GAN

Reviewing generated images after the fade-in training phase for 8×8 images shows more structure (plot_008x008-faded.png). The images are blocky but we can see faces.

Synthetic Celebrity Faces at 8x8 Resolution After Fade-In Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 8×8 Resolution After Fade-In Generated by the Progressive Growing GAN

Next, we can contrast the generated images for 16×16 after the fade-in training phase (plot_016x016-faded.png) and after the tuning training phase (plot_016x016-tuned.png).

We can see that the images are clearly faces and we can see that the fine-tuning phase appears to improve the coloring or tone of the faces and perhaps the structure.

Synthetic Celebrity Faces at 16x16 Resolution After Fade-In Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 16×16 Resolution After Fade-In Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 16x16 Resolution After Tuning Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 16×16 Resolution After Tuning Generated by the Progressive Growing GAN

Finally, we can review generated faces after tuning for the remaining 32×32, 64×64, and 128×128 resolutions. We can see that each step in resolution, the image quality is improved, allowing the model to fill in more structure and detail.

Although not perfect, the generated images show that the progressive growing GAN is capable of not only generating plausible human faces at different resolutions, but it is able to scale building upon what was learned at lower resolutions to generate plausible faces at higher resolutions.

Synthetic Celebrity Faces at 32x32 Resolution After Tuning Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 32×32 Resolution After Tuning Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 64x64 Resolution After Tuning Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 64×64 Resolution After Tuning Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 128x128 Resolution After Tuning Generated by the Progressive Growing GAN

Synthetic Celebrity Faces at 128×128 Resolution After Tuning Generated by the Progressive Growing GAN

Now that we have seen how the generator models can be fit, next we can see how we might load and use a saved generator model.

How to Synthesize Images With a Progressive Growing GAN Model

In this section, we will explore how to load a generator model and use it to generate synthetic images on demand.

The saved Keras models can be loaded via the load_model() function.

Because the generator models use custom layers, we must specify how to load the custom layers. This is achieved by providing a dict to the load_model() function that maps each of the custom layer names to the appropriate class.

... # load model cust = {'PixelNormalization': PixelNormalization, 'MinibatchStdev': MinibatchStdev, 'WeightedSum': WeightedSum} model = load_model('model_016x016-tuned.h5', cust)

We can then use the generate_latent_points() function from the previous section to generate points in latent space as input for the generator model.

... # size of the latent space latent_dim = 100 # number of images to generate n_images = 25 # generate images latent_points = generate_latent_points(latent_dim, n_images) # generate images X = model.predict(latent_points)

We can then plot the results by first scaling the pixel values to the range [0,1] and plotting each image, in this case in a square grid pattern.

# create a plot of generated images def plot_generated(images, n_images): 	# plot images 	square = int(sqrt(n_images)) 	# normalize pixel values to the range [0,1] 	images = (images - images.min()) / (images.max() - images.min()) 	for i in range(n_images): 		# define subplot 		pyplot.subplot(square, square, 1 + i) 		# turn off axis 		pyplot.axis('off') 		# plot raw pixel data 		pyplot.imshow(images[i]) 	pyplot.show()

Tying this together, the complete example of loading a saved progressive growing GAN generator model and using it to generate new faces is listed below.

In this case, we demonstrate loading the tuned model for generating 16×16 faces.

# example of loading the generator model and generating images from math import sqrt from numpy import asarray from numpy.random import randn from numpy.random import randint from keras.layers import Layer from keras.layers import Add from keras import backend from keras.models import load_model from matplotlib import pyplot  # pixel-wise feature vector normalization layer class PixelNormalization(Layer): 	# initialize the layer 	def __init__(self, **kwargs): 		super(PixelNormalization, self).__init__(**kwargs)  	# perform the operation 	def call(self, inputs): 		# calculate square pixel values 		values = inputs**2.0 		# calculate the mean pixel values 		mean_values = backend.mean(values, axis=-1, keepdims=True) 		# ensure the mean is not zero 		mean_values += 1.0e-8 		# calculate the sqrt of the mean squared value (L2 norm) 		l2 = backend.sqrt(mean_values) 		# normalize values by the l2 norm 		normalized = inputs / l2 		return normalized  	# define the output shape of the layer 	def compute_output_shape(self, input_shape): 		return input_shape  # mini-batch standard deviation layer class MinibatchStdev(Layer): 	# initialize the layer 	def __init__(self, **kwargs): 		super(MinibatchStdev, self).__init__(**kwargs)  	# perform the operation 	def call(self, inputs): 		# calculate the mean value for each pixel across channels 		mean = backend.mean(inputs, axis=0, keepdims=True) 		# calculate the squared differences between pixel values and mean 		squ_diffs = backend.square(inputs - mean) 		# calculate the average of the squared differences (variance) 		mean_sq_diff = backend.mean(squ_diffs, axis=0, keepdims=True) 		# add a small value to avoid a blow-up when we calculate stdev 		mean_sq_diff += 1e-8 		# square root of the variance (stdev) 		stdev = backend.sqrt(mean_sq_diff) 		# calculate the mean standard deviation across each pixel coord 		mean_pix = backend.mean(stdev, keepdims=True) 		# scale this up to be the size of one input feature map for each sample 		shape = backend.shape(inputs) 		output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1)) 		# concatenate with the output 		combined = backend.concatenate([inputs, output], axis=-1) 		return combined  	# define the output shape of the layer 	def compute_output_shape(self, input_shape): 		# create a copy of the input shape as a list 		input_shape = list(input_shape) 		# add one to the channel dimension (assume channels-last) 		input_shape[-1] += 1 		# convert list to a tuple 		return tuple(input_shape)  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # generate points in latent space as input for the generator def generate_latent_points(latent_dim, n_samples): 	# generate points in the latent space 	x_input = randn(latent_dim * n_samples) 	# reshape into a batch of inputs for the network 	z_input = x_input.reshape(n_samples, latent_dim) 	return z_input  # create a plot of generated images def plot_generated(images, n_images): 	# plot images 	square = int(sqrt(n_images)) 	# normalize pixel values to the range [0,1] 	images = (images - images.min()) / (images.max() - images.min()) 	for i in range(n_images): 		# define subplot 		pyplot.subplot(square, square, 1 + i) 		# turn off axis 		pyplot.axis('off') 		# plot raw pixel data 		pyplot.imshow(images[i]) 	pyplot.show()  # load model cust = {'PixelNormalization': PixelNormalization, 'MinibatchStdev': MinibatchStdev, 'WeightedSum': WeightedSum} model = load_model('model_016x016-tuned.h5', cust) # size of the latent space latent_dim = 100 # number of images to generate n_images = 25 # generate images latent_points = generate_latent_points(latent_dim, n_images) # generate images X  = model.predict(latent_points) # plot the result plot_generated(X, n_images)

Running the example loads the model and generates 25 faces that are plotted in a 5×5 grid.

Plot of 25 Synthetic Faces with 16x16 Resolution Generated With a Final Progressive Growing GAN Model

Plot of 25 Synthetic Faces with 16×16 Resolution Generated With a Final Progressive Growing GAN Model

We can then change the filename to a different model, such as the tuned model for generating 128×128 faces.

... model = load_model('model_128x128-tuned.h5', cust)

Re-running the example generates a plot of higher-resolution synthetic faces.

Plot of 25 Synthetic Faces With 128x128 Resolution Generated With a Final Progressive Growing GAN Model

Plot of 25 Synthetic Faces With 128×128 Resolution Generated With a Final Progressive Growing GAN Model

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

  • Change Alpha via Callback. Update the example to use a Keras callback to update the alpha value for the WeightedSum layers during fade-in training.
  • Pre-Scale Dataset. Update the example to pre-scale each dataset and save each version to file to be loaded when needed during training.
  • Equalized Learning Rate. Update the example to implement the equalized learning rate weight scaling method described in the paper.
  • Progression in Number of Filters. Update the example to decrease the number of filters with depth in the generator and increase the number of filters with depth in the discriminator to match the configuration in the paper.
  • Larger Image Size. Update the example to generate large image sizes, such as 512×512.

If you explore any of these extensions, I’d love to know.
Post your findings in the comments below.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Official

API

Articles

Summary

In this tutorial, you discovered how to implement and train a progressive growing generative adversarial network for generating celebrity faces.

Specifically, you learned:

  • How to prepare the celebrity faces dataset for training a progressive growing GAN model.
  • How to define and train the progressive growing GAN on the celebrity faces dataset.
  • How to load saved generator models and use them for generating ad hoc synthetic celebrity faces.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Train a Progressive Growing GAN in Keras for Synthesizing Faces appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

How to Implement Progressive Growing GAN Models in Keras

The progressive growing generative adversarial network is an approach for training a deep convolutional neural network model for generating synthetic images.

It is an extension of the more traditional GAN architecture that involves incrementally growing the size of the generated image during training, starting with a very small image, such as a 4×4 pixels. This allows the stable training and growth of GAN models capable of generating very large high-quality images, such as images of synthetic celebrity faces with the size of 1024×1024 pixels.

In this tutorial, you will discover how to develop progressive growing generative adversarial network models from scratch with Keras.

After completing this tutorial, you will know:

  • How to develop pre-defined discriminator and generator models at each level of output image growth.
  • How to define composite models for training the generator models via the discriminator models.
  • How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Implement Progressive Growing GAN Models in Keras

How to Implement Progressive Growing GAN Models in Keras
Photo by Diogo Santos Silva, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Progressive Growing GAN Architecture?
  2. How to Implement the Progressive Growing GAN Discriminator Model
  3. How to Implement the Progressive Growing GAN Generator Model
  4. How to Implement Composite Models for Updating the Generator
  5. How to Train Discriminator and Generator Models

What Is the Progressive Growing GAN Architecture?

GANs are effective at generating crisp synthetic images, although are typically limited in the size of the images that can be generated.

The Progressive Growing GAN is an extension to the GAN that allows the training of generator models capable of outputting large high-quality images, such as photorealistic faces with the size 1024×1024 pixels. It was described in the 2017 paper by Tero Karras, et al. from Nvidia titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation.”

The key innovation of the Progressive Growing GAN is the incremental increase in the size of images output by the generator starting with a 4×4 pixel image and double to 8×8, 16×16, and so on until the desired output resolution.

Our primary contribution is a training methodology for GANs where we start with low-resolution images, and then progressively increase the resolution by adding layers to the networks.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

This is achieved by a training procedure that involves periods of fine-tuning the model with a given output resolution, and periods of slowly phasing in a new model with a larger resolution.

When doubling the resolution of the generator (G) and discriminator (D) we fade in the new layers smoothly

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

All layers remain trainable during the training process, including existing layers when new layers are added.

All existing layers in both networks remain trainable throughout the training process.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Progressive Growing GAN involves using a generator and discriminator model with the same general structure and starting with very small images. During training, new blocks of convolutional layers are systematically added to both the generator model and the discriminator models.

Example of Progressively Adding Layers to Generator and Discriminator Models.

Example of Progressively Adding Layers to Generator and Discriminator Models.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The incremental addition of the layers allows the models to effectively learn coarse-level detail and later learn ever finer detail, both on the generator and discriminator side.

This incremental nature allows the training to first discover the large-scale structure of the image distribution and then shift attention to increasingly finer-scale detail, instead of having to learn all scales simultaneously.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The model architecture is complex and cannot be implemented directly.

In this tutorial, we will focus on how the progressive growing GAN can be implemented using the Keras deep learning library.

We will step through how each of the discriminator and generator models can be defined, how the generator can be trained via the discriminator model, and how each model can be updated during the training process.

These implementation details will provide the basis for you developing a progressive growing GAN for your own applications.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

How to Implement the Progressive Growing GAN Discriminator Model

The discriminator model is given images as input and must classify them as either real (from the dataset) or fake (generated).

During the training process, the discriminator must grow to support images with ever-increasing size, starting with 4×4 pixel color images and doubling to 8×8, 16×16, 32×32, and so on.

This is achieved by inserting a new input layer to support the larger input image followed by a new block of layers. The output of this new block is then downsampled. Additionally, the new image is also downsampled directly and passed through the old input processing layer before it is combined with the output of the new block.

During the transition from a lower resolution to a higher resolution, e.g. 16×16 to 32×32, the discriminator model will have two input pathways as follows:

  • [32×32 Image] -> [fromRGB Conv] -> [NewBlock] -> [Downsample] ->
  • [32×32 Image] -> [Downsample] -> [fromRGB Conv] ->

The output of the new block that is downsampled and the output of the old input processing layer are combined using a weighted average, where the weighting is controlled by a new hyperparameter called alpha. The weighted sum is calculated as follows:

  • Output = ((1 – alpha) * fromRGB) + (alpha * NewBlock)

The weighted average of the two pathways is then fed into the rest of the existing model.

Initially, the weighting is completely biased towards the old input processing layer (alpha=0) and is linearly increased over training iterations so that the new block is given more weight until eventually, the output is entirely the product of the new block (alpha=1). At this time, the old pathway can be removed.

This can be summarized with the following figure taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

Figure Showing the Growing of the Discriminator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution

Figure Showing the Growing of the Discriminator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The fromRGB layers are implemented as a 1×1 convolutional layer. A block is comprised of two convolutional layers with 3×3 sized filters and the leaky ReLU activation function with a slope of 0.2, followed by a downsampling layer. Average pooling is used for downsampling, which is unlike most other GAN models that use transpose convolutional layers.

The output of the model involves two convolutional layers with 3×3 and 4×4 sized filters and Leaky ReLU activation, followed by a fully connected layer that outputs the single value prediction. The model uses a linear activation function instead of a sigmoid activation function like other discriminator models and is trained directly either by Wasserstein loss (specifically WGAN-GP) or least squares loss; we will use the latter in this tutorial. Model weights are initialized using He Gaussian (he_normal), which is very similar to the method used in the paper.

The model uses a custom layer called Minibatch standard deviation at the beginning of the output block, and instead of batch normalization, each layer uses local response normalization, referred to as pixel-wise normalization in the paper. We will leave out the minibatch normalization and use batch normalization in this tutorial for brevity.

One approach to implementing the progressive growing GAN would be to manually expand a model on demand during training. Another approach is to pre-define all of the models prior to training and carefully use the Keras functional API to ensure that layers are shared across the models and continue training.

I believe the latter approach might be easier and is the approach we will use in this tutorial.

First, we must define a custom layer that we can use when fading in a new higher-resolution input image and block. This new layer must take two sets of activation maps with the same dimensions (width, height, channels) and add them together using a weighted sum.

We can implement this as a new layer called WeightedSum that extends the Add merge layer and uses a hyperparameter ‘alpha‘ to control the contribution of each input. This new class is defined below. The layer assumes only two inputs: the first for the output of the old or existing layers and the second for the newly added layers. The new hyperparameter is defined as a backend variable, meaning that we can change it any time via changing the value of the variable.

# weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output

The discriminator model is by far more complex than the generator to grow because we have to change the model input, so let’s step through this slowly.

Firstly, we can define a discriminator model that takes a 4×4 color image as input and outputs a prediction of whether the image is real or fake. The model is comprised of a 1×1 input processing layer (fromRGB) and an output block.

... # base model input in_image = Input(shape=(4,4,3)) # conv 1x1 g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 (output block) g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 4x4 g = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # dense output layer g = Flatten()(g) out_class = Dense(1)(g) # define model model = Model(in_image, out_class) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

Next, we need to define a new model that handles the intermediate stage between this model and a new discriminator model that takes 8×8 color images as input.

The existing input processing layer must receive a downsampled version of the new 8×8 image. A new input process layer must be defined that takes the 8×8 input image and passes it through a new block of two convolutional layers and a downsampling layer. The output of the new block after downsampling and the old input processing layer must be added together using a weighted sum via our new WeightedSum layer and then must reuse the same output block (two convolutional layers and the output layer).

Given the first defined model and our knowledge about this model (e.g. the number of layers in the input processing layer is 2 for the Conv2D and LeakyReLU), we can construct this new intermediate or fade-in model using layer indexes from the old model.

... old_model = model # get shape of existing model in_shape = list(old_model.input.shape) # define new input shape as double the size input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) in_image = Input(shape=input_shape) # define new input processing layer g = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) g = LeakyReLU(alpha=0.2)(g) # define new block g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = AveragePooling2D()(g) # downsample the new larger image downsample = AveragePooling2D()(in_image) # connect old input processing to downsampled new input block_old = old_model.layers[1](downsample) block_old = old_model.layers[2](block_old) # fade in output of old model input layer with new input g = WeightedSum()([block_old, g]) # skip the input, 1x1 and activation for the old model for i in range(3, len(old_model.layers)): 	g = old_model.layers[i](g) # define straight-through model model = Model(in_image, g) # compile model model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

So far, so good.

We also need a version of the same model with the same layers without the fade-in of the input from the old model’s input processing layers.

This straight-through version is required for training before we fade-in the next doubling of the input image size.

We can update the above example to create two versions of the model. First, the straight-through version as it is simpler, then the version used for the fade-in that reuses the layers from the new block and the output layers of the old model.

The add_discriminator_block() function below implements this, returning a list of the two defined models (straight-through and fade-in), and takes the old model as an argument and defines the number of input layers as a default argument (3).

To ensure that the WeightedSum layer works correctly, we have fixed all convolutional layers to always have 64 filters, and in turn, output 64 feature maps. If there is a mismatch between the old model’s input processing layer and the new blocks output in terms of the number of feature maps (channels), then the weighted sum will fail.

# add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]

It is not an elegant function as we have some repetition, but it is readable and will get the job done.

We can then call this function again and again as we double the size of input images. Importantly, the function expects the straight-through version of the prior model as input.

The example below defines a new function called define_discriminator() that defines our base model that expects a 4×4 color image as input, then repeatedly adds blocks to create new versions of the discriminator model each time that expects images with quadruple the area.

# define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

This function will return a list of models, where each item in the list is a two-element list that contains first the straight-through version of the model at that resolution, and second the fade-in version of the model for that resolution.

We can tie all of this together and define a new “discriminator model” that will grow from 4×4, through to 8×8, and finally to 16×16. This is achieved by passing he n_blocks argument to 3 when calling the define_discriminator() function, for the creation of three sets of models.

The complete example is listed below.

# example of defining discriminator models for the progressive growing gan from keras.optimizers import Adam from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Conv2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define models discriminators = define_discriminator(3) # spot check m = discriminators[2][1] m.summary() plot_model(m, to_file='discriminator_plot.png', show_shapes=True, show_layer_names=True)

Running the example first summarizes the fade-in version of the third model showing the 16×16 color image inputs and the single value output.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_3 (InputLayer)            (None, 16, 16, 3)    0 __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 16, 16, 64)   256         input_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU)       (None, 16, 16, 64)   0           conv2d_7[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_7[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64)   256         conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_8 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_8[0][0] __________________________________________________________________________________________________ average_pooling2d_4 (AveragePoo (None, 8, 8, 3)      0           input_3[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64)   256         conv2d_9[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 8, 8, 64)     256         average_pooling2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_9 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 8, 8, 64)     0           conv2d_4[1][0] __________________________________________________________________________________________________ average_pooling2d_3 (AveragePoo (None, 8, 8, 64)     0           leaky_re_lu_9[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum)    (None, 8, 8, 64)     0           leaky_re_lu_4[1][0]                                                                  average_pooling2d_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 64)     36928       weighted_sum_2[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64)     256         conv2d_5[2][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_3[2][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               (None, 8, 8, 64)     36928       leaky_re_lu_5[2][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64)     256         conv2d_6[2][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_4[2][0] __________________________________________________________________________________________________ average_pooling2d_1 (AveragePoo (None, 4, 4, 64)     0           leaky_re_lu_6[2][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 4, 4, 128)    73856       average_pooling2d_1[2][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128)    512         conv2d_2[4][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_1[4][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D)               (None, 4, 4, 128)    262272      leaky_re_lu_2[4][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128)    512         conv2d_3[4][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_2[4][0] __________________________________________________________________________________________________ flatten_1 (Flatten)             (None, 2048)         0           leaky_re_lu_3[4][0] __________________________________________________________________________________________________ dense_1 (Dense)                 (None, 1)            2049        flatten_1[4][0] ================================================================================================== Total params: 488,449 Trainable params: 487,425 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

Note: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to plot_model().

The plot shows the 16×16 input image that is downsampled and passed through the 8×8 input processing layers from the prior model (left). It also shows the addition of the new block (right) and the weighted average that combines both streams of input, before using the existing model layers to continue processing and outputting a prediction.

Plot of the Fade-In Discriminator Model For the Progressive Growing GAN Transitioning From 8x8 to 16x16 Input Images

Plot of the Fade-In Discriminator Model For the Progressive Growing GAN Transitioning From 8×8 to 16×16 Input Images

Now that we have seen how we can define the discriminator models, let’s look at how we can define the generator models.

How to Implement the Progressive Growing GAN Generator Model

The generator models for the progressive growing GAN are easier to implement in Keras than the discriminator models.

The reason for this is because each fade-in requires a minor change to the output of the model.

Increasing the resolution of the generator involves first upsampling the output of the end of the last block. This is then connected to the new block and a new output layer for an image that is double the height and width dimensions or quadruple the area. During the phase-in, the upsampling is also connected to the output layer from the old model and the output from both output layers is merged using a weighted average.

After the phase-in is complete, the old output layer is removed.

This can be summarized with the following figure, taken from the paper showing a model before growing (a), during the phase-in of the larger resolution (b), and the model after the phase-in (c).

Figure Showing the Growing of the Generator Model, Before (a) During (b) and After (c) the Phase-In of a High Resolution

Figure Showing the Growing of the Generator Model, Before (a), During (b), and After (c) the Phase-In of a High Resolution.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The toRGB layer is a convolutional layer with 3 1×1 filters, sufficient to output a color image.

The model takes a point in the latent space as input, e.g. such as a 100-element or 512-element vector as described in the paper. This is scaled up to provided the basis for 4×4 activation maps, followed by a convolutional layer with 4×4 filters and another with 3×3 filters. Like the discriminator, LeakyReLU activations are used, as is pixel normalization, which we will substitute with batch normalization for brevity.

A block involves an upsample layer followed by two convolutional layers with 3×3 filters. Upsampling is achieved using a nearest neighbor method (e.g. duplicating input rows and columns) via a UpSampling2D layer instead of the more common transpose convolutional layer.

We can define the baseline model that will take a point in latent space as input and output a 4×4 color image as follows:

... # base model latent input in_latent = Input(shape=(100,)) # linear scale up to activation maps g  = Dense(128 * 4 * 4, kernel_initializer='he_normal')(in_latent) g = Reshape((4, 4, 128))(g) # conv 4x4, input block g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 3x3 g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # conv 1x1, output block out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(in_latent, out_image)

Next, we need to define a version of the model that uses all of the same input layers, although adds a new block (upsample and 2 convolutional layers) and a new output layer (a 1×1 convolutional layer).

This would be the model after the phase-in to the new output resolution. This can be achieved by using own knowledge about the baseline model and that the end of the last block is the second last layer, e.g. layer at index -2 in the model’s list of layers.

The new model with the addition of a new block and output layer is defined as follows:

... old_model = model # get the end of the last block block_end = old_model.layers[-2].output # upsample, and define new block upsampling = UpSampling2D()(block_end) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) g = BatchNormalization()(g) g = LeakyReLU(alpha=0.2)(g) # add new output layer out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) # define model model = Model(old_model.input, out_image)

That is pretty straightforward; we have chopped off the old output layer at the end of the last block and grafted on a new block and output layer.

Now we need a version of this new model to use during the fade-in.

This involves connecting the old output layer to the new upsampling layer at the start of the new block and using an instance of our WeightedSum layer defined in the previous section to combine the output of the old and new output layers.

... # get the output layer from old model out_old = old_model.layers[-1] # connect the upsampling to the old output layer out_image2 = out_old(upsampling) # define new output image as the weighted sum of the old and new models merged = WeightedSum()([out_image2, out_image]) # define model model2 = Model(old_model.input, merged)

We can combine the definition of these two operations into a function named add_generator_block(), defined below, that will expand a given model and return both the new generator model with the added block (model1) and a version of the model with the fading in of the new block with the old output layer (model2).

# add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]

We can then call this function with our baseline model to create models with one added block and continue to call it with subsequent models to keep adding blocks.

The define_generator() function below implements this, taking the size of the latent space and number of blocks to add (models to create).

The baseline model is defined as outputting a color image with the shape 4×4, controlled by the default argument in_dim.

# define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list

We can tie all of this together and define a baseline generator and the addition of two blocks, so three models in total, where a straight-through and fade-in version of each model is defined.

The complete example is listed below.

# example of defining generator models for the progressive growing gan from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define models generators = define_generator(100, 3) # spot check m = generators[2][1] m.summary() plot_model(m, to_file='generator_plot.png', show_shapes=True, show_layer_names=True)

The example chooses the fade-in model for the last model to summarize.

Running the example first summarizes a linear list of the layers in the model. We can see that the last model takes a point from the latent space and outputs a 16×16 image.

This matches as our expectations as the baseline model outputs a 4×4 image, adding one block increases this to 8×8, and adding one more block increases this to 16×16.

__________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to ================================================================================================== input_1 (InputLayer)            (None, 100)          0 __________________________________________________________________________________________________ dense_1 (Dense)                 (None, 2048)         206848      input_1[0][0] __________________________________________________________________________________________________ reshape_1 (Reshape)             (None, 4, 4, 128)    0           dense_1[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D)               (None, 4, 4, 128)    147584      reshape_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 4, 4, 128)    512         conv2d_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D)               (None, 4, 4, 128)    147584      leaky_re_lu_1[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 4, 4, 128)    512         conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU)       (None, 4, 4, 128)    0           batch_normalization_2[0][0] __________________________________________________________________________________________________ up_sampling2d_1 (UpSampling2D)  (None, 8, 8, 128)    0           leaky_re_lu_2[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D)               (None, 8, 8, 64)     73792       up_sampling2d_1[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 8, 8, 64)     256         conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D)               (None, 8, 8, 64)     36928       leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 8, 8, 64)     256         conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU)       (None, 8, 8, 64)     0           batch_normalization_4[0][0] __________________________________________________________________________________________________ up_sampling2d_2 (UpSampling2D)  (None, 16, 16, 64)   0           leaky_re_lu_4[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D)               (None, 16, 16, 64)   36928       up_sampling2d_2[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 16, 16, 64)   256         conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_5[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D)               (None, 16, 16, 64)   36928       leaky_re_lu_5[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 16, 16, 64)   256         conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU)       (None, 16, 16, 64)   0           batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D)               multiple             195         up_sampling2d_2[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D)               (None, 16, 16, 3)    195         leaky_re_lu_6[0][0] __________________________________________________________________________________________________ weighted_sum_2 (WeightedSum)    (None, 16, 16, 3)    0           conv2d_6[1][0]                                                                  conv2d_9[0][0] ================================================================================================== Total params: 689,030 Trainable params: 688,006 Non-trainable params: 1,024 __________________________________________________________________________________________________

A plot of the same fade-in version of the model is created and saved to file.

Note: creating this plot assumes that the pygraphviz and pydot libraries are installed. If this is a problem, comment out the import statement and call to plot_model().

We can see that the output from the last block passes through an UpSampling2D layer before feeding the added block and a new output layer as well as the old output layer before being merged via a weighted sum into the final output layer.

Plot of the Fade-In Generator Model For the Progressive Growing GAN Transitioning From 8x8 to 16x16 Output Images

Plot of the Fade-In Generator Model For the Progressive Growing GAN Transitioning From 8×8 to 16×16 Output Images

Now that we have seen how to define the generator models, we can review how the generator models may be updated via the discriminator models.

How to Implement Composite Models for Updating the Generator

The discriminator models are trained directly with real and fake images as input and a target value of 0 for fake and 1 for real.

The generator models are not trained directly; instead, they are trained indirectly via the discriminator models, just like a normal GAN model.

We can create a composite model for each level of growth of the model, e.g. pair 4×4 generators and 4×4 discriminators. We can also pair the straight-through models together, and the fade-in models together.

For example, we can retrieve the generator and discriminator models for a given level of growth.

... g_models, d_models = generators[0], discriminators[0]

Then we can use them to create a composite model for training the straight-through generator, where the output of the generator is fed directly to the discriminator in order to classify.

# straight-through model d_models[0].trainable = False model1 = Sequential() model1.add(g_models[0]) model1.add(d_models[0]) model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

And do the same for the composite model for the fade-in generator.

# fade-in model d_models[1].trainable = False model2 = Sequential() model2.add(g_models[1]) model2.add(d_models[1]) model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))

The function below, named define_composite(), automates this; given a list of defined discriminator and generator models, it will create an appropriate composite model for training each generator model.

# define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list

Tying this together with the definition of the discriminator and generator models above, the complete example of defining all models at each pre-defined level of growth is listed below.

# example of defining composite models for the progressive growing gan from keras.optimizers import Adam from keras.models import Sequential from keras.models import Model from keras.layers import Input from keras.layers import Dense from keras.layers import Flatten from keras.layers import Reshape from keras.layers import Conv2D from keras.layers import UpSampling2D from keras.layers import AveragePooling2D from keras.layers import LeakyReLU from keras.layers import BatchNormalization from keras.layers import Add from keras.utils.vis_utils import plot_model from keras import backend  # weighted sum output class WeightedSum(Add): 	# init with default value 	def __init__(self, alpha=0.0, **kwargs): 		super(WeightedSum, self).__init__(**kwargs) 		self.alpha = backend.variable(alpha, name='ws_alpha')  	# output a weighted sum of inputs 	def _merge_function(self, inputs): 		# only supports a weighted sum of two inputs 		assert (len(inputs) == 2) 		# ((1-a) * input1) + (a * input2) 		output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1]) 		return output  # add a discriminator block def add_discriminator_block(old_model, n_input_layers=3): 	# get shape of existing model 	in_shape = list(old_model.input.shape) 	# define new input shape as double the size 	input_shape = (in_shape[-2].value*2, in_shape[-2].value*2, in_shape[-1].value) 	in_image = Input(shape=input_shape) 	# define new input processing layer 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# define new block 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	d = AveragePooling2D()(d) 	block_new = d 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model1 = Model(in_image, d) 	# compile model 	model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# downsample the new larger image 	downsample = AveragePooling2D()(in_image) 	# connect old input processing to downsampled new input 	block_old = old_model.layers[1](downsample) 	block_old = old_model.layers[2](block_old) 	# fade in output of old model input layer with new input 	d = WeightedSum()([block_old, block_new]) 	# skip the input, 1x1 and activation for the old model 	for i in range(n_input_layers, len(old_model.layers)): 		d = old_model.layers[i](d) 	# define straight-through model 	model2 = Model(in_image, d) 	# compile model 	model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	return [model1, model2]  # define the discriminator models for each image resolution def define_discriminator(n_blocks, input_shape=(4,4,3)): 	model_list = list() 	# base model input 	in_image = Input(shape=input_shape) 	# conv 1x1 	d = Conv2D(64, (1,1), padding='same', kernel_initializer='he_normal')(in_image) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 3x3 (output block) 	d = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# conv 4x4 	d = Conv2D(128, (4,4), padding='same', kernel_initializer='he_normal')(d) 	d = BatchNormalization()(d) 	d = LeakyReLU(alpha=0.2)(d) 	# dense output layer 	d = Flatten()(d) 	out_class = Dense(1)(d) 	# define model 	model = Model(in_image, out_class) 	# compile model 	model.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_discriminator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # add a generator block def add_generator_block(old_model): 	# get the end of the last block 	block_end = old_model.layers[-2].output 	# upsample, and define new block 	upsampling = UpSampling2D()(block_end) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(upsampling) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	g = Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# add new output layer 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model1 = Model(old_model.input, out_image) 	# get the output layer from old model 	out_old = old_model.layers[-1] 	# connect the upsampling to the old output layer 	out_image2 = out_old(upsampling) 	# define new output image as the weighted sum of the old and new models 	merged = WeightedSum()([out_image2, out_image]) 	# define model 	model2 = Model(old_model.input, merged) 	return [model1, model2]  # define generator models def define_generator(latent_dim, n_blocks, in_dim=4): 	model_list = list() 	# base model latent input 	in_latent = Input(shape=(latent_dim,)) 	# linear scale up to activation maps 	g  = Dense(128 * in_dim * in_dim, kernel_initializer='he_normal')(in_latent) 	g = Reshape((in_dim, in_dim, 128))(g) 	# conv 4x4, input block 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 3x3 	g = Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal')(g) 	g = BatchNormalization()(g) 	g = LeakyReLU(alpha=0.2)(g) 	# conv 1x1, output block 	out_image = Conv2D(3, (1,1), padding='same', kernel_initializer='he_normal')(g) 	# define model 	model = Model(in_latent, out_image) 	# store model 	model_list.append([model, model]) 	# create submodels 	for i in range(1, n_blocks): 		# get prior model without the fade-on 		old_model = model_list[i - 1][0] 		# create new model for next resolution 		models = add_generator_block(old_model) 		# store model 		model_list.append(models) 	return model_list  # define composite models for training generators via discriminators def define_composite(discriminators, generators): 	model_list = list() 	# create composite models 	for i in range(len(discriminators)): 		g_models, d_models = generators[i], discriminators[i] 		# straight-through model 		d_models[0].trainable = False 		model1 = Sequential() 		model1.add(g_models[0]) 		model1.add(d_models[0]) 		model1.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# fade-in model 		d_models[1].trainable = False 		model2 = Sequential() 		model2.add(g_models[1]) 		model2.add(d_models[1]) 		model2.compile(loss='mse', optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8)) 		# store 		model_list.append([model1, model2]) 	return model_list  # define models discriminators = define_discriminator(3) # define models generators = define_generator(100, 3) # define composite models composite = define_composite(discriminators, generators)

Now that we know how to define all of the models, we can review how the models might be updated during training.

How to Train Discriminator and Generator Models

Pre-defining the generator, discriminator, and composite models was the hard part; training the models is straight forward and much like training any other GAN.

Importantly, in each training iteration the alpha variable in each WeightedSum layer must be set to a new value. This must be set for the layer in both the generator and discriminator models and allows for the smooth linear transition from the old model layers to the new model layers, e.g. alpha values set from 0 to 1 over a fixed number of training iterations.

The update_fadein() function below implements this and will loop through a list of models and set the alpha value on each based on the current step in a given number of training steps. You may be able to implement this more elegantly using a callback.

# update the alpha value on each instance of WeightedSum def update_fadein(models, step, n_steps): 	# calculate current alpha (linear from 0 to 1) 	alpha = step / float(n_steps - 1) 	# update the alpha for each model 	for model in models: 		for layer in model.layers: 			if isinstance(layer, WeightedSum): 				backend.set_value(layer.alpha, alpha)

We can define a generic function for training a given generator, discriminator, and composite model for a given number of training epochs.

The train_epochs() function below implements this where first the discriminator model is updated on real and fake images, then the generator model is updated, and the process is repeated for the required number of training iterations based on the dataset size and the number of epochs.

This function calls helper functions for retrieving a batch of real images via generate_real_samples(), generating a batch of fake samples with the generator generate_fake_samples(), and generating a sample of points in latent space generate_latent_points(). You can define these functions yourself quite trivially.

# train a generator and discriminator def train_epochs(g_model, d_model, gan_model, dataset, n_epochs, n_batch, fadein=False): 	# calculate the number of batches per training epoch 	bat_per_epo = int(dataset.shape[0] / n_batch) 	# calculate the number of training iterations 	n_steps = bat_per_epo * n_epochs 	# calculate the size of half a batch of samples 	half_batch = int(n_batch / 2) 	# manually enumerate epochs 	for i in range(n_steps): 		# update alpha for all WeightedSum layers when fading in new blocks 		if fadein: 			update_fadein([g_model, d_model, gan_model], i, n_steps) 		# prepare real and fake samples 		X_real, y_real = generate_real_samples(dataset, half_batch) 		X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch) 		# update discriminator model 		d_loss1 = d_model.train_on_batch(X_real, y_real) 		d_loss2 = d_model.train_on_batch(X_fake, y_fake) 		# update the generator via the discriminator's error 		z_input = generate_latent_points(latent_dim, n_batch) 		y_real2 = ones((n_batch, 1)) 		g_loss = gan_model.train_on_batch(z_input, y_real2) 		# summarize loss on this batch 		print('>%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, d_loss1, d_loss2, g_loss))

The images must be scaled to the size of each model. If the images are in-memory, we can define a simple scale_dataset() function to scale the loaded images.

In this case, we are using the skimage.transform.resize function from the scikit-image library to resize the NumPy array of pixels to the required size and use nearest neighbor interpolation.

# scale images to preferred size def scale_dataset(images, new_shape): 	images_list = list() 	for image in images: 		# resize with nearest neighbor interpolation 		new_image = resize(image, new_shape, 0) 		# store 		images_list.append(new_image) 	return asarray(images_list)

First, the baseline model must be fit for a given number of training epochs, e.g. the model that outputs 4×4 sized images.

This will require that the loaded images be scaled to the required size defined by the shape of the generator models output layer.

# fit the baseline model g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] # scale dataset to appropriate size gen_shape = g_normal.output_shape scaled_data = scale_dataset(dataset, gen_shape[1:]) print('Scaled Data', scaled_data.shape) # train normal or straight-through models train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can then process each level of growth, e.g. the first being 8×8.

This involves first retrieving the models, scaling the data to the appropriate size, then fitting the fade-in model followed by training the straight-through version of the model for fine tuning.

We can repeat this for each level of growth in a loop.

# process each level of growth for i in range(1, len(g_models)): 	# retrieve models for this level of growth 	[g_normal, g_fadein] = g_models[i] 	[d_normal, d_fadein] = d_models[i] 	[gan_normal, gan_fadein] = gan_models[i] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train fade-in models for next level of growth 	train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

We can tie this together and define a function called train() to train the progressive growing GAN function.

# train the generator and discriminator def train(g_models, d_models, gan_models, dataset, latent_dim, e_norm, e_fadein, n_batch): 	# fit the baseline model 	g_normal, d_normal, gan_normal = g_models[0][0], d_models[0][0], gan_models[0][0] 	# scale dataset to appropriate size 	gen_shape = g_normal.output_shape 	scaled_data = scale_dataset(dataset, gen_shape[1:]) 	print('Scaled Data', scaled_data.shape) 	# train normal or straight-through models 	train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch) 	# process each level of growth 	for i in range(1, len(g_models)): 		# retrieve models for this level of growth 		[g_normal, g_fadein] = g_models[i] 		[d_normal, d_fadein] = d_models[i] 		[gan_normal, gan_fadein] = gan_models[i] 		# scale dataset to appropriate size 		gen_shape = g_normal.output_shape 		scaled_data = scale_dataset(dataset, gen_shape[1:]) 		print('Scaled Data', scaled_data.shape) 		# train fade-in models for next level of growth 		train_epochs(g_fadein, d_fadein, gan_fadein, scaled_data, e_fadein, n_batch, True) 		# train normal or straight-through models 		train_epochs(g_normal, d_normal, gan_normal, scaled_data, e_norm, n_batch)

The number of epochs for the normal phase is defined by the e_norm argument and the number of epochs during the fade-in phase is defined by the e_fadein argument.

The number of epochs must be specified based on the size of the image dataset and the same number of epochs can be used for each phase, as was used in the paper.

We start with 4×4 resolution and train the networks until we have shown the discriminator 800k real images in total. We then alternate between two phases: fade in the first 3-layer block during the next 800k images, stabilize the networks for 800k images, fade in the next 3-layer block during 800k images, etc.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

We can then define our models as we did in the previous section, then call the training function.

# number of growth phase, e.g. 3 = 16x16 images n_blocks = 3 # size of the latent space latent_dim = 100 # define models d_models = define_discriminator(n_blocks) # define models g_models = define_generator(100, n_blocks) # define composite models gan_models = define_composite(d_models, g_models) # load image data dataset = load_real_samples() # train model train(g_models, d_models, gan_models, dataset, latent_dim, 100, 100, 16)

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Official

API

Articles

Summary

In this tutorial, you discovered how to develop progressive growing generative adversarial network models from scratch with Keras.

Specifically, you learned:

  • How to develop pre-defined discriminator and generator models at each level of output image growth.
  • How to define composite models for training the generator models via the discriminator models.
  • How to cycle the training of fade-in version and normal versions of models at each level of output image growth.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Implement Progressive Growing GAN Models in Keras appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

A Gentle Introduction to the Progressive Growing GAN

Progressive Growing GAN is an extension to the GAN training process that allows for the stable training of generator models that can output large high-quality images.

It involves starting with a very small image and incrementally adding blocks of layers that increase the output size of the generator model and the input size of the discriminator model until the desired image size is achieved.

This approach has proven effective at generating high-quality synthetic faces that are startlingly realistic.

In this post, you will discover the progressive growing generative adversarial network for generating large images.

After reading this post, you will know:

  • GANs are effective at generating sharp images, although they are limited to small image sizes because of model stability.
  • Progressive growing GAN is a stable approach to training GAN models to generate large high-quality images that involves incrementally increasing the size of the model during training.
  • Progressive growing GAN models are capable of generating photorealistic synthetic faces and objects at high resolution that are remarkably realistic.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

A Gentle Introduction to Progressive Growing Generative Adversarial Networks

A Gentle Introduction to Progressive Growing Generative Adversarial Networks
Photo by Sandrine Néel, some rights reserved.

Overview

This tutorial is divided into five parts; they are:

  1. GANs Are Generally Limited to Small Images
  2. Generate Large Images by Progressively Adding Layers
  3. How to Progressively Grow a GAN
  4. Images Generated by the Progressive Growing GAN
  5. How to Configure Progressive Growing GAN Models

GANs Are Generally Limited to Small Images

Generative Adversarial Networks, or GANs for short, are an effective approach for training deep convolutional neural network models for generating synthetic images.

Training a GAN model involves two models: a generator used to output synthetic images, and a discriminator model used to classify images as real or fake, which is used to train the generator model. The two models are trained together in an adversarial manner, seeking an equilibrium.

Compared to other approaches, they are both fast and result in crisp images.

A problem with GANs is that they are limited to small dataset sizes, often a few hundred pixels and often less than 100-pixel square images.

GANs produce sharp images, albeit only in fairly small resolutions and with somewhat limited variation, and the training continues to be unstable despite recent progress.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Generating high-resolution images is believed to be challenging for GAN models as the generator must learn how to output both large structure and fine details at the same time.

The high resolution makes any issues in the fine detail of generated images easy to spot for the discriminator and the training process fails.

The generation of high-resolution images is difficult because higher resolution makes it easier to tell the generated images apart from training images …

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Large images, such as 1024-pixel square images, also require significantly more memory, which is in relatively limited supply on modern GPU hardware compared to main memory.

As such, the batch size that defines the number of images used to update model weights each training iteration must be reduced to ensure that the large images fit into memory. This, in turn, introduces further instability into the training process.

Large resolutions also necessitate using smaller minibatches due to memory constraints, further compromising training stability.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Additionally, the training of GAN models remains unstable, even in the presence of a suite of empirical techniques designed to improve the stability of the model training process.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Generate Large Images by Progressively Adding Layers

A solution to the problem of training stable GAN models for larger images is to progressively increase the number of layers during the training process.

This approach is called Progressive Growing GAN, Progressive GAN, or PGGAN for short.

The approach was proposed by Tero Karras, et al. from Nvidia in the 2017 paper titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation” and presented at the 2018 ICLR conference.

Our primary contribution is a training methodology for GANs where we start with low-resolution images, and then progressively increase the resolution by adding layers to the networks.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Progressive Growing GAN involves using a generator and discriminator model with the same general structure and starting with very small images, such as 4×4 pixels.

During training, new blocks of convolutional layers are systematically added to both the generator model and the discriminator models.

Example of Progressively Adding Layers to Generator and Discriminator Models

Example of Progressively Adding Layers to Generator and Discriminator Models.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

The incremental addition of the layers allows the models to effectively learn coarse-level detail and later learn ever finer detail, both on the generator and discriminator side.

This incremental nature allows the training to first discover large-scale structure of the image distribution and then shift attention to increasingly finer scale detail, instead of having to learn all scales simultaneously.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

This approach allows the generation of large high-quality images, such as 1024×1024 photorealistic faces of celebrities that do not exist.

How to Progressively Grow a GAN

Progressive Growing GAN requires that the capacity of both the generator and discriminator model be expanded by adding layers during the training process.

This is much like the greedy layer-wise training process that was common for developing deep learning neural networks prior to the development of ReLU and Batch Normalization.

For example, see the post:

Unlike greedy layer-wise pretraining, progressive growing GAN involves adding blocks of layers and phasing in the addition of the blocks of layers rather than adding them directly.

When new layers are added to the networks, we fade them in smoothly […] This avoids sudden shocks to the already well-trained, smaller-resolution layers.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Further, all layers remain trainable during the training process, including existing layers when new layers are added.

All existing layers in both networks remain trainable throughout the training process.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The phasing in of a new block of layers involves using a skip connection to connect the new block to the input of the discriminator or output of the generator and adding it to the existing input or output layer with a weighting. The weighting controls the influence of the new block and is achieved using a parameter alpha (a) that starts at zero or a very small number and linearly increases to 1.0 over training iterations.

This is demonstrated in the figure below, taken from the paper.

It shows a generator that outputs a 16×16 image and a discriminator that takes a 16×16 pixel image. The models are grown to the size of 32×32.

Example of Phasing in the Addition of New Layers to the Generator and Discriminator Models

Example of Phasing in the Addition of New Layers to the Generator and Discriminator Models.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

Let’s take a closer look at how to progressively add layers to the generator and discriminator when going from 16×16 to 32×32 pixels.

Growing the Generator

For the generator, this involves adding a new block of convolutional layers that outputs a 32×32 image.

The output of this new layer is combined with the output of the 16×16 layer that is upsampled using nearest neighbor interpolation to 32×32. This is different from many GAN generators that use a transpose convolutional layer.

… doubling […] the image resolution using nearest neighbor filtering

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The contribution of the upsampled 16×16 layer is weighted by (1 – alpha), whereas the contribution of the new 32×32 layer is weighted by alpha.

Alpha is small initially, giving the most weight to the scaled-up version of the 16×16 image, although slowly transitions to giving more weight and then all weight to the new 32×32 output layers over training iterations.

During the transition we treat the layers that operate on the higher resolution like a residual block, whose weight alpha increases linearly from 0 to 1.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Growing the Discriminator

For the discriminator, this involves adding a new block of convolutional layers for the input of the model to support image sizes with 32×32 pixels.

The input image is downsampled to 16×16 using average pooling so that it can pass through the existing 16×16 convolutional layers. The output of the new 32×32 block of layers is also downsampled using average pooling so that it can be provided as input to the existing 16×16 block. This is different from most GAN models that use a 2×2 stride in the convolutional layers to downsample.

… halving the image resolution using […] average pooling

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The two downsampled versions of the input are combined in a weighted manner, starting with a full weighting to the downsampled raw input and linearly transitioning to a full weighting for the interpreted output of the new input layer block.

Images Generated by the Progressive Growing GAN

In this section, we can review some of the impressive results achieved with the Progressive Growing GAN described in the paper.

Many example images are provided in the appendix of the paper and I recommend reviewing it. Additionally, a YouTube video was also created summarizing the impressive results of the model.

Synthetic Photographs of Celebrity Faces

Perhaps the most impressive accomplishment of the Progressive Growing GAN is the generation of large 1024×1024 pixel photorealistic generated faces.

The model was trained on a high-quality version of the celebrity faces dataset, called CELEBA-HQ. As such, the faces look familiar as they contain elements of many real celebrity faces, although none of the people actually exist.

Example of Photorealistic Generated Faces using Progressive Growing GAN

Example of Photorealistic Generated Faces Using Progressive Growing GAN.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

Interestingly, the model required to generate the faces was trained on 8 GPUs for 4 days, perhaps out of the range of most developers.

We trained the network on 8 Tesla V100 GPUs for 4 days, after which we no longer observed qualitative differences between the results of consecutive training iterations. Our implementation used an adaptive minibatch size depending on the current output resolution so that the available memory budget was optimally utilized.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Synthetic Photographs of Objects

The model was also demonstrated on generating 256×256-pixel photorealistic synthetic objects from the LSUN dataset, such as bikes, buses, and churches.

Example of Photorealistic Generated Objects using Progressive Growing GAN

Example of Photorealistic Generated Objects Using Progressive Growing GAN.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

How to Configure Progressive Growing GAN Models

The paper describes the configuration details of the model used to generate the 1024×1024 synthetic photographs of celebrity faces.

Specifically, the details are provided in Appendix A.

Although we may not be interested or have the resources to develop such a large model, the configuration details may be useful when implementing a Progressive Growing GAN.

Both the discriminator and generator models were grown using blocks of convolutional layers, each using a specific number of filters with the size 3×3 and the LeakyReLU activation layer with the slope of 0.2. Upsampling was achieved via nearest neighbor sampling and downsampling was achieved using average pooling.

Both networks consist mainly of replicated 3-layer blocks that we introduce one by one during the course of the training. […] We use leaky ReLU with leakiness 0.2 in all layers of both networks, except for the last layer that uses linear activation.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The generator used a 512-element latent vector of Gaussian random variables. It also used an output layer with a 1×1-sized filters and a linear activation function, instead of the more common hyperbolic tangent activation function (tanh). The discriminator also used an output layer with 1×1-sized filters and a linear activation function.

The Wasserstein GAN loss was used with the gradient penalty, so-called WGAN-GP as described in the 2017 paper titled “Improved Training of Wasserstein GANs.” The least squares loss was tested and showed good results, but not as good as WGAN-GP.

The models start with a 4×4 input image and grow until they reach the 1024×1024 target.

Tables were provided that list the number of layers and number of filters used in each layer for the generator and discriminator models, reproduced below.

Tables Showing Generator and Discriminator Configuration for the Progressive Growing GAN

Tables Showing Generator and Discriminator Configuration for the Progressive Growing GAN.
Taken from: Progressive Growing of GANs for Improved Quality, Stability, and Variation.

Batch normalization is not used; instead, two other techniques are added, including minibatch standard deviation pixel-wise normalization.

The standard deviation of activations across images in the mini-batch is added as a new channel prior to the last block of convolutional layers in the discriminator model. This is referred to as “Minibatch standard deviation.”

We inject the across-minibatch standard deviation as an additional feature map at 4×4 resolution toward the end of the discriminator

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

A pixel-wise normalization is performed in the generator after each convolutional layer that normalizes each pixel value in the activation map across the channels to a unit length. This is a type of activation constraint that is more generally referred to as “local response normalization.”

The bias for all layers is initialized as zero and model weights are initialized as a random Gaussian rescaled using the He weight initialization method.

We initialize all bias parameters to zero and all weights according to the normal distribution with unit variance. However, we scale the weights with a layer-specific constant at runtime …

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

The models are optimized using the Adam version of stochastic gradient descent with a small learning rate and low momentum.

We train the networks using Adam with a = 0.001, B1=0, B2=0.99, and eta = 10^−8.

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Image generation uses a weighted average of prior models rather a given model snapshot, much like a horizontal ensemble.

… visualizing generator output at any given point during the training, we use an exponential running average for the weights of the generator with decay 0.999

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Summary

In this post, you discovered the progressive growing generative adversarial network for generating large images.

Specifically, you learned:

  • GANs are effective at generating sharp images, although they are limited to small image sizes because of model stability.
  • Progressive growing GAN is a stable approach to training GAN models to generate large high-quality images that involves incrementally increasing the size of the model during training.
  • Progressive growing GAN models are capable of generating photorealistic synthetic faces and objects at high resolution that are remarkably realistic.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to the Progressive Growing GAN appeared first on Machine Learning Mastery.

Blog – Machine Learning Mastery

New Research Shows Public Safety Drone Programs Growing Rapidly, Yet Facing New Challenges

DOWNEY, CA – DRONERESPONDERS’ Analyst Greg Crutsinger, Ph.D., presents data at the 2019 UAS DRONES Disaster Conference in Los Angeles. Data-driven insights from non-profit group DRONERESPONDERS provide insights as to how first responders and their agencies are using UAS for public safety MIAMI, FL – As public safety agencies continue to adopt unmanned aircraft systems (UAS) and […]
sUAS News – The Business of Drones

Augmented Reality (AR) Market 2019 Growing at CAGR of +61% by 2025 Including Top Vendors- Google, Osterhout Design, Magic Leap, DAQRI, Catchoom, Infinity Augmented Reality – Market Expert

Augmented Reality (AR) Market 2019 Growing at CAGR of +61% by 2025 Including Top Vendors- Google, Osterhout Design, Magic Leap, DAQRI, Catchoom, Infinity Augmented Reality  Market Expert

Augmented Reality (AR) is an interactive experience of a real-world environment where the objects that reside in the real-world are enhanced by …

“augmented reality” – Google News

Anti-Drone Market Growing at a CAGR of 28.8% and Expected to Reach $2,276 Million by 2024 – Exclusive Report by MarketsandMarkets™ content= – PR Newswire UK

Anti-Drone Market Growing at a CAGR of 28.8% and Expected to Reach $ 2,276 Million by 2024 – Exclusive Report by MarketsandMarkets™ content=  PR Newswire UK

CHICAGO, July 30, 2019 /PRNewswire/ — According to the new market research report on “Anti-Drone Market by Technology (Laser, Kinetic, and Electronics), …

“Sweden drone” – Google News

Anti-Drone Market Growing at a CAGR of 28.8% and Expected to Reach $2,276 Million by 2024 – Exclusive Report by MarketsandMarkets™ – Yahoo Finance

Anti-Drone Market Growing at a CAGR of 28.8% and Expected to Reach $ 2,276 Million by 2024 – Exclusive Report by MarketsandMarkets™  Yahoo Finance

CHICAGO, July 30, 2019 /PRNewswire/ — According to the new market research report on “Anti-Drone Market by Technology (Laser, Kinetic, and Electronics), …

“Sweden drone” – Google News

New Opportunities in Virtual Patient Simulation Market Growing at a CAGR of +19% by 2023, Focusing on top key players like Anesoft Corporation, Bioflight VR, Coburger Lehrmittelanstalt, Deepstream VR, Decision Simulation and others – Market Report Gazette

New Opportunities in Virtual Patient Simulation Market Growing at a CAGR of +19% by 2023, Focusing on top key players like Anesoft Corporation, Bioflight VR, Coburger Lehrmittelanstalt, Deepstream VR, Decision Simulation and others  Market Report Gazette

Market Report Gazette provides latest industry trending news, news *service*, IT & Technology news helps businesses connect with their target audiences in …

“Sweden vr” – Google News

Global Micro Electric Automotive Market Growing Strategical Analysis By 2025 – Global Industry Insight

Global Micro Electric Automotive Market Growing Strategical Analysis By 2025  Global Industry Insight

The Latest business intelligence report on Global Micro Electric Automotive Market released by HTF MI aims to deliver an in-depth outline regarding the trends …

“Sweden ev” – Google News

5G Fixed Wireless Access (FWA) Market Gain Impetus due to the Growing Demand over 2025 – Market Reports

5G Fixed Wireless Access (FWA) Market Gain Impetus due to the Growing Demand over 2025  Market Reports

“Fixed wireless access (FWA) is a type of wireless broadband data communication, which is performed between two fixed locations connected through fixed …

“Sweden 5g” – Google News