On the Structures of Representation for the Robustness of Semantic Segmentation to Input Corruption

Introduction

I wanted to provide some additional resources beyond the above linked paper and code to reproduce our results. Below is mostly the same code provided in the repository—things like prints, form renderings for Google Colab, and rendered tqdm progress bars have been removed. Please be sure to expand the “show code >” buttons to look at the code that generated the corresponding outputs/results.

General Imports and Visualiztion methods

We will be importing some of the required libraries here and build out the machinery for visualizing the results of this work. Of note in colorize_voc_label, is assigning the color white to classes above 20. This is to account for 255 being used to indicate the “ignore” class. For visualize, the assumption is the arguments are torch.Tensor instances that are the outputs of our models.

show code

Segmentation Transforms

Now we build out the joint transform used during training and validation that is compatible with torchvision.datasets.VOCSegmentation. This requires the joint_transforms module provided in the above linked code.

show code

VOC2012

Now to put our visualize method to work, we can look at the first image and label pair in the VOC2012 training set.

show code

png

SBD

Next, we want to append the VOC2012 training set with the “train_noval” subset of SBDataset as provided by torchvision to have a total of 7087 training examples. Notice as we visualize an example from SBDataset that they differ in that the transitions from foreground to background is not bordered by the “ignore” class.

show code

png

Dataloader

Now to bring the datasets together for use in training and validation. If you will be implementing this yourself, note that we used a batch size of 20, which may exceed the memory available on your GPU.

show code
Steps per Epoch: 354

Corruptions

Corruptions to Transform

In order to use the machinery in pytorch we wrapped the corruption methods from the imagenet-c package in the standard pytorch transform interface. ImageNet-C is part of a collection of work on studying the impacts of corruptions by Dan Hendrycks, Thomas Dietterich, and others. More details can be found on the ImageNet-C Repository.

show code
{
'gaussian': 0, 'shot': 1, 
'impulse': 2, 'defocus': 3, 
'glass': 4, 'motion': 5, 
'zoom': 6, 'snow': 7, 
'frost': 8, 'fog': 9, 
'brightness': 10, 'contrast': 11, 
'elastic': 12, 'pixelate': 13, 
'jpeg': 14
}

VOC Corruptions 4,5,6,7

Glass, Motion, Zoom and Snow take a long time for each iteration, so we can gain efficiencies by preprocessing these at all corruption levels. To do so, use the provided script dump_voc_c.py with the desired corruption number and severity.

show code
motion_blur @ 4

png

Visualize Corruptions

Here’s what the different corruption levels look like for a subset of the corruptions.

show code

png

Metrics

Running Confusion Matrix

This will allow us to get metrics while running through with batches during training or in aggregate across the entire validation set.

show code

Experiment

Trainer

Here is a configurable implementation of a semantic segmentation experiment.

show code

Models

Below we prepare three versions of the DeepLab v3+ with ResNet50 backbone: vanilla, Implicit Background Estimation (IBE), and Sigmoid Cross Entropy Implicit Background Estimation (SCrIBE). Since the models are trained from an ImageNet pretrained ResNet50, the appropriate layers are replaced and wrapped in an nn.Module. We then train each model using the previously introduced configurable experiment.

DeepLabV3+

show code

DeepLabV3+IBE

show code

DeepLabV3+ScrIBE

show code

Representation Metrics

Running Logit Tracker

Much like the Running Confusion Matrix, we will also track the logits or pre-softmax model outputs over a run of batched iterations for later analysis.

show code

Run over all Corruptions and Levels

Here we measure the performance of each model for each corruption at each level. This also takes a while, but has some progress saving built in.

show code
Restarting from 14@5
Case 2: Skipping 14@0
Case 2: Skipping 14@1
Case 2: Skipping 14@2
Case 2: Skipping 14@3
Case 2: Skipping 14@4

Run Validation

show code

Dimensionality Analysis

Explained Variance

show code

png

Structural Analysis

show code

png

png

png

show code

png

Qualitative Result Visualizations

Make a list of images from Validation set

show code

Render 100 of them starting at some index

show code

png

Generate Results to Visualize

Here we pick one from the group above and collect outputs for all models and corruptions at 3 levels for visualization. Crop top and crop bottom allow for adjusting the very tall figure.

show code

Visualize that collections

This is the visualization code used to generate a figure in the paper.

show code

png

Performance Comparison Plot

show code

png

Example Videos

Load the video

Here we provide the code that was used to produce the introduction demo that was used in the presentation video. show code

Prepare the corrupting transform

show code

Generate the demo video

Here we equally divide the frames amongst corruptions and processing them through SCrIBE and the baseline models.

show code

Animate

This takes a while. I am sure there is a faster way…

show code
(270, 480, 3) (270, 480, 3)

png

show code

Thank you for making it to the bottom of this post. I hope you will feel more comfortable reproducing our work. Please feel free to contact me with any questions or comments.

Charlie Lehman
Charlie Lehman
PhD Student

My research interests include robustness and explainability of deep vision models.

Related