CapsNets vs ConvNets in Radio Galaxy Morphology classification

Vesna Lukic
Mar 15, 2021
4 min read

In the previous post we looked at classifying sources from the Radio Galaxy Zoo into compact sources and different classes of extended sources based on the number of components.

We found that the overall high classification metrics obtained were as a result of having the highest number of sources in the simplest morphology class of compact/single-extended sources, and the metrics would become increasingly poorer with the number of components in the radio source.

A drawback of conventional neural networks is the use of pooling, Although it reduces the number of parameters in the network and makes it faster and easier to train, it results in some information loss as the way the features relate to each other globally tends to degrade

This effect would impact the sources having an increasing number of components more.

One can imagine that if we took the top right part of the FRII-type source shown on the left, rotated it and put it back, the morphology is no longer that of an FRII; it is perhaps a hybrid or two independent sources.

However, a conventional convolutional network might be fooled into thinking the source is still an FRII. This is the type of error that can be introduced through the pooling operation. The way the features relate to each other on a global scale tends to degrade.

A way to address this issue is to use Capsule networks, first introduced by Sabour et al. 2017.

They were designed to preserve the hierarchical relationships between features in images. This is achieved through the use of routing, where if the features detected by higher-level capsules agree with those of lower-level capsules, the connection between them is strengthened, whereas if they don't agree the connection is weakened.

Capsule networks attempt to extract all the possible variations that can be made to an image, such that the label stays the same. The example to the right shows variations such as changing the scale and thickness, and skew explored for particular digits in the MNIST dataset,

As such, in Lukic et al. 2019 we used images from the LOFAR LoTSS two-metre sky survey.

Shown to the left is the LOFAR core, found in the Netherlands.

Image credit: "The LOFAR core in the Netherlands" byonsalarymdobservatorium is licensed under CC BY-NC-SA 2.0.

We additionally had cross-identification information of the LOFAR sources with their corresponding optical counterparts. This can help to see which radio components belong to which radio sources.

ree — Examples of radio sources cross-identified with their optical counterparts. Image credit: Williams et al. 2018

We had just under 3000 images that have been classified into Unresolved, FRI and FRII type sources, both in original image format (fits) as well as a 'cleaned' dataset, where emission below 4rms as well as unassociated emission have been removed.

ree — Showing the original fits images to the left, as well as the corresponding 4rms images to the right

More images were generated with the use of image augmentation, such that we had just under 16,000 images at our disposal in total.

The labels were generated automatically by looking at the ratio of the distance from the core of the radio galaxy to the brightest part, compared to the extent of the emission.

ree — FRII image example. Credit: Beatriz Mingo and the LOFAR team

If the brightest part is less than half of the distance compared to the extent of the emission, the source was an FRI, otherwise it was an FRII. By this rule, the example to the left was classified as an FRII,

We experimented with different architectures of Convolutional as well as Capsule networks used both datasets of original FITS images as well as the 4rms ones, and used approximately 80% of the images for training and 20% for validation and testing.

Shown to the right are the two Convolutional architectures explored: Having 4- and 8- convolutional layers respectively. Both had two dense layers after the final pooling layer.

We also explored different architectures of Capsule Networks: a default one (shown below), one where the complexity of the decoder and weight of the loss function is increased, and one where we increased the number of filters and the stride.

ree — Default CapsNet architecture shown, with an example input image and features detected

The source (represented in the input image) in the figure above is most likely an FRI with a bit of unassociated emission to the top right. If we look at some of the features detected by the PrimaryCaps layer, we can see that the unassociated emission tends to be preserved in CapsNet.

We found that all the ConvNet architectures outperformed the CapsNet ones (>93% compared to ~90% in classification metrics respectively.) We also applied transfer learning, which involved the use of a pre-trained network on a problem with many classes and many thousands of images per class, and found that the performance was not significantly better compared to training the ConvNets from scratch.

Both the CapsNet and ConvNet architectures achieved optimal results when the 4rms 'cleaned' dataset was used, meaning that neither architecture could be made completely robust to noise and unassociated radio emission.

It is instructive to look at the image reconstructions made by the CapsNet decoder (where it attempts to reconstruct the original image using the encoded information, based on what it has learnt.)

The first and third row show the original images, the second and fourth row show the corresponding reconstructions. We can see that CapsNet has failed to identify individual components of radio sources; this is particularly evident in the FRII examples. The CapsNet makes its classification based on the fuzziness of the reconstructed sphere in the centre, where unresolved (compact) sources tend to be the most point-like reconstructions, FRIIs tend to be the most diffuse, and FRIs are somewhere in between.

The classification metrics still remain relatively high (~90%) despite the inaccurate reconstructions.

If we also look at the features detected by the ConvNet-8 for the same input image that we showed for the CapsNet, we can see that the unassociated emission tends to be filtered out by the fourth convolutional layer.

Overall, it appears that the CapsNet architectures do not cope as successfully with unassociated features and noise compared to the ConvNet architectures. It is likely that a higher number of original training images are necessary for CapsNet to learn to correctly characterise the individual components of emission belonging to radio sources, as opposed to noise.

Additionally, the pooling operation may be advantageous for the dataset at hand, as it appears to remove unassociated emission and noise, and may also provide more degrees of freedom for the morphology differences that occur within radio galaxy classes.

In the next blog post, I will cover source-finding with Convolutional networks.

CapsNets vs ConvNets in Radio Galaxy Morphology classification

Recent Posts

Comments