Roxxcloud

Cloud and Tech News

# Unique deep learning-based image super-resolution for low-field MRI brain images

### Background

Either a low resolution 2D image $$\mathbf y$$our goal is to acquire its high resolution counterpart $$\mathbf x$$. The relation between $$\mathbf x$$ and $$\mathbf y$$ can be modeled as follows:

\beginaligned \mathbf y= \varvec\mathscr F_LR^-1\mathbf D\varvec\ mathscr F_HR\mathbf x+ \mathbf n, \endaligned

(1)

or $$\varvec\mathscr F_LR^-1$$ is the inverse FFT operator applied in the LR regime, $$\mathbf D$$ is an operator that selects only low-frequency components in k-space, which in our case gives a matrix of size $$64\times 64$$, $$\varvec\mathscr F_HR$$ is the FFT operator for the HR regime ($$128\times 128$$) and $$\mathbf n$$ is an (unknown) noise vector. The goal of super-resolution is to find an approximate inverse of the operator $$\varvec\mathscr F_LR^-1\mathbf D{\varvec{\mathscr F}}_HR$$. Note that the standard super-resolution problem is generally posed differently, i.e. the HR image is assumed to undergo blurring and downsampling, resulting in an LR image. However, using Eq. (1) more accurately follows the low-resolution MRI acquisition process.

### Convolutional Neural Network

We have chosen a convolutional neural network of the SRDenseNet architecture for our application. Our choice was motivated by the good performance of SRDenseNet, combined with its manageable number of parameters. We note that since there is a large literature on deep learning-based methods for super-resolution, other networks may also be applicable.24,26,27. SRDenseNet was introduced by Tong et al.25 It consists of blocks of densely connected convolutional layers (“dense blocks”). In each dense block, which is consistent with the DenseNet architecture33, each convolutional layer receives as input the concatenated outputs of all previous convolutional layers, as shown in Fig. 1. By reusing feature maps in this way, learning redundant features is avoided. Instead, the current layer is forced to learn additional information. As in the original article, we will use 8 dense blocks of 8 convolutional layers each, where each convolutional layer produces 16 feature maps, which means each dense block produces 128 feature maps. In each convolutional layer, the kernel size is 3×3. After the final dense block, a bottleneck layer with convolutional kernels of size 1×1 is used to reduce the number of feature maps to 256, followed by a transposed convolutional layer (often called a deconvolution layer) which oversamples the image in HR space. Note that in this work the oversampling factor is 2 and so we only use a single transposed convolutional layer with a stride of 2, as opposed to the 2 transposed convolutional layers in the original SRDenseNet which was used for a oversampling factor of 4 Finally, another convolutional layer with a 3×3 kernel is applied to reduce the output to a single channel. All layers except the final convolutional layer use a non-linear Rectified Linear Unit (ReLU) activation function. Additionally, hop connections are used to feed the output of each dense block to each of the following dense blocks, according to the SRDenseNet_All architecture shown in the original article.25. The complete architecture, which has 1,910,689 trainable parameters, is shown in Fig. 2.

### Data set and training

In this work, we focused on 2D images, but it should be noted that this approach can be extended to 3D. We generated a training and validation set using 2D images obtained from the public NYU fastMRI Initiative database (fastmri.med.nyu.edu)34.35. As such, NYU fastMRI investigators provided data but did not participate in the analysis or writing of this manuscript. A list of NYU fastMRI investigators, subject to updates, can be viewed at the aforementioned website. The main goal of fastMRI is to test whether machine learning can help in the reconstruction of medical images. The database consists of slices of T1-weighted, T2-weighted and FLAIR (fluid-attenuated inversion recovery) images acquired using 1.5 T and 3 T MRI scanners. such a variety of MR brain images, the resulting network should also be applicable to images acquired using different sequences, without the need to retrain the network each time the parameters change. We note that, even if we plan to apply the trained grating only, for example, to T1-weighted low-field MRI images, it would still be wise to train the grating on high-field MRI images acquired using of different types of sequences. , which makes it adaptable to different types of inputs. The reason for this is that relaxation times vary with field strength and therefore a T1-weighted image acquired using a low-field scanner may look different from one acquired using a a high-field scanner. One parameter to be careful with, however, is the image size. We will use input images and output images of size $$64\times 64$$ and $$128\times 128$$, respectively. Due to the purely convolutional nature of the network, it is possible to use images of different sizes as input. The network must be able to adapt to small deviations in size. However, it is unlikely to generalize to images that differ significantly in size from images in the training set.

Database images have different sizes. As we are interested in HR images of $$128\times 128$$ pixels, all images have been resized to $$128\times 128$$ pixels. This was done by using an FFT to convert the images to k-space data, selecting the central part of k-space and then applying an inverse fast Fourier transform (FFT), as in the equation. (1). After that, we downsample these HR images to LR images of $$64\times 64$$ pixels, again using Eq. (1), i.e., we use an FFT to convert the image to k-space, select the central part of k-space (of size $$64\times 64$$) and apply an inverse FFT to obtain an LR image. To obtain noisy LR images, we add a complex Gaussian noise in k-space, the noise level varying from one image to another. We used a range of noise levels consistent with the low field MR images we have seen in practice. This step is necessary to generalize the convolutional neural network to images acquired using a low-field MRI scanner, which, due to the weaker magnetic field, produces signals with a relatively low SNR.36. In this way, 29,059 and 17,292 image pairs were obtained from the training and validation sets provided in the dataset, respectively. We assigned 10,000 of the 17,292 image pairs in the validation set to our own validation set and the remaining 7,292 to our test set. Some examples of image pairs present in the training set are shown in Fig. 3. We note that the data was split at the patient level and therefore no data leaks occurred.

Since SRDenseNet is a purely convolutional neural network, it is possible to train on patches instead of full images, which requires less memory during training. Also, using patches allows us to generate more data. Therefore, we used the HR-LR image pairs to create 190,000 patch pairs for network training and 10,000 patch pairs for validation, with the HR and their corresponding LR patches having a size of $$32\times 32$$ pixels and $$16\times 16$$ pixels, respectively.

The convolutional neural network has been implemented in TensorFlow37. The Adam Optimizer38 with a learning rate of $$10^-3$$ was used to minimize the root mean square error loss between the network output and the model HR image patches. In addition, we studied two different loss functions: $$\ell _1$$– loss and HFEN loss (high frequency error standard)39. However, upon visual inspection of the resulting images, we found that the root mean square error loss outperformed the others. We used a batch size of 20 and a total number of epochs of 74 because this corresponded to the smallest value of the validation loss. The training was performed on a Titan X Geforce GPU (12GB) and lasted around 5 hours.

### Acquisition of low-field MRI images

Two in vivo three-dimensional scans of the brain of two healthy volunteers were acquired using the low-field MRI scanner described by O’Reilly et al.9 We will use different (2D) slices of the resulting 3D images as network input. Both experiments were performed using a turbo spin echo sequence. For the first experiment, the following parameters were used: FoV (field of view) $$224\times 224\times 175$$ $$\hbox mm^3$$voxel size $$1.75 \times 1.75 \times 3.5$$ $$\hbox mm^3$$, $$T_R$$/$$YOU$$ (repetition time/echo time) = 500 ms/20 ms, echo train length 4, acquisition bandwidth 20 kHz, no signal averaging, k-space cylindrical coverage. The second experiment was performed with a different set of parameters: FoV $$180\times 240 \times 180$$ $$\hbox mm^3$$, $$1.5 \times 1.5 \times 3$$ $$\hbox mm^3$$, $$T_R$$/$$YOU$$ = 400 ms/20 ms, echo train length 5, acquisition bandwidth 20 kHz, no signal averaging. All methods were performed in accordance with current guidelines and regulations.