Light-weight MobileNet for Fast Detection of COVID-19

The machine learning models based on Convolutional Neural Networks (CNNs) can be effectively used for detection and recognition of objects, such as Corona Virus Disease 19 (COVID-19). In particular, the MobileNet and Single Shot multi-box Detector (SSD) have recently been proposed as the machine learning model for object detection. However, there are still some challenges for deployment of such architectures on the embedded devices, due to the limited computational power. Another problem is that the accuracy of the associated machine learning model may be decreased, depending on the number of concerned parameters and layers. This paper proposes a light-weight MobileNet (LMN) architecture that can be used to improve the accuracy of the machine learning model, with a small number of layers and lower computation time, compared to the existing models. By experimentation, we show that the proposed LMN model can be effectively used for detection of COVID-19 virus. The proposed LMN can achieve the accuracy of 98% with the file size of 27.8 Mbits by replacing the standard CNN layers with separable convolutional layers.


INTRODUCTION
Nowadays, the Corona Virus Disease 19  has become a global pandemic issue since January 2019 [1]. Many methods have been developed to diagnose the patients who may get an infection with the virus by using a swab test or rapid test. With the help of the embedded device, the virus can be detected much faster. Furthermore, a variety of object detection methods can be used to obtain faster and more accurate results. The Convolutional Neural Networks (CNNs) have become popular since 2012, and one of the famous CNN models is MobileNet [2]. MobileNet and its variants have so far been proposed, such as SqueezeNet [3], and Inception [4]. To achieve higher accuracies, the models need to be more indepth and more complex. MobileNet does not use the standard convolutions. It instead uses the depth-wise separable convolutions which have a small number of parameters, and MobileNet will be used as a classifier. For object detection, the Single Shot multi-box Detector (SSD) [5] has been widely used because of the simplicity of the architecture. Therefore, the combination of MobileNet and SSD may be used to improve the accuracy of object detection on COVID-19, as shown in the Thin MobileNet [6] with SSD. However, the conventional combination of MobileNet or Thin MobileNet and SSD tends to increase the model size and computation cost, which makes it less efficient to deploy the object detection model on embedded devices. Based on the observations, in this paper, we propose a Lightweight MobileNet (LMN) architecture that can be used to increase the model accuracy and to reduce the model size. The proposed LMN model is based on the ReLU [7] and 2D separable convolution. These techniques are helpful to resize the model up to four times the traditional combination of MobileNet and SSD. In the proposed scheme, the accuracy of object detection will not be dropped, while reducing the networks size by using these techniques. The proposed LMN model can achieve more than 90% accuracy for object detection of COVID-19 viruses. As for the challenge of a small size of dataset for COVID-19 object detection, we employed a proper technique, called the transfer learning from CIFAR-10 [8] dataset to COVID-19 dataset. It is noted that the CIFAR-10 dataset consists of ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The CIFAR-10 dataset can help to fill the knowledge domain gap for the LMN model, and the transfer learning from CIFAR-10 will give the high accuracy of COVID-19 object detection model. The proposed LMN model can be used to effectively identify the COVID-19 virus or its variation by reducing the inference time for object detection. The main contribution of this paper can be summarized as follows: • We proposed a small and efficient model that can run on an embedded device, and the accuracy of our model is about 89% on the CIFAR10 dataset with ten classes as fire sparks on the training; • Our model can detect and classify the COVID- 19 viruses based on the cell images. The proposed LMN model can be used as a pre-trained model for another type of viruses. In the experiments, we use the three kinds of virus dataset [9]: 1) COVID-19, 2) Rotavirus, and 3) Influenza. The proposed model can run smoothly on the small embedded device, such as Raspberry Pi or mobile phones. The proposed model can be used to help the medical staff to check faster and to get higher accuracy even with the small computing devices. This paper is organized as follows. In Related Works Section, we discuss the related works, including MobileNet, Thin MobileNet, Transfer Learning, and SSD. Proposed LMN Architecture Section describes the proposed LMN architecture. Experimentation And Performance Analysis Section discusses the experimentation and the performance analysis of the proposed LMN model with the existing models. In Conclusion Section, we conclude this paper.

RELATED WORKS A. MobileNet
MobileNet [2] has a unique characteristic by using the depth-wise separable convolutions. MobileNet consists of two hyper-parameters: 1) resolution multiplier to decrease the resolution of images, 2) width multiplier to slim the network. Those two hyper-parameters can make the computation cost reduced by eight to nine times, and the model size will get much smaller. It is noted that the original MobileNet consists of 28 layers. Batch Normalization [10] and ReLU are used after the convolutional layer. In addition, MobileNet has the Average Pooling layer, Fully Connected layer, and Softmax classifier. Table 1 shows the architecture of MobileNet with all layers for classification. For object detection, MobileNet gives a good accuracy with SSD combination. MobileNet becomes a classifier accelerator with SSD as an object detection model. The MobileNet architecture consists of 2D convolutional layer and 2D depth-wise separable convolutional layer. Two types of strides are used: Stride 1 (s1) and Stride 2 (s2). Stride represents the element-wise shift displacement of a kernel over an input along a particular axis [11]. Stride 1 will move one filter at a time, and Stride 2 will move two filters at a time. The resolution of the image used on this model is 224 x 224 pixels, where the 2D convolutional layer only receives an image with the resolution 224x224 pixels. The resolution of the input file will be reduced until 1x1 on the last layer.  [6] is a fork of MobileNet. It uses separable convolutions rather than depth-wise separable convolutional layers. It also changes the ReLU non-linear unit into the Drop-Activation function. It introduces Random Erasing [12] to increase the accuracy during the training. Meanwhile, Random Erasing is data augmentation. The system selects a random area on images, then it erases the pixel of that area and substitutes with a random value. Figure 1 shows the difference between MobileNet and Thin MobileNet, in which Thin MobileNet has a simpler block. Thin MobileNet cannot be combined with SSD networks because Random Erasing makes the training more challenging to reach high accuracy.

C. Transfer Learning for Visual Categorization
Transfer Learning for Visual Categorization [13] has the three types of knowledge transfer: 1) source domain features, 2) source domain features, and the corresponding labels, 3) parameters of the learned source domain models. For COVID-19 object detection, we use CIFAR-10 dataset knowledge on the first step. The model can identify the COVID-19 dataset, not from prior knowledge. Transfer learning has the two classes for knowledge transfer: 1) Weights will rank instance-based transfer learning, training sample from source domain on the target domain; 2) Feature-based transfer learning uses a standard feature that transfers information from the source domain to the target area.

D. Single Shot multi-box Detector (SSD)
Single Shot multi-box Detector (SSD) [5] is a first deep neural network for object detector using a bounding box. It has a good accuracy for object detection using the VOC2007 dataset [14]. SSD can improve the speed because it eliminates the subsequent bounding box proposals and the features resampling stage. The core role of SSD is to predict category scores and box offsets. There is some auxiliary structure on the system to provide detections with the following key features: 1) Multi-scale feature maps for detection; 2) Convolutional predictors for prediction; and 3) Default boxes and aspect ratios. SSD comes with Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG-16) [15] as a feature extractor on the top layer, as shown in Figure 2.

E. MobileNet with SSD
The MobileNet with SSD [16] is a combination of object detection and recognition tasks that can be implemented on the autonomous mobile robot with a camera. Figure 3 shows the combination of MobileNet and SSD. This combination can be helpful to increase the accuracy of the object detection. MobileNet can extract the feature on the input images. The original size of the image will be reduced into 300x300 pixels. SSD is used as extra feature layer for object detection. The result of this combination can achieve good accuracy, but it still tends to give the big file size. Drop activation and random erasing cannot run properly on a combination of Thin MobileNet and SSD. Thus, we change the SSD 2D convolution layer into 2D separable convolution. This modification makes the SSD model size much smaller without decreasing the accuracy. We use Tensorflow [17], and Keras framework [18] to build the LMN model. We also choose Adam [19] as an optimizer for the proposed LMN model.

A. LMN Architecture Overview
In the proposed LMN architecture, the training process consists of the three stages, as shown in Figure 4. Stage 1 (LMN without SSD) will train the dataset for classification. Stage 2 (Transfer Learning) is a bridge between LMN without SSD into LMN with SSD. Stage 3 (LMN with SSD) will train the dataset for the object detection model. Based on Figure 4, the four steps to produce an object detection model can be summarized as follows.
1) Training for classification on LMN without SSD using the CIFAR10 dataset.
2) The first transfer learning is divided into the two parts: a) The result will go through LMN with SSD for VOC object detection model. b) The result will go through LMN without SSD for virus classification model. 3) These two models will be implemented on the second transfer learning stage. 4) The last model will feed into LMN with SSD for virus object detection model.

B. LMN without SSD for Object Classification
The LMN without SSD is used for classification. Table 3 shows the 8 layers of the proposed LMN architecture for classification. We note that totally 28 layers are employed in the existing MobileNet architecture, as shown in Table 2 Figure 6 shows the training process of LMN without SSD for object classification. For virus classification, we only need to perform a single transfer learning for the CIFAR10 training dataset. Through this transfer learning process, the LMN without SSD model can produce a virus classification model. This LMN model will be used again on the LMN with SSD training stage. This training process on LMN without SSD is helpful to increase the accuracy of LMN model. Figure 6. Training process on LMN without SSD

C. LMN with SSD for Object Classification
For object detection, we employ the combination of LMN with SSD. The VOC dataset will be used for initilatization of training, and the COVID-19 dataset is used to produce a model for object detection of COVID-19 viruses. We use the LMN structure, as described in the previous section, whereas the SSD networks will be changed by applying the first 2D convolution layer, and then the rest of 2D convolutional layer will be replaced with the 2D separable convolution. This modification is done to reduce the file size after combination of LMN with SSD networks. Figure 7 shows the combination of LMN with SSD on the streamed-line neural network, in which LMN acts as a classifier to increase the training process time and for identification of the object. The associated results will feed to the SSD networks for object detection process, which will produce the output on the image.

EXPERIMENTATION AND PERFORMANCE ANALYSIS A. Experimentations for COVID-19 Virus Classification and Detection
Based on the proposed LMN architecture, we performed the two kinds of experimentations for COVID-19 viruses: classification and detection. Virus Classification is a step taken after the transfer learning stage on cross-domain knowledge transfer due to a limited dataset. CIFAR-10 dataset is very helpful to continue the training by using the transfer learning. After the LMN model has obtained enough knowledge to do classification, we feed the virus dataset into the LMN network so as to classify three kinds of viruses; influenza, rotavirus, and COVID-19. We set the rotation on 90°, 180°, and backflip on each picture with Adam as an optimizer and the default learning rate for our training. Figure 8 shows the virus image that we will use as dataset, in which we choose only three classes to test how much good our model is in identifying the three type of the viruses with small dataset.   Figure 9 shows the object (virus) detection results for the existing MobileNet and proposed LMN architectures. In the figure, we see that the proposed LMN architecture can give higher accuracy for COVID-19 object detectors than the existing MobileNet architecture. For performance analysis, we first trained the model from scratch by using the CIFAR-10 dataset which consist ten class, 50,000 train images and 10,000 test images without transfer learning technique. By experimentation, we have trained the model using Adam as optimizer, and 200 epochs on each 1563 steps. Figure 10 shows the accuracy and loss after LMN model has been trained. From the figure, we see that the proposed LMN architecture gives the accuracy of 86%, whereas the existing MobileNet architecture provides the accuracy of 84.3%. We can also see that the proposed LMN architecture provides lower losses than the existing MobileNet architecture. It seems that this performance benefit comes from the elaborately designed LMN and SSD with a small file size.  2) LMN Architectures without SSD using CIFAR-10 Dataset We now analyze the performance of the proposed LMN without SSD for COVID-19 dataset. In this section, the LMN architecture was implemented by using the transfer learning method for the COVID-19 dataset. Table 6 compares the performance with the COVID-19 dataset for MobileNet and LMN. Note that Thin MobileNet cannot be applied to the COVID-19 dataset, since it fails to apply transfer learning. Figure 11 shows the accuracy and loss after the LMN with SSD model has been trained. From the table and figure, we see that the proposed LMN architecture gives the accuracy of 98%, whereas the existing MobileNet architecture provides the accuracy of 94%. Overall, the proposed LMN architecture provides higher accuracy and lower losses, with smaller file sizes, than the existing MobileNet architecture. a. Validation Accuracy b. Validation Loss Figure 11. Validation accuracy and loss for LMN with SSD 3) LMN Architecture with SSD using COVID-19 Dataset In this section, we analyze the object detection performance of the proposed LMN architecture with SSD networks by using the COVID-19 dataset. It is noted that the transfer learning model gives a big impact on the object detection training accuracy. In this paper, we proposed a light-weight MobileNet (LMN) architecture for detection of COVID-19 viruses, which can be used to improve the accuracy of the machine learning model, with a small file size. The proposed architecture is featured by the replacement of the standard CNN layer with separable convolutional layers, differently from the existing MobileNet and SSD schemes. From experimentations, we see that the proposed LMN can achieve higher accuracy with smaller file size and smaller number of parameters, compared with the existing MobileNet and Thin MobileNet architectures. In the proposed LMN model, it is also possible to use the transfer learning method, which can be useful for a small dataset to train a model. For future study, the proposed LMN model needs to be implemented on real embedded devices.

CONCLUSION
In this paper, we proposed a light-weight MobileNet (LMN) architecture for detection of COVID-19 viruses, which can be used to improve the accuracy of the machine learning model, with a small file size. The proposed architecture is featured by the replacement of the standard CNN layer with separable convolutional layers, differently from the existing MobileNet and SSD schemes. From experimentations, we see that the proposed LMN can achieve higher accuracy with smaller file size and smaller number of parameters, compared with the existing MobileNet and Thin MobileNet architectures. In the proposed LMN model, it is also possible to use the transfer learning method, which can be useful for a small dataset to train a model. For future study, the proposed LMN model needs to be implemented on real embedded devices. With the small file size on LMN with SSD can reach up to 6 Mb, and 1,442,004 no. of parameters.