OBJECT DETECTION ON SIIM COVID DATASET

kaushal talapady
10 min readAug 22, 2021

COVID -19 has ravaged the world for past 2 years while causing several deaths all over the world along with a financial damage of billions of dollars.

The present testing mechanism for COVID is polymerase chain reaction is quiet efficient but has a major draw back of time taken to obtain result and cost of it.

So SIIM organized the a Kaggle competition in order to improve the testing process using X-ray you can have more info by going through the link

https://www.kaggle.com/c/siim-covid19-detection

PROBLEM STATEMENT

The problem we need to solve is to detect weather there is an infection of COVID (there are 4 types of infections) and also need to locate the infection in X-ray images.

In this blog we are going to the focus on the location of the infection.

We can infer this problem as an object detection one since we want locate a particular pattern in the image.

Data Exploration

The dataset we are given has 6000 images of 4 classes, all in the format of .dicom which we need to convert to either png or jpg format, i have chosen to convert it to jpg format

def read_xray(path, voi_lut = True, fix_monochrome = True):
# Original from: https://www.kaggle.com/raddar/convert-dicom-to-np-array-the-correct-way
dicom = pydicom.read_file(path)

# VOI LUT (if available by DICOM device) is used to transform raw DICOM data to
# "human-friendly" view
if voi_lut:
data = apply_voi_lut(dicom.pixel_array, dicom)
else:
data = dicom.pixel_array

# depending on this value, X-ray may look inverted - fix that:
if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
data = np.amax(data) - data

data = data - np.min(data)
data = data / np.max(data)
data = (data * 255).astype(np.uint8)

return data

The above code processes the dicom image which then can be used to be saved as .jpg file.

Along with the images dataset also has two csv files which gives information about the images.

The first csv file is called study level image it has the data about the class of the image as show in the image below

Second is the image level data csv which contains information about objects and the types

Machine Learning Problem

The kaggle problem has both classification and object detection , To solve this problem my approach will using Efficientnet and different combinations of object detection algorithm along with it.

Models Used

YOLO V5

YOLO algorithm is most popular object detection algorithm, because of its simplicity i.e. the architecture has only a single neural network which predicts bounding boxes and class of image, and the entire image is fed to the network instead of patches of it, all these makes YOLO very accurate and fast.

The above diagram shows the architecture of the yolo, with input size of the image 256.

For this project I have used yolov5 model from ultralytics, you refer it from below link https://github.com/ultralytics/yolov5

First step in the process of training the yolo model is to change the format of the bbox parameter i.e. convert it to yolo format, The yolo box format can be shown as X,Y,W,H where X and Y are top right hand corner points and H and W are width and height points, another thing to remember is that all the values are normalized by dividing it with size of the image hence all values of X,Y,H,W are in range (0,1). This is the primary difference between YOLO and COCO formats.

After the conversion to the yolo format is done we need to create directories in the yolo format after splitting data into train and test sets.

import yaml
i=0
for train_split,test_split in kf.split(train_csv_org):
if i != 1:
i=i+1
continue
train= train_csv_org.iloc[train_split]
test= train_csv_org.iloc[test_split]
train['split']='train'
test['split']='test'
train_csv=pd.concat([train,test],ignore_index=True)
joined_meta=pd.merge(train_csv,new_meta,on='id')
fold='fold_'+str(i)
os.mkdir(fold)
os.chdir(fold)
os.mkdir('data')
os.mkdir('data/images')
os.mkdir('data/images/train')
os.mkdir('data/images/valid')
os.mkdir('data/labels')
os.mkdir('data/labels/train')
os.mkdir('data/labels/valid')
os.system('git clone https://github.com/ultralytics/yolov5')
os.chdir('yolov5')
os.system('pip install -qr requirements.txt')
os.chdir('..')

print(os.getcwd())
for k in range(len(train_csv)):
row = train_csv.iloc[k]
if row['split']=='train':
copyfile('../train/'+row['id']+'.jpg','data/images/train/'+row['id']+'.jpg')
else:
copyfile('../train/'+row['id']+'.jpg','data/images/valid/'+row['id']+'.jpg')

data_yaml = dict(
train = '../data/images/train',
val = '../data/images/valid',
nc = 2,
names = ['none', 'opacity']
)
print(os.getcwd())
with open('yolov5/data/data.yaml', 'w') as outfile:
yaml.dump(data_yaml, outfile, default_flow_style=True)

IMG_SIZE=680
for j in tqdm(range(len(joined_meta))):
row = joined_meta.loc[j]
# Get image id
img_id = row.id
# Get split
split = row.split
# Get image-level label
if row.split=='train':
file_name = f'data/labels/train/{row.id}.txt'
else:
file_name = f'data/labels/valid/{row.id}.txt'


if row.label.split()[0]=='opacity':
# Get bboxes
bboxes = get_bbox(row)
# Scale bounding boxes
scale_bboxes = scale_bbox(row, bboxes)
# Format for YOLOv5
yolo_bboxes = get_yolo_format_bbox(IMG_SIZE, IMG_SIZE, scale_bboxes)
with open(file_name, 'w') as f:
for bbox in yolo_bboxes:
bbox = [1]+bbox
bbox = [str(i) for i in bbox]
bbox = ' '.join(bbox)
f.write(bbox)
f.write('\n')
os.chdir('yolov5')
print(os.getcwd())
!python train.py --img 704 \
--batch 20 \
--epochs 20 \
--data data.yaml \
--weights yolov5s.pt \
--save_period 1\
--project kaggle
i=i+1
os.chdir('..')
os.chdir('..')

The above code shows the entire process of creating the directory for images of train and test and its annotations and training the data using train.py of the yolo v5 library

Faster R CNN using detectron2

Detectron2 is framework by facebook AI lab which provides various models like R CNN for various image related problems like object detection, segmentation etc.

We can select the required model all of which is pretrained in the coco dataset from the model zoo of the detectron 2 framework.

I have used Faster RCNN from the frame work for the object detection task I had in the kaggle problem

I will discuss the implementation of model but first i would like to give brief introduction into RCNN models .

The RCNN was the one of earliest CNN based model to have great success in the realm of the object detection.

The R CNN model has 2 stage first is to generating regions by using selective search algorithm and passed to CNN network which predicts the class of the object and also bbox points.

The Fast RCNN is an upgrade on the previous one instead of passing various region proposals to CNN we pass image to CNN initially and generate various region on feature map output of cnn.

And finally faster CNN improves the performance over fast RCNN by using region proposal network which replaces selective search algorithm.

from detectron2.structures import BoxMode
from detectron2 import model_zoo
from detectron2.config import get_cfg
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.engine import DefaultPredictor, DefaultTrainer, launch
from detectron2.evaluation import COCOEvaluator
from detectron2.structures import BoxMode
from detectron2.utils.visualizer import ColorMode
from detectron2.utils.logger import setup_logger
from detectron2.utils.visualizer import Visualizer

The first step of creating the RCNN using detectron frame work is to import classes required to load and train it for example model zoo and DatasetCatalog is used to load model and to create training data respectively

After importing required classes we need to set up the data to be trained, this is done by generating a list of dictionary containing all the information about the data like image name objects and bbox parameters etc.

This is shown in below code

def get_dict(xyz):
dataset_dicts=[]
for i in range(len(train)):
row = train.iloc[i]
record = {}
record['file_name']='train/'+row['id'].split('_')[0]+'.jpg'
record['image_id']=row['id'].split('_')[0]
x,y=get_size(row['id'].split('_')[0])
record['width']=670
record['height']=670
resized_height=670
resized_width=670
if row['label'].split()[0]=='opacity':
objs=[]
if len(row['label'].split())>1:
for j in row['label'].split('opacity'):
h_ratio = resized_height / y
w_ratio = resized_width / x
l=j.split()
if len(l)>1:
l=l[1:]
l=[float(k) for k in l ]
l[0]=l[0]*w_ratio
l[2]=l[2]*w_ratio
l[1]=l[1]*h_ratio
l[3]=l[3]*h_ratio
obj = {
"bbox": l,
"bbox_mode": BoxMode.XYXY_ABS,
"category_id": 0,
}
objs.append(obj)
record["annotations"] = objs
dataset_dicts.append(record)
return dataset_dicts

After generating the list of dict the dataset is setup then the model is trained after importing cfg file and weights of the a given model as shown in the below code.

cfg = get_cfg()
config_name = "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
#config_name = "COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"
#config_name = "COCO-Detection/faster_rcnn_R_101_C4_3x.yaml"

cfg.merge_from_file(model_zoo.get_config_file(config_name))

cfg.DATASETS.TRAIN = ("train")
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
#cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(config_name)
cfg.MODEL.WEIGHTS="../input/1siim-covid19-detectron2-weights/output/model_final.pth"


cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.00025

cfg.SOLVER.WARMUP_ITERS = 1000
cfg.SOLVER.MAX_ITER = 1000 #adjust up if val mAP is still rising, adjust down if overfit
#cfg.SOLVER.STEPS = (100, 500) # must be less than MAX_ITER
#cfg.SOLVER.GAMMA = 0.05


cfg.SOLVER.CHECKPOINT_PERIOD = 100000 # Small value=Frequent save need a lot of storage.
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1


os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)


#Training using custom trainer defined above
#trainer = AugTrainer(cfg)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

EfficientDet

Efficientnet is object detection model created by google brain, and which outperforms yolov3 on its initial release.

The efficientdet is a model is object detection model which has efficientnet as backbone of the model and also a special type of feature pyramid network called BI-FPN

The major innovations in the efficientdet are

  1. Bi-FPN
  2. new computational Scaling techniques

FPN is an important technique by which we can achieve scale invariability in the field of computer vision, The Bi-FPN further improves the performance.

The above diagram shows different types like PANet,NAS-FPN and BiFPN

The architecture of the entire network is shown below

The first part is backbone then, Bi-FPN and prediction part ( class and box )

The implementation is done by py-torch the snippets of which are shown below

from effdet import create_model_from_config, get_efficientdet_config
device = 'cuda'
def get_net():
config = get_efficientdet_config('tf_efficientdet_d6')

config.image_size = [img_size,img_size]
config.norm_kwargs=dict(eps=.001, momentum=.01)

net = EfficientDet(config, pretrained_backbone=False)
checkpoint = torch.load('/kaggle/input/efficientdet-init-weights/efficientdet_d6-51cb0132.pth')
net.load_state_dict(checkpoint)

# we have only one class - opacity
net.reset_head(num_classes=1)
net.class_net = HeadNet(config, num_outputs=config.num_classes)

return DetBenchTrain(net, config)

Cascade R-CNN using MMdetect

Just like detectron2 MMdetect is an framework which is offers a model zoo from which we choose different models from.

We can get more information from the github repository of mmdetect

The process of training an mmdetect model can split into follwing steps

  1. Select model config file of the required model from the model zoo and load weights
  2. Set up data configuration for training like wheather data is in COCO,Pascal-Voc or some other format
  3. Give the path of annotation files for both train and test
  4. Set up training parameters like learning rate,number of epochs etc
  5. train the model

Now that I have briefly discussed about MMdetect we will move on to cascade RCNN

The major innovation of cascade RCNN is instead of iterative using same head during inference like previously done improve the accuracy of the bbox which uses same head, we cascade multiple prediction heads one after the other each of them designed for specific IOU threshold.

From the experiment's we can clearly see that the mAP is much better then Faster R-CNN

The first step in implementing cascade-Rcnn is to load config file of model and weights and defining the object classes

from mmcv import Config
cfg = Config.fromfile('mmdetection/configs/cascade_rcnn/cascade_rcnn_x101_64x4d_fpn_20e_coco.py')
cfg.dataset_type = 'CocoDataset'
cfg.classes = ("Covid_Abnormality",)
for head in cfg.model.roi_head.bbox_head:
head.num_classes = 1

then load train,test and valid images and annotations

job_folder='working_dir'
if not os.path.exists(job_folder):
os.makedirs(job_folder)
cfg.dataset_type = 'CocoDataset'
cfg.data.train.type = 'CocoDataset'
cfg.data.train.img_prefix = 'train'
cfg.data.train.classes = cfg.classes
cfg.data.train.ann_file = '../input/covid-json-dataset/json/coco_768x768/train_annotations_fold0_768x768.json'
cfg.work_dir = job_folder
cfg.data.val.img_prefix = 'train' # Prefix of image path
cfg.data.val.classes = cfg.classes
cfg.data.val.ann_file = f'../input/covid-json-dataset/json/coco_768x768/val_annotations_fold0_768x768.json'
cfg.data.val.type='CocoDataset'

cfg.data.test.img_prefix = 'train' # Prefix of image path
cfg.data.test.classes = cfg.classes
cfg.data.test.ann_file = f'../input/covid-json-dataset/json/coco_768x768/val_annotations_fold0_768x768.json'
cfg.data.test.type='CocoDataset'

Set up taining parameters

cfg.runner.max_epochs = 12
cfg.total_epochs = 12
cfg.optimizer.lr = 0.02 / 8
cfg.lr_config.warmup = None
cfg.log_config.interval = 600

Then load weights and train model

model = build_detector(cfg.model)
datasets = [build_dataset(cfg.data.train)]
model.init_weights()
cfg.checkpoint_config.interval = 1
cfg.data.samples_per_gpu = 4 # Batch size of a single GPU used in testing
cfg.data.workers_per_gpu = 2
train_detector(model, datasets[0], cfg, distributed=False, validate=True)

Modeling process

For the classification task of the kaggle problem I have used Efficientnet(effnet) as the model since it gives state of the art accuracy in most classification tasks.

And the effnet is used alongside all the object detection model i have discussed above.

First model i submitted was efficient net and yolov5 which produced score of 0.598 after some expermentation the score was improved to 0.601

Second i tried using Faster RCNN along with effnet but score was some where 0.562

Next i went on try effdet which did improve the score but still much less than the yolo

Finally i used cascade RCNN with effnet through which I achieved my best score 0.614

And the score is in top 30% of the participants at end of the competition

The models and scores are given above

Future improvements

The scores can be further improved by applying image transforms like histogram equalization,CLAHE etc

We can also try to pretrain the model on similar data to improve performance

check the training files in github link:

https://github.com/kaushaltalapady/SIIM-covid

Refrences

--

--