Yolo training my dataset

config:

# run my model in windows
      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.09       Driver Version: 430.09       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   52C    P0    N/A /  N/A |      0MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Pre:

Yolo running time test:

// cpu yolov3
xxxdeMacBook-Pro:darknet xxx$ ./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg 
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32  0.639 BFLOPs
    ...
  106 yolo
Loading weights from yolov3.weights...Done!
data/dog.jpg: Predicted in 18.616094 seconds.
bicycle: 99%
truck: 92%
dog: 100%
// cpu yolov2
xxxdeMacBook-Pro:darknet xxx$ ./darknet detect cfg/yolov2.cfg yolov2.weights data/dog.jpg 
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32  0.639 BFLOPs
   ...
   31 detection
mask_scale: Using default '1.000000'
Loading weights from yolov2.weights...Done!
data/dog.jpg: Predicted in 8.660725 seconds.
dog: 82%
truck: 64%
bicycle: 85%
// gpu yolov2, in another computer, but I didn't try cpu version
// at least 100x faster than cpu version
mei@mei-luo:~/Desktop/cv/darknet$ ./darknet detect cfg/yolov2.cfg yolov2.weights 
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32  0.639 BFLOPs
    ...
   31 detection
mask_scale: Using default '1.000000'
Loading weights from yolov2.weights...Done!
Enter Image Path: data/person.jpg
data/person.jpg: Predicted in 0.122470 seconds.
horse: 82%
dog: 86%
person: 86%

1、Dataset

The VOC dataset directory structure is as follows：

├──VOC2019
  ├── Annotations
  ├── ImageSets
    ├── Layout
    ├── Main
    └── Segmentation
  └──  JPEGImages

1.1 Annotations

Store the xml file in VOC format, each xml corresponds to an image, and each xml stores the location and category information of each target of the tag, and the naming is usually the same as the corresponding original image.

1.2 ImageSets

ImageSets we only need to use the Main folder, which is stored in some text files, usually train.txt, test.txt, etc. The content of the text file is the name of the image that needs to be trained or tested (no suffix) No path)

1.3 JPEGImages

In the JPEGImages folder, put the original image we have named according to the uniform rules.

Take the data set about the instant noodles that I collected this time as an example.

Source: Baidu, Jingdong, personal shooting

├── JPEGImages
  ├── class_LR
  	├── LR00001.jpg
  	├── ...
  ├── class_XL
  	├── XL00001.jpg
  	├── ...
  ├── rename_voc.py
  ├── resize_pic.py
  └── voc_spider.py

The image naming format is <label_name><num(len=5)>.jpg，eg：LR00001.jpg

1.3.1 voc_spider.py

It can be compiled directly, the captured image is stored in the current directory. The picture naming format is <label_name><num(len=5)>.jpg, for example: LR00001.jpg.

1.3.2 resize_pic.py

Use to reduce the size of the image, because the network needs to normalize the image, the default is to process the size of the square so you only need to enter a number to process all the images in the directory (recursive). The required image size should match the size in yolo.cfg.

# yolov2.cfg
[net]
# Testing
batch=1
subdivisions=1
...
width=608   !!!
height=608  !!!
...

1.3.3 rename_voc.py

Look for the file named class_<label_name> under the folder and rename it according to the serial number. This is mainly for the uniform processing of the photos taken by myself.

1.4 Conclusion

Recommended steps for VOC dataset construction:

Create a folder with the directory structure as above. It is recommended to name it VOC+year, so you can use the provided voc_label.py directly.
Create a folder for each class under JPEGImages. I mainly want to distinguish it, and it doesn’t matter.
Use the py file above to get the image
Finally all the pictures are in the JPEGImages folder and no longer distinguish them. It is recommended to do this step first! The xml folder that is not in the future needs to change the path, no trouble, but still a few more steps.

2、Marking the image target area

Use the tool labelImg from https://github.com/tzutalin/labelImg.

# install lableImg in Mac OS X 10.14
pip3 install pyqt5 lxml # Install qt and lxml by pip

make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]

Standard format of *.xml file in VOC2019/Annotations/:

<annotation>
	<folder>JPEGImages</folder>
	<filename>LR00001.jpg</filename>
	<path>/home/mei/Desktop/cv/darknet/scripts/VOCdevkit/NOODLE2019/JPEGImages/LR00001.jpg</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>608</width>
		<height>608</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>luxiang</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>393</xmin>
			<ymin>304</ymin>
			<xmax>583</xmax>
			<ymax>465</ymax>
		</bndbox>
	</object>
</annotation>

Some parts you may need to modify:

folder

The folder is the placement position of the image when the model is finally run. The command runs under ubuntu.
1
find -name '*.xml' |xargs perl -pi -e 's|<folder>class_LR|<folder>JPEGImages|g'

filename

The filename that may be exported is without a file suffix.

1	find -name '*.xml' \|xargs perl -pi -e 's\|</filename>\|.jpg</filename>\|g'

Path

Path is the absolute path of the image on the running model (because I need to port to another computer)
1
find -name '*.xml' |xargs perl -pi -e 's|<path>$(your_original_pic_path)|<path>$(your_final_pic_path)|g'
width, height

Sometimes using the tool labelImg under Windows, there may be a size where the width and height are 0. If the image size has been normalized before the label, then the following two lines can be used to modify the value of 0.
1
2
find -name '*.xml' |xargs perl -pi -e 's|0</width>|608</width>|g'
find -name '*.xml' |xargs perl -pi -e 's|0</height>|608</height>|g'

3、Training YOLO on VOC

3.1 Generate Labels for VOC

You should install and make sure it can run successfully in you computer. Make a directory named VOCdevkit after darknet/scripts, copy your VOC2019 file in it.

Now we need to generate the label files that Darknet uses. Darknet wants a .txt file for each image with a line for each ground truth object in the image that looks like:

1	<object-class> <x> <y> <width> <height>

Where x, y, width, and height are relative to the image’s width and height. To generate these file we will run the voc_label.py script in Darknet’s scripts/ directory.

First we should modify the voc_label.py cause we don’t use the original VOC dataset.

Here mainly modify the data set name, and category information, mine is VOC2019, and all samples are used for training, there are two types of targets, so set as follows:

...
 
#sets=[('2012', 'train'), ('2012', 'val'), ('2007', 'train'), ('2007', 'val'), ('2007', 'test')]
 
#classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
 
sets=[('2019', 'train')]
classes = [ "luxiang", "xiangla"]
 
...
 
def convert_annotation(year, image_id):
    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id)) 
    #（if you don't use the name 'VOC', modify this）
    out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')  
    #（as above）
    ...
    
for year, image_set in sets:
    if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
        os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
    image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
        convert_annotation(year, image_id)
    list_file.close()

After modifying it, run the command in this directory: python voc_label.py, then generate the folder labels in the folder script/VOCdevki/VOC2019:

├──VOC2019
   ├── labels
   ├── LR00000.txt
   ├── LR00091.txt
     ├── ...

At the same time, the 2019_train.txt file should be generated under scripts/, which contains the absolute path of all training samples.

❗️❗️❗️ Note that the train.txt file is also generated under the darknet/ folder at this time. If there is, check if it is empty. If it is empty, move the contents of 2019_train.txt. Go to train.txt.

3.2 Modify Cfg for Pascal Data

Now go to your Darknet directory. We have to change the cfg/voc.data config file to point to your data:

classes= 2
train  = <path-to-voc>/train.txt
// valid  = <path-to-voc>2007_test.txt
names = data/voc.names
backup = backup

You should replace <path-to-voc> with the directory where you put the VOC data.

3.3 Download Pretrained Convolutional Weights

For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the darknet53 model. You can just download the weights for the convolutional layers here (76 MB).

1	wget https://pjreddie.com/media/files/darknet53.conv.74

3.4 Train The Model

Now we can train! Run the command:

1	./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

If you want to use multiple gpus run:

1	./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74 -gpus 0,1,2,3

If you want to stop and restart training from a checkpoint:

1	./darknet detector train cfg/coco.data cfg/yolov3.cfg backup/yolov3.backup -gpus 0,1,2,3