This article is based on the great tutorial here on how to train and detect custom objects with Tensorflow. I also referred to the official documentations here and here for running Tensorflow model building locally. It was my first custom detection project and I faced some hiccups along the way and this article is to log and share my finding so it can help other beginners like me. In the end, I managed to train a tensorflow model to detect Batsumaru, a character from Sanrio.
This is how the detection will look like.
The tools
- Windows 10 Pro 64
- Tensorflow originally 1.7.1 and upgraded to 1.12.0. I will share the reason later.
- Python 3.5.4
- LabelImg for image labeling
- PyCharm IDE
Steps and Pitfalls
Some of the mistakes I made and other discoveries when following the guide. I will not repeat the steps mentioned in the original guide, but only the parts where I had to deviate from the walkthrough and found out things by myself.
- The training and testing images has to be RGB. This is because Tensorflow expects a certain number of channels in the image, and this particular model expects 3 channels (RGB). This discussion helped point me in the right direction. (Should have read the guide more carefully!)
- Preparing the tools at tensorflow/models/research. This step was particularly rough for me since I was totally new to Tensorflow. I didn't understand at first that we need another set of scripts to generate a custom model which is located here. If we just want to do detection on a generated model we only need the core library here. Since we will be generating our own model, we need both.
- When generating the TFRecord files, I encountered the error 'cannot import name string_int_label_map_pb2'. This means we need to compile Proto files in tensorflow/models/research/object_detection/protos.
- When trying to compile the PROTO files, I found that Windows version of Protoc does not support wildcard character. So we cannot compile 'all proto files in a folder' and instead have to compile one by one. However, the workaround is to use Windows Shell script to run a loop as such : do D:\Libraries\vcpkg\installed\x64-windows\tools\protoc object_detection\protos\%G --python_out=.
- Subnote : When compiling with protoc, if you get no input errors, you probably need an additional space like mentioned here.
- When running the object_detection/model_main.py, I encountered the error 'from pycocotools import coco \ ModuleNotFoundError: No module named 'pycocotools''. I needed to install cocoapi but on Windows it was not supported officially ! Luckily I found a port to Windows here. You can follow the discussion on Protoc repository too.
- When running the object_detection/model_main.py, I encountered the error : non_max_suppression tensorflow unexpected keyword score_threshold. This is caused by Tensorflow version <1.9 not supporting 'score_threshold'. This is why I upgraded my Tensorflow to use v1.12.0 instead. After updating Tensorflow version we must also use the corresponding CUDA and cuDNN library. I found that CUDA 10 and cuDNN 7.4 works for me. Also, don't forget to change CUDA_HOME in env settings.
The actual command I used to build the model:
py .\object_detection\model_main.py --pipeline_config_path=D:\Workspace\TF_TrainDetect\
training\models\model\ssd_mobilenet_v1_pets.config --model_dir=D:\Workspace\TF_TrainDetect\training\output --num_train_s
teps=50000 --sample_1_of_n_eval_examples=1 --alsologtostderr
Some additional notes when running the above command:
- Make sure the number of training steps (num_train) is appropriate for your needs. During my training of 50000 steps it took almost 18 hours to complete.
- model_dir is the folder where the generated models will be saved.
The actual command I used to export the model:
py .\object_detection\export_inference_graph.py --input_type=image_tensor --pipeline_config_path=D:\Workspace\TF_TrainDetect\training\models\model\ssd_mobilenet_v1_pets.config --trained_checkpoint_prefix=D:\
Workspace\TF_TrainDetect\training\output\model.ckpt-50000 --output_directory=D:\Workspace\TF_TrainDetect\training\export
ed_model
Some additional notes when running the above command:
- For trained_checkpoint_prefix, I chose the final checkpoint, at 50000 steps, and we can change this to any checkpoint that is deemed good enough for the detection.
- output_directory is where the frozen_inference_graph.pb file will be generated. This frozen graph is the final model to be used in the detection phase.
Detection Phase
For the detection test, I searched for videos containing the character and used those as input. The detection results were inaccurate when there are other characters present and when the size of the character in the image is too small. This is due to the low number of images used in training, and also because most of the images are too similar. An improvement to the images used in training will definitely boost the accuracy of the detection.
The detection test codes can be found here.
The /training folder is not included due to file size restrictions on Github, if there are any unclear steps due to this please let me know.
All the best!
This comment has been removed by a blog administrator.
ReplyDelete