Skip to main content

Pitfalls during Training and Object Detection with TensorFlow for Absolute Beginners

This article is based on the great tutorial here on how to train and detect custom objects with Tensorflow. I also referred to the official documentations here and here for running Tensorflow model building locally. It was my first custom detection project and I faced some hiccups along the way and this article is to log and share my finding so it can help other beginners like me. In the end, I managed to train a tensorflow model to detect Batsumaru, a character from Sanrio.

This is how the detection will look like.


The tools

  1. Windows 10 Pro 64
  2. Tensorflow originally 1.7.1 and upgraded to 1.12.0. I will share the reason later.
  3. Python 3.5.4
  4. LabelImg for image labeling
  5. PyCharm IDE

Steps and Pitfalls

Some of the mistakes I made and other discoveries when following the guide. I will not repeat the steps mentioned in the original guide, but only the parts where I had to deviate from the walkthrough and found out things by myself.
  1. The training and testing images has to be RGB. This is because Tensorflow expects a certain number of channels in the image, and this particular model expects 3 channels (RGB). This discussion helped point me in the right direction. (Should have read the guide more carefully!)
  2. Preparing the tools at tensorflow/models/research. This step was particularly rough for me since I was totally new to Tensorflow. I didn't understand at first that we need another set of scripts to generate a custom model which is located here. If we just want to do detection on a generated model we only need the core library here. Since we will be generating our own model, we need both.
  3. When generating the TFRecord files, I encountered the error 'cannot import name string_int_label_map_pb2'. This means we need to compile Proto files in tensorflow/models/research/object_detection/protos.
  4. When trying to compile the PROTO files, I found that Windows version of Protoc does not support wildcard character. So we cannot compile 'all proto files in a folder' and instead have to compile one by one. However, the workaround is to use Windows Shell script to run a loop as such : do D:\Libraries\vcpkg\installed\x64-windows\tools\protoc object_detection\protos\%G --python_out=.
    1. Subnote : When compiling with protoc, if you get no input errors, you probably need an additional space like mentioned here.
  5. When running the object_detection/model_main.py, I encountered the error 'from pycocotools import coco \ ModuleNotFoundError: No module named 'pycocotools''. I needed to install cocoapi but on Windows it was not supported officially ! Luckily I found a port to Windows here. You can follow the discussion on Protoc repository too.
  6. When running the object_detection/model_main.py, I encountered the error : non_max_suppression tensorflow unexpected keyword score_threshold. This is caused by Tensorflow version <1.9 not supporting 'score_threshold'. This is why I upgraded my Tensorflow to use v1.12.0 instead. After updating Tensorflow version we must also use the corresponding CUDA and cuDNN library. I found that CUDA 10 and cuDNN 7.4 works for me. Also, don't forget to change CUDA_HOME in env settings.
The actual command I used to build the model:

py .\object_detection\model_main.py --pipeline_config_path=D:\Workspace\TF_TrainDetect\
training\models\model\ssd_mobilenet_v1_pets.config --model_dir=D:\Workspace\TF_TrainDetect\training\output --num_train_s
teps=50000 --sample_1_of_n_eval_examples=1 --alsologtostderr

Some additional notes when running the above command:
  1. Make sure the number of training steps (num_train) is appropriate for your needs. During my training of 50000 steps it took almost 18 hours to complete.
  2. model_dir is the folder where the generated models will be saved.

The actual command I used to export the model:
 py .\object_detection\export_inference_graph.py --input_type=image_tensor --pipeline_config_path=D:\Workspace\TF_TrainDetect\training\models\model\ssd_mobilenet_v1_pets.config --trained_checkpoint_prefix=D:\
Workspace\TF_TrainDetect\training\output\model.ckpt-50000 --output_directory=D:\Workspace\TF_TrainDetect\training\export
ed_model

Some additional notes when running the above command:
  1. For trained_checkpoint_prefix, I chose the final checkpoint, at 50000 steps, and we can change this to any checkpoint that is deemed good enough for the detection.
  2. output_directory is where the frozen_inference_graph.pb file will be generated. This frozen graph is the final model to be used in the detection phase.

Detection Phase

For the detection test, I searched for videos containing the character and used those as input. The detection results were inaccurate when there are other characters present and when the size of the character in the image is too small. This is due to the low number of images used in training, and also because most of the images are too similar. An improvement to the images used in training will definitely boost the accuracy of the detection.

The detection test codes can be found here.
The /training folder is not included due to file size restrictions on Github, if there are any unclear steps due to this please let me know.

All the best!

Comments

Post a Comment

Popular posts from this blog

Using FCM with the new HTTP v1 API and NodeJS

When trying to send FCM notifications I found out that Google has changed their API specifications. The legacy API still works but if you want to use the latest v1 API you need to make several changes. The list of changes is listed on their site so I won't be repeating them again but I'll just mention some of the things that caused some trial and error on my project. The official guide from Google is here : Official Migration Guide to v1 . The request must have a Body with the JSON containing the message data. Most importantly it needs to have "message" field which must contain the target of the notification. Usually this is a Topic, or Device IDs. Since my previous project was using GAS, my request had a field called "payload" instead of "body". Using the request from my previous project, my request in Node JS was as follows: request ({ url: 'https://fcm.googleapis.com/v1/projects/safe-door-278108/messages:send' , method: ...

Building a native plugin for Intel Realsense D415 for Unity

Based on a previous post , I decided to write a plugin for the Intel Realsense SDK methods so we can use these methods from within Unity. FYI Intel also has their own Unity wrapper in their Github repository , but for our projects, I needed to perform image processing with OpenCV and passing the results to Unity instead of just the raw image/depth data. There is a plugin called OpenCVForUnity to use OpenCV functions from Unity but previous experiments indicate the image processing inside Unity can take a long time. I hope this post can help someone else who wants to use Intel's cameras or any other devices natively in Unity. Test Environment Windows 10 64bit Unity 2017.2.0f3 x64 bit Realsense SDK from Intel CMake 3.0 or higher Steps Checkout the native plugin code here . Don't worry about the other projects in the same repository. The relevant code is in the link above. Checkout the Unity sample project here . However, instead of master you need to go to the br...

Microsoft Azure Face API and Unity

During one of my projects, I came across Microsoft's face recognition API (Azure Face API) and it looked good enough to recognize people's faces and detect if a person is a newcomer or a repeating customer to our store. As our installations mainly use the game engine Unity, I wanted to be able to use the Face API from Unity. Face API does not have an SDK for Unity but their requests are just HTTP requests so the Networking classes in Unity can be wrapped into methods to make it easy to call these APIs. First of all, to those who just want to see the code, here it is . My tests focus on the identification of a face in an input image. The full tutorial I followed can be found here . The Main scene goes through the steps in the tutorial such as creating a PersonGroup and adding Persons to the group if it is not created yet. Just make sure you: Change the API key. I used a free trial key which is no longer valid. Use whatever images you want. I don't mind you us...