Skip to main content

Object detection with Google Colab and Tensorflow

This is just a memo of the challenges I faced when running a model training on Google Colab, while following a great tutorial here.

Mind the versions

Tensorflow is currently at version 2.2.0 but most tutorials are still using the contrib package, and there is no known easy way to update the code to remove dependency on contrib. So my best bet is to downgrade the tensorflow version to 1.x. Since Google Colab only gives the options of either 1.x or 2.x and we cannot specify the exact version, I ended up with version 1.15.2.

Even with the command : %tensorflow_version 1.15.0
I ended up with : 1.15.2

Another pitfall was the version of numpy. Installing numpy gives us the version 1.18.3 but for some reason this generates the error :

TypeError: 'numpy.float64' object cannot be interpreted as an integer

Downgrading numpy to version 1.17.4 solved this for me.

It seems we don't need ngrok for tensorboard

With the command : %load_ext tensorboard
We can view tensorboard directly on Google Colab

The generated TF Record files are of version 1.6

Another error I encountered was : 

Signature mismatch. Keys must be dtype <dtype: 'float32'>, got <dtype: 'string'>


This is resolved by adding dct_method as follows:

Modify the xml_to_csv method (maybe needed)

In the xml_to_csv_method there is this line: 
value = (root.find('filename').text + '.jpg',

If you use labelimg and labels your own image dataset, you may have the filename labeled with the extension (.jpg). Check your XML and if you already have the extension, you can remove the + '.jpg' part from the code.

Afterwords

Machine learning is very time consuming, especially the data preparation step. I didn't get a good enough detection with my own dataset, so I went for a published open dataset. The amount of data is thousands more than I had. If I had to do the data gathering and labeling myself there is no way I can finish any project in a reasonable time.

Furthermore, the training process is also very time consuming even with high powered PCs. If you make a wrong step or missed a parameter, redoing the process can be super time consuming.

I hope these two problems can be addressed in the near future. They are what is preventing machine learning from being more common in the masses.

Comments

Popular posts from this blog

Using FCM with the new HTTP v1 API and NodeJS

When trying to send FCM notifications I found out that Google has changed their API specifications. The legacy API still works but if you want to use the latest v1 API you need to make several changes. The list of changes is listed on their site so I won't be repeating them again but I'll just mention some of the things that caused some trial and error on my project. The official guide from Google is here : Official Migration Guide to v1 . The request must have a Body with the JSON containing the message data. Most importantly it needs to have "message" field which must contain the target of the notification. Usually this is a Topic, or Device IDs. Since my previous project was using GAS, my request had a field called "payload" instead of "body". Using the request from my previous project, my request in Node JS was as follows: request ({ url: 'https://fcm.googleapis.com/v1/projects/safe-door-278108/messages:send' , method: ...

Building a native plugin for Intel Realsense D415 for Unity

Based on a previous post , I decided to write a plugin for the Intel Realsense SDK methods so we can use these methods from within Unity. FYI Intel also has their own Unity wrapper in their Github repository , but for our projects, I needed to perform image processing with OpenCV and passing the results to Unity instead of just the raw image/depth data. There is a plugin called OpenCVForUnity to use OpenCV functions from Unity but previous experiments indicate the image processing inside Unity can take a long time. I hope this post can help someone else who wants to use Intel's cameras or any other devices natively in Unity. Test Environment Windows 10 64bit Unity 2017.2.0f3 x64 bit Realsense SDK from Intel CMake 3.0 or higher Steps Checkout the native plugin code here . Don't worry about the other projects in the same repository. The relevant code is in the link above. Checkout the Unity sample project here . However, instead of master you need to go to the br...

Microsoft Azure Face API and Unity

During one of my projects, I came across Microsoft's face recognition API (Azure Face API) and it looked good enough to recognize people's faces and detect if a person is a newcomer or a repeating customer to our store. As our installations mainly use the game engine Unity, I wanted to be able to use the Face API from Unity. Face API does not have an SDK for Unity but their requests are just HTTP requests so the Networking classes in Unity can be wrapped into methods to make it easy to call these APIs. First of all, to those who just want to see the code, here it is . My tests focus on the identification of a face in an input image. The full tutorial I followed can be found here . The Main scene goes through the steps in the tutorial such as creating a PersonGroup and adding Persons to the group if it is not created yet. Just make sure you: Change the API key. I used a free trial key which is no longer valid. Use whatever images you want. I don't mind you us...