Skip to main content

Object detection with Google Colab and Tensorflow

This is just a memo of the challenges I faced when running a model training on Google Colab, while following a great tutorial here.

Mind the versions

Tensorflow is currently at version 2.2.0 but most tutorials are still using the contrib package, and there is no known easy way to update the code to remove dependency on contrib. So my best bet is to downgrade the tensorflow version to 1.x. Since Google Colab only gives the options of either 1.x or 2.x and we cannot specify the exact version, I ended up with version 1.15.2.

Even with the command : %tensorflow_version 1.15.0
I ended up with : 1.15.2

Another pitfall was the version of numpy. Installing numpy gives us the version 1.18.3 but for some reason this generates the error :

TypeError: 'numpy.float64' object cannot be interpreted as an integer

Downgrading numpy to version 1.17.4 solved this for me.

It seems we don't need ngrok for tensorboard

With the command : %load_ext tensorboard
We can view tensorboard directly on Google Colab

The generated TF Record files are of version 1.6

Another error I encountered was : 

Signature mismatch. Keys must be dtype <dtype: 'float32'>, got <dtype: 'string'>


This is resolved by adding dct_method as follows:

Modify the xml_to_csv method (maybe needed)

In the xml_to_csv_method there is this line: 
value = (root.find('filename').text + '.jpg',

If you use labelimg and labels your own image dataset, you may have the filename labeled with the extension (.jpg). Check your XML and if you already have the extension, you can remove the + '.jpg' part from the code.

Afterwords

Machine learning is very time consuming, especially the data preparation step. I didn't get a good enough detection with my own dataset, so I went for a published open dataset. The amount of data is thousands more than I had. If I had to do the data gathering and labeling myself there is no way I can finish any project in a reasonable time.

Furthermore, the training process is also very time consuming even with high powered PCs. If you make a wrong step or missed a parameter, redoing the process can be super time consuming.

I hope these two problems can be addressed in the near future. They are what is preventing machine learning from being more common in the masses.

Comments

Popular posts from this blog

Installing a custom ROM on Android (on the GT-N8013)

It's been a while since my last entry and since it is a new start in 2019, I thought I'd write something about "gone with the old and in with the new". I've had my Samsung Galaxy Note 10.1 (pnotewifi) since 2014, and it's one of the early Galaxy Note tablet series. It has served me well all this years but now it just sits there collecting dust. My old Samsung GT-N8013 I've known a long time about custom Android ROMs like CyanogenMod but has never had the motivation to try them out, until now ! Overview of the process For beginners like me, I didn't have an understanding of the installation process and so it looked complicated and it was one of the reasons I was put off in trying the custom ROM. I just want to say, it's not complicated at all!   Basically you will need to Prepare an SD card and install Android SDK (you need adb ). Install a custom boot loader ( TWRP is the de facto tool at the moment). Use adb to copy custom...

Unity Best Practices for Beginners

Unity is a fabulous tool for not only games but also interactive entertainment, simulations, etc. And there are many good tutorials on tips and tricks to optimize performance and code readability. I compiled this list as a memo to myself when starting out on Unity, so I can always refer back to the basics and hoping that this can help someone else too. Object pooling. Object pooling is a pretty cool trick and it improves performance because you can reduce the number of Initiate() and Destroy() calls. To illustrate why Destroy can be bad, I attached a screenshot of a project I was building with Unity. I'm not going to go into the details of Object Pooling because there are already many good tutorials out there. One of them being this one.  One thing to always keep in mind is that it is easy to get MissingReferenceException if you accidentally Destroy() the pooled objects. It happened to me once where I attached one script to multiple prefabs which behave the same except that so...

Camera Calibration with OpenCV and Kinect

I have been recently working on projects that use the Microsoft Kinect, and also had the chance to use OpenCV to calibrate the color camera in the Kinect. This is because one of the projects use Aruco AR markers and to obtain the orientation of the Aruco markers, we need the camera's internal parameters and therefore, calibration! The Kinect's camera is just like any other webcam, so there is nothing special about it. But I thought the calibration process can help others who are just starting to use OpenCV and need to get camera internal parameters. The code can be found here . You need to prepare a calibration board with checker patterns with known length. The pattern image is also included in the repository but you'll need to find a sturdy planar object such as a wooden board. Currently there is not any UI so it's probably good to share here that after you position your checker board you need to manually press 'c' to capture an image. Repeat this 10 ti...