Reproducibility project for beginners — Deep Orchards: Integrating the Deep orchards fruit dataset with Faster RCNN
Terms like Deep learning and machine learning have been quite popular these days. There have been several advancements in the field over the last couple of years. Having seen several fields slowly adapting techniques to solve problems via deep learning and machine learning encouraged me and my Partner Xin Tan to take up Machine learning and Deep learning courses in our Masters. While courses give you a basic understanding on a subject, Projects have always been an interesting way of learning new concepts hence we took up a challenging project to integrate the Deep Orchards dataset [link] with the Faster RCNN network and reproduce the research paper [link]. The blog takes you through the very basic steps one should follow if you are a newbie in deep learning.
Understanding the dataset
Understanding the Dataset is the most important step in integrating a Dataset with any model. Questions you should ask yourself while checking the dataset is how is the dataset organized. What is the tree structure of the dataset? Is your dataset for supervised or unsupervised learning? Does in have annotation? How are the annotations for the images done? In what format are the annotations available to you? Can you integrate the annotations to the dataset? How are the image set files named. Are the annotation files named?
The best way to answer all the queries is to write a simple script which would help you understand your dataset. [Fair Warning !!!] A prior basics of basic python would help you. We personally like Jupyter notebooks and have been used to them for quite some time. Hence, if you are a beginner would recommend you using Jupyter notebook for easy understanding.
Following is a script which can be used to visualize the deep orchards' dataset.
Following are some example images with the annotation for all the 3 datasets in the images.
While visualizing data we understood a few aspects of the dataset.
- The images are of random size.
- The Annotation for apples is in form or circles
- There could be more than one bounding boxes in an image or no bounding box in an image.
Since it was our first time replicating the network we made it a little easy for us and went forward with the using the almonds dataset and train the Faster RCNN network for Binary classification.
Understanding the Network
The Deep orchards paper gives a brief of what network they had used for their work but does not deep dive explaining how they trained their network. To understand the RCNN better we followed the blog post. Referring to the blogs might seem easy at first, but we would recommend you to use a pencil and paper to understand the complete network. Also, it is highly recommended getting the size of every tensor in between as it will help you figure out the issues in your code later.
Faster R-CNN Explained
Faster R-CNN has two networks: region proposal network (RPN) for generating region proposals and a network using these…
Demystifying Region Proposal Network (RPN)
Object detection is an important field of computer vision. Before the emergence of deep learning, traditional methods…
After understanding the network first it might seem simple to code it out but it would be better if we refer to a pre-existing code. The reason for doing so is that sometimes by looking at a deep learning code written by Professional programmers we also get to know how to structure our own code. For the purpose of our project we referred to the following code.
This repo was built back two years ago when there were no pytorch detection implementation that can achieve reasonable…
To begin with we started training the model with their own dataset (VOC2007) and we encountered a NaN error. We understood that the error is due to an issue that the bounding boxes while training go outside the boundary. Further we noticed that the model used dataset in XML format. But our dataset was in CSV format. To make the dataset compatible with the program we wrote one more script to convert our CSV format dataset to XML format.
Make sure you compare your data in XML format with CSV format. For some reason we also did receive the same NaN error on our loss. By now we were sure that this error is not due to the code but the dataset. We removed all the images from the dataset where the bounding box went outside the image.
Note: While working with the code you might face some CUDA pytorch compatibility issues if you are working for the first time.
Note: The Deep orchards paper suggests on using ZFNET and VGG16 in the convolutional section but for some reason the VGG16 model also did not work as expected hence we integrated the RESNET which was available with the code and trained it with RESNET to get more stable results.
Finally !!!!! After some NaN errors on loss function and CUDA pytorch errors we were able to train the model for 10,000 iterations and used the tensor board to have a look the loss. Tensor board is a great tool which was available with the code to visualize the following cross entropy and loss for the RPN network.
Following are the example test results we achieved from the network.
We can make a few observations from the test results that the bounding boxes are not exact but are close to exact. The reason the difference is seen is due to the number of iterations we trained our model for is 10,000 and the number of iterations the model is trained for to get better results as shown in the paper is 10,000 * 7.
Following is the output just after the RPN later in the network.
We are yet to get to the final results but for a beginning this is a good start.
Following is a cumulative link of our complete program integrated with the dataset. [link]