GSoC WINE -Deep Learning with Caffe & OpenCV Proposal

Yida Wang wangyida37 at
Sat Mar 14 20:47:07 CDT 2015

Posted by Yida Wang

    Passion with OpenCV and other Open Source Communities

Hello, I am Yida Wang, a first year Master Student in Pattern Recognition of PRIS Lab in SICE school of BUPT.  I have wonderful experience with OpenCV and several other open source communities. 

I have won the 1st prize in Scilab Open Source Contest 2014 and Excellent Developer in CSDN Summer of Code 2014 in BladeRF commmunity. All certificate could be provided if it is necessary. 

As for WINE project, I have developed a PCANet structure to automatically match opencv source figures with Microsoft figures sot that the editor in linux platform could match the correct figure copied from Microsoft Office. 

This time I think it is convenient to use Caffe and OpenCV for deep learning and other CV issues for some pattern recognition applications.

I have also own an Nvidia K40 GPU donated by Nvidia for research, I think I can extend the package for GPU calculation!

    Motivation for Proposal

      Framework in Computer Vision- Deep Learning Project

I’m Concentrating on Deep Learning coding with OpenCV and other open source project like Caffe and cuDNN for a period of time. My project contains two meaningful contents, the 1st one is a BP algorithm eliminated CNN structure which is far less time consuming than traditional CNN and the 2nd one is an entire Deep Leaning structure on image recognition based on Caffe.

      New Idea for a Fast CNN

My idea is based on the thriving pattern recognition method called ’Deep Neural Networks ’ which has been used by Google to build the amazing recognition structure last year. As Deep Neural Network behaves amazingly in pattern recognition, 
there are already useful tools such as Caffe being implemented for research. Some of the useful codes could be modified and embedded in OpenCV for the development of DNN. 

But at the same time, the problem lies in the dependence on hardware such as powerful GPU which is not easy to be got by ordinary people. The time consuming process is mainly caused by 2 points: the Randomness of the convolutional kernel and the countless back propagation step.

I have been studying on a powerful single direction fast CNN called PCANet for research and commercial use. 

Here are some benefit in such structure:

Such structure hasn’t the BP algorithm used in DNN which cost much operation. 

It has some tiny PCA filters instead. The filters’ shape is just like the shape in the training results of normal CNN.

My structure could be used in initiating the CNN in the future to optimize the training time in CNN.


      Combination of Caffe and PCANet in WINE

The ‘blob’ structure defined in Caffe is clear for the layer based CNN and the input data of convolution kernel and bias in one layer could be replaced by the PCANet filter for initialization. So the modified structure combined with Caffe and PCANet could be powerful both in speed of convergence and recognition stability with the help of OpenCV.

So my idea consists 2 stage: PCANet implementation and Caffe modifying for OpenCV.

    The 1st direction called Fast-CNN(CellNet)

My 1st idea is an implementation of PCANet which has different structure of local filters learning from different databases while holding the stability and generality of general CNN at the same time.

Here is the General model of PCANet:

I have been using this algorithm in one of the most difficult face database called ‘LFW’ and achieves 92% recognition rate with just this algorithm alone. 

My friend Qian Hong, has guild me to help him solve the general figure recognition problem recently to reduce the human labor cost in selecting the most likely figure in open source figure database to the Microsoft official figure.  I use PCANet for such problem with particular parameters and got the result partly shown below:

Target database

Open source database

The matching result shows the accuracy of PCANet, they are all extract from raw database, and there are many shape alike matches. My algorithm just could select the best substitute.

Now I have modified the structure for general visual object classification problems and join the Detection-Feature extraction-Classification process together into an entire extension neural network both for my study and the project itself. 
By attempting to modify my previous Matlab codes about CellNet into C++ codes with the help of OpenCV(especially in Matrix operations), I could get some raw features extracted from a photo through the network these days. The performance is still as excellent as I mentioned last few weeks, but I am concerning about another confusion related to what you have told me.

    The 2nd direction: Modifying Caffe with OpenCV in WINE

In fact besides studying in artificial architecture, I am also research in normal neural networks at the same time, so came the 2nd content I just showed. Deep learning is powerful indeed, it could exceed the human capabilities in the future due to Big Data and fast 2-D computation (including GPU). But the complex back and forth process cost really much even in the future. So I am still searching the better solution between one-way network and two-way network. 

I noticed that OpenCV has posted that it would be cool if OpenCV could load and run deep networks trained with popular DNN packages like Caffe or Torch. I am glad to hear that because I am also doing some experiment on CNN with Caffe. 

Than there are some main task to be done in the future:

Modify the Caffe’s data structure with OpenCV, especially the ‘blob’ structure defined in Caffe. The layer really appears more clear with the help of ‘blob’, but the basic element could be more flexible with the OpenCV data structure.

Some prerequisites should be simplified including BLAS & Boost. We may just need some basic dependencies to form a CNN.

Whether including GPU calculating process is under consideration due to the short time. 


Just as described  in detail, my idea is composed of a fast DNN which is almost completed and a Caffe based DNN. I attached the paper describing the fast DNN called “PCANet”. The source code is published on Github with the name of CellNet with the URL:

As for the Caffe based DNN, I thought it is much more easy to develop a CPU version rather than a GPU version at first. I am worried about that the time might not be enough to apply OpenCV on Caffe without bugs from now on though there are already some utilizations of OpenCV in Caffe.

I wonder if the idea is meaningful enough?  Especially the 1st part.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wpsXhOezI.png
Type: image/png
Size: 48850 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wps51bojG.png
Type: image/png
Size: 44888 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wpsMfL73D.png
Type: image/png
Size: 30041 bytes
Desc: not available
URL: <>

More information about the wine-devel mailing list