Posts

Credit Card OCR with OpenCV

The problem of recognize the text information in the credit card 1 Capturing a card image 2 Localizing or alignment: to find the different regions of text on the image Edge detection for the entire image using Sobel operator Hough line transformation 3 Segmentation: to splite the text imformation to separate digit on each Calculate the gradient image -> open operation -> aspect ratio (ratio of width and height) 4 Digits recognition Convolutional Neural Network Template Matching Challenge: illumination changes, wearing, no contrast. The most hardest part is the card number segmentation.

Trian Neural Netowrk with Noise Label

label-smoothing regularization Add noise to the label and change the label distribution as a regularization method and make the prediciton with less confident. https://www.zhihu.com/question/61971817 https://www.robots.ox.ac.uk/~vgg/rg/papers/reinception.pdf

Research Questions

Please introduce your research quickly? I am a Ph.D. student from the University of South Carolina. This September, I just passed my dissertation defense, so I can definitely graduate on this Dec. My research area is computer vision, machine learning, especially for the deep learning recently. Our lab focused on improving the facial behavior recognition using the machine learning methods. The facial behavior recognition including the facial expression and facial action unit recognition. If you want to hear more about the facial action unit, I can explain you later. The problem is that given a facial image, the learned model should tell which expression or facial actions unit exist in this facial image. In the first three years of my Ph.D. program, we mainly extracted the human-designed features such as the LBP and HOG, then use the Adaboost or SVM for the classification. In the recent three years, we use the Deep learning methods especially the convolution neural networks. We main

Lasso (l1 penalty VS Ridge (l2 penalty)

Ridge and Lasso are forms of regularized linear regressions. The regularization can also be interpreted as prior in a maximum a posterior estimation method. Ridge and Lasso regression use two different penalty functions. Ridge uses l2, which is the sum of the squares of the coefficients. And for Lasso is the L1 norm, which is the sum of the absolute values of the coefficients. The ridge (L2) regression can't zero coefficients out, so we either select all the coefficients or none of them, whereas Lasso (L1) does both parameter shrinkage and variable selection automatically because it zero out the coefficient of collinear variables, which mean it can help to select the variables out of given n variables while  performing lasso regression.  We will continue to talk about the difference between L1 and L2 norm. While practicing machine learning, you may have come upon a choice of L1 and L2. Usually the two decisions are : 1) L1-norm vs L2-norm loss function; and 2) L1-regularization

Recurrent Neural Network vs Recursive Neural Network

Image
A recurrent neural network basically unfolds over time. It is used for sequential inputs where the time factor is the main differentiating factor between the elements of the sequence. For example, here is a recurrent neural network used for language modelling that has been unfolded over time. At each time step, in addition to the user input at that time step, it also accepts the output of the hidden layer that was computed at the previous time step. A recursive neural network is more like a hierarchical network where there is really no time aspect to the input sequence but the input has to be processed hierarchically in a tree fashion. Here is as example of how a recursive neural network looks. It shows the way to learn a parse tree of a sentence by recursively taking the output of the operation performed on a smaller chunk of the text. The Recurrent NN are in fact recursive neural networks with a particular structure as linear chain shape, which is good at handling the lin

Autoencoders VS Sparse Coding

Sparse Coding Sparse coding minimizes the objective L sc = | | W H − X | | 2 2   reconstruction term + λ | | H | | 1    sparsity term where  W  is a matrix of bases, H is a matrix of codes and  X  i s a matrix of the data we wish to represent.  λ   implements a trade of between sparsity and reconstruction. Note that if we are given  H , estimation of  W is easy via least squares. Autoencoders Autoencoders are a family of unsupervised neural networks. There are quite a lot of them, e.g. deep autoencoders or those having different regularisation tricks attached--e.g. denoising, contractive, sparse. There even exist probabilistic ones, such as generative stochastic networks or the variational autoencoder. Their most abstract form is D ( d ( e ( x ; θ r ) ; θ d ) , x ) but we will go along with a much simpler one for now: L ae = | | W σ ( W T X ) − X | | 2 where  σ σ  is a nonlinear function such as the logistic sigmoid  σ ( x ) = 1 1 + exp ( − x ) Difference: 1 W