Large amount of memory and computation in current Convolution Neural Networks (CNNs) impedes their implementation in embedded systems. A bitwise CNN with weights and activations in a convolution layer being either -1 or 1 offers a promising solution to compressing the model size and speeding up inference. But current solution of bitwise CNN encounters an accuracy drop. In order to alleviate that, we proposed to use a shortcut propagating the real-valued information that is already computed in the 1-bit neural network. By further enhancing the training techniques of binary networks, we achieved 56.4% accuracy on ImageNet dataset, much higher than XNOR-net with even fewer real-valued parameters.
Filming human motions with a camera drone is a very challenging task, because it requires the cameraman to manipulate a remote controller and design the desired image composition in real time. To help the inexperienced flyers to capture the cinematic videos, we propose several methods to auto aerial filming from three aspects: learning-based, heuristic-based and interactive-based.
Retinal fundus images provide rich information about pathological changes, which can be used for diagnosis of eye-related diseases, such as macular degeneration, diabetic retinopathy and glaucoma. Among various features in fundus images, retinal vessel features play a crucial role in diagnosis. Taking diabetic retinopathy as an example, microaneurysm, one fundamental symptom, generally exists along retinal vessels. For the extraction of retinal vessel features, generating accurate segmentation of retinal blood vessels is essential. However, manual annotation by a human observer is time consuming. Automated retinal vessel segmentation has been widely studied over decades; however it remains a challenging task especially for thin vessels. In addition, due to the inter-observer problem, a better evaluation metric is highly demanding.
Aerial surveillance and monitoring demand both real-time and robust motion detection from a moving camera. Most existing techniques for drones involve sending a video data streams back to a ground station with a high-end desktop computer or server. These methods share one major drawback: data transmission is subjected to considerable delay and possible corruption. Onboard computation can not only overcome the data corruption problem but also increase the range of motion. Unfortunately, due to limited weight-bearing capacity, equipping drones with computing hardware of high processing capability is not feasible. Therefore, developing a motion detection system with real-time performance and high accuracy for drones with limited computing power is highly desirable.
LDB (Local Difference Binary) is a highly efficient, robust and distinctive binary descriptor. The distinctiveness and robustness of LDB are achieved through 3 steps. First, LDB captures the internal patterns of each image patch through a set of binary tests, each of which compares the average intensity Iavg and first-order gradients, dx and dy, of a pair of image grids within the patch (as shown in (a) and (b)). Second, LDB employs a multiple gridding strategy to capture the structure at different spatial granularities (as shown in (c)). Coarse-level grids can cancel out high-frequency noise while fine-level grids can capture detailed local patterns, thus enhancing distinctiveness. Third, LDB selects a subset of highly-variant and distinctive bits and concatenates them to form a compact and unique LDB descriptor.
Although the recent progress in the deep neural network has led to the development of learnable local feature descriptors, there is no explicit answer for estimation of the necessary size of a neural network. Specifically, the local feature is represented in a low dimensional space, so the neural network should have more compact structure. The small networks required for local feature descriptor learning may be sensitive to initial conditions and learning parameters and more likely to become trapped in local minima. In order to address the above problem, we introduce an adaptive pruning Siamese Architecture based on neuron activation to learn local feature descriptors, making the network more computationally efficient with an improved recognition rate over more complex networks.
Real-time Facial Information Extraction, Analysis and Utilization
Obtaining desired semantic information from facial images plays a significant role in applications such as face recognition, head pose estimation and real-time animation. This project investigates how to improve the state-of-the-art techniques (e.g. face detection and face alignment) in terms of accuracy, time efficiency and robustness to perturbation. We also explore novel solutions to emerging problems and applications on mobile platforms.