Aerial filming via drones in action scenes is difficult because it requires users to understand the dynamic scenarios and operate the drone and camera simultaneously. Existing systems allow the user to manually specify the shots and guide the drone to capture footage, while none of them employ aesthetic objectives to automate aerial filming in action scenes. Meanwhile, these drone cinematography systems depend on the external motion capture systems to perceive the human action, which is limited to the indoor environment. In this paper, we propose an Autonomous Cinematography system "ACT" on the drone platform to address the above challenges.
Retinal fundus images provide rich information about pathological changes, which can be used for diagnosis of eye-related diseases, such as macular degeneration, diabetic retinopathy and glaucoma. Among various features in fundus images, retinal vessel features play a crucial role in diagnosis. Taking diabetic retinopathy as an example, microaneurysm, one fundamental symptom, generally exists along retinal vessels. For the extraction of retinal vessel features, generating accurate segmentation of retinal blood vessels is essential. However, manual annotation by a human observer is time consuming. Automated retinal vessel segmentation has been widely studied over decades; however it remains a challenging task especially for thin vessels. In addition, due to the inter-observer problem, a better evaluation metric is highly demanding.
Aerial surveillance and monitoring demand both real-time and robust motion detection from a moving camera. Most existing techniques for drones involve sending a video data streams back to a ground station with a high-end desktop computer or server. These methods share one major drawback: data transmission is subjected to considerable delay and possible corruption. Onboard computation can not only overcome the data corruption problem but also increase the range of motion. Unfortunately, due to limited weight-bearing capacity, equipping drones with computing hardware of high processing capability is not feasible. Therefore, developing a motion detection system with real-time performance and high accuracy for drones with limited computing power is highly desirable.
LDB (Local Difference Binary) is a highly efficient, robust and distinctive binary descriptor. The distinctiveness and robustness of LDB are achieved through 3 steps. First, LDB captures the internal patterns of each image patch through a set of binary tests, each of which compares the average intensity Iavg and first-order gradients, dx and dy, of a pair of image grids within the patch (as shown in (a) and (b)). Second, LDB employs a multiple gridding strategy to capture the structure at different spatial granularities (as shown in (c)). Coarse-level grids can cancel out high-frequency noise while fine-level grids can capture detailed local patterns, thus enhancing distinctiveness. Third, LDB selects a subset of highly-variant and distinctive bits and concatenates them to form a compact and unique LDB descriptor.
Although the recent progress in the deep neural network has led to the development of learnable local feature descriptors, there is no explicit answer for estimation of the necessary size of a neural network. Specifically, the local feature is represented in a low dimensional space, so the neural network should have more compact structure. The small networks required for local feature descriptor learning may be sensitive to initial conditions and learning parameters and more likely to become trapped in local minima. In order to address the above problem, we introduce an adaptive pruning Siamese Architecture based on neuron activation to learn local feature descriptors, making the network more computationally efficient with an improved recognition rate over more complex networks.
Real-time Facial Information Extraction, Analysis and Utilization
Obtaining desired semantic information from facial images plays a significant role in applications such as face recognition, head pose estimation and real-time animation. This project investigates how to improve the state-of-the-art techniques (e.g. face detection and face alignment) in terms of accuracy, time efficiency and robustness to perturbation. We also explore novel solutions to emerging problems and applications on mobile platforms.
Energy Efficient Bitwise Network Design for Vision Tasks
Large amount of memory and computation in current Convolution Neural Networks (CNNs) impedes their implementation in embedded systems. A bitwise CNN with weights and activations in a convolution layer being either -1 or 1 offers a promising solution to compressing the model size and speeding up inference. But current solution of bitwise CNN encounters an accuracy drop when the network goes deep for complex tasks. In order to alleviate that, we designed a specific network structure to preserve a small amount of real values for striking a balance between memory usage, efficiency and accuracy.