Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/17481
Title: | IDENTIFICATION OF HUMAN ACTIONS IN VIDEO SEQUENCES |
Authors: | DHIMAN, CHHAVI |
Keywords: | HUMAN ACTIONS VIDEO SEQUENCES IDENTIFICATION SYSTEM DEPTH DATA SKELETON DATA |
Issue Date: | Nov-2019 |
Series/Report no.: | TD-4860; |
Abstract: | The concept of an intelligent identification of human actions in videos is evolving as an active research area of computer vision and has covered a wide range of applications such as Ambient Assistive Living (AAL) [1], healthcare of elderly people [2], Intelligent Video surveillance systems [3], human-computer interfaces (HCI) [4] [5] , sports [6], event analysis, robotics [7], intrusion detection system [8], content based video analysis [9], multimedia semantic annotation and indexing [10] etc. With the advent of technology and proliferating demand of society, automatic video sequence analysis based systems have become the need of the hour and their application in real life is helping to raise the standards of safety and security in society. The performance of the intelligent human action identification system greatly depends on the type of input fed to the systems, and features extracted from the input data. Feature designing plays an important role in understanding the actions in videos. However, various environmental conditions such as lighting conditions, cluttered background, partial or complete occlusion, crowded scenes, different viewpoint of the camera, size, shape, appearance and complexity of human actions, badly affect the process of discriminating feature. Such challenges have always pushed forth researchers to explore new dimensions of the solution from vision-based to sensors, from 2D data to 3D data based Surveillance Systems, integrating multiple features, over the years. Various algorithms [11] [12] [13] [14] have been developed by the researchers, keeping different challenging scenarios in mind. Various real-time depth and skeleton based fall detection systems [15] [16] [17] [18] are developed considering the affordable range of the common user and practical challenges involved in video iv analysis. In addition to this, deep architectures [19] have also foot-stepped in computer vision field and used for automatic assessment of Parkinson’s disease, AAL applications and many more. Therefore, this thesis investigates both two-dimensional: RGB and three-dimensional: Depth and Skeleton based human action identification methods using both traditional handcrafted features as well as deep features. The human action identification objective is mainly divided into three steps: ▪ The first step deals with human silhouette extraction. For different types of inputs different human silhouette extraction methods are used, which are listed as follows: i. For RGB video sequences, entropy based texture segmentation helps to segment the human silhouettes from background. ii. While dealing with depth images, human silhouette extraction process is accomplished by using global thresholding. iii. For skeleton data, joining of skeleton 3D coordinates generate the human poses for each frame. ▪ The second step is feature extraction and representation using both traditional and deep learning models. For different types of combination of inputs, features are extracted with four different approaches, given as follows. i. For RGB video sequences, feature vector is generated by combining global Spatial Distribution Gradient (SDGs) representation and Difference of Gaussian (DoG) based STIPs which are scale, rotation and translation invariant. v ii. For Depth and skeleton data, a robust feature vector is computed by using 𝓡-transform and Zernike moments based human pose description, which is robust in terms of translation, rotation and scale variations. iii. For RGB and Depth data, motion dynamics of an action is represented as Dynamic Images (DIs) based CNN features and geometrical view invariant details of human poses are defined as deep HPM based features followed by learning of temporal information using LSTM model. iv. For skeleton data, part-wise spatio-temporal CNN – RIAC Network based 3D human action features are defined. ▪ The third step is classification of human actions. K-NN, SVM, and HMM are used to classify traditional handcrafted features. Weighted, max, average and multiply late fusion strategies are used for deep learning models. The performance of each proposed action identification model is tested with various publicly available datasets and compared with earlier state-of-the-art algorithms. In addition to this, a novel Abnormal Human Action (AbHA) dataset is generated, while developing an automatic abnormal human action identification framework targeting the elderly health care and made publically available. Finally, the research work is concluded followed by future research direction as well as possible future applications which are highlighted and discussed in detail. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/17481 |
Appears in Collections: | Ph.D. Electronics & Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ChhaviDhiman_Thesis_2k16phdec07.pdf | 4.26 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.