Abstract

In order to improve the accuracy and timeliness of folk dance movement recognition, this paper proposes an improved MCM-SVM recognition model to recognize the lower limb human motion of ethnic dance in rural areas based on sensors. In order to recognize these actions, the SVM algorithm is used to identify the current action, and the MCM is used to optimize the recognition result. The experimental results show that the proposed improved model achieves higher recognition rate compared to the SVM algorithm for the recognition of different dance moves. The average recognition rate exceeds 93%, and the average recognition time is about 0.6 ms, which verifies the effectiveness of the proposed model. The proposed model will provide guidance and practicality for the design and construction of future dance movement recognition systems.

1. Introduction

The application of modern information technology in the protection of traditional cultural resources and cultural inheritance makes the spread of national art and culture present a dynamic and technological sense. Ethnic dance has a distinctive feature in China and is the main form of art in rural areas of China. However, with the passage of time, many ethnic dance moves are on the verge of being lost. Therefore, the recording, protection, and inheritance of these intangible cultural heritage have become an important research topic.

Motion capture is an accurate three-dimensional human motion recording method, which can be used in ethnic dance to comprehensively record and protect dynamic arts such as ethnic dance [1]. Human action recognition is a hot research field that has emerged in recent years. With the development of pattern recognition and artificial intelligence, more and more universities, research institutes, and companies have invested in research in this field. As an important branch in the field of pattern recognition and artificial intelligence, human action recognition has important significance and far-reaching development prospects in human-computer interaction, medical health, sports analysis, intelligent monitoring, and even homeland security. Divided from the research content, the human body action recognition can be divided into two aspects: gesture recognition and dynamic process recognition [2]. Among them, gesture recognition is mainly to classify and distinguish static objects, which can be subdivided into hand shape recognition and posture recognition. Dynamic process recognition is the recognition of dynamic objects, which can be specifically subdivided into gesture recognition, facial expression recognition, gait recognition, motion recognition, etc.

The recognition of lower limb human motion of ethnic dance belongs to dynamic process recognition, which mainly includes the recognition of motion information such as lower limb human motion mode, spatial position, lower limb joint angle, and angular acceleration. At present, in the process of human action recognition, the accuracy and timeliness of recognition need to be further improved. The low recognition accuracy causes the movement of the exoskeleton of the lower limbs to be inconsistent with the movement that the wearer wants to complete, and the poor recognition timeliness causes the device to always lag behind the movement of the human body [3]. Human motion recognition based on wearable sensors is one of the important research directions. It mainly analyzes and processes human body motion information to identify the human body’s motion state.

Lower limb human motion recognition is mainly used in motion analysis, medical health, and bionic robots, but the research on its application in the field of dance is relatively scarce. This research will take the inheritance of ethnic dance in rural areas of China as the background to explore the sensor-based recognition method of ethnic dance lower limb human motion to fill the research gap in this field. In order to recognize these actions, the support vector machine (SVM) algorithm is used to identify the current action, and the Markov chain model (MCM) is used to optimize the recognition result.

2. Literature Review

Psychologist Johansson was the first to study human movements. He obtained the movement process of the human body by binding high-brightness reflective sequins to key parts of the human body [4]. This shows that people can distinguish the type of human body movement by observing the movement time series of key parts of the human body. This experiment opened the prelude to the study of human movement and had an important impact on future research.

With the development of computers and sensors, the research on human motion recognition has opened a diversified, high-precision research model. In order to achieve more accurate motion recognition, Peng et al. added physiological signals to the motion capture data, using the characteristics of the physiological signals directly as the input of the classifier, and the sensor signals were extracted from the topic through the topic model. After the distribution is input into the classifier, this method has achieved a good recognition effect on complex behaviors [5]. Human movements are highly complex and diverse in styles. Wang et al. focused on the difference in motion between the elderly and young people, extracted the common time-domain and frequency-domain features in the motion capture signal, and completed the establishment of the recognition model through commonly used machine learning algorithms [6]. Gibson et al. introduced and evaluated a system for remote health monitoring, which can realize fall detection and diagnosis based on acceleration sensors [7]. Since there is currently little research work on combined classifiers, one of the contributions of the article is to use a combination of multiple classifiers with different attributes to improve the performance of the system and improve the single classifier system through majority voting.

The threshold-based fall detection method uses whether the acceleration peak, valley, or other characteristic value reaches a predetermined threshold to determine whether a fall has occurred. They can detect when a fall occurs; however, the false alarm rate is an important issue for this type of algorithm. Thakkar and Pareek discussed various ML and DL technologies of human behavior recognition (HAR) from 2011 to 2019 and gave the advantages and disadvantages of action representation methods, dimensionality reduction methods, and action analysis methods [8]. Dhiman and Vishwakarma proposed a human action recognition framework with invariable depth of view, which integrates two important action cues: motion and shape time dynamics (STD) [9]. Ludl et al. introduced a modular simulation framework that provides training and verification algorithms in various human-centered scenarios. Laboratory experiments show that based on motion capture data and 3D avatars, you can train with only simulation data. A recurrent neural network achieves almost perfect results in classifying human actions on real data [10]. Du and Mukaidani discussed a two-stream structure human action recognition method based on a linear dynamic system, proposed a dual-stream deep feature extraction framework based on a preprocessed convolutional neural network, and verified the effectiveness of the method [11]. Sedmidubsky and Zezula proposed an evaluation procedure for 3D human action recognition to determine the best combination in a very effective way [12].

3. Methodology

3.1. Data Collection Scheme of Lower Limb Human Motion of Ethnic Dance

The movement of the lower limbs of the human body is completed by the driving force generated by the contraction of skeletal muscles to make the joints move [13]. The movement of the lower limbs of the human body mainly drives the movement of the ankle, knee, and hip joints through muscle contraction, thereby driving the bone movement and finally completing the corresponding gait movement. The form of motion of the joint is mainly rotation, and in the three motion modes of the ankle and hip joints, bending/extension has a larger range of motion, while the knee joint can only perform bending/extension motion. Joint movement under a specific action will also drive the bone to produce corresponding acceleration [14]. Therefore, these joint angle information and acceleration information of specific parts can be used to analyze the motion of the human body.

In order to recognize actions, the information of these actions should first be converted into electrical signals that can be processed by a computer. Physical interaction signals are used to analyze the lower limb human motion of ethnic dance. Try to use as few sensors as possible to accurately identify human movement intentions. Reducing the number of sensors can not only improve the comfort of the wearer but also reduce the amount of calculation to shorten the calculation time [15]. Due to the symmetry of the human body, only the movement of one leg was analyzed in the study, which helps to extend to both legs.

In the experiment, angle encoders were chosen to be placed at the ankle, knee, and hip joints to detect the angle signals of the joints in the sagittal plane. The Angtron-RE-38 series rotary angle encoder is used to detect the joint angle signal, the range is 0-360°, and the measurement accuracy is less than 0.1°. The MODEL4630 acceleration sensor is used to measure the sagittal acceleration on the underside of the thigh, with a range of ±10 g and a nonlinearity (%FSO) of ±0.1. The -axis and -axis of the acceleration sensor are in the sagittal plane, the -axis is perpendicular to the thigh bone forward, and the -axis is upward along the thigh. In order to determine the start and end time of the gait action, the experiment placed a pressure sensor on the sole of the back foot to segment the movement data of the entire gait cycle. The effective area of the FlexiForce A401 thin-film pressure sensor is 25 mm, the measuring range is 45 kg, the measurement error is ±3%, and the response time is less than 5 ms. The sensor meets the requirements of this experiment and is more comfortable and convenient to wear and at the same time reduces the impact on the normal movement of the human body.

The sensor is connected to the capture card. Transfer the collected data to the computer for analysis and processing.

3.2. Signal Noise Reduction Method

The original signal has noise, especially the acceleration signal noise is more serious, and it appears as a more serious glitch phenomenon on the signal graph [16]. These noises will have a great impact on the feature extraction and the final dance action recognition accuracy. Therefore, these noises must be filtered out as much as possible, while trying not to destroy the original motion information of the signal.

The human body acceleration signal value fluctuates within a certain range and is subject to strong random interference. In this article, moving average filtering is used to filter signal noise.

The following is the principle of the moving average filtering method: first, the continuous sampling data is regarded as a queue with a fixed length of . After a new measurement, the first data of the above queue is removed, the remaining data are moved forward in turn, and the new sampled data is inserted as the tail of the new queue. Finally, arithmetic operations are performed on this new queue, and the result is the final result of this measurement.

The original signal data is recorded as

The data processed by the moving average filtering method is

The corresponding relationship between and is

is the number of adjacent sampling points used by the average value.

3.3. Signal Segmentation Processing

The human motion signal collected by the sensor is data composed in time sequence, which belongs to a typical time sequence. In order to use these time series data to quickly identify the motion of the human body, it is first necessary to segment it to obtain a data segment containing the corresponding motion. The result of data segmentation has a greater impact on the accuracy and real-time performance of the action recognition.

According to the experimental characteristics of this article, the important point segmentation method is selected. Segmenting the human body motion data segment contains only one complete gait cycle. The data fragment not only contains a complete gait movement information but also does not exceed the redundant data after the movement, so it is very suitable for real-time analysis of dance movements.

3.4. Action Feature Extraction Method

The human body motion information obtained by the sensor contains too much data. It is very time-consuming and inefficient to directly use these signal data to analyze the human body motion. In order to process these sensor data efficiently, the method of extracting signal characteristics is generally used to solve this problem, and the extracted characteristics are used to replace the signal itself to achieve the purpose of simplifying the signal. From another perspective, feature extraction is also a method of compression and dimensionality reduction of the original signal. While extracting signal features to reduce the dimensionality of the time series, the important information contained in the signal itself must be kept as much as possible. In the process of behavior expression, the posture of the object breeds spatial information, and the motion information is reflected in time and space. Therefore, the temporal dynamic information is very important for behavior expression. Therefore, the temporal feature extraction method is used in this article for the feature extraction of human motion information.

4. The Lower Limb Human Motion Recognition Model of Ethnic Dance

4.1. Data Collection Experimental Results of Lower Limb Human Motion of Ethnic Dance

In order to obtain objective and reasonable data, a total of 6 folk dance performers from rural areas were recruited in this collection process, aged between 29 and 49, and none of their lower limbs had any disease. Among them, 3 men are between 170 and 185 cm in height; 3 women are between 155 and 172 cm in height.

Before the start of the experiment, the tested persons were instructed, and their gait speed and other parameters were standardized to ensure that the movement data was collected under normal gait actions. The tested persons were all familiar with the procedures and precautions of the entire testing process. After fixing the sensor to the person to be detected, the movement information of the flexion, extension, abduction, external rotation, and internal rotation in the ethnic dance in rural areas is detected.

During the experiment, take appropriate rest according to the state of the tested person to prevent fatigue from adversely affecting the exercise data. Obviously, abnormal signals were removed according to the law of signal images, 120 combined grid data were obtained for each tester in each gait, and a total of 3,600 sets of data were obtained for 5 gaits. Each group of data contains 3 pieces of angle signal data, 2 pieces of acceleration sensor data, and 1 piece of pressure sensor data. In the experiment, the acquisition frequency of the sensor is set to 50 Hz, the data processing platform is Lenovo PC, the main frequency is 2.5 GHz, the memory is 8 GB, and the software platform is MATLAB R2016b.

Take the knee joint angle signal as an example. Figure 1 shows the knee joint angle signal during the lower limb human motion of these five kinds of ethnic dance. It can be seen intuitively from the figure that the joint angle signals have obvious differences under these 5 different actions, so these signals can be used to identify the actions of the human body’s lower limbs.

4.2. Experimental Results of Signal Noise Reduction

The real-time nature of the signal is very important. The principle of selecting the value is to reduce the value as much as possible on the basis of achieving the noise reduction effect. In order to find the appropriate value of , this paper uses different values of for comparative experiments. Taking the acceleration in the -axis direction of the thigh as an example, the filtering effect of the signal is obtained through experiments. The study found that when , the processed signal has poor smoothness, and when , the processed signal makes the original characteristics of the signal inconspicuous. After comprehensively considering the signal characteristics such as smoothness and peaks, it is found that the filtering effect of the moving average method is better when . Therefore, the value of is set as 5 in this paper.

4.3. Feature Extraction and Optimization of Lower Limb Human Motion Signal of Ethnic Dance
4.3.1. Feature Standardization

We strive to achieve a higher accuracy of action recognition with a small amount of calculation. When analyzing sensor data, select time-domain features that have better real-time performance and are easy to analyze and process. The specific time-domain features selected include mean, median, variance, skewness, maximum, and minimum.

Since the dimensions of the signals collected by different types of sensors are different, there are many types of features extracted from the same type of sensor signals. This leads to large numerical differences between the extracted features. If these features are directly used for classification and recognition, the role of features with a larger value will be amplified and become the main factor, and the role of features with a smaller value will be reduced and become a secondary factor. This makes those features with good classification performance but small values fail to exert their excellent performance, which will cause greater interference to the overall classification and recognition performance. In order to solve this problem, different types of features need to be standardized to a certain range. In this article, each type of feature is standardized by feature category. The formula is

is the average value of this type of feature, and is the standard deviation.

4.3.2. Feature Dimensionality Reduction

Multiple sensors are used to detect dance moves in this article. Except for the pressure sensor, there are 5 signals in total. Six features are extracted for each sensor signal, and there are 30 features in total. There are still too many such features, and the information expressed between different features is likely to overlap. The higher feature dimension not only results in a substantial increase in the amount of calculation but also is not conducive to reducing the redundant information of the feature, thereby reducing the classification performance. In order to further reduce the feature dimensions to obtain features that are more suitable for classification, it is necessary to perform dimensionality reduction processing on these feature vectors.

There are two main ways of feature dimensionality reduction. One is feature screening. The key issue is how to determine the importance of features and how to select them. The other is the combination optimization of features. The original features are reorganized through transformation to produce new features, that is to say, the new features are a function of the original features. This article uses the latter to optimize features.

Principle component analysis (PCA) is used to optimize the combination of feature vectors to achieve the purpose of dimensionality reduction processing. PCA is a widely used and efficient dimensionality reduction method. It uses the covariance matrix according to the principle of maximizing variance to convert multiple original features into a small number of representative and better integrated features, so as to reduce the dimensionality of the feature vector. It is generally ensured that these reconstructed comprehensive features can reflect more than 85% of the information contained in the original features and that the various features are independent of each other to avoid overlap of information.

Write the standardized features obtained from the training set data in the form of (3-4) matrix .

Take each signal feature as a column, and take the number of samples collected as a row. The number of rows is the number of samples, and the number of columns is 30.

Input the original feature data to calculate

Arrange the calculated eigenvalues from large to small, and the first five eigenvalues are 9.0624, 6.6358, 6.0065, 3.1528, and 1.0865. According to the principle of selection of principal components, this article selects the first five principal components, and the total contribution rate can be obtained:

After inputting the collected data, the eigenvector corresponding to the eigenvalue is calculated as shown in Table 1. Among them, the feature column vector corresponds to the feature value , each column corresponds to the coefficient of the respective principal component, and each row value corresponds to the weight of the original feature of the corresponding sensor signal in the principal component. Substituting the values of the eigenvectors in Table 1 into formula (8), the corresponding 5 principal components can be obtained.

The contribution rate of each principal component is 30.38%, 22.19%, 20.08%, 10.54%, and 3.67%, and the contribution rate histogram is shown in Figure 2. The PCA method is used to compress 30 features into 5 comprehensive features. Moreover, the compressed 5 kinds of comprehensive features contain 86.86% of the information of the original features.

In order to determine the classification performance of the acquired features, this paper conducts a comparative experiment on the features before and after optimization. The features proposed from the same training group and verification group signals will also use the SVM algorithm (using radial basis kernel function) to train and recognize the action model. The results shown in Table 2 are obtained. The results show that the feature extraction scheme used in this paper has obvious advantages in the accuracy and efficiency of recognition.

The overall process of national dance movement recognition data processing is shown in Figure 3.

4.4. The Lower Limb Human Motion Recognition Model of Ethnic Dance
4.4.1. SVM Model

SVM is a very powerful model that perform well on a variety of datasets. SVM allows the decision boundary to be complex, even if the data has only a few features. It performs well on both low-dimensional data and high-dimensional data, meeting the needs of the design of this paper. Therefore, the SVM model was selected to classify and recognize the five movements of the lower limb human motion of ethnic dance. SVM is a two-classifier. Suppose there are samples, is the sample point, and the category label is

For positive samples,

For negative samples,

So, in summary,

But in actual use, most samples are not so obvious, and there may be a small amount of confusion between the two types of samples. In order to solve this problem, the researchers introduced the concepts of slack variables and penalty factors to deal with samples that violated the inequality (13), thereby obtaining the following optimization problem:

is a slack variable, and a small amount of misclassification is allowed, . is the penalty factor, which is a manually set parameter greater than 0, used to punish samples that violate the inequality, so as to avoid sample misclassification caused by slack variables. By introducing slack variables and penalty factors, the application value of SVM is further improved.

There are 10 SVM subclassifiers. The training and use process of the SVM recognition model is shown in Figure 4.

The MATLAB platform is used to build the SVM motion recognition model. To use SVM, the kernel function must be selected first, but there is still no unified standard for the selection rules of kernel function. In order to select the appropriate kernel function, this paper uses the same training group and validation group data to compare and analyze the performance of the kernel function in the SVM model. The experimental results are shown in Table 3. The results show that, for the action recognition model in this paper, the radial basis kernel function has the highest recognition rate, and at the same time, it is better in recognition efficiency. Therefore, this paper finally chooses the radial basis kernel function in the SVM model.

After determining the structure and kernel function of the SVM recognition model, the database containing the training set and the verification set is input into the SVM model for model training and optimization. First, the 5-dimensional feature vector extracted from the training dataset is input into the SVM model for training, so as to optimize the parameters in the model to improve the classification performance. After completing the training of the model, use the validation data to test the recognition accuracy of the SVM model. The main algorithm steps are as follows: (1)Import feature database(2)Select the radial basis kernel function(3)Find the optimal and parameters(4)Use training data to train the recognition model to determine various parameters(5)Test the SVM recognition model(6)Draw a classification effect diagram

After the above steps, the training task of the SVM action recognition model is finally completed, and then, the test group motion data is input to the model for final recognition effect verification. Experiments show that the model has an average recognition rate of 85.6% for the five actions of flexion, extension, abduction, external rotation, and internal rotation (codenamed 1~5). The recognition rate of each action is shown in Table 4.

4.4.2. MCM

There will be some misrecognition in the recognition of human lower limb movements, especially for similar movements. There are specific rules for the transition between the movements of the lower limbs of the human body. We use this law to improve the accuracy of human lower limb human motion recognition. The Markov chain model (MCM) is used to solve this problem.

Set 5 gait movements: flexion, extension, abduction, external rotation, and internal rotation. The code names are 1~5. Remember the corresponding probability of each action at the last moment:

The corresponding probability of each action at the current moment is

The transition probability between gait actions is

Therefore, the predicted probability of each action at the current moment is

The accuracy of the current gait action probabilities predicted by MCM using the previous action probabilities is poor and cannot be directly used as the result of action recognition. This article mainly uses it to optimize the recognition results of SVM and eliminate some obvious misrecognitions, thereby improving the recognition accuracy.

4.5. Model Optimization

Multiply the processed votes corresponding to the current actions obtained by SVM by the probabilities of the current actions predicted by MCM to correct the number of votes obtained by SVM. The action corresponding to the maximum value of the final result is regarded as the current action. The number of votes for each gait action obtained and processed by the SVM recognition model is

The probability of each current action predicted by MCM is

The value after fusion is

The elements are

In order to improve the accuracy of action recognition as much as possible, a condition for optimizing the SVM model using the MCM mechanism is set. The rules are as follows: if the number of votes recognized by the SVM as a certain action is equal to 4 (the maximum number of votes), the result of the action is directly output, and MCM is no longer used for processing; If the number of votes identified as an action is less than 4, the last identification is made. If the number of votes for a certain action is 4, the recognition result of the SVM model is optimized using MCM. In this way, it can optimize the situation where the SVM recognition accuracy rate is low, and it will not affect the recognition result when the SVM recognition accuracy rate is high and finally achieve the purpose of improving the recognition accuracy rate of the lower limb human motion of the human body. The optimized action recognition process is shown in Figure 5.

After determining the action recognition optimization plan, the same experimental data is used to test the effect of MCM on SVM optimization. Similarly, the 5-dimensional feature vector is classified and recognized using the SVM model, and the recognition results of the SVM model are optimized using MCM. The average recognition accuracy is improved to 93.4%. The accuracy comparison of these two recognition schemes is shown in Table 5. From the comparison of the histogram, it can be seen directly that the recognition accuracy of the action recognition model optimized by MCM has been greatly improved.

5. Result and Discussion

This paper presents a new sensor-based research on the recognition method of ethnic dance lower limb human motion in rural areas. On the basis of extracting the temporal features of ethnic dance lower limb human motion, the SVM algorithm is used to identify the current action, and the MCM is used to optimize the recognition result. Three angle sensors are used to detect the angle signals of the ankle, knee, and hip joints. An acceleration sensor is placed on the underside of the thigh, and a pressure sensor is placed on the sole of the hind foot. The moving average filtering method with a small amount of calculation is used to filter the signal noise. The time-domain feature extraction method is used to extract each signal feature. The principal component analysis method is adopted to reduce the dimension of the feature vector. The SVM model was used to identify the five lower limb human motions of ethnic dance, with an average recognition accuracy rate of 85.6%. MCM theory is used to optimize the recognition results of the SVM model. Experimental results prove that the optimized action recognition model has a higher accuracy rate, with an average recognition accuracy rate of 93.4%.

This paper is an exploratory research and proposes an ethnic dance lower limb human motion recognition scheme in rural areas. In addition, it is important that the number of sensors can be appropriately increased without excessively increasing the amount of calculation to improve the accuracy of action recognition.

Data Availability

The dataset can be accessed upon request.

Conflicts of Interest

The author declares no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Guangxi University Humanities and Social Science Key Research Base “Research on Art and Culture Construction of New Rural Communities in Guangxi—Taking Southeast Guangxi as an Example.” The project number is 2020YJJD0009.