Abstract

In the contemporary world, lack of physical fitness is causing several diseases like heart problems, obesity, high blood pressure, diabetes, etc. Anxiety and depression are also major health concerns today. Regular exercise is mandatory for a healthy lifestyle as it combats health conditions and diseases. Proper physical training is essential to meet specific fitness goals and get real benefits. Fitness coaching facilities are expensive and there is a lack of trained professionals. Due to these reasons, people do the exercises themselves. They often get injured due to improper posture and do not get the expected results. This discourages people to do exercise regularly. Our model proposes a virtual physical trainer that can assist users during their exercise. It can provide real-time quality feedback to the users. During exercise, the user's video is recorded and given as input to the pose estimation and machine learning algorithms. After comparing the results with the original exercise models, corrections are sent to the user instantly as audio.

When a machine is programmed to think like humans and exhibit traits of the human mind like learning and problem solving, it is referred to as Artificial Intelligence (AI). The scope of AI has widely expanded into multiple industries like healthcare, transport, security, making it the fastest growing sector in technology. The top challenges in implementing AI include higher computational power to run AI algorithms and data scarcity.

The inclusion of AI into the fitness industry can revolutionize the user's experience while exercising. In this fast-paced and hectic lifestyle, sparing time to visit a gym or hiring a personal trainer is difficult. It is expensive and not affordable to everyone. At the same time, not doing exercise regularly, affects people's health negatively. Including AI in fitness is a possible solution to this problem. A smart vision-enabled fitness application integrated with AI can intensely help us to achieve our fitness goals. The user gets a one-to-one experience where the personalized AI trainer gives realtime feedbacks to correct exercise posture thereby giving better results.

Several other modules can be incorporated to build a completely efficient system.

In this paper, we propose our new AI model that can assist fitness freaks during their exercise. It also discusses the procedure followed to achieve the proposed design.

A noticeable number of researchers have contributed towards technology-based solutions in the area of health care for its remarkable benefits. The growth, adoption, and conglomeration of innovative technologies such as cloud computing and machine learning have been assured by recent reports released by information industry experts (Han et al., 2017; Lin et al., 2014).

1.1 Fitness and e-Health Care

The idea of using Artificial Intelligence in fitness has been introduced by Toshev and Szegedy (2014) where they used a deep neural network to improve pose detection, finding the location of each joint in human body using regression on CNN features. This model did not provide any results from real-time data.

Bogo et al. (2016) estimated 3D pose, as well as 3D mesh shape, using just single RGB images. But the accuracy is lesser than the OpenPose model which came after this model. Cao et al. (2017) used Part Affinity Fields to estimate poses of multiple people in a scene in real-time without the need to identify individual people. Cao et al. (2017) have open-sourced their work as a project called OpenPose, which we utilize for Pose Trainer.

Bourdev and Malik (2009) and Nadeem et al. (2020) presented a new methodology for human action detection and recognition using a combination of Linear Discriminant Analysis (LDA) and Artificial Neural Network (ANN). Human body parts are detected from the human silhouette and these body parts are used to generate multidimensional features with the help of linear discriminant analysis.

Han et al. (2017) proposed a model that collected the user's posture during rehabilitation exercise after injury. They designed the Deep Neural Network (DNN) model for the posture correction algorithm. It is a network composed of fully-connected layers and uses the super vector that concatenates a set of vectors for skeletons of the user as an input (Duffner & Garcia, 2013). It finds out the wrong body part from the incorrect posture of the user.

Nagarkoti et al. (2019) proposes a system to detect minute errors in limb positions during exercise. CNN model is trained on the COCO dataset (Lin et al., 2014) for human pose estimation, further fine-tuning using data targeted towards physical workouts yields much better accuracy for body-parts detection.

1.2 Moving Towards Smart Fitness

The AI models proposed so far do not provide real-time results. There are no proper ways to handle and generalize the parameters like the height of the person, camera angle, etc. Also, there are no mechanisms to track the performance of the user. In this paper, we propose an AI model where,

This paper proposes a smart cloud-based solution for realtime fitness monitoring. The Figure 1 represents the overall architecture of the proposed model.

Figure 1. The Overall Architecture Diagram of the Proposed Model

2.1 Client System Layer

On the user's end, a mobile device or a webcam is required to capture the entire session of the exercise performance. The user is required to give access to the camera to use this fitness application. Initially, the user is expected to sign in where the details like gender, age, location are collected. The facial datasets of the user are also collected and stored in the database so that every time the user logs in, the performance history can be fetched. The user can select the preferred exercise to do and can start exercising with the camera being on. The video is captured and sent to the pose estimation and machine learning algorithms for processing.

2.2 Cloud System Layer

The video recorded is separated into frames. Each frame is sent as input to the pose estimation algorithms and the pose coordinates are extracted. This is stored in an array and stored in JavaScript Object Notation (JSON) format. The data is normalized and cleaned. Normalization is done using the formula x-mean/standard deviation.

(1)

A smoothing algorithm is used to remove the noise. From the poses, a continuous-time series with the coordinate values of each pose is produced.

The process is carried out in 5 steps (Kyaagba, 2018).

Step 1: Divide the two series into equal points.

Step 2: Calculate the Euclidean distance between the first point in the first series and every point in the second series. Store the minimum distance calculated (this is the 'time warp' stage).

Step 3: Move to the second point and repeat 2. Move step by step along with points and repeat 2 till all points are exhausted.

Step 4: Repeat 2 and 3 but with the second series as a reference point.

Step 5: Add up all the minimum distances that were stored and this is a true measure of similarity between the two series.

The ML algorithm calculates the difference of each pose. Based on the difference generated, real-time feedback is sent as output to the user. The user can make rectifications to improve the overall productivity. A feedback report is also sent to the trainer.

3. Procedure

3.1 Pose Estimation

The pose estimation algorithm, OpenPose is used in the model to extract the user's exercise poses from the video input. Figure 2 clearly shows the working architecture of the OpenPose model.

Figure 2. OpenPose Model Extracting Poses (Parley Labs, 2020)

The poses of the user's image are extracted from individual frames using the OpenPose algorithm. A JSON file is generated with the pose values and output. The values in JSON file for the sample image is embedded in the input image as shown in Figure 3.

Figure 3. The Output from the Pose Extraction Model

The pseudo-code of OpenPose is as follows:

Begin

import OpenPose library

Get the user's input video

For each frame in video

Give the input image to OpenPose and extract poses

Append the result in an array

Export the output to the next module

End

3.2 Data Normalisation and Cleaning

The Figure 4 represents the overall flow of the normalizing module. The data is normalized using Equation (1). After the data is normalized it is cleaned using the smoothing algorithm. The Figure 5 represents the plotted data of the estimated pose before normalization. The Figure 6 represents the plotted smooth data of the estimated pose after normalization.

Figure 4. Data Normalization Module

Figure 5. Data Before Normalization and Cleaning

Figure 6. Data After Normalization and Cleaning

3.3 Converting Data into Time Series

A continuous-time series is produced from the poses with the coordinate values of each pose. This time series is used to represent an exercise. The exercise is evaluated using the time series. The Figure 7 represents the example of time series generation for the extracted pose.

Figure 7. Time Series Generation for Each Extracted Pose

3.4 Evaluating Exercise Using ML

The pre-trained model is trained with the dataset and it has the ideal value for the correct poses. Now dynamic time wrapping (Buchanan, 2006; Kyaagba, 2018; Müller, 2007) is performed in the time series value. Now the output from the dynamic time wrapping is fed into the ML model and it calculates the difference of each pose.

4. Result and Discussions

The real-time feedback is generated by comparing the time series with the pre-trained models. The machine learning algorithm generates the feedback in text format. The feedback is generated based on the values produced in the dynamic time wrapping module. This feedback is converted to audio and delivered to the user as the result. So, using this feedback the user can modify his exercise as per the guidance provided by the virtual AI trainer. Figure 8 shows how the real-time feedback produced in our proposed model would appear. This proposed model can correctly predict exercise accuracy for trained exercises, even if there is a slight error in posture.

Figure 8. Feedback Generation (Zou et al., 2019)

Many people exercise, but due to improper poses, they do not get the expected results. It may even lead to potential injuries. We have proposed an automated trainer which can assist the user by providing real-time meaningful feedback regarding the quality of exercise. The proposed system can be extended to multiple exercise and the accuracy of the feedback can be improved.

Conclusion

The rapid development of knowledge in health domain reveals that several health disorders are due to lack of fitness. It is vital to incorporate new technologies to enhance user experience and assist people during exercise. This work proposed a real-time feedback system with Smart Vision that uses machine learning and cloud technology to enable the user to train in the correct position and improve the quality of each movement, giving better results in much less time. In addition, our model offers users the smart ability to receive quality feedback in real-time in the form of voice messages, messaging and communication with gym coaches, generating alarms for incorrect postures, and reporting and assistance from fitness centers. Pose estimation algorithms using OpenPose had been included in this work for detecting human postures. It is a cost effective measure, as we no more need a gym coach or an actual trainer. This model can prove to be very useful for users exercising at home due to its greater convenience and professional guidance.

References

[1]. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., & Black, M. J. (2016, October). Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision (pp. 561-578). Cham: Springer. https://doi.org/10.1007/978-3-3 19-46454-1_34

[2]. Bourdev, L., & Malik, J. (2009, September). Poselets: Body part detectors trained using 3D human pose annotations. In 2009, IEEE 12^th International Conference on Computer Vision (pp. 1365-1372). IEEE. https://doi.org/ 10.1109/ICCV.2009.5459303

[3]. Buchanan, A., & Fitzgibbon, A. (2006, June). Interactive feature tracking using kd trees and dynamic programming. In 2006, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 1, pp. 626-633). IEEE. https://doi.org/10. 1109/CVPR.2006.158

[4]. Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7291-7299).

[5]. Duffner, S., & Garcia, C. (2013). PixelTrack: A fast adaptive algorithm for tracking non-rigid objects. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2480-2487). https://doi.org/10.1109/ ICCV.2013.308

[6]. Han, S. H., Kim, H. G., & Choi, H. J. (2017, February). Rehabilitation posture correction using deep neural network. In 2017, IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 400-402). IEEE. https://doi.org/10.1109/BIGCOMP.2017.7881743

[7]. Kyaagba, S. (2018, September 7). Dynamic Time Warping with Time Series. Medium. Retrieved from https:// medium.com/@shachiakyaagba_41915/dynamic-timewarping- with-time-series-1f5c05fb8950

[8]. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014, September). Microsoft COCO: Common objects in context. In European Conference on Computer Vision (pp. 740-755). Cham: Springer. https://doi.org/10.1007/978-3-319-10602-1_48

[9]. Müller, M. (2007). Dynamic time warping. In Information Retrieval for Music and Motion, (pp.69-84). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3- 540-74048-3_4

[10]. Nadeem, A., Jalal, A., & Kim, K. (2020, February). Human actions tracking and recognition based on body parts detection via Artificial neural network. In 2020, 3^rd International Conference on Advancements in Computational Sciences (ICACS) (pp. 1-6). IEEE. https://doi.org/10.1109/ ICACS47775.2020.9055951

[11]. Nagarkoti, A., Teotia, R., Mahale, A. K., & Das, P. K. (2019, July). Realtime indoor workout analysis using machine learning & computer vision. In 2019, 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 1440-1443). IEEE. https://doi.org/10.1109/EMBC.2019.8856547

[12]. Parley Labs (2020, January 6). Exploration: Pose Estimation with OpenPose and PoseNet — Parley Labs. Medium. Retrieved from https://parleylabs.medium.com/ exploration-pose-estimation-with-openpose-and-pose net-parley-labs-d7f21b541774

[13]. Toshev, A., & Szegedy, C. (2014). DeepPose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1653-1660). https://doi.org/10.1109/ CVPR.2014.214

[14]. Zou, J., Li, B., Wang, L., Li, Y., Li, X., Lei, R., & Sun, S. (2018, November). Intelligent fitness trainer system based on human pose estimation. In International Conference on Signal and Information Processing, Networking and Computers (pp. 593-599). Springer, Singapore. https://doi. org/10.1007/978-981-13-7123-3_69

Smart Vision-Enabled Real-Time Monitoring For Fitness Freaks With A Feedback System