Abstract

Due to the Corona Virus Diseases (COVID-19) pandemic, education is completely dependent on digital platforms, so recent advances in technology have made a tremendous amount of video content available. Due to the huge amount of video content, content-based information retrieval has become more and more important. Video content retrieval, just like information retrieval, requires some pre-processing such as indexing, key frame selection, and, most importantly, accurate detection of video shots. This gives the way for video information to be stored in a manner that will allow easy access. Video processing plays a vital role in many large applications. The applications required to perform the various manipulations on video streams (as on frames or say shots). The high definition of video can take a lot of memory to store, so compression techniques are huge in demand. Also, object tracking or object identification is an area where much considerable research has taken place and it is in progress.

In the current scenario, with the rapid development of multimedia technology, the amount of video data available every day is enormous and is increasing at a high rate. Video is the most consumed data type on the Internet, such as on YouTube, Vimeo, Dailymotion, Yahoo Video, and social networking sites like Facebook, Twitter, Instagram, etc. Searching for videos can be text-based or content-based. The text-based approach is more timeconsuming and populates the database with lot of data. Therefore, an efficient retrieval process is a contentbased retrieval, which gives more appropriate results. The explosive growth in video content leads to the problem of content management. The high definition of video can take a lot of memory to store, so compression techniques are a huge requirement. Also, object tracking or object identification is an area where much considerable research has taken place and is in progress. The retrieval of a video frame from a large database is also possible in video processing. This can be the next step in Content- Based Image Retrieval (CBIR). The formation of video is made from frames, and by combining them, it becomes a shot. The identification of shot(s) is a research area, which found very useful for much of real-time applications. Shot Boundary Detection (SBD) is also necessary for the retrieval of the desired frame from the video database.

For automatic video indexing and browsing, Shot Boundary Detection (SBD) is required. It can be used for a variety of purposes, including video database indexing and video compression, among others. The basic building block of video is the frame. Figure 1 depicts a video's structure. The frame number is used to index the frame sequences. The frames that are left behind after cutting the video are all the same size. Usually, 25 to 30 frames are taken every second. A video shot is a collection of connected frames captured by a single camera at a time. Typically, pictures are stitched together to create a video. A scene in a video might be made up of one or more shots that tell a specific story. The recognition of the visual disparity brought on by transitions serves as the foundation for Shot Boundary Detection. The discrepancy between two frames is typically discovered during a shot change. This dissimilarity manifests in various ways that can be divided into two categories, such as abrupt (as a hard cut) and gradual (dissolve, fade in, fade out, wipe). A single frame shows a sudden change in shot.

The shot is a series of interrelated, consecutive frames taken continuously by a single camera and representing continuous action in time and space. The shots are considered primitives for higher-level content analysis, indexing, and classification.

Abdulhussain et al. (2018) develops an automatic detection of the shot boundaries in a stream of video is called "shot boundary detection." This is dealing with the detection of transitions between the shots in digital video for temporal segmentation, as it is required for purposes like video content analysis, video browsing, and video retrieval, which are content-based. Here, interrelated consecutive frames are taken continuously by a single camera, which gives continuous action in time and space. The shots are efficient for indexing and may be for higher-level content analysis as well as classification ( Hannane et al., 2016 ; Janwe & Bhoyar, 2013 ). Video shot boundary detection is the process of identifying the transition between two successive shots. The shot boundary is basically a connection point between one shot and another. Shot boundary detection deals with two types of shot transitions ( Bi et al., 2018 ).

A shot detection method based on objects was proposed by Heng & Ngan, 2001 . In this paper, a time stamp transfer technique from numerous frames was proposed as a means of discovering information. This method effectively handled gradual changes. This method outperformed more conventional algorithms in several ways.

Lu and Shi (2013 ) developed a Video Shot Boundary Detection approach based on segment selection and Singular Value Decomposition (SVD) with Pattern Matching. In this paper, adaptive thresholds were used to calculate the position of shot borders and the length of gradual transitions, and the majority of non-boundary frames were simultaneously eliminated.

Figure 2 shows an abrupt transition, it indicates a sudden transition, i.e., one frame belongs to the first shot and the next frame belongs to the second shot ( Zheng & Zhang, 2016 ). In this transition, shot changes are clearly visible. It is also known as "hard cuts" or "simple cuts.”

Figure 3 shows a gradual transition, in this transition, the change in visual content takes place slowly and continues over a few frames ( Janwe & Bhoyar, 2013 ; Wu & Xu, 2014 ). Fade-in, fade-out, wipe, and dissolve are some of the types of gradual transitions shown in Figure 4.

Fade-in is when darkness slowly turns into an image, and then it's called fade-in. This is normally used when a film begins. In Figure 4, middle section shows the example of fade-in and fade-out.

Fade-out is when an image slowly turns into darkness, and then it's called fade-out. This is normally used when a film ends.

A wipe is a transition in which each pixel of the first shot is replaced by a pixel of the second shot in an organized way, which can be circular, diagonal, vertical, or horizontal. Figure 4 bottom section shows the wipe transition.

Dissolve is also called lapping. Dissolve is a gradual transition from one image to another, with the first image beginning to disappear and the second image gradually appearing shown in the top section in Figure 4.

The video segmentation, setting the length of shots, and feature extractions are the terms that are needed to be implemented here for achieving the desired task ( Karpagavalli et al., 2020 ). It is classified into two criteria,

The detection is excellent for hard cuts as well as gradual transitions.

Consistent performance for any arbitrary sequence, with very few manual interventions for tuning the parameters.

To achieve accurate video segmentation results, it must select appropriate thresholds. For different types of video sources, shot transition thresholds vary widely. For example, a cartoon's threshold is usually higher than a teleplay. On the other hand, dissolve transition detection is a challenging problem in the shot boundary detection area ( Yi et al., 2012 ).

2.2.1 Adaptive Threshold Technique

The threshold selection should be based on the frame difference. In practice, the extracted frame features can be histograms, edges, motions, etc. Except for shot transitions and camera motions, frame differences are usually caused by three reasons,

The noise is generated when the analog video signal is digitized.

Video causes equipment noise.

Intra-frame

2.2.2 Detection of Dissolve Type Transition

Gradual transition detection especially dissolve transition detection, is a difficult problem in the video-shot boundary detection area. It proposes an integrated algorithm that is composed of two threshold techniques, transparency computation, and a canny operator.

3. Methods and Discussion

The proposed system includes the implementation of basic steps such as the preprocessing of video frames in a stream of video. Then followed the application of precise matching techniques to make a comparison with several techniques, thus choosing the most efficient one to achieve the preliminary tasks. The selection of either thresholding or a classifier to find the matched frames, the analysis of frames from the video database using the most favorable tool(s), the detection of shot boundaries; and the judgment of the algorithm. Figure 5 shows the flow chart in general for the steps to be carried out to achieve the goal.

Figure 5. Flowchart of Proposed System

4. Features Used

Shot Boundary Detection techniques take one or more features from a video frame or a subset of it, referred to as a "Region of Interest" (ROI). The detection of shot change from these features can then be done by an algorithm using various techniques. Almost all shot change detection techniques decrease the video domain's high dimensionality by extracting a small number of features from one or more locations of interest in each video frame. Among these features are,

4.1 Luminance/Color

A Region of Interest (ROI's) average grayscale luminance is the most basic property that may be used to describe it. Utilizing one or more statistics (for example, averages) of the values in an appropriate color space, such as HSV, is a more reliable option.

4.2 Luminance/Color Histogram

Grayscale or color histograms are more detailed features for ROIs than luminance or color histograms. The fact that it is very discriminating, simple to compute, and largely unaffected by translational, rotational, and zooming Camera motion is one of its advantages. It is commonly used for the aforementioned reasons.

4.3 Image Edges

The edge information of an image is a natural choice for describing an ROI. The benefit connects people who perceive a situation visually and are sufficiently invariant to changes in illumination and various types of motion. The key drawbacks are the computational expense, noise sensitivity, and large dimensionality without postprocessing.

4.4 Transform Co-efficients

Transform coefficients, such as the Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), and wavelet, are traditional techniques to characterize the picture data in an ROI. Another benefit of using the Discrete Cosine Transform (DCT) coefficients is that Moving Picture Experts Group-encoded (MPEG) video streams and files already contain it.

4.5 Motion

This is occasionally used as a characteristic for detecting shot transitions, but it is typically paired with others because, on its own, it can be highly discontinuous within a shot (when motion changes quickly), and it is obviously useless when there is no motion in the video.

5. Quality Measures

The gathering and analysis of data, which is typically expressed in terms of measures and metrics, is necessary for the measurement of quality, whether it relates to products or processes. The main goal of measurements is to take control of a task so that it can be managed. Three measures are used to measure the quality of an Shot Boundary Detection (SBD) algorithm, as follows,

Recall: The probability that an existing cut (boundary) will be detected can be calculated by Equation 1.
V=C/(C+M) (1)
where C = correctly measured cuts (correct hits), and M = no. of not-detected cuts (missed hits).

Precision: The probability that an assumed cut is in fact a cut can be calculated by Equation 2.
P=C/(C+F) (2)
where F = falsely detected cuts (false hits).

Combined Measures: F1 is a combined measure that results in a high value if and only if both precision and recall result in high values. It can be calculated using Equation 3.

F1 = 2*P*V/(P+V) (3)

where, V=recall, P=Precision

All these measures are mathematical measures ranging between 0 and 1. If the basic rule is higher the value, the algorithm also gives better performance.

Figure 6 shows cut detection, where 1 denotes a hit that is detected as a hard cut, whereas 2 denotes a missed hit, i.e., a soft cut (dissolve), that was not detected, and 3 denotes a false hit, i.e., one single soft cut that is falsely interpreted as two different hard cuts.

Figure 6. Cut Detection

6. Summary

Table 1 shows the summary of the performance of videoshot boundary detection with different techniques ( Yong et al., 2002 ).

Table 1. Video Shot Boundary Detection Techniques

Conclusion

Despite of a large number of works in shot boundary detection, many issues remain unresolved and required additional research. For a large database, this is the most required method to extract the desired video content. Also, well-ordered and effective management of video documents depends on the availability of indexes. This Shot Boundary Detection (SBD) helps to make the indexing task possible. In addition, it also helps in video browsing.

References

[1]. Abdulhussain, S. H., Ramli, A. R., Saripan, M. I., Mahmmod, B. M., Al-Haddad, S. A. R., & Jassim, W. A. (2018). Methods and challenges in shot boundary detection: a review. Entropy, 20(4), 214. https://doi.org/ 10.3390/e20040214

[2]. Bi, C., Yuan, Y., Zhang, J., Shi, Y., Xiang, Y., Wang, Y., & Zhang, R. (2018). Dynamic mode decomposition based video shot detection. IEEE Access, 6, 21397-21407. https://doi.org/10.1109/ACCESS.2018.2825106

[3]. Hannane, R., Elboushaki, A., Afdel, K., Naghabhushan, P., & Javed, M. (2016). An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram. International Journal of Multimedia Information Retrieval, 5(2), 89-104. https://doi.org/10.1007/s13735- 016-0095-6

[4]. Heng, W.J., Ngan, K.N.: An Object-Based Shot Boundary Detection Using Edge Tracing and Tracking. Journal of Visual Communication and Image Representation 12, 217–239 (2001)

[5]. Janwe, N. J., & Bhoyar, K. K. (2013, December). Video shot boundary detection based on JND color histogram. In 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013) (pp. 476-480). IEEE.

[6]. Karpagavalli, S., Balamurugan, V., & Kumar, S. R. (2020, February). A novel hybrid keypoint detection algorithm for gradual shot boundary detection. In 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE) (pp. 1- 5). IEEE. https://doi.org/10.1109/ic-ETITE47903.2020.343

[7]. Wu, Z., & Xu, P. (2014). A fast gradual shot boundary detection method based on SURF. In Practical Applications of Intelligent Systems (pp. 699-706). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54927-4_66

[8]. Lu, Z.M., Shi, Y.: Fast Video Shot Boundary Detection Based on SVD and Pattern Matching. IEEE Transactions on Image Processing 22(12), 5136–5145 (2013)

[9]. Yi, H., Pengzhou, Z., & Yanfeng, W. (2012, November). Adaptive threshold based video shot boundary detection framework. In 2012 International Conference on Image Analysis and Signal Processing (pp. 1-5). IEEE. https://doi.org/10.1109/IASP.2012.6425020

[10]. Yong, C., Xu, Y., & De, X. (2002, October). A method for shot boundary detection with automatic threshold. In 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM'02. Proceedings. (Vol. 1, pp. 582-585). IEEE. https://doi.org/10.1109/TENCON.2002.1181342

[11]. Zheng, Y., & Zhang, Y. (2016, October). Abrupt shot boundary detection with combined features and SVM. In nd 2016 2 IEEE International Conference on Computer and Communications (ICCC) (pp. 409-413). IEEE. https://doi.org/10.1109/CompComm.2016.7924733

TECHNIQUES FOR DETECTING VIDEO SHOT BOUNDARIES: A REVIEW

Abstract

Keywords :

INTRODUCTION

1. Objectives

2. Shot Boundary Detection

2.1 Abrupt Transition

2.2 Gradual Transitions