References
                                               [1]. Babri, H. A., & Tong, Y. (1996, June). Deep
feedforward networks: application to pattern recognition.
In Proceedings of International Conference on Neural
Networks (ICNN'96), 3, 1422-1426. https://doi.org/10.1109/ICNN.1996.549108
 [2]. Baum, E. B. (1988). On the capabilities of multilayer
perceptrons. Journal of Complexity, 4(3), 193-215. https://doi.org/10.1016/0885-064X(88)90020-9
 [3]. Baum, L. E., & Petrie, T. (1966). Statistical inference for
probabilistic functions of finite state Markov chains. The
Annals of Mathematical Statistics, 37(6), 1554-1563.
 [4]. Bayes, T. (1763). LII. An essay towards solving a
problem in the doctrine of chances: By the late Rev. Mr.
Bayes, FRS communicated by Mr. Price, in a letter to John
Canton, A.M.F.R.S. Philosophical transactions of the Royal
Society of London, 53(53), 370-418. https://doi.org/10.1098/rstl.1763.0053
 [5]. Bellis, M. (2021). The History of Photography: Pinholes
and Polaroids to Digital Images. Retrieved from https://
www.thoughtco.com/history-of-photography-and-thecamera-1992331
 [6]. Block, H. D. (1970). A review of “perceptrons: An
introduction to computational geometry”. Information
and Control, 17(5), 501-522. https://doi.org/10.1016/S0019-9958(70)90409-2
 [7]. Block, H. D. (1970). A review of “perceptrons: An
introduction to computational geometry. Information
and Control, 17(5), 501-522.
 [8]. Boutaba, R., Salahuddin, M. A., Limam, N., Ayoubi, S.,
Shahriar, N., Estrada-Solano, F., & Caicedo, O. M. (2018).
A comprehensive survey on machine learning for
networking: evolution, applications and research
opportunities. Journal of Internet Services and
Applications, 9(1), 1-99. https://doi.org/10.1186/s13174-018-0087-2
 [9]. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A.
(1984). Classification and Regression Trees. CRC press,
Boca Raton, Florida.
 [10]. Buczak, A . L. (2005). U. S. Patent No. 6, 922, 680.
Washington, DC: U. S. Patentand Trademark Office.
 [11]. Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn:
Delving into high quality object detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 6154-6162).
 [12]. Chao, Y. W., Wang, Z., Mihalcea, R., & Deng, J.
(2015). Mining semantic affordances of visual object
categories. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 4259-4267).
 [13]. Chow, C. K. (1957). An optimum character
recognition system using decision functions. IRE
Transactions on Electronic Computers, EC-6(4), 247-254.
https://doi.org/10.1109/TEC.1957.5222035
 [14]. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2015,
June). Gated feedback recurrent neural networks. In
International Conference on Machine Learning (pp. 2067-2075). PMLR.
 [15]. Cook, W. A. (1989). Case Grammar Theory.
Georgetown University Press.
 [16]. Cortes, C., & Vapnik, V. (1995). Support vector
machine. Machine Learning, 20(3), 273-297.
 [17]. Cox, D. R. (1958). The regression analysis of binary
sequences. Journal of the Royal Statistical Society: Series
B (Methodological), 20(2), 215-232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
 [18]. Dalal, N., & Triggs, B. (2005, June). Histograms of
oriented gradients for human detection. In 2005 IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR'05), 1, 886-893. https://doi.org/10.1109/CVPR.2005.177
 [19]. Evgeniou, T., & Pontil, M. (1999, July). Support vector
machines: Theory and applications. In Advanced Course
on Artificial Intelligence (pp. 249-257). Springer, Berlin,
Heidelberg. https://doi.org/10.1007/3-540-44673-7_12
 [20]. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively
trained part-based models. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 32(9), 1627-1645.
https://doi.org/10.1109/TPAMI.2009.167
 [21]. Fix, E. & Hodges, J. L. (1951). Discriminatory Analysis,
Nonparametric Discrimination: Consistency Properties.
Technical Report 4, USAF School of Aviation Medicine,
Randolph Field.
 [22]. Floyd, R. W., & Beigel, R., (1994). The Language of
Machines: An Introduction to Computability and Formal
Languages. Computer Science Press, New York.
 [23]. Gagniuc, P. A. (2017). Markov Chains: From Theory
to Implementation and Experimentation. John Wiley &
Sons, Inc, (pp. 256).
 [24]. Gallant, S. I. (1990). Perceptron-based learning
algorithms. IEEE Transactions on Neural Networks, 1(2),
179-191.
 [25]. Gan, C., Yang, T., & Gong, B. (2016). Learning
attributes equals multi-source domain generalization. In
Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (pp. 87-97).
 [26]. Girshick, R. (2015). Fast r-cnn. In Proceedings of the
IEEE International Conference on Computer Vision
(pp.1440-1448). https://doi.org/10.1109/ICCV.2015.169
 [27]. Girshick, R., Donahue, J., Darrell, T., & Malik, J.
(2014). Rich feature hierarchies for accurate object
detection and semantic segmentation. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 580-587).
 [28]. Girshick, R., Donahue, J., Darrell, T., & Malik, J.
(2014). Rich feature hierarchies for accurate object
detection and semantic segmentation. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 580-587).
 [29]. Goldberger, A. S. (2004). Econometric computing
by hand. Journal of Economic and Social Measurement,
29(1-3), 115-117.
 [30]. Graves, A., Mohamed, A. R., & Hinton, G. (2013,
May). Speech recognition with deep recurrent neural
networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6645-6649). IEEE. https://doi.org/10.1109/ICASSP.2013.6638947
 [31]. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE International
Conference on Computer Vision (pp. 2961-2969).
 [32]. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE International
Conference on Computer Vision (pp. 2961-2969).
 [33]. Hebb, D. O. (1949). The Organization of Behavior: A
Neuropsychological Theory. John Wiley and Sons, Inc.,
New York, 1-337.
 [34]. Hebb, D. O. (1949). The Organization of Behavior: A
Neuropsychological Theory. John Wiley and Sons, Inc.,
New York, 1-337.
 [35]. Hebb, D. O. (2002). The Organization of Behavior: A
Neuropsychological Theory. Psychology Press, (pp. 378).
https://doi.org/10.4324/9781410612403
 [36]. Herz, A., Sulzer, B., Kühn, R., & Van Hemmen, J. L.
(1988). The Hebb rule: Storing static and dynamic objects
in an associative neural network. EPL (Europhysics Letters),
7(7), 663. https://doi.org/10.1209/0295-5075/7/7/016
 [37]. Herz, A., Sulzer, B., Kühn, R., & Van Hemmen, J. L.
(1988). The Hebb rule: Storing static and dynamic objects
in an associative neural network. EPL (Europhysics Letters),
7(7), 663. https://doi.org/10.1209/0295-5075/7/7/016
 [38]. Hill, S. (2013). A Complete History of the Camera
Phone. Retrieved from https://www.digitaltrends.com/mobile/camera-phone-history/
 [39]. Hochreiter, S., & Schmidhuber, J. (1997). Long shortterm
memory. Neural Computation, 9(8), 1735-1780.
https://doi.org/10.1162/neco.1997.9.8.1735
 [40]. Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative
description of membrane current and its application to
conduction and excitation in nerve. The Journal of
Physiology, 117(4), 500-544. https://doi.org/10.1113/jphysiol.1952.sp004764
 [41]. Jaynes, E. T. (1957a). Information theory and
statistical mechanics. Physical Review, 106(4), 620.
https://doi.org/10.1103/PhysRev.106.620
 [42]. Jaynes, E. T. (1957b). Information theory and
statistical mechanics. Physical review, 108(2), 171. https://doi.org/10.1103/PhysRev.108.171
 [43]. Jordan, M. I., & Rumelhart, D. E. (1992). Forward
models: Supervised learning with a distal teacher.
Cognitive Science, 16(3), 307-354. https://doi.org/10.1016/0364-0213(92)90036-T
 [44]. Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007).
Supervised machine learning: A review of classification
techniques. Emerging Artificial Intelligence Applications
in Computer Engineering, 160(1), 3-24.
 [45]. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
Imagenet classification with deep convolutional neural
networks. Advances in Neural Information Processing
Systems, 1, 1097- 1105.
 [46]. Kubat, M. (1999). Neural networks: a comprehensive
foundation by Simon Haykin, Macmillan, 1994, ISBN 0-02-
352781-7. The Knowledge Engineering Review, 13(4), 409-412.
 [47]. Lampert, C. H., Nickisch, H., & Harmeling, S. (2009,
June). Learning to detect unseen object classes by
between-class attribute transfer. In 2009 IEEE Conference
on Computer Vision and Pattern Recognition (pp. 951-958). IEEE. https://doi.org/10.1109/CVPR.2009.5206594
 [48]. LeCun, Y., & Bengio, Y. (1995). Convolutional
networks for images, speech, and time series. The
Handbook of Brain Theory and Neural Networks,
3361(10), 1-14.
 [49]. LeCun, Y., Boser, B., Denker, J., Henderson, D.,
Howard, R., Hubbard, W., & Jackel, L. (1989). Handwritten
digit recognition with a back-propagation network.
Advances in Neural Information Processing Systems, (pp-2).
 [50]. Legendre, A. (1805). Nouvelles Méthodes Pour La
Détermination Des Orbites Des Comètes. Nineteenth
Centur y Collections Online (NCCO): Science,
Technology, and Medicine: 1780-1925.
 [51]. Li, L. J., & Fei-Fei, L. (2007, October). What, where
and who? Classifying events by scene and object
recognition. In 2007 IEEE 11th International Conference on
Computer Vision (pp. 1-8). IEEE. https://doi.org/10.1109/ICCV.2007.4408872
 [52]. Liu, B. (2011). Supervised learning. Web Data Mining,
63-132. https://doi.org/10.1007/978-3-642-19460-3_3
 [53]. Machinery, C. (1950). Computing machinery and
intelligence-AM Turing. Mind, 59(236), 433.
 [54]. MacQueen, J. (1967). Some methods for
classification and analysis of multivariate observations.
Berkeley Symposium on Mathematical Statistics
Probability, 5(1), 281-297.
 [55]. Maji, S., Bourdev, L., & Malik, J. (2011, June). Action
recognition from a distributed representation of pose and
appearance. In CVPR 2011 (pp. 3177-3184). IEEE. https://doi.org/10.1109/CVPR.2011.5995631
 [56]. Marchesi, M., Orlandi, G., Piazza, F., Pollonara, L., &
Uncini, A. (1990, June). Multi-layer perceptrons with
discrete weights. In 1990 IJCNN International Joint
Conference on Neural Networks (pp. 623-630). IEEE.
https://doi.org/10.1109/IJCNN.1990.137772
 [57]. Maron, M. E. (1961). Automatic indexing: an
experimental inquiry. Journal of the ACM (JACM), 8(3),
404-417.
 [58]. McCulloch, W. S., & Pitts, W. (1943). A logical
calculus of the ideas immanent in nervous activity. The
Bulletin of Mathematical Biophysics, 5(4), 115-133.
https://doi.org/10.1007/BF02478259
 [59]. Miller, B. L. (1967). Finite Stage Continuous Time
Markov Decision Processes with an Infinite Planning
Horizon. RAND Corporation, Research foundation in Santa
Monica, California. (pp. 26).
 [60]. Ng, A., & Jordan, M. (2001). On discriminative vs.
generative classifiers: A comparison of logistic regression
and naive bayes. Advances in Neural Information
Processing Systems, 14, (pp. 8).
 [61]. Novikoff, A. B. (1962). On Convergence Proofs on
Perceptrons. Symposium on the mathematical theory of
automata, 615-622.
 [62]. Novikoff, A., (1962). On convergence proofs on
perceptrons. Symposium on the Mathematical Theory of
Automata, 615-622.
 [63]. O'Regan, G. (2013). Marvin minsky. Giants of
Computing, 193-195. https://doi.org/10.1007/978-1-4471-5340-5_41
 [64]. Ono, K., & Kimura, M. (1969). Optimal control of
markov processes. Transactions of the Society of
Instrument and Control Engineers, 5(3), 273-278. https://doi.org/10.9746/sicetr1965.5.273
 [65]. Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K.,
Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016).
Wavenet: A generative model for raw audio. arXiv preprint
arXiv:1609.03499. https://doi.org/10.48550/arXiv.1609.03499
 [66]. Park, J. K., Chen, Y. H., & Simons, D. B. (1979). Cluster
analysis based on density estimates and its application to
landsat imagery (Doctoral dissertation, Colorado State
University. Libraries).
 [67]. Parzen, E. (1962). On estimation of a probability
density function and mode. The Annals of Mathematical
Statistics, 33(3), 1065-1076.
 [68]. Pegler-Gordon, A. (2006). Seeing images in history.
Perspectives on History, 44, 28-31.
 [69]. Raudys, Š. (1998). Evolution and generalization of a
single neurone: I. Single-layer perceptron as seven
statistical classifiers. Neural Networks, 11(2), 283-296.
https://doi.org/10.1016/S0893-6080(97)00135-4
 [70]. Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster RCNN:
towards real-time object detection with region
proposal networks. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 39(6), 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031
 [71]. Riedmiller, M., & Braun, H. (1992). Rprop-a fast
adaptive learning algorithm. In Proc. of ISCIS VII),
Universitat.
 [72]. Rosenblatt, F. (1957). The Perceptron - A Perceiving
and Recognizing Automaton. Cornell Aeronautical
Laboratory, New York.
 [73]. Rosenblatt, F. (1958). The perceptron: a probabilistic
model for information storage and organization in the
brain. Psychological Review, 65(6), 386. https://psycnet.apa.org/doi/10.1037/h0042519
 [74]. Rosenblatt, F. (1960). Perceptron simulation
experiments. Proceedings of the IRE, 48(3), 301-309. https://doi.org/10.1109/JRPROC.1960.287598
 [75]. Rosenblatt, M. (1956). Remarks on some
nonparametric estimates of a density function. The
Annals of Mathematical Statistics, 27(3), 832-837.
 [76]. Rumelhart, D. E., Hinton, G. E., & Williams, R. J.
(1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0
 [77]. Samuel, A. L. (1959). Some studies in machine
learning using the game of checkers. IBM Journal of
Research and Development, 3(3), 210-229. https://doi.org/10.1147/rd.33.0210
 [78]. Samuel, A. L. (1959). Some studies in machine
learning using the game of checkers. IBM Journal of
Research and Development, 3(3), 210-229. https://doi.org/10.1147/rd.33.0210
 [79]. Samuel, A. L. (1967). Some studies in machine
learning using the game of checkers. II—Recent progress.
IBM Journal of Research and Development, 11(6), 601-617. https://doi.org/10.1147/RD.116.0601
 [80]. Schuster, M., & Paliwal, K. K. (1997). Bidirectional
recurrent neural networks. IEEE Transactions on Signal
Processing, 45(11), 2673-2681. https://doi.org/10.1109/78.650093
 [81]. Schwiening, C. J. (2012). A brief historical
perspective: Hodgkin and Huxley. The Journal of
Physiology, 590(11), 2571-2575. https://doi.org/10.1113/jphysiol.2012.230458
 [82]. Simonyan, K., & Zisserman, A. (2014). Very deep
convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
 [83]. Stanfill, C., & Waltz, D. (1986). Toward memorybased
reasoning. Communications of the ACM, 29(12),
1213-1228. https://doi.org/10.1145/7902.7906
 [84]. Steinhaus, H. (1956). Sur la division des corps
matériels en parties. Bulletin L'Académie Polonaise des
Science, 4(12), 801-804.
 [85]. Stigler, S. M. (1981). Gauss and the invention of least
squares. The Annals of Statistics, 9(3), 465-474.
 [86]. Stratonovich, R. L. (1965). 36-Conditional markov
processes. Non-Linear Transformations of Stochastic
Processes, 427-453. https://doi.org/10.1016/B978-1-4832-3230-0.50041-9
 [87]. Sugano, Y., & Bulling, A. (2016). Seeing with humans:
Gaze-assisted neural image captioning. arXiv preprint
arXiv:1608.05203. https://doi.org/10.48550/arXiv.1608.05203
 [88]. Sutton, R. S., & Barto, A. G. (2018). Reinforcement
Learning: An Introduction. The MIT Press, Cambridge. 1-526.
 [89]. Svozil, D., Kvasnicka, V., & Pospichal, J. (1997).
Introduction to multi-layer feed-forward neural networks.
Chemometrics and Intelligent Laboratory Systems, 39(1),
43-62. https://doi.org/10.1016/S0169-7439(97)00061-0
 [90]. Tesauro, G. (2007). Reinforcement learning in
autonomic computing: A manifesto and case studies.
IEEE Internet Computing, 11(1), 22-30. https://doi.org/10.1109/MIC.2007.21
 [91]. Turing, A. M. (1980). Computing Machinery and
Intelligence. Creative Computing, 6(1), 44-53.
 [92]. Turing, A. M. (1980). Computing machinery and
intelligence. Creative Computing, 6(1), 44-53.
 [93]. United States Patent and Trademark Office. (n.d.).
Retrieved from https://www.uspto.gov/
 [94]. Venugopalan, S., Anne Hendricks, L., Rohrbach, M.,
Mooney, R., Darrell, T., & Saenko, K. (2017). Captioning
images with diverse objects. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition (pp. 5753-5761).
 [95]. Wang, C., Yang, H., Bartz, C., & Meinel, C. (2016,
October). Image captioning with deep bidirectional
LSTMs. In Proceedings of the 24th ACM International
Conference on Multimedia (pp. 988-997). https://doi.org/10.1145/2964284.2964299
 [96]. Wang, P. S., Liu, Y., Guo, Y. X., Sun, C. Y., & Tong, X.
(2017). O-cnn: Octree-based convolutional neural
networks for 3d shape analysis. ACM Transactions on
Graphics (TOG), 36(4), 1-11. https://doi.org/10.1145/3072959.3073608
 [97]. Wang, X., Girshick, R., Gupta, A., & He, K. (2018).
Non-local neural networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition (pp. 7794-7803).
 [98]. Weaver, W. (1955). Translation. Machine Translation
of Languages, 14(15-23), 10. 67.
 [99]. Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural
networks. Neural Computation, 1(2), 270-280. https://doi.org/10.1162/neco.1989.1.2.270
 [100]. Woods, W. A. (1969). Augmented Transition
Networks for Natural Language Analysis. Harvard
University, Cambridge.