DED: Drift Principle in Educational Evolved Data
Main Article Content
Abstract
Clustering data streams is one of the prominent tasks of discovering hidden patterns in data streams. It refers to the process of clustering newly arrived data into continuously and dynamically changing segmentation patterns. This article presents a stream mining algorithm to cluster the data stream with focusing on its evolution and concept drift. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the algorithm on the education data stream. Viber education data streams produced by Viber Groups in our Computer Science Department are used to conduct this study. The results show that our proposed algorithm superiority existing ones in purity, entropy, and sum of square error measurements. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters and outliers indicates a significant education event. This kind of online monitoring and its results can be utilized in education systems in various ways, such as present the capabilities of participants.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Tikrit Journal of Pure Science is licensed under the Creative Commons Attribution 4.0 International License, which allows users to copy, create extracts, abstracts, and new works from the article, alter and revise the article, and make commercial use of the article (including reuse and/or resale of the article by commercial entities), provided the user gives appropriate credit (with a link to the formal publication through the relevant DOI), provides a link to the license, indicates if changes were made, and the licensor is not represented as endorsing the use made of the work. The authors hold the copyright for their published work on the Tikrit J. Pure Sci. website, while Tikrit J. Pure Sci. is responsible for appreciate citation of their work, which is released under CC-BY-4.0, enabling the unrestricted use, distribution, and reproduction of an article in any medium, provided that the original work is properly cited.
References
[1] Kumar, D. (2016). Big data Clustering for Smart City Applications. Ph.D. thesis, The University Of Melbourne, Department of Electrical and Electronic Engineering: 122 pp.
[2] Guha, S., Mishra, N., Motwani, R. & O’Callaghan, L. (2000). Clustering Data Streams. 0-7695-0850-2/00 $10.00 0 2000 IEEE 359–366.
[3] Marcos, D. A., Rodrigo, N. C., Silvia, B., Marco, A. S. N. & Rajkumar, B. (2014). Big Data Computing and Clouds: Trends and Future Directions. J. Parallel Distrib. Comput. 1–44 (2014).
[4] Aggarwal, C., Han, J., Wang, J. & Yu, P. (2003). A Framework for Clustering Evolving Data Streams. Proc. 29th VLDB Conf. Ger.
[5] Isaksson, C. (2016). New Outlier Detection Techniques For Data Streams. Ph.D. thesis, Southern Methodist University, Bobby B. Lyle School of Engineering: 154.
[6] Stahl, F., Badii, A., Oldenburg, M. & Theodorstahldfkide, F. Building Adaptive Data
Mining Models on Streaming Data in Real-Time (2020). Comput. Intell. 3, 12.
[7]. Lobo, J. L., Del, J., Eneko, S., Albert, O. & Francisco, B. (2020). CURIE: A Cellular Automaton for Concept Drift Detection. arXiv Prepr. arXiv . 5, 15.
[8] Hu, H., Kantardzic, M. & S. Sethi, T. (2019). No Free Lunch Theorem for concept drift detection in streaming data classification: A review. Br. Assoc. Adv. Sci. 2, 16.
[9] Toor, A. A. et al. (2020). Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems. Sensors MDPI 2, 20.
[10] Yeoh, J. M., Caraffini, F., Homapour, E., Santucci, V. & Milani, A. (2019). A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation. MDPI Mathmatics 2, 1–24.
[11] Ding, S., Wu, F., Qian, J. & Jia, H. (2013). Research on data stream clustering algorithms. Springer Artif Inte, 593–600.
[12] Silva, J., Faria, E., Barros, R., Hruschka, E. & Carvalho, A. (2013). Data Stream Clustering : A Survey. ACM Comput. Surv. 1–37.
[13] Nguyen, H. L., Woon, Y. K. & Ng, W. K. (2015). A survey on data stream clustering and classification. Knowl. Inf. Syst. Springer 535–569. doi:10.1007/s1015-014-0808-1
[14] Bifet, A., Carvalho, A. & Gama, J. (2017). BigData Stream Mining.51(1):24-54.
[15] Sethi, T. S. & Kantardzic, M. (2017). On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82, 77–99, ISBM 09574174.
[16] Udommanetanakit, K., Rakthanmanon, T. & Waiyamai, K.(2007). E-Stream: Evolution-Based Technique for Stream Clustering. Springer-Verlag Berlin 403, 42–55.
[17] Aggarwal, C. C., Han, J., Wang, J. & Yu, P. S. (2004). A Framework for Projected Clustering of High Dimensional Data Streams. Proc. Thirtieth Int. Conf. Very large data bases 30, 863.
[18] Davies, R. N. (2017). Efficient Analysis of Data Streams. M.Sc. thesis, Lancaster University, Department of Computing and Communications).
[19] Yogita & Toshniwal, D. (2012). Clustering Techniques for Streaming Data – A Survey. 3rd IEEE Int. Adv. Comput. Conf. 951–956.
[20] Al Abd Alazeez, A., Jassim, S. & Du, H. (2017). EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data. Proc. 6th Int. Conf. Pattern Recognit. Appl. Methods 173–183. doi:10.5220/0006196901730183
[21] Al Abd Alazeez, A., Jassim, S. & Du, H. (2017). EDDS: An Enhanced Density-Based Method for Clustering Data Streams. 2017 46th Int. Conf. Parallel Process. Work. 103–112 (2017).
doi:10.1109/ICPPW.27
[22] Kremer, H. et al. (2011). An effective evaluation measure for clustering on evolving data streams. Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.-KDD ’11 868–876.
doi:10.1145/2020408.2020555
[23] Cao, F., Ester, M., Qian, W. & Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. Proc. Sixth SIAM Int. Conf. Data Min. 206, 328–339.
[24] Zhao, Y. & Karypis, G. (2001). Technical Report Criterion Functions for Document Clustering: Experiments and Analysis. Univ. Minnesota, Dep. Comput. Sci. / Army HPC Res. Center/ Tech. Rep. 1–30.