DED: Drift Principle in Educational Evolved Data

Main Article Content

Ammar Thaher Yaseen Al Abd Alazeez

Abstract

Clustering data streams is one of the prominent tasks of discovering hidden patterns in data streams. It refers to the process of clustering newly arrived data into continuously and dynamically changing segmentation patterns. This article presents a stream mining algorithm to cluster the data stream with focusing on its evolution and concept drift. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the algorithm on the education data stream. Viber education data streams produced by Viber Groups in our Computer Science Department are used to conduct this study. The results show that our proposed algorithm superiority existing ones in purity, entropy, and sum of square error measurements. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters and outliers indicates a significant education event. This kind of online monitoring and its results can be utilized in education systems in various ways, such as present the capabilities of participants.

Article Details

How to Cite
Ammar Thaher Yaseen Al Abd Alazeez. (2022). DED: Drift Principle in Educational Evolved Data. Tikrit Journal of Pure Science, 26(2), 118–125. https://doi.org/10.25130/tjps.v26i2.128
Section
Articles

References

[1] Kumar, D. (2016). Big data Clustering for Smart City Applications. Ph.D. thesis, The University Of Melbourne, Department of Electrical and Electronic Engineering: 122 pp.

[2] Guha, S., Mishra, N., Motwani, R. & O’Callaghan, L. (2000). Clustering Data Streams. 0-7695-0850-2/00 $10.00 0 2000 IEEE 359–366.

[3] Marcos, D. A., Rodrigo, N. C., Silvia, B., Marco, A. S. N. & Rajkumar, B. (2014). Big Data Computing and Clouds: Trends and Future Directions. J. Parallel Distrib. Comput. 1–44 (2014).

[4] Aggarwal, C., Han, J., Wang, J. & Yu, P. (2003). A Framework for Clustering Evolving Data Streams. Proc. 29th VLDB Conf. Ger.

[5] Isaksson, C. (2016). New Outlier Detection Techniques For Data Streams. Ph.D. thesis, Southern Methodist University, Bobby B. Lyle School of Engineering: 154.

[6] Stahl, F., Badii, A., Oldenburg, M. & Theodorstahldfkide, F. Building Adaptive Data

Mining Models on Streaming Data in Real-Time (2020). Comput. Intell. 3, 12.

[7]. Lobo, J. L., Del, J., Eneko, S., Albert, O. & Francisco, B. (2020). CURIE: A Cellular Automaton for Concept Drift Detection. arXiv Prepr. arXiv . 5, 15.

[8] Hu, H., Kantardzic, M. & S. Sethi, T. (2019). No Free Lunch Theorem for concept drift detection in streaming data classification: A review. Br. Assoc. Adv. Sci. 2, 16.

[9] Toor, A. A. et al. (2020). Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems. Sensors MDPI 2, 20.

[10] Yeoh, J. M., Caraffini, F., Homapour, E., Santucci, V. & Milani, A. (2019). A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation. MDPI Mathmatics 2, 1–24.

[11] Ding, S., Wu, F., Qian, J. & Jia, H. (2013). Research on data stream clustering algorithms. Springer Artif Inte, 593–600.

[12] Silva, J., Faria, E., Barros, R., Hruschka, E. & Carvalho, A. (2013). Data Stream Clustering : A Survey. ACM Comput. Surv. 1–37.

[13] Nguyen, H. L., Woon, Y. K. & Ng, W. K. (2015). A survey on data stream clustering and classification. Knowl. Inf. Syst. Springer 535–569. doi:10.1007/s1015-014-0808-1

[14] Bifet, A., Carvalho, A. & Gama, J. (2017). BigData Stream Mining.51(1):24-54.

[15] Sethi, T. S. & Kantardzic, M. (2017). On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82, 77–99, ISBM 09574174.

[16] Udommanetanakit, K., Rakthanmanon, T. & Waiyamai, K.(2007). E-Stream: Evolution-Based Technique for Stream Clustering. Springer-Verlag Berlin 403, 42–55.

[17] Aggarwal, C. C., Han, J., Wang, J. & Yu, P. S. (2004). A Framework for Projected Clustering of High Dimensional Data Streams. Proc. Thirtieth Int. Conf. Very large data bases 30, 863.

[18] Davies, R. N. (2017). Efficient Analysis of Data Streams. M.Sc. thesis, Lancaster University, Department of Computing and Communications).

[19] Yogita & Toshniwal, D. (2012). Clustering Techniques for Streaming Data – A Survey. 3rd IEEE Int. Adv. Comput. Conf. 951–956.

[20] Al Abd Alazeez, A., Jassim, S. & Du, H. (2017). EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data. Proc. 6th Int. Conf. Pattern Recognit. Appl. Methods 173–183. doi:10.5220/0006196901730183

[21] Al Abd Alazeez, A., Jassim, S. & Du, H. (2017). EDDS: An Enhanced Density-Based Method for Clustering Data Streams. 2017 46th Int. Conf. Parallel Process. Work. 103–112 (2017).

doi:10.1109/ICPPW.27

[22] Kremer, H. et al. (2011). An effective evaluation measure for clustering on evolving data streams. Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.-KDD ’11 868–876.

doi:10.1145/2020408.2020555

[23] Cao, F., Ester, M., Qian, W. & Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. Proc. Sixth SIAM Int. Conf. Data Min. 206, 328–339.

[24] Zhao, Y. & Karypis, G. (2001). Technical Report Criterion Functions for Document Clustering: Experiments and Analysis. Univ. Minnesota, Dep. Comput. Sci. / Army HPC Res. Center/ Tech. Rep. 1–30.