TEXT MINING DATA FROM STUDENTS TO REVEAL MEANINGFUL INFORMATION FOR EDUCATORS
Abstract
Academic institutions adopt different advising tools for various objectives. Past research used both numeric and text data to predict students’ performance. Moreover, numerous research projects have been conducted to find different learning strategies and profiles of students. Those strategies of learning together with academic profiles assisted in the advising process. This research proposes an approach to supplement these activities by text mining students’ essays to better understand different students’ profiles across different courses (subjects). Text analysis was performed on 99 essays written by undergraduate students in three different courses. The essays and terms were projected in a 20-dimensional vector space. The 20 dimensions were used as independent variables in a regression analysis to predict a student’s final grade in a course. Further analyses were performed on the dimensions found statistically significant. This study is a preliminary analysis to demonstrate a novel approach of extracting meaningful information by text mining essays written by students to develop an advising tool that can be used by educators.
Metrics
##plugins.themes.bootstrap3.article.details##
educational data miningpost-secondary educationstudent learningadvising facultytext miningnatural language processing
Aldowah, H., Al-Samarraie, H., and Fauzy, W. M. (2019). Educational Data Mining and Learning Analytics for 21st Century Higher Education: A Review and Synthesis. Telematics and Informatics, 37, 13-49.
Al Ahmar, M. A. (2011). A Prototype Student Advising Expert System Supported with an Object-Oriented Database. International Journal of Advanced Computer Science and Applications (IJACSA), Special Issue on Artificial Intelligence, 100-105.
AlQenaei, Z. M. (2009). An Investigation of The Relationship between Consumer Mental Health Recovery Indicators and Clinicians’ Reports Using Multivariate Analyses of the Singular Value Decomposition of a Textual Corpus, (Doctoral dissertation). University of Colorado at Boulder, Colorado, USA. Retrieved from http://gradworks.umi.com/33/66/3366570.html
AlQenaei, Z. M., and Monarchi, D. E. (2016). Semantic Dimension Naming (SDN): A Process for Naming Dimensions in a Semantic Space. Advances in Computer Science and Engineering, 16(3/4), 61.
Anjewierden, A., Kolloffel, B., and Hulshof, C. (2007). Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry-learning processes. In International Workshop on Applying Data Mining in e-Learning (ADML 2007).
Antonenko, P. D., Toy, S., and Niederhauser, D. S. (2012). Using cluster analysis for data mining in educational technology research. Educational Technology Research and Development, 60(3), 383-398.
Appleby, D. C. (1989). The microcomputer as an academic advising tool. Teaching of Psychology, 16(3), 156-159.
Bahr, P. R. (2008). Cooling out in the community college: What is the effect of academic advising on students’ chances of success? Research in Higher Education, 49(8), 704-732.
Bai, C. E., Chi, W., and Qian, X. (2014). Do College Entrance Examination Scores Predict Undergraduate GPAs? A tale of two universities. China Economic Review, 30, 632-647.
Betts, J. R., and Morell, D. (1999). The determinants of undergraduate grade point average: The relative importance of family background, high school resources, and peer group effects. Journal of human Resources, 268-293.
Bingham, E., Kabán, A., and Girolami, M. (2003). Topic Identification in Dynamical Text by Complexity Pursuit. Neural Processing Letters, 17(1), 69-83.
Bingham, E., and Mannila, H. (2001). Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 245-250), ACM.
Bouchet, F., Harley, J. M., Trevors, G. J., and Azevedo, R. (2013). Clustering and profiling students according to their interactions with an intelligent tutoring system fostering self-regulated learning. Journal of Educational Data Mining (JEDM), 5(1), 104-146.
Carpenter, S. L., Delugach, H. S., Etzkorn, L. H., Farrington, P. A., Fortune, J. L., Utley, D. R., and Virani, S. S. (2007). A knowledge modeling approach to evaluating student essays in engineering courses. Journal of Engineering Education, 96(3), 227-239.
Chen, Y., Yu, B., Zhang, X., and Yu, Y. (2016). Topic modeling for evaluating students' reflective writing: a case study of pre-service teachers' journals. In Proceedings of the Sixth International Conference on Learning Analytics and Knowledge, ACM, 1-5.
Chen, H., and Ward, P. A. (2019, November). Predicting student performance using data from an auto-grading system. In Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering (pp. 234-243).
Cohn, E., Cohn, S., Balch, D. C., and Bradley Jr, J. (2004). Determinants of undergraduate GPAs: SAT scores, high school GPA and high-school rank. Economics of Education Review, 23(6), 577-586.
Crossley, S., Allen, L. K., Snow, E. L., and McNamara, D. S. (2016). Incorporating learning characteristics into automatic essay scoring models: What individual differences and linguistic features tell us about writing quality. JEDM-Journal of Educational Data Mining, 8(2), 1-19.
Deerwester, S., Dumais, S. T., Furnas, G., and Landauer, T. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391-407.
Dias, S. B., and Diniz, J. A. (2014). Towards an Enhanced Learning Management System for Blended Learning in Higher Education Incorporating Distinct Learners' Profiles. Educational Technology and Society, 17(1), 307-319.
Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S., and Harshman, R. (1988). Using latent semantic analysis to improve access to textual information. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, 281-285.
Feghali, T., Zbib, I., and Hallal, S. (2011). A Web-based Decision Support Tool for Academic Advising. Educational Technology and Society, 14 (1), 82–94.
Ferreira‐Mello, R., André, M., Pinheiro, A., Costa, E., and Romero, C. (2019). Text mining in Education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6), e1332.
Figueira, Á. (2017, October). Mining Moodle logs for grade prediction: a methodology walk-through. In Proceedings of the 5th International Conference on Technological Ecosystems for Enhancing Multiculturality (pp. 1-8).
Foltz, P. W., Laham, D., and Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, 1(2), 939-944.
Freeman, L. C. (2008). Establishing Effective Advising Practices to Influence Student Learning and Success. Peer Review, 10(1), 12.
Gao, J., and Zhang, J. (2005). Clustered SVD Strategies in Latent Semantic Indexing. Information Processing and Management, 41(5), 1051-1063.
Golub, G. H., Van Loan, C. F. (1996). Matrix Computations. The Johns Hopkins University Press.
Hannigan, T. R., Haans, R. F., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M. S. and Jennings, P. D. (2019). Topic modeling in management research: Rendering new theory from textual data. Academy of Management Annals, 13(2), 586-632.
Hare, J. S., and Lewis, P. H. (2005, July). On image retrieval using salient regions with vector spaces and latent semantics. In International Conference on Image and Video Retrieval (pp. 540-549). Springer, Berlin, Heidelberg.
Harrak, F., Bouchet, F., Luengo, V., and Gillois, P. (2018, March). Profiling students from their questions in a blended learning environment. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (pp. 102-110).
Hastings, P., Hughes, S., Magliano, J. P., Goldman, S. R., and Lawless, K. (2012). Assessing the use of multiple sources in student essays. Behavior Research Methods, 44(3), 622-633.
Hirschberg, J., and Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266.
Hotho, A., Maedche, A. and Staab, S., 2002. Ontology-Based Text Document Clustering. KI, 16(4), pp.48-54.
Jovanović, J., Gašević, D., Dawson, S., Pardo, A., and Mirriahi, N. (2017). Learning Analytics to Unveil Learning Strategies in a Flipped Classroom. The Internet and Higher Education, 33(4), 74-85.
Klebanov, B. B., Burstein, J., Harackiewicz, J. M., Priniski, S. J., and Mulholland, M. (2016). Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, 199-205.
Kot, F. C. (2014). The impact of centralized advising on first-year academic performance and second-year enrollment behavior. Research in Higher Education, 55(6), 527-563.
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150.
Kovanović, V., Joksimović, S., Mirriahi, N., Blaine, E., Gašević, D., Siemens, G., and Dawson, S. (2018, March). Understand students' self-reflections through learning analytics. In Proceedings of the 8th international conference on learning analytics and knowledge (pp. 389-398).
Landauer, T. K., and Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.
Landauer, T., McNamara, D. S., Dennis, S., and Kintsch, W. (2007). Handbook of Latent Semantic Analysis. Mahwah, NJ, Lawrence Erlbaum Associates.
Martinčić-Ipšić, S., Miličić, T., and Todorovski, L. (2019). The Influence of Feature Representation of Text on the Performance of Document Classification. Applied Sciences, 9(4), 743.
Massung, S. and Zhai, C., (2015). SyntacticDiff: Operator-based transformation for comparative text mining”, In Big Data (Big Data). The 2015 IEEE International Conference on Systems, Man, and Cybernetics (pp. 571-580).
Matcha, W., Gašević, D., Uzir, N. A. A., Jovanović, J., and Pardo, A. (2019, March). Analytics of learning strategies: associations with academic performance and feedback. In Proceedings of the 9th International Conference on Learning Analytics and Knowledge (pp. 461-470).
Murray, D., and Durrell, K. (2000), “Inferring demographic attributes of anonymous Internet users”, In Web Usage Analysis and User Profiling (pp. 7-20). Springer Berlin Heidelberg.
Nasiri, M., Minaei, B., and Vafaei, F. (2012, February). Predicting GPA and academic dismissal in LMS using educational data mining: A case mining. In 6th National and 3rd International conference of e-Learning and e-Teaching (pp. 53-58). IEEE.
Nguyen, H., and Litman, D. J. (2016). Improving Argument Mining in Student Essays by Learning and Exploiting Argument Indicators versus Essay Topics. In FLAIRS Conference, 485-490.
Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., and Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. PloS one, 9(12), e115844.
Richardson, J. T. (2005). Instruments for obtaining student feedback: A review of the literature. Assessment and Evaluation in Higher Education, 30(4), 387-415.
Robson, R., and Ray, F. (2012). Applying Semantic Analysis to Training, Education, and Immersive Learning. In the Inter-service/Industry Training, Simulation and Education Conference (I/ITSEC) (No. 1).
Romero, C., and Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), e1355.
Santoso, P. B. (2010). The Development of a Case-Based Reasoning System in Relational Model using Group technology for Academic Advising. International Journal of Academic Research, 2(6), 287-294.
Siemens, G. (2010). About: 1st international conference on learning analytics and knowledge. (Retrieved from https://tekri.athabascau.ca/analytics/about). Accessed 24 June 2016
Tinto, V. (2012). Completing college: Rethinking institutional action. University of Chicago Press.
Walsh, K. R., and Mahesh, S. (2017). Exploratory Study Using Machine Learning to Make Early Predictions of Student Outcomes. Twenty-third Americas Conference on Information Systems, Boston, MA, Volume: 23.
Waykole, R. N., and Thakare, A. (2018). A Review of Feature Extraction Methods for Text Classification. International Journal of Advance Engineering and Research Development (IJAERD), 5(04).
Williamson, L. V., Goosen, R. A., and Gonzalez Jr, G. F. (2014). Faculty Advising to Support Student Learning. Journal of Developmental Education, 38(1), 20-24.
You, J. W. (2016). Identifying significant indicators using LMS data to predict course achievement in online learning. The Internet and Higher Education 29, 23-30.
Zhang, Y., and Wu, B. (2019, May). Research and application of grade prediction model based on decision tree algorithm. In Proceedings of the ACM Turing Celebration Conference-China (pp. 1-6).
Zelikovitz, S., and Hirsh, H. (2001). Using LSI for text classification in the presence of background text. In Proceedings of the tenth international conference on Information and knowledge management, ACM, 113-118.