EDM Data Enrichment

Educational Data Mining Data Enrichment

Though Educational Data Mining (EDM) has been around for a long while making the most of data has been a challenge.

This article explores how to enrich data sources to enable better mining and gain deeper insights.

 

Cheat Sheet

EDM Enriching Data with a snapshot of a GA4 dashboard

Ok, so you don’t want to read the whole article… you are busy with your EDM Data Enrichment.  What are the primary takeaways.

  1. Integration of tools such as GA4 can add a variety of enriching data
  2. GeoIP and GeoLite are a free Geo IP database to enrich location by IP data
  3. Consider timezone differences and recompute for normal day operations of Follow the Sun / Follow the Moon
  4. Browscap using the User Agent String of the browser adds Browser Capability and device insight
  5. Active Directory / LDAP groups in a large organisation help analytics better understand the audience
  6. Result Based Mapping can help to drive justification / financial backing of specific focus areas
  7. Implied Duration and Exit Duration are ways of being able to add interactivity times albeit with caveats

 

Background

Research

There is no doubt the impact of IT systems on Education over the last 20+ years.  As an author and avid student of numerous educational tools, the design of reporting and insight gaining is fundamental building block baked in from the outset, but is that true for all tools?

In “Educational Data mining: a systematic review of research and emerging trends” (2020) Du, X., Yang, J., Shelton, B. sprawling research paper of articles it highlights the lack of “detailed process of the examined data collections”.  i.e. how do you get down to the juicy bits.

What becomes even more interesting when reviewing the literature is the focus on educational institutions such as in “A Systematic Review on Educational Data Mining” (2016), Dutt, A., Akmar Ismail, M., Herwawan, T..

 

Educational Data Mining

From the papers that are publicly available, the majority focus on applying supervised and unsupervised DM techniques (association rule learning, clustering, classification, and regression, etc.) to an available set of data.  Within the research application of global business considerations are scant often due to the sensitivity of the applications.

Educational based data is readily combined with outcomes such as grade / result.  For many educational systems this rich vein of data doesn’t exist as data and scores are kept in separate systems.  At some point extraction of the data to a ware house with some ETL (Extract, Transform, Load) work is going to be required.  Someone somewhere in the mechanism is applying data mining skills.

The question then expands to … if the audience of the data mining efforts are the teachers / lecturers / managers what insights can be brought that would be helpful?

 

EDM Data Enrichment

The Who, What, Where, Why, When and How of it all

What

The first and most common piece of data is the what.  What was being looked at?

The first expansion of this is “What was someone doing when they came across your materials?” JavaScript click trackers using # can provide extended insight on how and from where people came from.  Examples such as GA4 provide details on what insights Google can add to the clicks of what brought people to your materials.

 

Who

Most educational system provide at the very least the ability to track the student.  Using a tool like Confluence, MediaWiki or Moodle you have reports with student information baked in.

If your system has no way to track users we lose a lot of rich data but there is a lot more to identify.

 

Where

If the student is connecting remotely then there’s a good chance there’s an IP address going to be available.  Moodle for example definitely records this information and so comes our first enrichment whether we have a who or not.

GeoIP and GeoLite are a free based IP database which provides geolocation based on IP address.  Consequently that starts to give us geo profiles or student base.

Furthermore even if you have students onsite, you can split the IP addresses as per your internal networking and external network setups.  This provides insight on “where” in the organisation your student is.

 

When

The next logical step of EDM Data Enrichment is adjustment for Follow the Sun or Follow the Moon students.  As platforms make themselves available 24 / 365 is a student really studying in the middle of the night or the middle of their day.

Working in a global multinational and performing analysis such as this adds a layer of complexity.  The simplest solution is to add an extra column to your data.  “Adjusted date time”, recompute without timezone impact and your analysis will be consistent across all time regions.

 

Why

Why did it work or fail? Should we bother investing in this project? Why!?  Result Based Mapping can help to drive justification / financial backing of specific focus areas.  Finding a metric in an educational setting becomes grades.  In a business setting that metric can be Average Handling Time, CSR (Customer Satisfaction Rating) or any combination of these.  Combining these as a part of Educational Data Mining can help to continue and increase resourcing of educational based projects.

 

How

Snapshot of Enriching Data from YouTube StudioInterestingly “How long” is quite a tricky metric to compute if you’re not the author of the original material.  YouTube studio provides amazing analytics and insights to help educators understand how people are interacting with their materials.  How do you measure that without being the author?

Implied Duration and Exit Duration are ways of being able to add interactivity times albeit with caveats.  If a user has multiple clicks within a lesson: item 1, item 2, item 3. You can measure the time between interactions.  The final action has to be discarded in the analysis by not computing the time.  From this you can compute an implied item duration and an estimated total visit duration sacrificing the time of the last element.

The reason we call it implied is because you can never be sure the user is actively interacting with the material even if they are at the desk.

 

Conclusion

Working in a global multinational I get to see and interact with a number of these approaches in the development of supporting materials and their dissemination.

Tableau for example helps to visualise the results of the research in a way that enables a much wider audience understand the insights.

Equally within WhereWeLearn, we have baked in these capabilities and enjoy a regular stream of visitors who we are always working with to grow and develop.

If you’d like to chat to me about Data Mining, Education, EDM Data Enrichment or explore similar topics, please do get in touch.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.