The volume of data we generate grows at a staggering rate each year. As we spend more time on connected devices, and devices themselves become connected to each other (i.e. Internet of Things), the opportunity to make use of it is irresistible. Sectors as diverse as insurance, transportation, financial services, and healthcare are investing heavily in software and processes to capture, organize, and make sense of big data.
But what about higher education? What type of data constitutes “big data” in higher education? What are the sources, current and potential, of this data and what concerns does it raise?
A significant number of people in the field are working to find ways to leverage big data to serve the instructional, administrative, and promotional operations in colleges and universities. Areas of opportunity for big data include:
- More quickly and accurately identify those students that require different types of support in order to succeed in their studies;
- Create new insights into which colleges and universities are having the greatest impact on student learning outcomes and work prospects;
- Develop deep profiles of students to enable institutions to automatically personalize their learning process; and
- More accurately target promotional materials to prospective students to recruit the ideal mix of learners.
Using Existing Data
Much of the data required to realize these and other opportunities is already available. The main buckets of data include:
- Administration: Many student interactions with colleges and universities are mediated by technology; for example, student responses to recruitment campaigns, course selections, and alumni activities. As more of these activities move online, the ability to capture the information, analyze it, and make use of it grows.
- Course Data: In classroom education, information about student learning was limited to grades and the memories of individual instructors, with little of this shared systematically. But as more teaching and learning moves online, whether as part of an on-campus, hybrid, or fully online courses, the “click stream” of each student’s engagement with curriculum, each other, and the instructor becomes potentially valuable information that can inform better decision-making and resource allocation.
- External Data: Useful data can be found among organizations outside of higher education. Government sources like the Integrated Postsecondary Education Data Stream (IPEDS) in the United States, and the United Kingdom Department for Education can be effectively combined with data from within the institution. For online education information, reports from Babson Survey Research Group (North America), JISC (UK) or from commercial research firms, like Eduventures or Huron, can be especially valuable.
Data from the Student Lifecycle
There are opportunities to use data strategically by tracking the students throughout their entire academic lifecycle, including such factors as:
√ Student recruitment and institution selection
√ Student demographic information captured as part of admissions and registration processes, including age, previous education, and gender
√ Grades from previous educational experiences
√ Student work experience
√ Program and course selection
√ Housing choices
√ Student performance in courses, including but not limited to, final grades
√ Student retention in courses
√ Graduation rates and time to completion
√ Use of student loans
√ Workforce patterns after graduation
Data from Courses
Data generated within courses, particularly online courses has been around for several years - often captured by the learning management system.
More ambitious forms of data collection and analysis are now available for courses that put the power of big data and analytics to greater use. These applications use finely tuned algorithms - typically refined through use with tens of thousands of learners - that measure student mastery of the material. This information can be reported to the instructor in the form of a dashboard report and used to automatically adjust the curriculum presented to students, based on their progress and needs - as interpreted by the algorithm.
The full potential of big data within courses will be realized when it is used across institutions. Data sets from different institutions could compare, for example, how institutions or systems of institutions with different funding models, policy systems, and organizational structures operate, integrate and support students, and design and deliver varied learning experiences.
Five Reasons for Caution
While these and other applications hold great promise, experts developing big data uses for higher education are acutely aware of the significant challenges they face, including:
- “Success” in education is often more difficult to measure than in other fields where big data has been used effectively. Not only are there multiple ideas of what constitutes student success, but it can be very challenging to isolate the factors leading to the success or failure. Was it the student’s cognitive abilities, study habits, level of persistence, or the approach to teaching and learning taken by the educator? Data can measure the impact of factors, but its ability to identify the relevant factors is left to the humans programming the software. And it’s not uncommon for data scientists to focus more on factors that can be most easily measured by technology - paying less attention to others. The trickiness of leaning on big data is especially evident in adaptive learning that calls on the software to make calculations as to what the student knows, doesn’t know, and why.
- Information about student learning, as with many types of information, can be misused. If, for example, data suggests a student - given her background and previous level of academic performance - would likely not graduate - a college or university may be tempted to not enrol her so as to maximize the “quality” of their graduates, tuition revenue, and minimize the demand on student support services. (Analogy: data used by health insurance companies to identify people with a higher than average likelihood of getting seriously ill.) For obvious reasons, decisions about how to use this information need to be well-considered and transparent.
- Care must be taken to ensure data collected is kept private. If the data is shared with other parties, it must be done so in accordance with strict guidelines. This issue was raised over the last few years most often with respect to working with commercial vendors who have a large role in data and analytics. A Fordham Law School study found 95 percent of US school districts used cloud services to manage student performance information — but less than seven percent of the district vendor contracts restricted the sale or marketing of student information by vendors.
- Institutions and educators are frequently reluctant to share data. For institutions, the hesitation may stem from a concern the information will be used by regulators, funding bodies, and others to manage the institution. Maintaining a sufficient distance between the college or university and external bodies - government, business, etc. - has long been a preoccupation of institutional leadership, often for good reasons. So, there’s little precedent or motivation to share data to inform educational research.
- Few institutions are prepared to systematically capture, analyze and make use of big data as of 2016, according to United Kingdom-based think-tank Policy Connect. New skills, occupational categories, and processes are necessary to make full use of big data’s potential. Without strong leadership or the infrastructure to manage the process, valuable information may lie dormant.
A System-Wide Approach is Required
There’s a certain degree of inevitability to the growth of big data in higher education. The volume of data continues to climb and new software is making it easier to capture and make sense of it. Over the next few years, much of the activity will develop organically as individual educators and researchers find ways to connect available data sets from inside and outside the institution. But significant progress requires a system-wide, strategic approach to ensure the measurements are sufficiently consistent across institutions and proper guidelines are developed to protect students, faculty and other stakeholders, while still leveraging big data to serve students.