Assessment: What are we even doing?
Assessment is a difficult, time-consuming, expensive process. But we need it to:
- Show that people have the skills and qualifications they claim to have
- Inform decisions about grades, placements and promotion
- Evaluate what we’re doing in education and adapt teaching strategies
- Help students evaluate their own progress
Assessment is traditionally divided between “formative assessment,” through which we provide helpful feedback to a person as they’re learning, and “summative assessment” to evaluate a student’s progress at the end of a program or course against standards we have determined. In practise, all assessments serve both functions, and more.
Assessment informs pedagogy. Learning design often begins with the assessment, perhaps mapping the skills and competencies required to perform real-world tasks to learning objectives, then identifying relevant evidence or indicators to show these objectives have been satisfied.
Assessment needs to be:
- Authentic and based in actual practice, not merely theoretical or artificial
- Aligned with genuine employer, workplace or personal needs
- Future-oriented and constructive, and more than just evidence of memorization
- Reflexive, enabling students to gain understanding and make their own judgements about their learning
- Based on actionable feedback showing how learning can be improved
- Holistic, encompassing an entire domain rather than merely easily measurable parts of it
Each of these poses a challenge for assessment. Tests, assignments and programs must measure each of these accurately, so evidence of success reflects actual success. They must also measure each consistently to ensure objectivity and fairness across a diverse range of students and learning environments.
Assessment necessarily requires judgement and inference on the part of the observer. We hope students pass tests for the right reasons, but we are often unsure. In high-performance or high-risk occupations, we require a candidate to undertake an internship, apprenticeship or period of supervised practice before we feel confident they can be certified.
Assessment is irreducibly complex. No wonder so many people feel there is a gap between what schools and post-secondary institutions teach and what people need to succeed in the workplace and in life. Sometimes vital skills are missing. Sometimes the skills develop only with actual experience. And always there is the unwritten or tacit knowledge found in any domain.
AI assessments are already here
To date, most of the tasks described in the previous section are performed by humans. Instructional designers, teachers, professors and teaching assistants identify the learning objectives, design the assessment, carry out the test or assignment or case study, and assess the results.
But what if there was a better way?
There already is in many areas of study. Artificial intelligence has been used to grade tests and essays for several years now. Consider that:
- Generative AI tools outperform human authors of argumentative essays
- The Texas Education Agency is using AI to score the written portion of standardized tests administered to students in third grade and up
- Tools like Gradescope (recently acquired by TurnItIn) and E-rater use machine learning algorithms to analyze essays and provide scores
- The best algorithms have been shown to be as accurate as humans
- They’re also more consistent, but some have shown to also reflect the present biases of our current system
AI is also being widely used for skills assessment outside colleges and universities:
- Employers are using AI to screen applicants’ resumes
- They’re also using AI tools like HireVue and Sapia to conduct and assess video interviews
- Services like Monster.com, LinkedIn and ZipRecruiter match employment opportunities with a pool of applicants
- AI tools like Skillfill and Glider are being used for workforce skills assessment
- See for example Microsoft’s AI-based engineer assessment tool
- AI is being used for performance reviews
Are the tools perfect? No, of course not. In the UK, an AI-based tool used to assess A-Levels during the pandemic was found to be biased. A similar complaint was lodged against LinkedIn’s job matching tool. For now, at least, human supervision of AI assessment is strongly recommended.
An AI tool for assessing AI assessments has even been developed.
But it won’t be long before we trust the computer more than we trust the human, as we already do with tools like calculators and scales. Data-supported AI will employ smart grading to ensure factual accuracy. Human work will be perceived as flawed compared to the much better work produced by the machine; this is called “discourse of altering authority.” AI skills assessment will be just one more thing AI does, along with diagnosing diseases, predicting job markets, monitoring traffic, helping guide financial investment strategies and inventing new products.
We don’t need badges
Certificates and credentials are the shorthand we use to assert that a person has a range of associated skills. A technical degree encompasses multiple topics. Even a history degree include skills in critical thinking, coordination, social perceptiveness, active listening and complex problem solving.
It’s hard to know what a student has learned just by looking at their degree. And a lot of skills don’t need years-long programs of study to learn. Initiatives such as badges and micro-credentials were developed to fill in the gaps. While intended to be “stackable” in order to lead to certificates and degrees, they could also capture the small skills and competencies not covered in a course-wide assessment like an essay or exam.
It’s a lot of work to develop and assess micro-credentials. Maybe we don’t need this shorthand at all. Maybe we can just evaluate skills and competences directly. This is one of the promises of AI-based assessment.
AI can perform real-time analysis and assessment. It doesn’t require that a person stop what they’re doing and place themselves into a testing environment. An assessment can be based on key metrics such as productivity, quality of work, and even social and emotional factors — things that simply can’t be evaluated in a course-based test or assignment.
As documents such as the Global Micro-Credential Schema Mapping report attest, the project of mapping skills and credentials to workplace needs and job requirements generally is an enormous one. It becomes especially complex when the differing needs of workplaces in different countries, working in different environments, under different regulatory regimes, are considered.
It is very likely that as AI tools provide more and more detailed assessments, we will find that the skills we need don’t resemble the skills we are taught today. Just as AI can detect new proteins or create new materials, it will be able to identify new skills and competencies.
Obviously, there are key personal and ethical considerations to take into account here. Even if the AI assessment is perfect and fair, people will resist being assessed by machines, and will demand human oversight, informed consent and the ability to opt out. Ongoing surveillance would lead to a significant infringement on personal privacy; questions have already been raised about the use of technologies such as video cameras and keyloggers.
But even with stringent protections in place there is a wealth of material for AI to assess. Any work that depends on performance in public, from acting and dancing to sports to policing and firefighting, can be assessed. Management in workplaces and environments where security is a major consideration will be able to collect and analyze data. Work that is output focused, such as art and design, software programming, engineering and construction, can be the focus of product assessment.
People volunteer even more data than that. From social media to personal portfolios to ongoing gaming and simulations, people present their skills and abilities (and also their failings and shortcomings) in a variety of ways. Today, vast amounts of this data are being used to train AI. Tomorrow, it will be used to assess the newest contributions from the next generation.
This data is already being collected. From there it’s a very small step for it to be used for performance assessment, in ways that are much more subtle and accurate than we know about today.
A fair assessment
It would be very hard to doubt the assessment of an AI that worked closely with a student. And such assessments would be comparable to the attestation of a doctor working with an intern and the master recommending an apprentice.
Why? Compare the AI’s assessment with the results of a test or assignment.
- The AI would know precisely what the student was trying to do and the degree of success
- The AI would be evaluating the whole performance, and not merely a set of artificially designed performance indicators
- The AI would have a nuanced understanding of how the student’s performance matched or was different from an expert’s performance
At a certain point, we will be less inclined to believe the results of a human-designed and evaluated assessment than we would an AI recommendation. At such a point, questions about the fairness of assessment will be turned upside down.
Today, a major criticism of artificial intelligence, and especially generative AI, is that it is biased. It represents, we read, the interests, prejudices and biases of the people who create the algorithms and the people who provided the training data: mostly white, well-educated male upper- and middle-class people living in western societies. These, in turn, amplify and codify discrimination.
This is not a new concern in testing and assessment. Evaluations and measures as varied as IQ tests and the SAT examinations have been questioned for same sort of biases. Educational curricula generally have been questioned as being Eurocentric and patriarchal. Meanwhile, allegations of personal and individual bias abound.
At a certain point, precisely because of efforts to make it less biased, and precisely because standards and evidence are based on actual practice, people will begin to recognize automated grading as the fairer and less biased of the two. AI will be used to promote equity, rather than as an agent eliminating it. Just as video replay and automated strike detection remove human error from sports, so also algorithmic processes will remove similar sorts of errors from assessment.
At that point, attention will turn to questions of fairness in the application of AI in assessment. We see these debates in the workplace already as employees resist surveillance, keylogging and other forms of performance monitoring. There’s a tension between the desire of employees to be assessed on factors such as effort and outcome, and of managers to assess on the basis of process and compliance. Similar tensions will appear in assessment generally. Students, for example, will ask why it matters whether they used AI to accomplish a task, while instructors will want to know that a certain step-by-step procedure was followed.
We’ll know that AI-based assessment is accurate — more accurate than any human-based assessment could be. The concern will be how pervasive and fair it is. People will want the right to be able to practise and develop a skill in private before allowing AI to pass judgement. And they’ll want to be able to control who is allowed to see their assessments and who is not.
Fact check
The Strategies of Formative Assessment and Assessment for Learning in Online and Blended Learning in Higher Education: A Systematic Literature Review - https://osf.io/preprints/edarxiv/vy2zn
Xia, Q., Weng, X., Ouyang, F., Lin, T. J., & Chiu, T. K. (2024). A scoping review on how generative artificial intelligence transforms assessment in higher education.
International Journal of Educational Technology in Higher Education, 21(1), 40.
assessments today are basically language tests. https://donaldclarkplanb.blogspot.com/2024/06/this-blind-trial-paper-raises-some.html
A real-world test of artificial intelligence infiltration of a university examinations system: a “Turing Test” case study - “we found that 94% of AI submissions were undetected. The grades awarded to AI submissions were on average half a grade boundary higher than that achieved by real students” - https://osf.io/preprints/psyarxiv/n854h
Authentic Assessments. https://www.learningscientists.org/blog/2024/7/12/digest-175
The AI tech aiming to identify future Olympians - https://www.bbc.com/news/articles/cmj2jkppvx3o
As a recent publication from the Universitat Oberta de Catalunya points out, artificial intelligence remains an opportunity (or an excuse) to transform assessment, curriculum, teaching, personalization and teaching competencies. https://pontydysgu.eu/2024/08/genai-and-assessment/
The future of assessment: embracing AI and EdTech - Jisc
Olly Newton, JISC, 2024/08/20
This is a surface-level discussion of the use of AI in assessment. "While there are concerns about AI's role in education, particularly regarding fairness and the potential for misuse," writes Olly Newton, "there is also a significant opportunity for AI to enhance the assessment process." One example I would adduce is Bolton College's work (more) (video) in formative assessment.
Bolton Firstpass The FirstPass pilot study offers a compelling case for the adoption of AI-powered formative assessment tools. https://www.ncfe.org.uk/help-shape-the-future-of-learning-and-assessment/aif-pilots/bolton-college/
Embracing Generative AI in Assessments: A Guided Approach. - https://nationalcentreforai.jiscinvolve.org/wp/2024/08/14/embracing-generative-ai-in-assessments-a-guided-approach/ and https://nationalcentreforai.jiscinvolve.org/wp/2024/08/14/embracing-generative-ai-in-assessments-a-guided-approach/
----
Why is assessment important? - https://www.csusb.edu/student-research/student-success-graduation-retention/why-assessment-important and https://www.edutopia.org/assessment-guide-importance and https://ecampusontario.pressbooks.pub/nccoursedevelopmentguidealpha/chapter/3-4-the-importance-of-assessing-students/
Design begins with assessment - https://www.learnworlds.com/learning-goals-objectives/
Formative vs summative - https://www.cmu.edu/teaching/assessment/basics/formative-summative.html
Assessment needs to be - https://www.researchgate.net/profile/Helen-Bound/publication/344221789_The_Six_Principles_of_Learning_Design/links/5f5d72fb4585154dbbce107f/The-Six-Principles-of-Learning-Design.pdf
Accuracy, consistency - https://files.eric.ed.gov/fulltext/ED502868.pdf
Internships, etc. https://cewilcanada.ca/CEWIL/CEWIL/About-Us/Work-Integrated-Learning.aspx
Gap between school and workplace - https://www.thehrdirector.com/business-news/gen-z/seven-ten-34s-say-school-university-didnt-prepare-workplace/
Tacit knowledge - https://info.aiim.org/aiim-blog/tacit-knowledge-vs-explicit-knowledge
AI writes better argumentative essays - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616290/
Texas AI use - https://www.edsurge.com/news/2024-05-03-is-it-fair-and-accurate-for-ai-to-grade-standardized-tests
Gradescope - https://www.turnitin.com/videos/meet-gradescope-3 and E-rtater https://www.ets.org/erater.html via https://drphilippahardman.substack.com/p/a-brief-history-of-ai-in-education
Accuracy of AI assessments - https://the-learning-agency-lab.com/wp-content/uploads/2023/08/TLA-Lab_Whitepaper_24-Aug-kf.pdf
AI grading more consistent - https://citejournal.org/volume-8/issue-4-08/english-language-arts/automated-essay-scoring-versus-human-scoring-a-correlational-study/
AI resume screening - https://www.linkedin.com/pulse/ai-resume-screening-hacking-recruitment-process-affinda/
HireVue - https://www.hirevue.com/
Sapia - https://sapia.ai/
AU job interviews - https://www.washingtonpost.com/technology/2023/03/27/ai-assessed-job-interview/
Monster.com LinkedIn and Ziprecruiter - https://www.cloudthat.com/resources/blog/discover-your-dream-job-by-exploring-the-secrets-of-generative-ai-in-2023
Workforce skills assessment - https://www.icf.com/insights/analytics/how-ai-is-transforming-the-way-we-assess-workforce-skills
Skillfill https://skillfill.ai/en/
Glider - https://glider.ai/
Engineer skills assessment - https://learn.microsoft.com/en-us/assessments/33a8d18b-7299-4808-95eb-ec1ac1eca4d9/
AI performance reviews - https://engagedly.com/blog/use-of-artificial-intelligence-in-performance-reviews/ and https://lattice.com/library/using-ai-to-write-performance-reviews-everything-you-need-to-know
AI bias in marking A-levels - https://www.theguardian.com/education/2020/aug/17/inbuilt-biases-and-the-problem-of-algorithms
LinkedIn biased matching - https://www.technologyreview.com/2021/06/23/1026825/linkedin-ai-bias-ziprecruiter-monster-artificial-intelligence/
AI assessment tool https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10217990/
Smart grading - https://www.sciencedirect.com/science/article/pii/S2215016123005277
Nipissing - https://www.nipissingu.ca/sites/default/files/2020-07/cha-the-value-of-a-history-degree.pdf
‘Discourse of altering authority’ - https://link.springer.com/article/10.1007/s10734-024-01288-w
Badge in soft skills - https://www.ucsc-extension.edu/resources/career-services/earn-a-badge-in-soft-skills/
----
What are measuring?
Global Micro-Credential Schema Mapping report https://credentialengine.org/wp-content/uploads/2024/08/Global-Micro-Credential-Schema-Mapping_-A-Vital-Step-Towards-Interoperability-and-Mobility-Aug-2024-Final-Version.pdf
(19 page PDF). The intent is to provide clarity, global transferability, and guidance for decision makers. It introduces a Data Ecosystem Schema Mapper (DESM),
AI detects new proteins - https://www.asbmb.org/asbmb-today/science/020324/ai-proteins-with-exceptional-binding-strength
AI assessments using https://psico-smart.com/en/blogs/blog-the-impact-of-artificial-intelligence-and-machine-learning-on-performance-evaluation-tools-11788
Tracking key metrics such as productivity, quality of work, and even emotional intelligence - https://psico-smart.com/en/blogs/blog-the-impact-of-artificial-intelligence-and-machine-learning-on-performance-evaluation-tools-11788
Personal and ethical considerations - https://www.priv.gc.ca/en/privacy-topics/employers-and-employees/02_05_d_17/
Problems with video surveillance in the workplace - https://www.legalline.ca/legal-answers/cameras-and-audio-tapes-in-the-workplace/
Keylogging - https://toronto-employmentlawyer.com/blog/severance/the-laws-regarding-employee-monitoring-software/
AI-based security assessment - https://www.paloaltonetworks.com/unit42/assess/ai-security-assessment risk assessment https://www.isaca.org/resources/news-and-trends/industry-news/2023/can-ai-be-used-for-risk-assessments and cybersecurity https://www.sangfor.com/blog/cybersecurity/role-of-artificial-intelligence-ai-in-threat-detection
AI assessment of policing - https://www.policechiefmagazine.org/navigating-future-ai-chatgpt/
AI social media monitoring - https://zapier.com/blog/best-ai-social-media-management/
AI bias - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11032492/
Bias in IQ tests - https://nrcgt.uconn.edu/wp-content/uploads/sites/953/2015/04/rm04204.pdf
Bias in SATs - https://www.scirp.org/journal/paperinformation?paperid=70682
Bias in curriculum - https://www.nas.org/blogs/statement/is_the_curriculum_biased
Grading bias - https://tlconestoga.ca/grading-bias-making-marking-more-equitable-for-all-learners/
How to Use AI to Promote Equity Now 10 Minute Teacher Podcast with Cool Cat Teacher - https://www.coolcatteacher.com/how-to-use-ai-to-promote-equity/