The Influence of Parametrised Tasks on Learners’ Judgement Accuracy – A Secondary Analysis

EARLI 2025 - August 26th, 2025 - Graz, Austria

Theresa Walesch, Carolin Baumann, Samuel Merk, Anja Prinz-Weiß

Karlsruhe University of Education, Germany

Relevance

Parametrised tasks are tasks with varying parameters (e.g., Michael, 2021)

non-parametrised task can be repeated

parametrised tasks can be repeated & generate a new task

SRL - cyclical model by Zimmerman (2000)

Judgments

Performance judgment (e.g., Schraw, 2009)

Judgment Accuracy

Absolute Accuracy (Maki et al., 2005)

Bias (Schraw, 2009)

Overestimation (e.g., Prinz et al., 2020)
Underestimation (Koriat et al., 2002)

Hypothesis

Students’ judgments are more accurate – less overestimation or less underestimation – after working on parametrised tasks than after working on non-parametrised tasks

Method

Design

experimental field study with pre-service teachers
within-person design

experimental field study with pre-service teachers in a university in Germany
randomly assigned to one type of task in the first week, automatically assigned to the other type of task in the next week, alternating between the conditions every other week
at the end of there semester was the final exam, from which we used the performance data
JOL stands fr judgment of learning, a common abbreviation in the field
If the contents of this week’s tasks will be assessed in the exam:
How well do you think you will perform on these contents?
How well do you think you have mastered this content?
we have preregistered the main hypotheses for this study and mark where we conducted exploratory analyses that were not preregistered
we also plan to share the data and code for this study, so that others can replicate our findings

Participants

N = 174 pre-service teachers
M_age = 21.5 years (SD = 3,26)
78,7% female

Results

Bayesian multilevel model (Bürkner, 2017)

imputation during model fitting (Bürkner, 2024)

absolute accuracy ~ type_of_task + (1 | person)

Results

The estimated effect of condition was small and uncertain (β = 0.01, 95% CI [−0.02, 0.04]).

no differences between the types of tasks

The results show that predicted absolute accuracy was basically the same in both the parametrised and non-parametrised conditions.

Even though, the evidence ratio for the effect exceeding a small threshold (b > 0.1 SD), was 0.34, indicating inconclusive support for a substantial effect, which, btw. occurs when the credible intervals overlap, no matter how small they are, we see these results as strong evidence for an effect equivalent to zero, meaning there is no difference in students judgment accuracy after working on parametrised or non-parametrised tasks.

-> expected problem with point null hypotheses, in further analyses we might check with a ROPE (region of practical equivalence), to better check for an effect equivalent to null

(posterior predicitve checks indicated a satisfactory model fit)

Exploratory analysis

Discussion

Why did we not find any differences?

high (intrinsic) cognitive load for both types of tasks (Golke et al., 2022; Seufert, 2018, 2020)
generalized judgment strategy (Hoch et al., 2023)

Underestimation

Pattern occurs less often than overestimation.

underconfidence-with-practice (UWP) effect (Koriat et al., 2002)
data driven interpretation of effort (Baars et al., 2020; David et al., 2024)
statistics anxiety (McIntee et al., 2022)

underestimation is less common than overestimation, but it can occur after some practice, which is called the underconfidence-with-practice effect (Koriat et al., 2002)
we knew that both directions of inaccuracy were possible, as on the one side, a new topic, as it was for most of the students, often leads to overestimation, on the other side they have as they have so much practice opportunity, especially with param. tasks, the underconfidence-with-practice effect was also likely to occur.
Further exploratory analysis of students’ intrinsic cognitive load and perceived difficulty showed high values for both variables over both types of tasks (Fig. XX). Mental effort can indicate cognitive load (Krieglstein et al., 2022; Paas et al., 2003). Yet, perceived effort itself was not collected in the original study. We therefore can only carefully infer that students may have perceived tasks as effortful, as they have rated them as difficult and high in intrinsic cognitive load (i.e. complex tasks). Learners can perceive and thus interpret effort in two different ways, which influences the direction of its relationship with students’ judgments (Koriat et al., 2014). The perception of effort is data-driven when students interpret effort bottom-up based on task difficulty (Baars et al., 2020). In this case, the correlation between perceived effort and judgments is negative. That is, students interpret higher effort as indicating poor learning.
in line with meta analyses Baars et al. (2020) & David et al. (2024)
female students in non-stem fields having to take statistics class experience higher statistics anxiety than female identifying students in stem fields Trassi et al. (2022)

Implications

More research on the effects of parametrised tasks on learners’ judgments

e.g. other domains, between subject designs, underlying cognitive processes

Focus on underestimation

underresearched

Scan QR code for my slides

contact: Theresa Walesch (they/them) theresa.walesch@ph-karlsruhe.de

Summary findings

Appendix

Overestimation

(Self-assessment > Performance)

occurs more frequently with novices or new topics (Golke et al., 2022)
Risk: stopping learning too early
Consequence: knowledge gaps -> poorer performance (Dunlosky & Rawson, 2012)

Underestimation

(Self-assessment < Performance)

may occur after some practice → “Underconfidence with practice effect” (Koriat et al., 2002)
Risk: ineffective regulation (Sarac & Tarhan, 2009)

Missingness

45,26% of the data was missing
we expected a substantial amount of data to be missing due to the type of study (field experiment) where students sometimes drop out etc.
some of the param. tasks were missing because students did not use them as intended, i.e. did’t use the parametrization at least once *the rate of missing judgments was higher for parametrised tasks than for non-parametrised tasks, because we checked for treatment fidelity. If students did not use the parametrization at least once, we did drop their data on a topic by topic basis and imputated the missing data.
as seen in the figure, the missingness was at random, given the type of task was observed
we used imputation during model fitting to account for the missing data (Bürkner, 2024) which yields reliable estimates, making the amount of missing data unproblematic

Missingness mechanisms definitions

MAR: missingness is depending on something observed (but not something unobserved) (Schafer & Graham, 2002)
MCAR: missingness has no relationship (is independent of) both observed and unsobserved variables (Schafer & Graham, 2002)
MNAR: missingness is related to the missing parts of the data (Graham, 2009)

Imputation during modelling

mi() function of the brms package (Bürkner, 2024)
one-step imputation
specifies which variables are included

Literatur

Adesope, O. O., Trevisan, D. A., & Sundararajan, N. (2017). Rethinking the use of tests: A meta-analysis of practice testing. Review of Educational Research, 87(3), 659–701. https://doi.org/10.3102/0034654316689306

Azevedo, R., & Cromley, J. G. (2004). Does Training on Self-Regulated Learning Facilitate Students’ Learning With Hypermedia? Journal of Educational Psychology, 96(3), 523–535. https://doi.org/10.1037/0022-0663.96.3.523

Baars, M., Wijnia, L., de Bruin, A., & Paas, F. (2020). The relation between students’ effort and monitoring judgments during learning: A meta-analysis. Educational Psychology Review, 32(4), 979–1002. https://doi.org/10.1007/s10648-020-09569-3

Bürkner, P.-C. (2017). Brms : An R package for bayesian multilevel models using Stan. Journal of Statistical Software, 80(1). https://doi.org/10.18637/jss.v080.i01

Bürkner, P.-C. (2024). Handle missing values with brms.

Choi, H., Jovanovic, J., Poquet, O., Brooks, C., Joksimović, S., & Williams, J. J. (2023). The benefit of reflection prompts for encouraging learning with hints in an online programming course. The Internet and Higher Education, 58, 100903. https://doi.org/10.1016/j.iheduc.2023.100903

David, L., Biwer, F., Baars, M., Wijnia, L., Paas, F., & De Bruin, A. (2024). The relation between perceived mental effort, monitoring judgments, and learning outcomes: A meta-analysis. Educational Psychology Review, 36(3), 66. https://doi.org/10.1007/s10648-024-09903-z

de Bruin, A. B. H., Biwer, F., Hui, L., Onan, E., David, L., & Wiradhany, W. (2023). Worth the effort: The start and stick to desirable difficulties (S2D2) framework. Educational Psychology Review, 35(2), 41. https://doi.org/10.1007/s10648-023-09766-w

Dunlosky, J., & Rawson, K. A. (2012). Overconfidence produces underachievement: Inaccurate self evaluations undermine students’ learning and retention. Learning and Instruction, 22(4), 271–280. https://doi.org/10.1016/j.learninstruc.2011.08.003

Golke, S., Steininger, T., & Wittwer, J. (2022). What makes learners overestimate their text comprehension? The impact of learner characteristics on judgment bias. Educational Psychology Review, 34(4), 2405–2450. https://doi.org/10.1007/s10648-022-09687-0

Graham, J. W. (2009). Missing data analysis: Making it work in the real world.

Händel, M., de Bruin, A. B. H., & Dresel, M. (2020). Individual differences in local and global metacognitive judgments. Metacognition and Learning, 15(1), 51–75. https://doi.org/10.1007/s11409-020-09220-0

Hoch, E., Fleig, K., & Scheiter, K. (2023). Can Monitoring Prompts Help to Reduce a Confidence Bias When Learning With Multimedia? Zeitschrift für Entwicklungspsychologie Und Pädagogische Psychologie, 55(2-3), 77–90. https://doi.org/10.1026/0049-8637/a000279

Koriat, A., Sheffer, L., & Ma’ayan, H. (2002). Comparing objective and subjective learning curves: Judgments of learning exhibit increased underconfidence with practice. Journal of Experimental Psychology: General, 131(2), 147–162. https://doi.org/10.1037/0096-3445.131.2.147

Maki, R. H., Shields, M., Wheeler, A. E., & Zacchilli, T. L. (2005). Individual differences in absolute and relative metacomprehension accuracy. Journal of Educational Psychology, 97(4), 723–731. https://doi.org/10.1037/0022-0663.97.4.723

McIntee, S.-E., Goulet-Pelletier, J.-C., Williot, A., Deck-Léger, E., Lalande, D., Cantinotti, M., & Cousineau, D. (2022). (Mal)Adaptive cognitions as predictors of statistics anxiety. Statistics Education Research Journal, 21(1), 5–5. https://doi.org/10.52041/serj.v21i1.364

Michael, B. (2021). E-Assessment: automatische Generierung parametrisierter Aufgaben für mathematische Assessments in E-Learning-Systemen [PhD thesis, Doctoral dissertation, Technische Universität Ilmenau]. https://doi.org/10.22032/DBT.49387

Panadero, E., Brown, G. T. L., & Strijbos, J.-W. (2016). The future of student self-assessment: A review of known unknowns and potential directions. Educational Psychology Review, 28(4), 803–830. https://doi.org/10.1007/s10648-015-9350-2

Prinz, A., Golke, S., & Wittwer, J. (2020). How accurately can learners discriminate their comprehension of texts? A comprehensive meta-analysis on relative metacomprehension accuracy and influencing factors. Educational Research Review, 31, 100358. https://doi.org/10.1016/j.edurev.2020.100358

Sarac, S., & Tarhan, B. (2009). Calibration of comprehension and performance in L2 reading. International Electronic Journal of Elementary Education, 2(1), 167–179.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989X.7.2.147

Schraw, G. J. (2009). A conceptual analysis of five measures of metacognitive monitoring. Metacognition and Learning, 4(1), 33–45. https://doi.org/10.1007/s11409-008-9031-3

Seufert, T. (2018). The interplay between self-regulation in learning and cognitive load. Educational Research Review, 24, 116–129. https://doi.org/10.1016/j.edurev.2018.03.004

Seufert, T. (2020). Building bridges between self-regulation and cognitive load—an invitation for a broad and differentiated attempt. Educational Psychology Review, 32(4), 1151–1162. https://doi.org/10.1007/s10648-020-09574-6

Trassi, A. P., Leonard, S. J., Rodrigues, L. D., Rodas, J. A., & Santos, F. H. (2022). Mediating factors of statistics anxiety in university students: A systematic review and meta-analysis. Annals of the New York Academy of Sciences, 1512(1), 76–97. https://doi.org/10.1111/nyas.14746

Zimmerman, B. J. (2000). Attaining Self-Regulation: A Social Cognitive Perspective. In M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of Self-Regulation (pp. 13–39). Academic Press. https://doi.org/10.1016/B978-012109890-2/50031-7