Rater variability studies in the context of music performance assessment treat rater effects as static characteristics of raters, where the effects occur similarly across each assessed performance. The purpose of this study was to investigate expert raters’ (N = 13) differential severity/leniency as dynamic processes, where the rater effects occur over time. In particular, we sought to examine the manifestation of group and individual variability using a class of rater effects referred to as differential rater functioning over time (DRIFT). DRIFT refers to the changes in rater performance in relation to a parameter of time. Three classes of Multifaceted Rasch (MFR) models were specified in order to explore differences in raters’ systematic changes in their interpretation of a 4-point rating scale structure across a 5-day rating session: (a) time-static model, (b) rater-by-time interaction model, and (c) partial credit model for time points. Results indicated a significant difference in severity/leniency across time for both the group of raters as a whole and some individual raters. Overall, raters demonstrated a general trend of decreasing severity over the 5-day rating session. Interaction analyses suggested that differential severity/leniency existed for both the raters as a group and for 9 out of the 13 individual raters. Of the total 65 potential pairwise interaction terms examined between raters and days, 21 (33.31%) were found to be statistically significant. Ten interactions systematically underestimated the performances and 11 interactions systematically overestimated the performances. Implications for the improved fairness of ratings in music assessment contexts are discussed.

