Student Evaluations Show Bias Against Female Professors

Posted September 15, 2022

Despite earning more than half of all doctoral degrees conferred in the U.S., women are significantly underrepresented in faculty positions at colleges and universities. This is particularly true in tenure-track and tenured positions, with women making up just over a third of all full professors. Women are also less likely to receive tenure or be promoted to full professor, a situation known as the academic “leaky pipeline,” where women’s representation continues to decline the further they advance in their careers. In male-dominated fields, like economics, the statistics are more drastic: women represent only 17.5% of economics professors but earn 35% of economics graduate degrees.

While various reasons have been suggested as to why women still trail men in academic position and prestige despite increasing levels of educational attainment, one factor may play a surprisingly big role: teaching evaluations.

In a recent study, Whitney Buser, senior academic professional and associate director of Academic Programs in the School of Economics at Georgia Tech, explores the nature and causes of gender bias in student evaluations of teaching (SETs). By drawing on social role theory to inform their hypotheses, Buser and her co-authors investigated whether bias exists at the outset of the semester and whether backlash after grading exacerbates it. Their study, “Evaluation of Women in Economics: Evidence of Gender Bias Following Behavioral Role Violations,” was published in the Springer journal Sex Roles.

“We know from the literature that female instructors fare worse in student evaluations, but with nearly all research on SETs done from end-of-semester evaluations, it’s hard to pinpoint how, when, and why gender bias arises, and how much exists. That was the goal of our study,” Buser said.

Role Expectations and Gender

According to social role theory, gender inequity arises from cultural beliefs and expectations about women and men. Women are overrepresented in low-status caretaking roles, which shapes beliefs and expectations about them being communal — helpful, kind, and concerned with others. Men, however, are overrepresented in high-status provider roles, which reinforces beliefs and expectations about men being ambitious, authoritative, and competent.

Role congruity theory shows that there are negative consequences for individuals who fail to fulfill society’s expectations either by role or by behavior, and it often comes in the form of backlash. Buser hypothesized that students would perceive grade feedback from female faculty more harshly than from male faculty due to role congruency expectations of communality in women, and that this backlash would be apparent in their SETs.

The Experiment and a New Survey

Universities use different methods for conducting teaching evaluations. To allow for direct comparisons across institutions, the researchers created their own standard survey for the study. Participants included nearly 1,200 undergraduate students, all of whom were enrolled in a Principles of Economics course. The students were taught by seven faculty members at five institutions.

The survey comprised criteria used in previous studies to detect gender bias. Students were asked to evaluate their instructors across seven areas using a 5-point scale ranging from “Strongly Disagree” to “Strongly Agree.”

The first three questions were gender neutral. Students were asked if they would (1) recommend the course, (2) recommend the instructor, and (3) whether they found their instructor interesting. Next, they were asked if they found their instructor to be (4) knowledgeable and (5) challenging, both of which are widely seen as male-like qualities. The final two criteria asked students to evaluate how (6) approachable and (7) caring their instructors are — qualities usually associated with women.

The anonymous surveys were conducted twice. The first survey was administered on the second day of class (Time 1) to assess participants’ early impressions. The second survey (Time 2) was given the day after students received their grades on the first exam, to see how impressions changed after they were given instructor feedback. 


On the second day of class (Time 1), female instructors were rated significantly lower than male instructors on all three gender-neutral criteria – recommend course, recommend instructor, and interesting – and the male-leaning criteria of challenging. There was no significant difference between male and female instructors observed for the communal qualities of caring and approachable, with women ranking only slightly higher.

Their results showed that between Time 1 and Time 2, male instructors improved on every trait. At Time 2, female instructors were still rated significantly lower than male instructors on all three gender-neutral qualities and both male-leaning qualities. Overall, female instructors stayed mostly stagnant between Time 1 and Time 2 but were rated as significantly less interesting at Time 2. At Time 2, students even rated their male instructors as slightly more caring and approachable than their female counterparts, a reversal from Time 1.

“The gender discrepancy between Time 1 and Time 2 was really driven by male instructors’ evaluations improving over time. This finding indicates that students view male instructors more favorably as time goes on, which was not at all the case for the women,” Buser said. “It was clear that exam grades made the evaluations split apart, even though there was no significant difference in exam grades between female and male instructors. As we predicted, this difference indicated a clear backlash against female faculty.”


In economics, it is usually only the statistically significant differences that are worth writing about. But in this study, there is reason to care about insignificant differences, because they are often used to make crucial decisions in practice.

For example, when department chairs and administrators look at teaching evaluations in hiring, they might have two candidates with similar scores separated by only a couple of decimal points. They could choose to interview or hire the candidate with slightly higher teaching scores without knowingly making a gender-biased decision, Buser said.

Universities currently have few formal ways of taking SET gender bias into account when it comes to performance evaluation, promotion, and tenure. Addressing the issue could help universities retain female faculty and work towards repairing the leaky pipeline.

“We hope this work will highlight the presence of gender bias and encourage the development of more objective teaching evaluation tools that take this dynamic into account,” Buser said. “Eliminating or reducing gender bias in teaching evaluations could have an enormous impact on women and their ability to thrive in academia.”


Citation: Buser, W., Batz-Barbarich, C.L. & Hayter, J.K. Evaluation of Women in Economics: Evidence of Gender Bias Following Behavioral Role Violations. Sex Roles 86, 695–710 (2022).


Funding: This research was not funded in any way.

Writer: Catherine Barzler

Media Contact: Catherine Barzler |


The Georgia Institute of Technology, or Georgia Tech, is a top 15 public research university developing leaders who advance technology and improve the human condition. The Institute offers business, computing, design, engineering, liberal arts, and sciences degrees. Its nearly 44,000 students, representing 50 states and 149 countries, study at the main campus in Atlanta, at campuses in France and China, and through distance and online learning. As a leading technological university, Georgia Tech is an engine of economic development for Georgia, the Southeast, and the nation, conducting more than $1 billion in research annually for government, industry, and society.

Related Media

Contact For More Information

Catherine Barzler, Senior Research Writer/Editor