Featured Topics

Featured series.

A series of random questions answered by Harvard experts.

Explore the Gazette

Read the latest.

Michael Sandel (clockwise from top left), Moshe Halbertal, and Sari Nusseibeh

‘What we really need is just to sit together and face each other’

Jared Kushner.

Jared Kushner speaks from Middle East experience in Kennedy School forum

Steve Kornacki.

Reading tea leaves with Steve Kornacki

Mahzarin Banaji opened the symposium on Tuesday by recounting the “implicit association” experiments she had done at Yale and at Harvard. The final talk is today at 9 a.m.

Kris Snibbe/Harvard Staff Photographer

Turning a light on our implicit biases

Brett Milano

Harvard Correspondent

Social psychologist details research at University-wide faculty seminar

Few people would readily admit that they’re biased when it comes to race, gender, age, class, or nationality. But virtually all of us have such biases, even if we aren’t consciously aware of them, according to Mahzarin Banaji, Cabot Professor of Social Ethics in the Department of Psychology, who studies implicit biases. The trick is figuring out what they are so that we can interfere with their influence on our behavior.

Banaji was the featured speaker at an online seminar Tuesday, “Blindspot: Hidden Biases of Good People,” which was also the title of Banaji’s 2013 book, written with Anthony Greenwald. The presentation was part of Harvard’s first-ever University-wide faculty seminar.

“Precipitated in part by the national reckoning over race, in the wake of George Floyd, Breonna Taylor and others, the phrase ‘implicit bias’ has almost become a household word,” said moderator Judith Singer, Harvard’s senior vice provost for faculty development and diversity. Owing to the high interest on campus, Banaji was slated to present her talk on three different occasions, with the final one at 9 a.m. Thursday.

Banaji opened on Tuesday by recounting the “implicit association” experiments she had done at Yale and at Harvard. The assumptions underlying the research on implicit bias derive from well-established theories of learning and memory and the empirical results are derived from tasks that have their roots in experimental psychology and neuroscience. Banaji’s first experiments found, not surprisingly, that New Englanders associated good things with the Red Sox and bad things with the Yankees.

She then went further by replacing the sports teams with gay and straight, thin and fat, and Black and white. The responses were sometimes surprising: Shown a group of white and Asian faces, a test group at Yale associated the former more with American symbols though all the images were of U.S. citizens. In a further study, the faces of American-born celebrities of Asian descent were associated as less American than those of white celebrities who were in fact European. “This shows how discrepant our implicit bias is from even factual information,” she said.

How can an institution that is almost 400 years old not reveal a history of biases, Banaji said, citing President Charles Eliot’s words on Dexter Gate: “Depart to serve better thy country and thy kind” and asking the audience to think about what he may have meant by the last two words.

She cited Harvard’s current admission strategy of seeking geographic and economic diversity as examples of clear progress — if, as she said, “we are truly interested in bringing the best to Harvard.” She added, “We take these actions consciously, not because they are easy but  because they are in our interest and in the interest of society.”

Moving beyond racial issues, Banaji suggested that we sometimes see only what we believe we should see. To illustrate she showed a video clip of a basketball game and asked the audience to count the number of passes between players. Then the psychologist pointed out that something else had occurred in the video — a woman with an umbrella had walked through — but most watchers failed to register it. “You watch the video with a set of expectations, one of which is that a woman with an umbrella will not walk through a basketball game. When the data contradicts an expectation, the data doesn’t always win.”

Expectations, based on experience, may create associations such as “Valley Girl Uptalk” is the equivalent of “not too bright.” But when a quirky way of speaking spreads to a large number of young people from certain generations,  it stops being a useful guide. And yet, Banaji said, she has been caught in her dismissal of a great idea presented in uptalk.  Banaji stressed that the appropriate course of action is not to ask the person to change the way she speaks but rather for her and other decision makers to know that using language and accents to judge ideas is something people at their own peril.

Banaji closed the talk with a personal story that showed how subtler biases work: She’d once turned down an interview because she had issues with the magazine for which the journalist worked.

The writer accepted this and mentioned she’d been at Yale when Banaji taught there. The professor then surprised herself by agreeing to the interview based on this fragment of shared history that ought not to have influenced her. She urged her colleagues to think about positive actions, such as helping that perpetuate the status quo.

“You and I don’t discriminate the way our ancestors did,” she said. “We don’t go around hurting people who are not members of our own group. We do it in a very civilized way: We discriminate by who we help. The question we should be asking is, ‘Where is my help landing? Is it landing on the most deserved, or just on the one I shared a ZIP code with for four years?’”

To subscribe to short educational modules that help to combat implicit biases, visit outsmartinghumanminds.org .

You might like

Israeli, Palestinian philosophers exchange tales of personal loss, find points of contention, agreement over war

Jared Kushner.

Ex-Trump adviser criticizes Biden’s pre-Oct. 7 engagement in the region, defends Israel’s actions in Gaza

Steve Kornacki.

NBC, MSNBC analyst sees Democratic potential, but lots of slack, uncertainty ahead for both parties

So what exactly makes Taylor Swift so great?

Experts weigh in on pop superstar's cultural and financial impact as her tours and albums continue to break records.

Looking to rewind the aging clock

Harvard researchers create model that better measures biological age, distinguishes between harmful and adaptive changes during life

What do we do with our loneliness?

‘Harvard Thinking’ podcast explores condition with experts Jeremy Nobel and Milena Batanova

  • Research article
  • Open access
  • Published: 16 May 2019

Interventions designed to reduce implicit prejudices and implicit stereotypes in real world contexts: a systematic review

  • Chloë FitzGerald 1 ,
  • Angela Martin 2 ,
  • Delphine Berner 1 &
  • Samia Hurst 1  

BMC Psychology volume  7 , Article number:  29 ( 2019 ) Cite this article

122k Accesses

202 Citations

495 Altmetric

Metrics details

Implicit biases are present in the general population and among professionals in various domains, where they can lead to discrimination. Many interventions are used to reduce implicit bias. However, uncertainties remain as to their effectiveness.

We conducted a systematic review by searching ERIC, PUBMED and PSYCHINFO for peer-reviewed studies conducted on adults between May 2005 and April 2015, testing interventions designed to reduce implicit bias, with results measured using the Implicit Association Test (IAT) or sufficiently similar methods.

30 articles were identified as eligible. Some techniques, such as engaging with others’ perspective, appear unfruitful, at least in short term implicit bias reduction, while other techniques, such as exposure to counterstereotypical exemplars, are more promising. Robust data is lacking for many of these interventions.

Conclusions

Caution is thus advised when it comes to programs aiming at reducing biases. This does not weaken the case for implementing widespread structural and institutional changes that are multiply justified.

Peer Review reports

A standard description of implicit biases is that they are unconscious and/or automatic mental associations made between the members of a social group (or individuals who share a particular characteristic) and one or more attributes (implicit stereotype) or a negative evaluation (implicit prejudice). Implicit prejudices are distinguished from implicit stereotypes in psychology: an implicit prejudice is supposedly a ‘hotter’ generic positive or negative feeling associated with a category, e.g. pleasant/white; an implicit stereotype involves a more belief-like association between a concept that is still valenced, but has fuller descriptive content, and a category, e.g. mentally agile/white. Although the distinction between implicit stereotypes and implicit prejudices is not as clear or necessarily as useful as much of the psychological literature assumes [ 1 ], it is important to track the distinction when analysing empirical findings because it can affect the results substantially. For example, Sabin and colleagues found that paediatricians demonstrated a weak implicit anti-black race prejudice (Cohen’s d = 0.41), but a moderate effect of implicit stereotyping, in which a white patient was more likely associated with medical compliance than a black patient (Cohen’s d = 0.60) [ 2 ].

The term implicit bias is typically used to refer to both implicit stereotypes and implicit prejudices and aims to capture what is most troubling for professionals: the possibility of biased judgement and of the resulting biased behaviour. Psychologists often define bias broadly; for instance, as ‘the negative evaluation of one group and its members relative to another’ [ 3 ]. However, on an alternative definition of bias, not all negative evaluations of groups would count as implicit biases because they are not troubling for our equity concerns. For instance, I might have a negative feeling associated with fans of heavy metal music – a negative implicit prejudice towards them. However, the fans of heavy metal music, as far as we are aware, are not a disadvantaged group, thus this implicit prejudice would not count as an implicit bias on this alternative definition. We thus stipulate that an implicit association (prejudice or stereotype) counts as implicit bias for our purposes only when it is likely to have a negative impact on an already disadvantaged group; e.g. if someone has an implicit stereotype associating young girls with dolls and caring behaviour, this would count as an implicit bias. It does not fit the psychologists’ definition above because it is not a negative evaluation per se, but it is an association that creates a certain image of girls and femininity that can prevent them from excelling in areas that are traditionally considered ‘masculine’ such as mathematics [ 4 ], and in which they already suffer discrimination. An example of an implicit prejudice that counts as a bias on our definition would be an association between negative feelings and homosexual couples - a negative implicit prejudice. This could disadvantage a group that already suffers discrimination and it thus qualifies as an implicit bias.

There has been much recent interest in studying the effects of implicit bias have on behaviour, particularly when that may lead to discrimination in significant areas of life, such as health care, law enforcement, employment, criminal justice, and education. Differing outcomes correlated with race, gender, sexual orientation, nationality, socio-economic status, or age, in these areas are likely to be partly the result of implicit biases, rather than or in addition to explicit prejudice or stereotyping. Given this fact, society has an interest in finding ways to reduce levels of implicit biases among the general population and among professionals who work in these areas in particular.

There is currently a growing awareness of implicit biases, particularly in the English-speaking world, and increasing attempts to counter them in professional settings. However, we found a lack of systematic evaluation of the evidence for the effectiveness of different interventions to reduce implicit bias.

In contrast to the recent study conducted by Forscher et al. [ 5 ], which used a technique new to psychology called network meta-analysis, and examined the effectiveness of procedures to change implicit bias, our focus was solely on the reduction of implicit social prejudice and implicit stereotypes, and only on those interventions that would be applicable in real world contexts and that were tested using the most widely employed implicit measure, the Implicit Association Test (IAT) and similar measures. Forscher et al.’s scope was wider because they investigated all changes in implicit biases of all kinds, admitted studies employing a variety of implicit measures, and did not restrict types of intervention.

Despite an unclear evidence base for their usefulness, interventions and training sessions to reduce implicit bias are being offered in the English-speaking world. Our review was partly prompted by this fact. Interventions that are not designed based on empirical evidence have the potential to do more harm than good. For instance, when people are told to avoid implicit stereotyping it can actually increase their biases [ 6 , 7 ]. Ineffective training sessions may give participants and companies false confidence when in fact the training has had no ameliorative effect. False confidence in this area is particularly problematic because there is evidence that being asked to reflect on instances where one has behaved in an unbiased manner actually increases implicit bias, while reflecting on presumed failures to be unbiased reduces it [ 8 ].

We conducted a systematic review of studies measuring the effects of interventions to reduce implicit biases in adults as measured by the IAT. Interventions had to be fairly easily applicable to real life scenarios, such as workplace or healthcare settings. We concentrated solely on implicit biases because interventions that target explicit biases may leave implicit prejudices and stereotypes intact. Given the wide variety of interventions tested using different methods, a systematic review was more apt than a meta-analysis. This variety in the literature is what prompted Forscher et al. to use a novel form of meta-analysis, called ‘network meta-analysis’, which had never previously been used in psychology.

To this date, the most broadly recognized measure of implicit biases is the IAT. The IAT is usually administered as a computerized task where participants must categorize negatively and positively valenced words together with either images or words, e.g. white faces and black faces for a Race IAT. The tests must be performed as quickly as possible. The relative speed of association of black faces with positively-valenced words (and white faces and negatively-valenced words) is used as an indication of the level of anti-black bias [ 9 ].

Since its creation, the IAT has been subject to analysis and criticism as a measuring tool in the academic world [ 5 , 10 , 11 ] and, more recently, in the wider media [ 12 , 13 ], where its utility as a predictor of real-world behaviour is questioned. Some valid criticisms of the IAT are against unwise uses of it or against interpretations of results obtained with it, rather than against the measure itself. Caution about how to use and interpret the IAT has been advised by its own creators, such as Brian Nosek, who in 2012 warned against using it as a tool to predict individual behaviour, for example [ 14 ]. The fact that it is does not have a high test-retest reliability in the same individual is widely known among researchers who use it. For that reason, it is not useful as a tool to label individuals e.g. as ‘an implicit sexist’ or to predict their individual behaviour. However, the creators of the IAT frequently use it as a tool to compare levels of implicit prejudice/implicit stereotype in different populations and see how this correlates with differences in behaviour [ 15 ].

The results of the IAT are highly context specific, as much research shows [ 16 ]. That does not mean that it has no validity or no connection to behaviour, just that we need more research to better understand exactly what it is measuring and how that relates to behavioural outcomes. Challenges are to be expected when trying to measure a construct that is outside conscious awareness. The connection between all measures of psychological attitudes and behaviour is complex, as is the case with self-report questionnaires, designed to measure explicit attitudes. In fact, implicit attitude tests partly came about as a result of the ineffectiveness of self-report measures to predict behaviour. Even if the most extreme criticisms of the IAT were true and the constructs it measured had very little effect on behaviour, we would expect a virtuous person who finds discrimination based on race abhorrent to be disturbed to discover that she automatically associates a historically oppressed race that still suffers discrimination with negative qualities. Professionals with integrity should thus be concerned to eliminate psychological associations that belie their moral principles.

Our research question was: which interventions have been shown to reduce implicit bias in adults? ERIC, PUBMED, PSYCHINFO were searched for peer reviewed studies published in English between May 2005 and April 2015. Our full search strategies are included in the Additional file  1 .

Study eligibility

Studies were included if they were written in English, participants were either all adults (over 18) or the average age was over 18, and they were published in peer-reviewed journals. We excluded minors because we were interested in interventions that would be applicable in workplaces, thus on adults. The intervention had to be a controlled intentional process conducted with participants in an experimental setting, with the aim of reducing an implicit prejudice or implicit stereotype. We limited our research to social stereotypes and prejudices against people, as opposed to animals, inanimate objects, etc. Prejudices and stereotypes had to involve pre-existing associations thus excluding novel associations. They also had to be against a specific target thus excluding more generalized ‘outgroup prejudice’. An outgroup, in contrast to an ingroup, is any group to which a person does not feel that she belongs, a ‘they’ as opposed to a ‘we’. [ 17 ]

In an optimal experimental design, an implicit pre-test and post-test would be conducted on the same subjects in addition to the inclusion of a control group. However, since this is rarely found in the literature, we included articles where the effect was measured in comparison to a control group with similar characteristics. An advantage of a design using only a control group is that it eliminates any concern about a training effect occurring in participants between performing the IAT pre- and post-test.

The effect of the intervention had to be measured using a valid implicit measure before and after the intervention. In order for results to be comparable, we only included studies employing the most frequently used measure, the IAT, or a measure derived from or conceptually similar to it, such as the SC-IAT (Single Category Implicit Association Test), GNAT (Go/No-go Association Task, BIAT (Brief Implicit Association Test). Paper-based or computer versions of these tests were permitted. The IAT is the most widely used measure, and thus the most criticized and tested measure. We needed to select one implicit measure because different measures, such as affective priming, potentially measure different psychological constructs.

The intervention had to be applicable to real-world contexts and thus of a length and kind that enabled it to be easily implemented in different areas where implicit bias is a potential problem (e.g. medicine, general education, police force, legal professions and judiciary, human resources). The ease of implementation criterion is a matter of judgment, but comparisons can be made with similar types of training, such as sexual harassment training. If the intervention could be adapted to make a programme of similar length to that of current trainings typically provided in these areas, it was deemed suitable. This criterion ruled out observations drawn from natural settings that could potentially be used to develop interventions (e.g. correlations between increased contact with the outgroup and reduced bias). Many articles were excluded on this basis. It also ruled out long-term interventions involving considerable time and emotional commitment from participants. For instance, if an intervention had involved weekly attendance at a course over the course of a year (not simply changes in students’ curricula), we would have excluded it. As it happens, no interventions needed to be excluded for this reason.

We also excluded interventions that were too invasive in a person’s private life or over a person’s bodily autonomy, such as forcing people to make new friends, drink alcohol at work to reduce biases, or direct brain stimulation. There remains a grey zone when it comes to invasiveness that is open to cultural difference (e.g. whether being touched by a person of the outgroup is considered invasive).

The effectiveness of the intervention in reducing levels of implicit bias had to be initially tested within a maximum of one month from the intervention. This did not rule out further testing after this initial test. Since we were interested in interventions that reduce bias, we excluded interventions undertaken with the aim of increasing an implicit prejudice or stereotype.

Study selection

The study selection process is illustrated in Fig.  1 . Three reviewers, Angela Martin (AM), Chloë FitzGerald (CF) and Samia Hurst (SM), reviewed the 1931 titles resulting from the database searches. At least two of the three independently screened each title. Screening involved proposing the rejection of titles if there was a clear indication that the study did not fulfil our inclusion criteria. The titles that were agreed by both reviewers, or in case of uncertainty, by all three reviewers, after discussion to be ineligible according to the inclusion criteria were discarded (1600) and the abstracts of the remaining 331 articles were independently screened by at least two of the three reviewers. Abstracts that were agreed by both reviewers to be ineligible according to the inclusion criteria were discarded (169). When the ineligible abstracts were discarded, the remaining 162 articles were read and independently screened by at least two of the reviewers. After discussion, their decision on whether the article should be included was recorded and reviewed by the third reviewer who had not initially screened the article. SH reviewed the statistical analyses in the remaining 32 studies, which resulted in 2 articles being discarded due to lack of information about the statistical methods used. The final number of eligible articles was 30. However, one of the included articles [ 18 ] was in fact a competition organized to test different interventions created by different authors and thus involved 18 different interventions tested several times. Footnote 1

figure 1

Data collection process

We based our inclusion criteria on the published results. If the data and methods used to calculate the results were not available in the article, we did not attempt to contact the authors to obtain this information. CF and AM independently extracted the data from the articles and each reviewed the other’s data when extraction was complete. All disagreements with the information extracted were resolved through discussion.

Identified studies

As shown in Table  1 , there are a total of 30 eligible articles. We have included the 18 interventions designed by different authors as part of a competition, all described in a single article [ 18 ], as separate entries to aid comprehension of the table, thus making a total of 47 different interventions tested. When there are slightly different eligible studies within one article, they are listed separately in the table only when the modifications produced a result that was different from the original study (in terms of being effective or ineffective at reducing bias).

We divided the interventions into 8 categories based on their psychological features. We used as our starting point modified versions of the 6 categories that had been created by the authors of the competition article of 17 interventions [ 18 ] and added two new categories. There are many different ways in which interventions can potentially be classified and we chose to base our categories on the ones already used in the competition article to facilitate discussion within the discipline. These categories are neither exhaustive nor completely exclusive. Our categories of intervention are:

Engaging with others’ perspective, consciousness-raising or imagining contact with outgroup – participants either imagine how the outgroup thinks and feels, are made aware of the way the outgroup is marginalised or given new information about the outgroup, or imagine having contact with the outgroup.

Identifying the self with the outgroup – participants perform tasks that lessen barriers between themselves and the outgroup.

Exposure to counterstereotypical exemplars – participants are exposed to exemplars that contradict the stereotype of the outgroup.

Appeals to egalitarian values – participants are encouraged to activate egalitarian goals or think about multiculturalism, co-operation or tolerance.

Evaluative conditioning – participants perform tasks to strengthen counterstereotypical associations.

Inducing emotion –emotions or moods are induced in participants

Intentional strategies to overcome biases – participants are instructed to implement strategies to override or suppress their biases.

Drugs – participants take a drug.

Effective interventions were those that showed a reduction in bias in the same individuals after the intervention in a pre−/post-test design, or in the group who underwent the intervention in a control group design. According to our criteria, the post-test had to be completed within a maximum of 1 month from the original intervention, but this did not rule out further tests at later dates.

The most effective categories were: intentional strategies to overcome biases (all 3 interventions were effective); exposure to counterstereotypical exemplars (7 out of 8 interventions had at least one effective instance); identifying the self with the outgroup (6 interventions out of 7 had at least one effective instance); evaluative conditioning (5 out of 5 interventions had at least one effective instance); and inducing emotion (3 out of 4 interventions were effective). The sole study in our drugs category was effective. The appeals to egalitarian values category had 4 interventions that were effective and 4 that were not. The largest category was engaging with others’ perspective, with 11 interventions, but a mere 4 of these were effective.

The number of studies in each category is small, thus strong conclusions cannot be drawn from these results. Patterns indicating clearly which methods were more successful as interventions were not visible. There is an indication that some directions may prove unfruitful, at least in short term bias reduction, such as engaging with others’ perspective, while exposure to counterstereotypical exemplars seems to be the most promising form of intervention, at least in the short term.

The country where studies were conducted was overwhelming the United States – US - (35 interventions), which explains why black/white race was the most examined bias in our review (34 interventions). There were 3 interventions aimed at Middle-Eastern/white bias and one each targeting Latino/white, Arab-Muslim/black and Asian/Anglo bias. Aside from race bias, 3 interventions were tested on weight bias, 2 on sexuality bias, 2 on religion bias, 1 on age bias and 1 on gender bias. 4 interventions were conducted in the United Kingdom (UK), 2 in Australia, 1 in Spain, 1 in the Netherlands, and 4 interventions were conducted in several different countries (including Belgium, Taiwan, Hungary, Italy, Pakistan and New Zealand). There was no clear pattern concerning whether some types of bias were more susceptible to interventions than others, given that the vast majority of articles in our review investigated black/white racial bias.

A majority of studies looked at implicit prejudice. However, 5 articles looked at implicit stereotypes as well as implicit prejudices in their interventions and 3 articles looked only at implicit stereotypes. Of these, only 3 interventions were effective at reducing stereotyping. The stereotypes investigated were the following: fat/lazy versus thin/motivated (3 articles); Dutch/high status versus ethnic minority/low status; Dutch/leader versus ethnic minority/leader (SC-IAT); men/leader versus women/supporter; men/science versus women/humanities; Spanish/active versus Moroccan/restful; white/mental versus black/physical.

Limitations

Of specific studies.

Although we judged all the studies in our review of sufficient quality to be included, the quality of the study design and statistical analysis employed varied greatly. One recurrent problem was the fact that there was often a lack of a proper statistical methods section and statistical tests used were instead reported in the results [ 26 , 28 , 38 ], or even in a footnote [ 46 ]. Some studies described their statistical methods only minimally [ 19 , 25 , 29 , 31 , 32 , 33 ].

The paucity of empirically demonstrated effective interventions to reduce implicit bias and the pressure towards publishing positive results [ 48 ] is likely to tempt researchers to analyse data in a way that leads to positive results. The lack of statistical description suggests a risk of this.

An intervention tested by one study, rather than reducing implicit bias, actually increased it [ 34 ]. White participants who performed an intervention where they were embodied by a black avatar displayed greater implicit race bias than those who were embodied by a white avatar.

Of the field

Due to the interdisciplinarity of the subject and variety of fields from which articles proceeded (social psychology, medical ethics, health psychology, neuroscience, education, death studies, LGBT studies, gerontology, counselling, mental health, professional ethics, religious studies, disability studies, obesity studies) there was a lack of uniformity in the way that studies were described. In many cases, neither the titles nor the abstracts were very precise. They sometimes omitted to mention whether they tested implicit or explicit attitudes, a crucial piece of information e.g. [ 25 , 41 ]. The distinction between implicit prejudice and implicit stereotype, which is important in the psychological literature, was also often blurred so that stereotype was cited in the title when the method described using an IAT to test implicit prejudice e.g. [ 41 ]. Methods and measures used were frequently omitted from the abstract, requiring the reader to read the article in full to gain this knowledge e.g. [ 31 ].

Many interventions were tested only on undergraduate psychology students, who are unlikely to be representative of the general population [ 49 ].

As is true in many areas, more replication studies are needed to confirm results. For example, two studies in our review tested a similar intervention, involving participants being embodied by a black avatar; while one found that the intervention actually increased implicit racial prejudice [ 34 ], the other found that it reduced it [ 38 ]. There were important differences between these two studies and the latter was not a replication study. All the interventions that are found to be effective in one study need to be replicated to provide confirmation.

There were some problems related to the indexing of articles: the keywords in PSYCHINFO and PUBMED in this field have changed frequently over the last few years because implicit bias is an emerging field of interest and study. Thus, indexing in databases was somewhat inconsistent making it difficult to capture all relevant articles with keywords. The fact that our search terms differed from those used by Forscher et al. [ 5 ], and that these differences were not all accounted for by differences in research question and inclusion criteria, is a sign of the problematic variations in terminology in the field.

The effects of interventions tend to be tested only over the short term. There were no longitudinal studies in our review. Even if short-term changes in biases are efficient, these changes will not be useful at providing practical solutions to discrimination unless they persist in the long term.

There is a risk that the sorts of stereotypes being studied are likely to be those that people are most aware of, and that stereotypes that are equally or more pernicious may be less visible and thus not be tested for. For instance, social class stereotypes can be hard to identify, especially given that they are not always clearly linked to economic status and that they may vary greatly from culture to culture. Furthermore, the sort of intervention tested is likely to be limited in scope to those that people think will be effective. For example, one philosopher has argued that many researchers are biased against certain effective techniques for reducing biases partly because they seem too mechanical [ 50 ]. The fact that such limited results have been found in the search for effective interventions may be caused by biases in researchers’ thinking.

While there are well-establish general publication biases in favour of positive publications, [ 48 ] we did not find this in our study as many published null results.

While several interventions aimed at reducing implicit biases had at least one instance of demonstrated effectiveness, the sample size was small and we were not able to identify reliable interventions for practical use. Thus, currently the evidence does not indicate a clear path to follow in bias reduction. Intentional strategies to overcome biases, evaluative conditioning, identifying the self with the outgroup, and exposure to counterstereotypical exemplars are categories that merit further research. Furthermore, caution is advised, as our review reveals that many interventions are ineffective; their use at present cannot be described as evidence-based.

As the authors of the competition study point out, the interventions that were successful in their competition had some features in common in reducing black/white race bias: the interventions that linked white people with negativity and black people with positivity were more successful than the ones that only linked black people with positivity; interventions where participants were highly involved, which means that they strongly identified with people in the scenarios that were used, were also successful [ 18 ]. Our category of identifying the self with the outgroup, which included several effective studies, includes this feature of high involvement.

There are similarities between our results and those from the recent network meta-analysis on change in implicit bias conducted by Forscher et al.: they found that procedures that associated sets of concepts, invoked goals or motivations, or taxed people’s mental resources produced the largest positive changes in implicit bias [ 5 ]; two of the categories that were most effective in our review, evaluative conditioning and counterstereotypical exemplars, involve associating sets of concepts, and interventions invoking goals or motivations would be included in our intentional strategies category, which also included effective interventions. Any confirmation between our review and that of Forscher et al. is of note, especially given that we used different search terms, research questions, and inclusion criteria. Forscher et al. also found that studies measuring interventions with the IAT rather than other implicit measures tended to produce larger changes in implicit bias. Overall, they found great variance in the effects of the interventions, which supports our conclusion that current interventions are unreliable. We do not yet know why interventions work in some circumstances and not in others and thus more fine-grained research is needed examining which factors cause an intervention to be effective.

So far, there has been very little research examining long-term changes in implicit attitudes and their effects on behaviour; the recent criticisms of the IAT mentioned in our introduction highlight this. Rather than invalidating the measure, they serve to show which directions future research with the IAT should go. In fact, in a follow-up study conducted by the same researchers as the competition study included in our review, interventions that had been demonstrated to be effective immediately were tested after delays of hours and days and none were found to be effective over these extended time periods [ 51 ].

To some extent, the ineffectiveness of interventions after a longer time period is to be expected. Implicit biases have been partly formed through repeated exposure to associations: their very presence hints at their being not only generated but also maintained by culture. Any counter-actions, even if effective immediately, would then themselves be rapidly countered since participants remain part of their culture from which they receive constant inputs. To tackle this, interventions may need to be repeated frequently or somehow be constructed so that they create durable changes in the habits of participants. More in-depth interventions where participants follow a whole course or interact frequently with the outgroup have been successful [ 51 , 52 , 53 ].

Unfortunately, this suggests that interventions of the type most desired by institutions to implement in training, i.e. short, one-shot sessions that can be completed and the requisite diversity boxes ticked, may simply be non-existent. If change is really to be produced, a commitment to more in-depth training is necessary.

In conducting the review, we were aware that interventions to reduce implicit biases were not sufficient to reduce prejudice in the public in general and in professionals in different fields on the long-term. These interventions should only form part of a bigger picture that addresses structural issues, social biases and may include more intensive training that aims to change the culture and society outside institutions in addition to within them [ 54 ]. Programmes in education to address the formation of stereotypes from much earlier on would be one way to effect longer term changes. In terms of addressing workplace culture, it may be worth reflecting on how culture changes are effected in institutions in other instances, such as in the case of medical error management in health care establishments. Affirmative action programmes that increase the numbers of women and minorities in leadership positions is one example of a policy with the potential to change the cultural inputs that foment implicit bias within a workplace.

Another approach that could be effective is to focus on reducing the impact of implicit bias on behaviour rather than reducing the bias itself. Organisational policies and procedures that are designed to increase equity will have an impact on all kinds of bias, including implicit bias. For example, collecting data that monitors equity, such as gender pay gaps, and addressing disparities, or reducing discretion in decision-making.

The majority of studies in our review only looked at effects of interventions on implicit prejudice, without investigating related implicit stereotypes. The lack of investigation into implicit stereotypes is troubling. Implicit prejudice is a measure of generic positive or negative implicit feelings, but it is likely that many behaviours that lead to micro-discriminations and inequalities are linked to specific and fine-grained stereotypes. This is particularly the case with gender stereotypes, as bias towards women is not typically linked to a generic negative feeling towards women, but towards women occupying certain roles that are not stereotypically ‘feminine’. For instance, one study found that only the implicit stereotype linking men with high status occupational roles and women with low status occupational roles predicted implicit and explicit prejudice towards women in authority. Other implicit stereotypes, linking women/home and men/career, or women/supportive and men/agential, lacked this predictive effect [ 55 ]. Only 8 of the articles in our review examined implicit stereotypes, but one of these found that an intervention that was effective at reducing implicit black/white race prejudice was not effective at reducing the implicit stereotype black/physical vs. white/mental [ 39 ]. Hence, it is not only important in the case of gender to investigate the effects of interventions on stereotypes as well as prejudice. The vast majority of studies on race prejudice seem to assume that it is the blanket positive/negative comparison of whites/blacks that needs to be addressed, but it could be the case that interventions will be more effective if they tackle more specific stereotypes.

A possible limitation of the review is that we included interventions that targeted different outgroups, and one may wonder whether interventions tested on one group are really applicable/effective to biases towards other groups. Indeed, if intervention X reduces the bias in group Y, it is by no means certain that same intervention is efficient to reduce bias against group Z. Implicit bias may well be a heterogeneous phenomenon [ 56 ]. On the other hand, an inefficient intervention X on group P may be efficient if tested for some other group or bias. Nonetheless, it is interesting to compare the types of intervention that are tested on different biases and to collect the evidence for interventions against different biases in one place. Often, researchers in a field interested in a particular bias, such as health professionals researching obesity, limit themselves to reading the literature on that bias and from their specific field and thus may overlook much evidence that could be relevant to their research. Furthermore, it may be that different biases require different types of intervention, but this can only be seen clearly if the different literatures are compared.

Current data do not allow the identification of reliably effective interventions to reduce implicit biases. As our systematic review reveals, many interventions have no effect, or may even increase implicit biases. Caution is thus advised when it comes to programs aiming at reducing biases. Much more investigation into the long term effects of possible interventions is needed. The most problematic fine-grained implicit stereotypes need to be identified and a range of specifically-tailored interventions need to be designed to combat the whole gamut of prejudices that are problematic in our societies, not only targeting black/white race prejudice. More research needs to be conducted examining the conditions under which interventions will work and the factors that make them fail.

The fact that there is scarce evidence for particular bias-reducing techniques does not weaken the case for implementing widespread structural and institutional changes that are likely to reduce implicit biases, but that are justified for multiple reasons.

Our advice for future studies in this area can be summarized as follows:

Investigate the effect of interventions on implicit stereotypes as well as implicit prejudices

Use large sample sizes

Pre-register study designs

Use key words and titles that will span disciplines

Include all relevant study parameters in the title and abstract

Include all statistical analyses and data when publishing

Include all the details of the study method

Investigate the long term effects of interventions

Investigate the effects of institutional/organizational changes on implicit biases

Test interventions on a wide range of real workforces outside universities

The title of the study lists 17 interventions, but the authors included a comparison condition, which makes a total of 18 interventions tested for our purposes.

Abbreviations

Angela Martin

Brief Implicit Association Test

Chloë FitzGerald

Go/No-go Association Task

Implicit Association Test

Single Category Implicit Association Test

Samia Hurst

United Kingdom

United States

Madva A, Brownstein M. Stereotypes, prejudice, and the taxonomy of the implicit social mind. Noûs. 2018;52(3):611–44.

Article   Google Scholar  

Sabin JA, Rivara FP, Greenwald AG. Physician implicit attitudes and stereotypes about race and quality of medical care. Med Care. 2008;46(7):678–85.

Article   PubMed   Google Scholar  

Blair IV, Steiner JF, Havranek EP. Unconscious (implicit) bias and health disparities: where do we go from here? Perm J. 2011;15(2):71.

PubMed   PubMed Central   Google Scholar  

Ambady N, Shih M, Kim A, Pittinsky TL. Stereotype susceptibility in children: effects of identity activation on quantitative performance. Psychol Sci. 2001;12(5):385–90.

Forscher PS, Lai CK, Axt J, Ebersole CR, Herman M, Devine PG, et al. A Meta-Analysis of Procedures to Change Implicit Measures. 2016. https://doi.org/10.31234/osf.io/dv8tu .

Payne BK, Lambert AJ, Jacoby LL. Best laid plans: effects of goals on accessibility bias and cognitive control in race-based misperceptions of weapons. J Exp Soc Psychol. 2002;38(4):384–96.

Galinsky AD, Moskowitz GB. Perspective-taking: decreasing stereotype expression, stereotype accessibility, and in-group favoritism. J Pers Soc Psychol. 2000;78(4):708.

Moskowitz GB, Li P. Egalitarian goals trigger stereotype inhibition: a proactive form of stereotype control. J Exp Soc Psychol. 2011;47(1):103–16.

Greenwald AG, McGhee DE, Schwartz JL. Measuring individual differences in implicit cognition: the implicit association test. J Pers Soc Psychol. 1998;74(6):1464.

Oswald FL, Mitchell G, Blanton H, Jaccard J, Tetlock PE. Predicting ethnic and racial discrimination: a metaanalysis of IAT criterion studies. J Pers Soc Psychol. 2013;105(2):171.

De Houwer J. What are implicit measures and why are we using them. Handb Implicit Cogn Addict. 2006:11–28.

Bartlett T. Can we really measure implicit bias? Maybe not. Chron High Educ. 2017.

Singal J. Psychology’s favorite tool for measuring racism isn’t up to the job. N Y Mag. 2017.

Nosek BA, Riskind RG. Policy implications of implicit social cognition. Soc Issues Policy Rev. 2012;6(1):113–47.

Greenwald AG, Banaji MR, Nosek BA. Statistically small effects of the implicit association test can have societally large effects; 2015.

Book   Google Scholar  

Blair IV. The malleability of automatic stereotypes and prejudice. Pers Soc Psychol Rev. 2002;6(3):242–61.

Tajfel H. Experiments in intergroup discrimination. Sci Am. 1970;223(5):96–103.

Lai CK, Marini M, Lehr SA, Cerruti C, Shin J-EL, Joy-Gaba JA, et al. Reducing implicit racial preferences: I. a comparative investigation of 17 interventions. J Exp Psychol Gen. 2014;143(4):1765.

Dermody N, Jones MK, Cumming SR. The failure of imagined contact in reducing explicit and implicit out-group prejudice toward male homosexuals. Curr Psychol. 2013;32(3):261–74.

Turner RN, Crisp RJ. Imagining intergroup contact reduces implicit prejudice. Br J Soc Psychol. 2010;49(1):129–42.

Rukavina PB, Li W, Shen B, Sun H. A service learning based project to change implicit and explicit bias toward obese individuals in kinesiology pre-professionals. Obes Facts. 2010;3(2):117–26.

Article   PubMed   PubMed Central   Google Scholar  

Swift JA, Tischler V, Markham S, Gunning I, Glazebrook C, Beer C, et al. Are anti-stigma films a useful strategy for reducing weight bias among trainee healthcare professionals? Results of a pilot randomized control trial. Obes Facts. 2013;6(1):91–102.

Devine PG, Forscher PS, Austin AJ, Cox WT. Long-term reduction in implicit race bias: A prejudice habitbreaking intervention. J Exp Soc Psychol. 2012;48(6):1267–78.

O’Brien KS, Puhl RM, Latner JD, Mir AS, Hunter JA. Reducing Anti-Fat Prejudice in Preservice Health Students: A Randomized Trial. Obesity. 2010;18(11):2138–44.

Castillo J-LÁ, Camara CP, Eguizábal AJ. Prejudice reduction in university programs for older adults. Educ Gerontol. 2011;37(2):164–90.

Park J, Felix K, Lee G. Implicit attitudes toward Arab-Muslims and the moderating effects of social information. Basic Appl Soc Psychol. 2007;29(1):35–45.

Joy-Gaba JA, Nosek BA. The surprisingly limited malleability of implicit racial evaluations. Soc Psychol. 2010; [cited 2016 Jul 14]; Available from: http://econtent.hogrefe.com/doi/full/10.1027/1864-9335/a000020 .

McGrane JA, White FA. Differences in Anglo and Asian Australians’ explicit and implicit prejudice and the attenuation of their implicit in-group bias. Asian J Soc Psychol. 2007;10(3):204–10.

Columb C, Plant EA. Revisiting the Obama effect: Exposure to Obama reduces implicit prejudice. J Exp Soc Psychol. 2011;47(2):499–501.

Blincoe S, Harris MJ. Prejudice reduction in white students: Comparing three conceptual approaches. J Divers High Educ. 2009;2(4):232.

Clobert M, Saroglou V, Hwang K-K. Buddhist concepts as implicitly reducing prejudice and increasing prosociality. Pers Soc Psychol Bull. 2015;41(4):513–25.

Castillo LG, Brossart DF, Reyes CJ, Conoley CW, Phoummarath MJ. The influence of multicultural training on perceived multicultural counseling competencies and implicit racial prejudice. J Multicult Couns Dev. 2007;35(4):243–55.

Brannon TN, Walton GM. Enacting cultural interests: How intergroup contact reduces prejudice by sparking interest in an out-group’s culture. Psychol Sci. 2013;24(10):1947–57.

Groom V, Bailenson JN, Nass C. The influence of racial embodiment on racial bias in immersive virtual environments. Soc Influ. 2009;4(3):231–48.

Gündemir S, Homan AC, de Dreu CK, van Vugt M. Think leader, think white? Capturing and weakening an implicit pro-white leadership bias. PLoS One. 2014;9(1):e83915.

Hall NR, Crisp RJ, Suen M. Reducing implicit prejudice by blurring intergroup boundaries. Basic Appl Soc Psychol. 2009;31(3):244–54.

Maister L, Sebanz N, Knoblich G, Tsakiris M. Experiencing ownership over a dark-skinned body reduces implicit racial bias. Cognition. 2013;128(2):170–8.

Peck TC, Seinfeld S, Aglioti SM, Slater M. Putting yourself in the skin of a black avatar reduces implicit racial bias. Conscious Cogn. 2013;22(3):779–87.

Woodcock A, Monteith MJ. Forging links with the self to combat implicit bias. Group Process Intergroup Relat. 2013;16(4):445–61.

Calanchini J, Gonsalkorale K, Sherman JW, Klauer KC. Counter-prejudicial training reduces activation of biased associations and enhances response monitoring. Eur J Soc Psychol. 2013;43(5):321–5.

French AR, Franz TM, Phelan LL, Blaine BE. Reducing Muslim/Arab stereotypes through evaluative conditioning. J Soc Psychol. 2013;153(1):6–9.

Kawakami K, Phills CE, Steele JR, Dovidio JF. (Close) distance makes the heart grow fonder: Improving implicit racial attitudes and interracial interactions through approach behaviors. J Pers Soc Psychol. 2007;92(6):957.

Huntsinger JR, Sinclair S, Clore GL. Affective regulation of implicitly measured stereotypes and attitudes: Automatic and controlled processes. J Exp Soc Psychol. 2009;45(3):560–6.

Huntsinger JR, Sinclair S, Dunn E, Clore GL. Affective regulation of stereotype activation: It’s the (accessible) thought that counts. Pers Soc Psychol Bull. 2010;36(4):564–77.

Lai CK, Haidt J, Nosek BA. Moral elevation reduces prejudice against gay men. Cognit Emot. 2014;28(5):781–94.

Wallaert M, Ward A, Mann T. Explicit Control of Implicit Responses. Soc Psychol. 2010.

Terbeck S, Kahane G, McTavish S, Savulescu J, Cowen PJ, Hewstone M. Propranolol reduces implicit negative racial bias. Psychopharmacology (Berl). 2012;222(3):419–24.

Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22(11):1359–66.

Henrich J, Heine SJ, Norenzayan A. Most people are not WEIRD. Nature. 2010;466(7302):29.

Madva A. Biased against Debiasing: On the Role of (Institutionally Sponsored) Self-Transformation in the Struggle against Prejuice. Open Access J Philos. 2017;4.

Lai CK, Skinner AL, Cooley E, Murrar S, Brauer M, Devos T, et al. Reducing implicit racial preferences: II. Intervention effectiveness across time. J Exp Psychol Gen. 2016;145(8):1001.

Rudman LA, Ashmore RD, Gary ML. “Unlearning” automatic biases: the malleability of implicit prejudice and stereotypes. J Pers Soc Psychol. 2001;81(5):856.

Shook NJ, Fazio RH. Interracial roommate relationships: An experimental field test of the contact hypothesis. Psychol Sci. 2008;19(7):717–23.

Russell CA. Questions of Race in Bioethics: Deceit, Disregard, Disparity, and the Work of Decentering. Philos Compass. 2016;11(1):43–55.

Rudman LA, Kilianski SE. Implicit and explicit attitudes toward female authority. Pers Soc Psychol Bull. 2000;26(11):1315–28.

Holroyd J, Sweetman J. The Heterogeneity of Implicit Bias. In: Brownstein M, Saul J, editors. Implicit Bias and Philosophy, Volume 1: Metaphysics and Epistemology: Oxford University Press; 2016. p. 80–103.

Download references

Acknowledgments

We are very grateful to Tobias Brosch for his advice in the planning stage of the review and to Janice Sabin and Jules Holroyd for extremely helpful comments on the manuscript, particularly their suggestions about the importance of focussing on organisational policy to promote equity. We would also like to thank the librarians from the University of Geneva Medical School library and the Psychology section of the Humanities library for their kind help with the initial keyword searches.

The systematic review was funded by a grant from the Swiss National Science Foundation, number 32003B_149407. The funding body approved the proposal for the systematic review as part of a larger project. After approval, they were not involved in the design of the study, nor the collection, analysis and interpretation of data, nor in writing the manuscript.

Availability of data and materials

Our full search strategies for each database is available in Additional file 1 so that the search can be accurately reproduced.

Author information

Authors and affiliations.

iEH2 (Institute for Ethics, History and the Humanities), Faculty of Medicine, University of Geneva, Geneva, Switzerland

Chloë FitzGerald, Delphine Berner & Samia Hurst

Department of Philosophy, University of Fribourg, Fribourg, Switzerland

You can also search for this author in PubMed   Google Scholar

Contributions

AM initially researched the suitable databases, performed the searches and organized the reviewing of the titles with supervision from CF and SH. AM, CF and SH reviewed the titles as described in the Methods section and SH reviewed the statistical sections. Data was extracted by AM and CF and Table 1 was drafted from this information by DB. DB contributed to the selection of categories of intervention and prompted further discussion regarding the presentation and organization of data. CF drafted the manuscript with major contributions from AM and input from SH. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Chloë FitzGerald .

Ethics declarations

Ethics approval and consent to participate.

Not applicable

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:.

Full search strategies. (DOCX 15 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

FitzGerald, C., Martin, A., Berner, D. et al. Interventions designed to reduce implicit prejudices and implicit stereotypes in real world contexts: a systematic review. BMC Psychol 7 , 29 (2019). https://doi.org/10.1186/s40359-019-0299-7

Download citation

Received : 24 December 2018

Accepted : 03 April 2019

Published : 16 May 2019

DOI : https://doi.org/10.1186/s40359-019-0299-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Implicit prejudice
  • Implicit stereotype
  • Implicit bias
  • Unconscious bias
  • Interventions
  • Professional ethics

BMC Psychology

ISSN: 2050-7283

research articles with bias

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Research bias

Types of Bias in Research | Definition & Examples

Research bias results from any deviation from the truth, causing distorted results and wrong conclusions. Bias can occur at any phase of your research, including during data collection , data analysis , interpretation, or publication. Research bias can occur in both qualitative and quantitative research .

Understanding research bias is important for several reasons.

  • Bias exists in all research, across research designs , and is difficult to eliminate.
  • Bias can occur at any stage of the research process .
  • Bias impacts the validity and reliability of your findings, leading to misinterpretation of data.

It is almost impossible to conduct a study without some degree of research bias. It’s crucial for you to be aware of the potential types of bias, so you can minimize them.

For example, the success rate of the program will likely be affected if participants start to drop out ( attrition ). Participants who become disillusioned due to not losing weight may drop out, while those who succeed in losing weight are more likely to continue. This in turn may bias the findings towards more favorable results.  

Table of contents

Information bias, interviewer bias.

  • Publication bias

Researcher bias

Response bias.

Selection bias

Cognitive bias

How to avoid bias in research

Other types of research bias, frequently asked questions about research bias.

Information bias , also called measurement bias, arises when key study variables are inaccurately measured or classified. Information bias occurs during the data collection step and is common in research studies that involve self-reporting and retrospective data collection. It can also result from poor interviewing techniques or differing levels of recall from participants.

The main types of information bias are:

  • Recall bias
  • Observer bias

Performance bias

Regression to the mean (rtm).

Over a period of four weeks, you ask students to keep a journal, noting how much time they spent on their smartphones along with any symptoms like muscle twitches, aches, or fatigue.

Recall bias is a type of information bias. It occurs when respondents are asked to recall events in the past and is common in studies that involve self-reporting.

As a rule of thumb, infrequent events (e.g., buying a house or a car) will be memorable for longer periods of time than routine events (e.g., daily use of public transportation). You can reduce recall bias by running a pilot survey and carefully testing recall periods. If possible, test both shorter and longer periods, checking for differences in recall.

  • A group of children who have been diagnosed, called the case group
  • A group of children who have not been diagnosed, called the control group

Since the parents are being asked to recall what their children generally ate over a period of several years, there is high potential for recall bias in the case group.

The best way to reduce recall bias is by ensuring your control group will have similar levels of recall bias to your case group. Parents of children who have childhood cancer, which is a serious health problem, are likely to be quite concerned about what may have contributed to the cancer.

Thus, if asked by researchers, these parents are likely to think very hard about what their child ate or did not eat in their first years of life. Parents of children with other serious health problems (aside from cancer) are also likely to be quite concerned about any diet-related question that researchers ask about.

Observer bias is the tendency of research participants to see what they expect or want to see, rather than what is actually occurring. Observer bias can affect the results in observationa l and experimental studies, where subjective judgment (such as assessing a medical image) or measurement (such as rounding blood pressure readings up or down) is part of the d ata collection process.

Observer bias leads to over- or underestimation of true values, which in turn compromise the validity of your findings. You can reduce observer bias by using double-blinded  and single-blinded research methods.

Based on discussions you had with other researchers before starting your observations , you are inclined to think that medical staff tend to simply call each other when they need specific patient details or have questions about treatments.

At the end of the observation period, you compare notes with your colleague. Your conclusion was that medical staff tend to favor phone calls when seeking information, while your colleague noted down that medical staff mostly rely on face-to-face discussions. Seeing that your expectations may have influenced your observations, you and your colleague decide to conduct semi-structured interviews with medical staff to clarify the observed events. Note: Observer bias and actor–observer bias are not the same thing.

Performance bias is unequal care between study groups. Performance bias occurs mainly in medical research experiments, if participants have knowledge of the planned intervention, therapy, or drug trial before it begins.

Studies about nutrition, exercise outcomes, or surgical interventions are very susceptible to this type of bias. It can be minimized by using blinding , which prevents participants and/or researchers from knowing who is in the control or treatment groups. If blinding is not possible, then using objective outcomes (such as hospital admission data) is the best approach.

When the subjects of an experimental study change or improve their behavior because they are aware they are being studied, this is called the Hawthorne effect (or observer effect). Similarly, the John Henry effect occurs when members of a control group are aware they are being compared to the experimental group. This causes them to alter their behavior in an effort to compensate for their perceived disadvantage.

Regression to the mean (RTM) is a statistical phenomenon that refers to the fact that a variable that shows an extreme value on its first measurement will tend to be closer to the center of its distribution on a second measurement.

Medical research is particularly sensitive to RTM. Here, interventions aimed at a group or a characteristic that is very different from the average (e.g., people with high blood pressure) will appear to be successful because of the regression to the mean. This can lead researchers to misinterpret results, describing a specific intervention as causal when the change in the extreme groups would have happened anyway.

In general, among people with depression, certain physical and mental characteristics have been observed to deviate from the population mean .

This could lead you to think that the intervention was effective when those treated showed improvement on measured post-treatment indicators, such as reduced severity of depressive episodes.

However, given that such characteristics deviate more from the population mean in people with depression than in people without depression, this improvement could be attributed to RTM.

Interviewer bias stems from the person conducting the research study. It can result from the way they ask questions or react to responses, but also from any aspect of their identity, such as their sex, ethnicity, social class, or perceived attractiveness.

Interviewer bias distorts responses, especially when the characteristics relate in some way to the research topic. Interviewer bias can also affect the interviewer’s ability to establish rapport with the interviewees, causing them to feel less comfortable giving their honest opinions about sensitive or personal topics.

Participant: “I like to solve puzzles, or sometimes do some gardening.”

You: “I love gardening, too!”

In this case, seeing your enthusiastic reaction could lead the participant to talk more about gardening.

Establishing trust between you and your interviewees is crucial in order to ensure that they feel comfortable opening up and revealing their true thoughts and feelings. At the same time, being overly empathetic can influence the responses of your interviewees, as seen above.

Publication bias occurs when the decision to publish research findings is based on their nature or the direction of their results. Studies reporting results that are perceived as positive, statistically significant , or favoring the study hypotheses are more likely to be published due to publication bias.

Publication bias is related to data dredging (also called p -hacking ), where statistical tests on a set of data are run until something statistically significant happens. As academic journals tend to prefer publishing statistically significant results, this can pressure researchers to only submit statistically significant results. P -hacking can also involve excluding participants or stopping data collection once a p value of 0.05 is reached. However, this leads to false positive results and an overrepresentation of positive results in published academic literature.

Researcher bias occurs when the researcher’s beliefs or expectations influence the research design or data collection process. Researcher bias can be deliberate (such as claiming that an intervention worked even if it didn’t) or unconscious (such as letting personal feelings, stereotypes, or assumptions influence research questions ).

The unconscious form of researcher bias is associated with the Pygmalion effect (or Rosenthal effect ), where the researcher’s high expectations (e.g., that patients assigned to a treatment group will succeed) lead to better performance and better outcomes.

Researcher bias is also sometimes called experimenter bias, but it applies to all types of investigative projects, rather than only to experimental designs .

  • Good question: What are your views on alcohol consumption among your peers?
  • Bad question: Do you think it’s okay for young people to drink so much?

Response bias is a general term used to describe a number of different situations where respondents tend to provide inaccurate or false answers to self-report questions, such as those asked on surveys or in structured interviews .

This happens because when people are asked a question (e.g., during an interview ), they integrate multiple sources of information to generate their responses. Because of that, any aspect of a research study may potentially bias a respondent. Examples include the phrasing of questions in surveys, how participants perceive the researcher, or the desire of the participant to please the researcher and to provide socially desirable responses.

Response bias also occurs in experimental medical research. When outcomes are based on patients’ reports, a placebo effect can occur. Here, patients report an improvement despite having received a placebo, not an active medical treatment.

While interviewing a student, you ask them:

“Do you think it’s okay to cheat on an exam?”

Common types of response bias are:

Acquiescence bias

Demand characteristics.

  • Social desirability bias

Courtesy bias

  • Question-order bias

Extreme responding

Acquiescence bias is the tendency of respondents to agree with a statement when faced with binary response options like “agree/disagree,” “yes/no,” or “true/false.” Acquiescence is sometimes referred to as “yea-saying.”

This type of bias occurs either due to the participant’s personality (i.e., some people are more likely to agree with statements than disagree, regardless of their content) or because participants perceive the researcher as an expert and are more inclined to agree with the statements presented to them.

Q: Are you a social person?

People who are inclined to agree with statements presented to them are at risk of selecting the first option, even if it isn’t fully supported by their lived experiences.

In order to control for acquiescence, consider tweaking your phrasing to encourage respondents to make a choice truly based on their preferences. Here’s an example:

Q: What would you prefer?

  • A quiet night in
  • A night out with friends

Demand characteristics are cues that could reveal the research agenda to participants, risking a change in their behaviors or views. Ensuring that participants are not aware of the research objectives is the best way to avoid this type of bias.

On each occasion, patients reported their pain as being less than prior to the operation. While at face value this seems to suggest that the operation does indeed lead to less pain, there is a demand characteristic at play. During the interviews, the researcher would unconsciously frown whenever patients reported more post-op pain. This increased the risk of patients figuring out that the researcher was hoping that the operation would have an advantageous effect.

Social desirability bias is the tendency of participants to give responses that they believe will be viewed favorably by the researcher or other participants. It often affects studies that focus on sensitive topics, such as alcohol consumption or sexual behavior.

You are conducting face-to-face semi-structured interviews with a number of employees from different departments. When asked whether they would be interested in a smoking cessation program, there was widespread enthusiasm for the idea.

Note that while social desirability and demand characteristics may sound similar, there is a key difference between them. Social desirability is about conforming to social norms, while demand characteristics revolve around the purpose of the research.

Courtesy bias stems from a reluctance to give negative feedback, so as to be polite to the person asking the question. Small-group interviewing where participants relate in some way to each other (e.g., a student, a teacher, and a dean) is especially prone to this type of bias.

Question order bias

Question order bias occurs when the order in which interview questions are asked influences the way the respondent interprets and evaluates them. This occurs especially when previous questions provide context for subsequent questions.

When answering subsequent questions, respondents may orient their answers to previous questions (called a halo effect ), which can lead to systematic distortion of the responses.

Extreme responding is the tendency of a respondent to answer in the extreme, choosing the lowest or highest response available, even if that is not their true opinion. Extreme responding is common in surveys using Likert scales , and it distorts people’s true attitudes and opinions.

Disposition towards the survey can be a source of extreme responding, as well as cultural components. For example, people coming from collectivist cultures tend to exhibit extreme responses in terms of agreement, while respondents indifferent to the questions asked may exhibit extreme responses in terms of disagreement.

Selection bias is a general term describing situations where bias is introduced into the research from factors affecting the study population.

Common types of selection bias are:

Sampling or ascertainment bias

  • Attrition bias
  • Self-selection (or volunteer) bias
  • Survivorship bias
  • Nonresponse bias
  • Undercoverage bias

Sampling bias occurs when your sample (the individuals, groups, or data you obtain for your research) is selected in a way that is not representative of the population you are analyzing. Sampling bias threatens the external validity of your findings and influences the generalizability of your results.

The easiest way to prevent sampling bias is to use a probability sampling method . This way, each member of the population you are studying has an equal chance of being included in your sample.

Sampling bias is often referred to as ascertainment bias in the medical field.

Attrition bias occurs when participants who drop out of a study systematically differ from those who remain in the study. Attrition bias is especially problematic in randomized controlled trials for medical research because participants who do not like the experience or have unwanted side effects can drop out and affect your results.

You can minimize attrition bias by offering incentives for participants to complete the study (e.g., a gift card if they successfully attend every session). It’s also a good practice to recruit more participants than you need, or minimize the number of follow-up sessions or questions.

You provide a treatment group with weekly one-hour sessions over a two-month period, while a control group attends sessions on an unrelated topic. You complete five waves of data collection to compare outcomes: a pretest survey, three surveys during the program, and a posttest survey.

Self-selection or volunteer bias

Self-selection bias (also called volunteer bias ) occurs when individuals who volunteer for a study have particular characteristics that matter for the purposes of the study.

Volunteer bias leads to biased data, as the respondents who choose to participate will not represent your entire target population. You can avoid this type of bias by using random assignment —i.e., placing participants in a control group or a treatment group after they have volunteered to participate in the study.

Closely related to volunteer bias is nonresponse bias , which occurs when a research subject declines to participate in a particular study or drops out before the study’s completion.

Considering that the hospital is located in an affluent part of the city, volunteers are more likely to have a higher socioeconomic standing, higher education, and better nutrition than the general population.

Survivorship bias occurs when you do not evaluate your data set in its entirety: for example, by only analyzing the patients who survived a clinical trial.

This strongly increases the likelihood that you draw (incorrect) conclusions based upon those who have passed some sort of selection process—focusing on “survivors” and forgetting those who went through a similar process and did not survive.

Note that “survival” does not always mean that participants died! Rather, it signifies that participants did not successfully complete the intervention.

However, most college dropouts do not become billionaires. In fact, there are many more aspiring entrepreneurs who dropped out of college to start companies and failed than succeeded.

Nonresponse bias occurs when those who do not respond to a survey or research project are different from those who do in ways that are critical to the goals of the research. This is very common in survey research, when participants are unable or unwilling to participate due to factors like lack of the necessary skills, lack of time, or guilt or shame related to the topic.

You can mitigate nonresponse bias by offering the survey in different formats (e.g., an online survey, but also a paper version sent via post), ensuring confidentiality , and sending them reminders to complete the survey.

You notice that your surveys were conducted during business hours, when the working-age residents were less likely to be home.

Undercoverage bias occurs when you only sample from a subset of the population you are interested in. Online surveys can be particularly susceptible to undercoverage bias. Despite being more cost-effective than other methods, they can introduce undercoverage bias as a result of excluding people who do not use the internet.

Cognitive bias refers to a set of predictable (i.e., nonrandom) errors in thinking that arise from our limited ability to process information objectively. Rather, our judgment is influenced by our values, memories, and other personal traits. These create “ mental shortcuts” that help us process information intuitively and decide faster. However, cognitive bias can also cause us to misunderstand or misinterpret situations, information, or other people.

Because of cognitive bias, people often perceive events to be more predictable after they happen.

Although there is no general agreement on how many types of cognitive bias exist, some common types are:

  • Anchoring bias  
  • Framing effect  
  • Actor-observer bias
  • Availability heuristic (or availability bias)
  • Confirmation bias  
  • Halo effect
  • The Baader-Meinhof phenomenon  

Anchoring bias

Anchoring bias is people’s tendency to fixate on the first piece of information they receive, especially when it concerns numbers. This piece of information becomes a reference point or anchor. Because of that, people base all subsequent decisions on this anchor. For example, initial offers have a stronger influence on the outcome of negotiations than subsequent ones.

  • Framing effect

Framing effect refers to our tendency to decide based on how the information about the decision is presented to us. In other words, our response depends on whether the option is presented in a negative or positive light, e.g., gain or loss, reward or punishment, etc. This means that the same information can be more or less attractive depending on the wording or what features are highlighted.

Actor–observer bias

Actor–observer bias occurs when you attribute the behavior of others to internal factors, like skill or personality, but attribute your own behavior to external or situational factors.

In other words, when you are the actor in a situation, you are more likely to link events to external factors, such as your surroundings or environment. However, when you are observing the behavior of others, you are more likely to associate behavior with their personality, nature, or temperament.

One interviewee recalls a morning when it was raining heavily. They were rushing to drop off their kids at school in order to get to work on time. As they were driving down the highway, another car cut them off as they were trying to merge. They tell you how frustrated they felt and exclaim that the other driver must have been a very rude person.

At another point, the same interviewee recalls that they did something similar: accidentally cutting off another driver while trying to take the correct exit. However, this time, the interviewee claimed that they always drive very carefully, blaming their mistake on poor visibility due to the rain.

  • Availability heuristic

Availability heuristic (or availability bias) describes the tendency to evaluate a topic using the information we can quickly recall to our mind, i.e., that is available to us. However, this is not necessarily the best information, rather it’s the most vivid or recent. Even so, due to this mental shortcut, we tend to think that what we can recall must be right and ignore any other information.

  • Confirmation bias

Confirmation bias is the tendency to seek out information in a way that supports our existing beliefs while also rejecting any information that contradicts those beliefs. Confirmation bias is often unintentional but still results in skewed results and poor decision-making.

Let’s say you grew up with a parent in the military. Chances are that you have a lot of complex emotions around overseas deployments. This can lead you to over-emphasize findings that “prove” that your lived experience is the case for most families, neglecting other explanations and experiences.

The halo effect refers to situations whereby our general impression about a person, a brand, or a product is shaped by a single trait. It happens, for instance, when we automatically make positive assumptions about people based on something positive we notice, while in reality, we know little about them.

The Baader-Meinhof phenomenon

The Baader-Meinhof phenomenon (or frequency illusion) occurs when something that you recently learned seems to appear “everywhere” soon after it was first brought to your attention. However, this is not the case. What has increased is your awareness of something, such as a new word or an old song you never knew existed, not their frequency.

While very difficult to eliminate entirely, research bias can be mitigated through proper study design and implementation. Here are some tips to keep in mind as you get started.

  • Clearly explain in your methodology section how your research design will help you meet the research objectives and why this is the most appropriate research design.
  • In quantitative studies , make sure that you use probability sampling to select the participants. If you’re running an experiment, make sure you use random assignment to assign your control and treatment groups.
  • Account for participants who withdraw or are lost to follow-up during the study. If they are withdrawing for a particular reason, it could bias your results. This applies especially to longer-term or longitudinal studies .
  • Use triangulation to enhance the validity and credibility of your findings.
  • Phrase your survey or interview questions in a neutral, non-judgmental tone. Be very careful that your questions do not steer your participants in any particular direction.
  • Consider using a reflexive journal. Here, you can log the details of each interview , paying special attention to any influence you may have had on participants. You can include these in your final analysis.
  • Baader–Meinhof phenomenon
  • Sampling bias
  • Ascertainment bias
  • Self-selection bias
  • Hawthorne effect
  • Omitted variable bias
  • Pygmalion effect
  • Placebo effect

Research bias affects the validity and reliability of your research findings , leading to false conclusions and a misinterpretation of the truth. This can have serious implications in areas like medical research where, for example, a new form of treatment may be evaluated.

Observer bias occurs when the researcher’s assumptions, views, or preconceptions influence what they see and record in a study, while actor–observer bias refers to situations where respondents attribute internal factors (e.g., bad character) to justify other’s behavior and external factors (difficult circumstances) to justify the same behavior in themselves.

Response bias is a general term used to describe a number of different conditions or factors that cue respondents to provide inaccurate or false answers during surveys or interviews. These factors range from the interviewer’s perceived social position or appearance to the the phrasing of questions in surveys.

Nonresponse bias occurs when the people who complete a survey are different from those who did not, in ways that are relevant to the research topic. Nonresponse can happen because people are either not willing or not able to participate.

Is this article helpful?

Other students also liked.

  • Attrition Bias | Examples, Explanation, Prevention
  • Observer Bias | Definition, Examples, Prevention
  • What Is Social Desirability Bias? | Definition & Examples

More interesting articles

  • Demand Characteristics | Definition, Examples & Control
  • Hostile Attribution Bias | Definition & Examples
  • Regression to the Mean | Definition & Examples
  • Representativeness Heuristic | Example & Definition
  • Sampling Bias and How to Avoid It | Types & Examples
  • Self-Fulfilling Prophecy | Definition & Examples
  • The Availability Heuristic | Example & Definition
  • The Baader–Meinhof Phenomenon Explained
  • What Is a Ceiling Effect? | Definition & Examples
  • What Is Actor-Observer Bias? | Definition & Examples
  • What Is Affinity Bias? | Definition & Examples
  • What Is Anchoring Bias? | Definition & Examples
  • What Is Ascertainment Bias? | Definition & Examples
  • What Is Belief Bias? | Definition & Examples
  • What Is Bias for Action? | Definition & Examples
  • What Is Cognitive Bias? | Definition, Types, & Examples
  • What Is Confirmation Bias? | Definition & Examples
  • What Is Conformity Bias? | Definition & Examples
  • What Is Correspondence Bias? | Definition & Example
  • What Is Explicit Bias? | Definition & Examples
  • What Is Generalizability? | Definition & Examples
  • What Is Hindsight Bias? | Definition & Examples
  • What Is Implicit Bias? | Definition & Examples
  • What Is Information Bias? | Definition & Examples
  • What Is Ingroup Bias? | Definition & Examples
  • What Is Negativity Bias? | Definition & Examples
  • What Is Nonresponse Bias? | Definition & Example
  • What Is Normalcy Bias? | Definition & Example
  • What Is Omitted Variable Bias? | Definition & Examples
  • What Is Optimism Bias? | Definition & Examples
  • What Is Outgroup Bias? | Definition & Examples
  • What Is Overconfidence Bias? | Definition & Examples
  • What Is Perception Bias? | Definition & Examples
  • What Is Primacy Bias? | Definition & Example
  • What Is Publication Bias? | Definition & Examples
  • What Is Recall Bias? | Definition & Examples
  • What Is Recency Bias? | Definition & Examples
  • What Is Response Bias? | Definition & Examples
  • What Is Selection Bias? | Definition & Examples
  • What Is Self-Selection Bias? | Definition & Example
  • What Is Self-Serving Bias? | Definition & Example
  • What Is Status Quo Bias? | Definition & Examples
  • What Is Survivorship Bias? | Definition & Examples
  • What Is the Affect Heuristic? | Example & Definition
  • What Is the Egocentric Bias? | Definition & Examples
  • What Is the Framing Effect? | Definition & Examples
  • What Is the Halo Effect? | Definition & Examples
  • What Is the Hawthorne Effect? | Definition & Examples
  • What Is the Placebo Effect? | Definition & Examples
  • What Is the Pygmalion Effect? | Definition & Examples
  • What Is Unconscious Bias? | Definition & Examples
  • What Is Undercoverage Bias? | Definition & Example
  • What Is Vividness Bias? | Definition & Examples
  • Research article
  • Open access
  • Published: 01 June 2020

Publication and related biases in health services research: a systematic review of empirical evidence

  • Abimbola A. Ayorinde 1 ,
  • Iestyn Williams 2 ,
  • Russell Mannion 2 ,
  • Fujian Song 3 ,
  • Magdalena Skrybant 4 ,
  • Richard J. Lilford 4 &
  • Yen-Fu Chen   ORCID: orcid.org/0000-0002-9446-2761 1  

BMC Medical Research Methodology volume  20 , Article number:  137 ( 2020 ) Cite this article

6901 Accesses

12 Citations

16 Altmetric

Metrics details

Publication and related biases (including publication bias, time-lag bias, outcome reporting bias and p-hacking) have been well documented in clinical research, but relatively little is known about their presence and extent in health services research (HSR). This paper aims to systematically review evidence concerning publication and related bias in quantitative HSR.

Databases including MEDLINE, EMBASE, HMIC, CINAHL, Web of Science, Health Systems Evidence, Cochrane EPOC Review Group and several websites were searched to July 2018. Information was obtained from: (1) Methodological studies that set out to investigate publication and related biases in HSR; (2) Systematic reviews of HSR topics which examined such biases as part of the review process. Relevant information was extracted from included studies by one reviewer and checked by another. Studies were appraised according to commonly accepted scientific principles due to lack of suitable checklists. Data were synthesised narratively.

After screening 6155 citations, four methodological studies investigating publication bias in HSR and 184 systematic reviews of HSR topics (including three comparing published with unpublished evidence) were examined. Evidence suggestive of publication bias was reported in some of the methodological studies, but evidence presented was very weak, limited in both quality and scope. Reliable data on outcome reporting bias and p-hacking were scant. HSR systematic reviews in which published literature was compared with unpublished evidence found significant differences in the estimated intervention effects or association in some but not all cases.

Conclusions

Methodological research on publication and related biases in HSR is sparse. Evidence from available literature suggests that such biases may exist in HSR but their scale and impact are difficult to estimate for various reasons discussed in this paper.

Systematic review registration

PROSPERO 2016 CRD42016052333.

Peer Review reports

Publication bias occurs when the publication, non-publication or late publication of research findings is influenced by the direction or strength of the results, and consequently the findings that are published or published early may differ systematically from those that remain unpublished or for which publication is delayed [ 1 , 2 ]. Other related biases, however, may occur between the generation of research evidence and its eventual publication. These include: p-hacking, which involves repeated analyses using different methods or subsets of data until statistically significant results are obtained [ 3 ]; and outcome reporting bias, whereby among those examined, only favourable outcomes are reported [ 4 ]. For brevity, we use the term “publication and related bias” in this paper to encompass these various types of biases (Fig.  1 ).

figure 1

Publication related biases and other biases at various stages of research

Publication bias is a major concern in health care as biased evidence available to decision makers may lead to suboptimal decisions that a) negatively impact on the care and the health of patients and b) lead to an inefficient and inequitable allocation of scarce resources. This problem has been documented extensively in the clinical research literature [ 2 , 4 , 5 ], and several high-profile cases of non-publication of studies showing unfavourable results have led to the introduction of mandatory prospective registration of clinical trials [ 6 ]. By comparison, publication bias appears to have received scant attention in health services research (HSR). A recent methodological study of Cochrane reviews of HSR topics found that less than one in 10 of the reviews explicitly assessed publication bias [ 7 ].

However, it is unlikely that HSR is immune from publication and related biases, and these problems may be anticipated on theoretical grounds. In contrast with clinical research, where mandatory registration of all studies involving human subjects has long been advocated through the declaration of Helsinki [ 8 ] and publication of results of commercial trials are increasingly enforced by regulatory bodies, the registration and regulation of HSR studies are much more variable. In addition, studies in HSR often examine a large number of factors (independent variables, mediating variables, contextual variables and outcome variables) along a long service delivery causal chain [ 9 ]. The scope for ‘data dredging’ associated with use of multiple subsets of data and analytical techniques is substantial [ 10 ]. Furthermore, there is a grey area between research and non-research, particularly in the evaluation of quality improvement projects [ 11 ], which are usually initiated under a service imperative rather than to produce generalizable knowledge. In these settings there are fewer checks against the motivation that may arise post hoc to selectively publish “newsworthy” findings from evaluations showing promising results.

The first step towards improving our understanding of publication and related biases in HSR, which is the main aim of this review, is to systematically examine the existing literature. We anticipated that we might find two broad types of literature: (1) methodological research that set out with the prime purpose of investigating publication and related bias in HSR; (2) systematic reviews of substantive HSR topics but in which the authors had investigated the possibility of publication and related biases as part of the methodology used to explore the validity of their findings.

We adopted the definition of HSR used by the United Kingdom’s National Institute for Health Research Health Services & Delivery Research (NIHR HS & DR) Programme: “research to produce evidence on the quality, accessibility and organisation of health services”, including evaluation of how healthcare organizations might improve the delivery of services. The definition is deliberately broad in recognition of the many associated disciplines and methodologies, and is compatible with other definitions of HSR such as those offered by the Agency for Healthcare Research and Quality (AHRQ). We were aware that publication bias may arise in qualitative research [ 12 ], but as the mechanisms and manifestations are likely to be very different, we focused on publication bias related to quantitative research in this review. The protocol for this systematic review was pre-registered in the PROSPERO International prospective register of systematic reviews (2016:CRD42016052333). We followed the PRISMA statement [ 13 ] for undertaking and reporting this review where applicable (see Additional file 1 for the PRISMA checklist).

Inclusion criteria

Included studies needed to be concerned with HSR related topics based on the NIHR HS & DR Programme’s definition described above. The types of study included were either:

(1) methodological studies that set out to investigate data dredging/p-hacking, outcome reporting bias or publication bias by one or more of: a) tracking a cohort of studies from inception or from a pre-publication stage such as conference presentation to publication (or not); b) surveying researchers about their experiences related to research publication; c) investigating statistical techniques to prevent, detect or mitigate the above biases;

(2) systematic reviews of substantive HSR topics that provided empirical evidence concerning publication and related biases. Such evidence could take various forms such as comparing findings in published vs. grey literature; statistical analyses (e.g. funnel plots and Egger’s test); and assessment of selective outcome reporting within individual studies included in the reviews.

Exclusion criteria

Articles were excluded if they assessed publication and related biases in subject areas other than HSR (e.g. basic sciences; clinical and public health research) or publication bias purely in relation to qualitative research. Biases in the dissemination of evidence following research publication, such as citation bias and media attention bias, were not included since they can be alleviated by systematic search [ 2 ]. Studies of bias relating to study design (such as recall bias) were also excluded. No language restriction was applied.

Search strategy

We used a judicious combination of information sources and searching methods to ensure that our coverage of the relevant HSR literature was as comprehensive as possible. MEDLINE (1946 to 16 March 2017), EMBASE (1947 to 16 March 2017), Health Management Information Consortium (HMIC, 1979 to January 2017), CINAHL (1981 to 17 March 2017), and Web of Science (all years) were searched using indexed terms and text words related to HSR [ 14 ], combined with search terms relating to publication bias. In April 2017 we searched HSR-specific databases including Health Systems Evidence (HSE) and the Cochrane Effective Practice and Organisation of Care (EPOC) Review Group using publication bias related terms. The search strategy for MEDLINE is provided in Appendix 1 (see Additional file  2 ).

For the included studies, we used forward and backward citation searches (using Google Scholar/PubMed and manual check of reference lists) to identify additional studies that had not been captured in the electronic database searches. We searched the webpages of major organizations related to HSR, including the Institute for Healthcare Improvement (USA), The AHRQ (USA), and the Research and Development (RAND) Corporation (USA), Health Foundation (UK), King’s Fund (UK) (last searched on 20th September 2017). We also searched the UK NIHR HSDR Programme website and the US HSRProj (Health Services Research Projects in Progress) database for previously commissioned and ongoing studies (last searched on 20th February 2018). All the searches were updated between 30th July and 2nd August 2018 in order to identify any new relevant methodological studies. Members of the project steering and management committees were consulted to identify any additional studies.

Citations retrieved were imported and de-duplicated in the EndNote software, and were screened for relevance based on titles and abstracts. Full-text publications were retrieved for potentially relevant records and articles were included/excluded based on the selection criteria described above. The screening and study selection were carried out by two reviewers independently, with any disagreement resolved by discussion with the wider research team.

Data extraction

Methodological studies.

For the included methodological studies set out to examine publication and related biases, a data extraction form was designed to collect the following information: citation details; methods of selecting study sample; characteristics of study sample; methods of investigating publication and related biases; key findings; limitations; and conclusions. Data extraction was conducted by one reviewer and checked by another reviewer.

Systematic reviews of substantive topics of HSDR

For systematic reviews that directly compared published literature with grey literature/unpublished studies, the following data were collected by one reviewer and checked by another: the topic being examined; methods used to identify grey literature and unpublished studies; findings of comparisons between published and grey/unpublished literature; limitations and conclusions. A separate data extraction form was used to collect data from the remaining HSR systematic reviews. Information concerning techniques used to investigate publication bias and outcome reporting bias was extracted along with findings of these investigations. Due to the large number of identified HSR systematic reviews falling into this category, the data extraction was carried out only by a single reviewer.

Risk of bias assessment

No single risk of bias assessment tool could capture the dimensions of quality for the types of methodological studies included [ 2 ]. We therefore critically appraised individual methodological studies and systematic reviews directly comparing published vs unpublished evidence on the basis of adherence to commonly accepted scientific principles, including: representativeness of published/unpublished HSR studies being examined or health services researchers being surveyed; rigour in data collection and analysis; and whether attention was paid to factors that could confound the association between study findings and publication status. Each study was read by at least two reviewers and any methodological issues identified are presented as commentary alongside study findings in the results section. No quality assessment was carried out for the remaining HSR systematic reviews, as we were only interested in their findings in relation to publication and related biases rather than the effects or associations examined in these reviews per se. We anticipated that it would not be feasible to use quantitative methods (such as funnel plots) for evaluating potential publication bias across studies due to heterogeneous methods and measures adopted to assess publication bias in the methodological studies included in this review.

Data synthesis and presentation

As included studies used diverse approaches and measures to investigate publication and related biases, meta-analyses could not be performed. Findings were therefore presented narratively [ 15 ].

Literature search and selection

The initial searches of the electronic databases yielded 6155 references, which were screened on the basis of titles/abstracts. The full-text for 422 of them and six additional articles identified from other sources were then retrieved and assessed (Fig.  2 ). Two hundred and forty articles did not meet the inclusion criteria primarily because no empirical evidence on publication and related biases was reported or the subject areas lay outside the domain of HSR as described above. An updated search yielded 1328 new records but no relevant methodological studies were identified.

figure 2

Flow diagram showing study selection process

We found four methodological studies that set out with the primary purpose of investigating publication and related biases in HSR [ 16 , 17 , 18 , 19 ]. We identified 184 systematic reviews of HSR topics where the authors of reviews looked for evidence of publication and related biases. Three of these 184 systematic reviews provided direct evidence on publication bias by comparing findings of published articles with those of grey literature and unpublished studies [ 20 , 21 , 22 ]. The remaining 181 review provided only indirect evidence on publication and related biases (Fig. 2 ).

Methodological studies setting out to investigate publication and related biases

The characteristics of the four included methodological studies are presented in Table  1 . Three studies [ 16 , 17 , 19 ] explored the presence or absence of publication bias in health informatics research. The remaining study [ 18 ] focused on p-hacking or reporting bias that may arise when authors of research papers compete by reporting ‘more extreme and spectacular results’ in order to optimize chances of journal publication. A brief summary of each of the studies is provided below.

Only one study was an inception cohort study, which tracked individual research projects from their start. Such a study provides direct evidence of publication bias [ 19 ]. This study assessed publication bias in clinical trials of electronic health records registered with ClinicalTrials.gov during 2000–8 and reported that results from 76% (47/62) of completed trials were subsequently published. Of the published studies, 74% (35/47) reported predominantly positive results, 21% (10/47) reported neutral results (no effect) and 4% (2/47) reported negative/harmful results. Data were available from investigators for seven of the 15 unpublished trials: four reported neutral results and three reported positive results. Based on these data, the authors concluded that trials with positive results are more likely to be published than those with null results, although we noticed that this finding was not statistically significant (see Table 1 ). The authors cautioned that few trials were registered in the early years of ClinicalTrials.gov and those registered may be more likely to publish their findings and thus systematically different from those not registered. They further noted that the registered data were often unreliable during that period.

The second study reported a pilot survey of academics in order to assess rates of non-publication in IT evaluation studies and reasons for any non-publication [ 16 ]. The survey asked what information systems the respondents had evaluated in the past 3 years, whether the results of the evaluation(s) were published, and if not published, the reasons behind the non-publication. The findings show that approximately 50% of the identified evaluation studies were published in peer reviewed journals, proceedings or books. Of the remaining studies, some were published in internal reports and/or local publications (such as masters’ theses and local conferences) and approximately one third were unpublished at the time of the survey. The reasons cited for non-publication included: “results not of interest for others”; “publication in preparation”; “no time for publication”; “limited scientific quality of study”; “political or legal reasons”, and “study only conducted for internal use”. The main limitation of this study is a low response rate with only 118 of 722 (18.8%) targeted participants providing valid responses.

The third methodological study used three different approaches to assess publication bias in health informatics [ 17 ]. However, for one of the approaches (statistical analyses of publication bias/small study effects) the authors were unable to find enough studies which reported findings using the same outcome measures; while the remaining two approaches adopted in this study (i.e. examining percentage of HSR evaluation studies reporting positive results and percentage of HSR reviews reaching positive conclusion) provided little information on publication bias since there is no estimate of what the “unbiased” proportion of positive findings should be for HSR evaluation studies and reviews (Table 1 ).

The fourth methodological study included in this review examined quantitative estimates of income elasticity of health care and price elasticity of prescription drugs reported in the published literature [ 18 ]. Using funnel plots and meta-regressions the authors identified a positive correlation between effect sizes and the standard errors of income/price elasticity estimates, which suggested potential publication bias [ 18 ]. In addition, they found an independent association between effect size and journal impact factor, indicating that given similar standard errors (which reflect sample sizes), studies reporting larger effect sizes (i.e. more striking findings) were more likely to be published in ‘high-impact’ journals. As other confounding factors could not be ruled out for these observed associations and no unpublished studies were examined, the evidence is suggestive rather than conclusive.

Systematic reviews of HSR topics providing evidence on publication and related bias

We identified 184 systematic reviews of HSR topics in which empirical evidence on publication and related bias was reported. Three of these reviews provided direct evidence on publication bias by comparing evidence from studies published in academic journals with those from grey literature or unpublished studies [ 20 , 21 , 22 ]. These reviews are described in detail in the next sub-section. The remaining 181 reviews only provided indirect evidence and are summarised briefly in the subsequent sub-section and in Appendix 2 (see Additional file  2 ).

HSR systematic reviews comparing published and grey/unpublished evidence

Three HSR systematic reviews made such comparisons [ 20 , 21 , 22 ]. The topics of these reviews and their findings are summarised in Table  2 . The first review evaluated the effectiveness of mass mailings for increasing the utilization of influenza vaccine [ 22 ], focusing on evidence from controlled trials. The authors found one published study reporting statistically significant intervention effects, but additionally identified five unpublished studies through a Medicare quality improvement project database. All the unpublished studies reported clinically trivial intervention effects (no effect or an increase of less than two percentage point in uptake). This case illustrated the practical implications of publication bias: the authors highlighted that further mass mailing interventions were being considered by service planners on the basis of results from the first published study when they presented the review findings.

The second review compared the grey literature [ 20 ] with published literature [ 23 ] on the effectiveness and cost-effectiveness of strategies to improve immunization coverage in developing countries, and found that the quality and nature of evidence differed between these two sources of evidence, and that the recommendations about the most cost-effective interventions would differ between the two reviews (Table 2 ).

The third review assessed nine associations between various measures of organisational culture, organisational climate and nurse’s job satisfaction [ 21 ]. The author included both published literature and doctoral dissertations in the review, and statistically significant differences in the pooled estimates between these two types of literature were found in three of the nine associations (Table 2 ).

Findings from other systematic reviews of HSR topics

Of the 181 remaining systematic reviews, 100 examined potential publication bias across studies included in the reviews using funnel plots and related techniques, and 108 attempted to assess outcome reporting bias within individual included studies, generally as part of the risk of bias assessment. The methods used in these reviews and key findings in relation to publication bias and outcome reporting bias are summarised in Appendix 2 (see Additional file  2 ). Fifty-one of the 100 reviews which attempted to assess publication bias showed some evidence of its existence (through the assumption that observed small study effects were caused by publication bias).

For the assessment of outcome reporting bias, reviewers frequently reported difficulties in judging outcome reporting bias due to the absence of a published protocol for the included studies. For instance, a Cochrane review of the effectiveness of interventions to enhance medication adherence included 182 RCTs and judged eight and 32 RCTs to be of high and low risk for outcome reporting bias respectively, but the remaining 142 RCTs were judged to be of unclear risk, primarily due to unavailability of protocols [ 24 ]. In the absence of a protocol, some reviewers assessed outcome reporting bias by comparing outcomes specified in the methods to those presented in the results section, or made subjective judgements on the extent to which all important outcomes were reported. However, the validity of such approaches remains unclear. All but one of the reviews that assessed outcome reporting bias used either the Cochrane risk of bias tool (the checklist developed by the Cochrane Collaboration for assessing internal validity of individual RCTs) or bespoke tools derived from this. The remaining review - of the effectiveness of interventions for hypertension care in the community - undertook a sensitivity analysis to explore the influence of studies that otherwise met the inclusion criteria except for not providing sufficient data on relevant outcomes [ 25 ]. This was achieved by imputing zero effects (with average standard deviations) for the studies with missing outcomes (40 to 49% of potentially eligible studies), including them in the meta-analysis and recalculating the pooled effect. They found that the pooled effect was considerably reduced although still statistically significant [ 25 ]. These reviews illustrate the challenges of assessing outcome reporting bias in HSR and in identifying its potential consequences.

Delay in publication arising from the direction or strength of the study findings, referred to as time lag bias, was assessed in one of the reviews which evaluated the effectiveness of interventions for increasing the uptake of mammography in low and middle income countries [ 26 ]. The authors classified the time lag from end of intervention to the publication date into ≤4 years and > 4 years and reported that studies published within 4 years showed stronger association between intervention and mammography uptake (risk differences: 0.10, 95% CI 0.08, 0.12) when compared to studies published more than 4 years after completion (0.08, 95% CI 0.04, 0.11). However, the difference between the two subgroups was very small and not statistically significant (F ratio = 2.94, p  = 0.10), and it was not clear whether this analysis and the cut-off time lag for defining the subgroups were specified a priori.

This systematic review examined current empirical evidence on publication and related biases in HSR. Very few methodological studies that directly investigated these issues were found. Nonetheless, a small number of available studies focusing on publication bias suggested its existence: findings of studies were not always reported/published; those published were often with positive results, and were sometimes of different nature, which could impact upon their applicability and relevance for different users of the evidence. There was also evidence suggesting that studies reporting larger effect sizes were more likely to be published in high impact journals. However, there are methodological weaknesses behind these pieces of evidence, which does not allow a firm conclusion to be drawn.

Reasons for non-publication of HSR findings described in the only survey we found appear to be similar to those of clinical research [ 27 ]. Lack of time and interest from the part of the researcher appears to be a major factor, which could exacerbate when the study findings are uninteresting. Also of note are comments such as “not of interest for others” and “only meant for internal use”. These not only illustrate context-sensitive nature of evidence for HSR, but also highlight issues arising from the hazy boundary between research and non-research for many evaluations undertaken in healthcare organizations, such as quality improvement projects and service audits. As promising findings are likely to motivate publication of these quality improvement projects, caution is required in interpreting and particularly in generalizing their findings. Another reason given for non-publication in HSR is “political and legal reasons”. Publication bias and restriction of access to data arising from conflict of interest is well documented in clinical research [ 2 ] and one might expect similar issues in HSR. We did not identify methodological research specifically related to the impact of conflict of interest on publication of findings in HSR, although anecdotal evidence of financial arrangement influencing editorial process exists [ 28 ], and there are debates concerning public’s accessibility of information related to health services and policy [ 29 ].

It is currently difficult to gauge the true scale and impact of publication and related biases given the sparse high quality evidence. Among the four methodological studies identified in this review, only one was an inception cohort study that provided direct evidence. This paucity of evidence is in stark contrast with a methodological review assessing publication bias and outcome reporting bias in clinical research, in which 20 inception cohort studies of RCTs were found [ 4 ]. The difference between these two fields is likely to be in part attributable to the less frequent use of RCTs in HSR and lack of requirement for study registration. The lesser reliance on RCTs and lack of study registration present a major methodological challenge in studying publication bias in HSR as there is no reliable way to identify studies that have been conducted but not subsequently published.

The lack of prospective study registration poses further challenges in assessing outcome reporting bias, which could be a greater concern for HSR than clinical research given the more exploratory approaches to examining a larger number of variables and associations in HSR. Empirical evidence on selective outcome reporting has primarily been obtained from RCTs as study protocols are made available in the trial registration process [ 4 ]. Calls for prospective registration of study protocols of observational studies have been made [ 30 ] and repositories of quality improvement projects are emerging [ 31 ]. HSR and quality improvement communities will need to consider and evaluate the feasibility and values of adopting these practices.

Statistical techniques such as funnel plots and regression methods are commonly used in HSR systematic reviews to identify potential publication bias, as in clinical research. Assumptions (e.g. any observed small study effects are caused by publication bias) and conditions (e.g. at least 10 studies measuring the same effect) related to the appropriate use of these techniques hold true for HSR, but heterogeneity commonly found among HSR studies resulting from the inherent complexity and variability of service delivery interventions and their interaction with contextual factors [ 32 , 33 ] may further influence the validity of funnel plots and related methods [ 34 ], and findings from these methods should be treated with caution [ 35 ].

In addition to the conventional methods discussed above, new methods such as p-curves for detecting p-hacking have emerged in recent years [ 36 , 37 ]. P-curves have been tested in various scientific disciplines [ 3 , 38 , 39 ], although no studies that we examined in the field of HSR have used this technique. The validity and usefulness of p-curves are subject to debate and accumulation of further empirical evidence [ 40 , 41 , 42 , 43 ].

Given the limitations of statistical methods, search of grey literature and contacting stakeholders to unearth unpublished studies remain an important means of mitigating publication bias, although this is often resource intensive and does not completely eliminate the risk. The finding from Batt et al. (2004) described above highlighted that published and grey literature could differ in their geographical coverage and nature of evidence [ 20 ]. This has important implications given the context-sensitive nature of HSR.

The limited evidence that we found does not allow us to estimate precisely the scale and impact of publication and related biases in HSR. It may be argued that publication bias may not be as prevalent in HSR as in clinical research because of the complexity of health systems which makes it often necessary to investigate the associations between a large number of variables along the service delivery causal pathway. As a result, HSR studies may be less likely to have completely null results or to depend for their contribution on single outcomes. Conversely, this heterogeneity and complexity may increase the scope for p-hacking and outcome reporting bias in HSR, which are even more difficult to prevent and detect.

A major challenge for this review was to delineate a boundary between HSR and other health/medical research. We used a broad range of search terms and identified a large number of studies, many of which were subsequently excluded after screening. We have used the definition of HSR provided by the UK NIHR and therefore our review may not have covered some areas of HSR if defined more broadly. We combined publication bias related terms with HSR related terms in our searches. As a result, we might not have captured some HSR related studies which have investigated publication and related bias but which did not mention them in their titles, abstracts or indexed terms. This is most likely to occur for systematic reviews of substantive HSR topics, in which funnel plot and related methods might have been deployed as a routine procedure to examine potential publication bias. Nevertheless, it is well known that statistical techniques such as funnel plot and related tests have low statistical power, and publication bias is just one of the many potential reasons behind ‘small study effects’ which these methods actually detect [ 34 ]. Findings from these systematic reviews are therefore of limited value in terms of confirming or refuting the existence of publication bias. Despite the limitation related to the search strategy, we identified and briefly examined more than 180 systematic reviews as shown in Appendix 2 in the supplementary file , but except for the small number of systematic reviews highlighted in the Results section, very little conclusion in relation to publication bias could be drawn from these reviews.

A further limitation of this study is that we have focused on publication and related biases related to quantitative studies and have not covered qualitative research, which plays an important role in HSR. It is also worth noting that three of the four included studies relate to the specific sub-field of health informatics which places limits on the extent to which our conclusions can be generalised to other subfields of HSR. Lastly, although we attempted to search several databases as well as grey literature, the possibility that evidence included in this review is subject to publication and related bias cannot be ruled out.

There is a paucity of empirical evidence and methodological literature addressing the issue of publication and related biases in HSR. While the available evidence suggests the presence of publication bias in this field, its magnitude and impact is yet to be fully explored and understood. Further research evaluating the existence of publication and related biases in HSR, what factors contribute towards their occurrence, their impact and the range of potential strategies to mitigate them, is therefore warranted.

Availability of data and materials

All data generated and/or analysed during this review are included within this article and its additional files. This systematic review was part of a large project investigating publication and related bias in HSR. The full technical report for the project will be published in the UK National Institute for Health Research (NIHR) Journals Library: https://www.journalslibrary.nihr.ac.uk/programmes/hsdr/157106/#/

Abbreviations

Agency for Healthcare Research and Quality

Effective Practice and Organisation of Care

Health Systems Evidence

Health Services Research

National Institute for Health Research Health Services & Delivery Research Programme

Randomised controlled trials

Hopewell S, Clarke M, Stewart L, Tierney J. Time to publication for results of clinical trials. Cochrane Database Syst Rev. 2007;2:MR000011.

Song F, Parekh S, Hooper L, Loke YK, Ryder J, Sutton AJ, Hing C, Kwok CS, Pang C, Harvey I. Dissemination and publication of research findings: an updated review of related biases. Health Technol Assess. 2010;14(8):1–193.

Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS Biol. 2015;13(3):e1002106.

Article   Google Scholar  

Dwan K, Gamble C, Williamson PR, Kirkham JJ. Systematic review of the empirical evidence of study publication bias and outcome reporting bias - an updated review. PLoS One. 2013;8(7):e66844.

Kicinski M, Springate DA, Kontopantelis E. Publication bias in meta-analyses from the Cochrane database of systematic reviews. Stat Med. 2015;34(20):2781–93.

Gulmezoglu AM, Pang T, Horton R, Dickersin K. WHO facilitates international collaboration in setting standards for clinical trial registration. Lancet. 2005;365(9474):1829–31.

Li X, Zheng Y, Chen T-L, Yang K-H, Zhang Z-J. The reporting characteristics and methodological quality of Cochrane reviews about health policy research. Health Policy. 2015;119(4):503–10.

Article   CAS   Google Scholar  

The World Medical Association. WMA declaration of Helsinki - ethical principles for medical research involving human subjects. In: Current policies. The World Medical Association; 2013.  https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/ . Accessed 26 Apr 2020.

Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010;341:c4413.

Gelman A, Loken E. The garden of forking paths: why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time (2013). http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf . Accessed 25 July 2018.

Google Scholar  

Smith R. Quality improvement reports: a new kind of article. They should allow authors to describe improvement projects so others can learn. BMJ. 2000;321(7274):1428.

Toews I, Glenton C, Lewin S, Berg RC, Noyes J, Booth A, Marusic A, Malicki M, Munthe-Kaas HM, Meerpohl JJ. Extent, awareness and perception of dissemination bias in qualitative research: an explorative survey. PLoS One. 2016;11(8):e0159290.

Liberati A, Altman DG, Tetzlaff J. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;339:b2700.

Wilczynski NL, Haynes RB, Lavis JN, Ramkissoonsingh R, Arnold-Oatley AE, The HSRHT. Optimal search strategies for detecting health services research studies in MEDLINE. CMAJ. 2004;171(10):1179–85.

Mays N, Pope C, Popay J. Systematically reviewing qualitative and quantitative evidence to inform management and policy-making in the health field. J Health Serv Res Policy. 2005;10(Suppl 1):6–20.

Ammenwerth E, de Keizer N. A viewpoint on evidence-based health informatics, based on a pilot survey on evaluation studies in health care informatics. JAMIA. 2007;14(3):368–71.

PubMed   Google Scholar  

Machan C, Ammenwerth E, Bodner T. Publication bias in medical informatics evaluation research: is it an issue or not? Stud Health Technol Inform. 2006;124:957–62.

Costa-Font J, McGuire A, Stanley T. Publication selection in health policy research: the winner's curse hypothesis. Health Policy. 2013;109(1):78–87.

Vawdrey DK, Hripcsak G. Publication bias in clinical trials of electronic health records. J Biomed Inform. 2013;46(1):139–41.

Batt K, Fox-Rushby JA, Castillo-Riquelme M. The costs, effects and cost-effectiveness of strategies to increase coverage of routine immunizations in low- and middle-income countries: systematic review of the grey literature. Bull World Health Organ. 2004;82(9):689–96.

PubMed   PubMed Central   Google Scholar  

Fang Y. A meta-analysis of relationships between organizational culture, organizational climate, and nurse work outcomes (PhD thesis). Baltimore: University of Maryland; 2007.

Maglione MA, Stone EG, Shekelle PG. Mass mailings have little effect on utilization of influenza vaccine among Medicare beneficiaries. Am J Prev Med. 2002;23(1):43–6.

Pegurri E, Fox-Rushby JA, Damian W. The effects and costs of expanding the coverage of immunisation services in developing countries: a systematic literature review. Vaccine. 2005;23(13):1624–35.

Nieuwlaat R, Wilczynski N, Navarro T, Hobson N, Jeffery R, Keepanasseril A, Agoritsas T, Mistry N, Iorio A, Jack S, et al. Interventions for enhancing medication adherence. Cochrane Database Syst Rev. 2014;11:CD000011.

Lu Z, Cao S, Chai Y, Liang Y, Bachmann M, Suhrcke M, Song F. Effectiveness of interventions for hypertension care in the community--a meta-analysis of controlled studies in China. BMC Health Serv Res. 2012;12:216.

Gardner MP, Adams A, Jeffreys M. Interventions to increase the uptake of mammography amongst low income women: a systematic review and meta-analysis. PLoS One. 2013;8(2):e55574.

Song F, Loke Y, Hooper L. Why are medical and health-related studies not being published? A systematic review of reasons given by investigators. PLoS One. 2014;9(10):e110418.

Homedes N, Ugalde A. Are private interests clouding the peer-review process of the WHO bulletin? A case study. Account Res. 2016;23(5):309–17.

Dyer C. Information commissioner condemns health secretary for failing to publish risk register. BMJ. 2012;344:e3480.

Swaen GMH, Urlings MJE, Zeegers MP. Outcome reporting bias in observational epidemiology studies on phthalates. Ann Epidemiol. 2016;26(8):597–599.e594.

Bytautas JP, Gheihman G, Dobrow MJ. A scoping review of online repositories of quality improvement projects, interventions and initiatives in healthcare. BMJ Qual Safety. 2017;26(4):296–303.

Long KM, McDermott F, Meadows GN. Being pragmatic about healthcare complexity: our experiences applying complexity theory and pragmatism to health services research. BMC Med. 2018;16(1):94.

Greenhalgh T, Papoutsi C. Studying complexity in health services research: desperately seeking an overdue paradigm shift. BMC Med. 2018;16(1):95.

Sterne JAC, Sutton AJ, Ioannidis JPA, Terrin N, Jones DR, Lau J, Carpenter J, Rücker G, Harbord RM, Schmid CH, et al. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ. 2011;343:d4002.

Lau J, Ioannidis JPA, Terrin N, Schmid CH, Olkin I. The case of the misleading funnel plot. BMJ. 2006;333(7568):597–600.

Simonsohn U, Nelson LD, Simmons JP. P-curve: a key to the file-drawer. J Exp Psychol Gen. 2014;143(2):534–47.

Simonsohn U, Nelson LD, Simmons JP. P-curve and effect size: correcting for publication Bias using only significant results. Perspect Psychol Sci. 2014;9(6):666–81.

Carbine KA, Larson MJ. Quantifying the presence of evidential value and selective reporting in food-related inhibitory control training: a p-curve analysis. Health Psychol Rev. 2019;13(3):318–43.

Carbine KA, Lindsey HM, Rodeback RE, Larson MJ. Quantifying evidential value and selective reporting in recent and 10-year past psychophysiological literature: a pre-registered P-curve analysis. Int J Psychophysiol. 2019;142:33–49.

Bishop DV, Thompson PA. Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ. 2016;4:e1715.

Bruns SB, Ioannidis JPA. P-curve and p-hacking in observational research. PLoS One. 2016;11(2):e0149144.

Simonsohn U, Simmons JP, Nelson LD. Better P-curves: making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a reply to Ulrich and Miller (2015). J Exp Psychol Gen. 2015;144(6):1146–52.

Ulrich R, Miller J. Some properties of p-curves, with an application to gradual publication bias. Psychol Methods. 2018;23(3):546–60.

Download references

Acknowledgements

We are grateful for the advice and guidance provided by members of the Study Steering Committee for the project.

This project is funded by the UK NIHR Health Services and Delivery Research Programme (project number 15/71/06). The authors are required to notify the funder prior to the publication of study findings, but the funder does not otherwise have any roles in the preparation of the manuscript and the decision to submit and publish it. MS and RJL are also supported by the NIHR Applied Research Collaboration (ARC) West Midlands. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the HS&DR Programme, NIHR, National Health Services or the Department of Health.

Author information

Authors and affiliations.

Warwick Centre for Applied Health Research & Delivery, Division of Health Sciences, Warwick Medical School, University of Warwick, Coventry, UK

Abimbola A. Ayorinde & Yen-Fu Chen

Health Services Management Centre, School of Social Policy, University of Birmingham, Birmingham, UK

Iestyn Williams & Russell Mannion

Norwich Medical School, University of East Anglia, Norwich, UK

Fujian Song

Institute of Applied Health Research, University of Birmingham, Birmingham, UK

Magdalena Skrybant & Richard J. Lilford

You can also search for this author in PubMed   Google Scholar

Contributions

YFC and RJL conceptualised the study. AAA and YFC contributed to all stages of the review and drafted the paper. IW, RM, FS, MS, RJL were involved in planning the study, advised on the conduct of the review and interpretation of the findings. All authors reviewed and helped revising drafts of this paper and approved its submission.

Corresponding author

Correspondence to Yen-Fu Chen .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1..

PRISMA checklist.

Additional file 2.

Appendices.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ayorinde, A.A., Williams, I., Mannion, R. et al. Publication and related biases in health services research: a systematic review of empirical evidence. BMC Med Res Methodol 20 , 137 (2020). https://doi.org/10.1186/s12874-020-01010-1

Download citation

Received : 28 January 2019

Accepted : 07 May 2020

Published : 01 June 2020

DOI : https://doi.org/10.1186/s12874-020-01010-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publication bias
  • Outcome reporting bias
  • Dissemination bias
  • Grey literature
  • Research publication
  • Research registration
  • Health services research
  • Systematic review
  • Research methodology
  • Funnel plots

BMC Medical Research Methodology

ISSN: 1471-2288

research articles with bias

Bias in research

Affiliation.

  • 1 University Department of Chemistry, University Hospital Center "Sestre Milosrdnice", Zagreb, Croatia. [email protected]
  • PMID: 23457761
  • PMCID: PMC3900086
  • DOI: 10.11613/bm.2013.003

By writing scientific articles we communicate science among colleagues and peers. By doing this, it is our responsibility to adhere to some basic principles like transparency and accuracy. Authors, journal editors and reviewers need to be concerned about the quality of the work submitted for publication and ensure that only studies which have been designed, conducted and reported in a transparent way, honestly and without any deviation from the truth get to be published. Any such trend or deviation from the truth in data collection, analysis, interpretation and publication is called bias. Bias in research can occur either intentionally or unintentionally. Bias causes false conclusions and is potentially misleading. Therefore, it is immoral and unethical to conduct biased research. Every scientist should thus be aware of all potential sources of bias and undertake all possible actions to reduce or minimize the deviation from the truth. This article describes some basic issues related to bias in research.

  • Biomedical Research / standards*
  • Publication Bias*
  • Research Design / standards*

Scientists Develop New Theory to Understand Why Our Perception is Biased

Researchers examined decades of data to create a unifying theory to explain biases in perception.

The University of Texas at Austin

How humans perceive the world around them is a complex dance between input from the various senses, how the brain encodes that information and how it all interacts with previous experiences. What we perceive often is systematically different from reality, leading to what is known as perceptual bias. 

Neuroscientists’ work to understand perceptual bias involves exploring how the brain processes information about different stimuli, including color, movement, size and the number and orientation of objects. Based on decades of experimental data, several insights have emerged about common perception biases, but they sometimes contradict each other. 

Researchers at The University of Texas at Austin and Saarland University in Germany have created a new central theory of perceptual biases that combines decades of data and unifies even contradictory phenomena into a model that can go so far as to predict the biases of individuals. 

The theory is outlined in a paper out this week in Nature Neuroscience .

“Perception is not simply understanding the environment around us as it is, but about how our brains reconstruct the environment around us,” said Xue-Xin Wei, assistant professor of neuroscience and psychology at UT Austin. “This theory allows us to understand how humans see the world and predict how they will see it, as well as how they may behave.”

One example of perceptual bias is that a slightly tilted bar is often perceived as more tilted than it is. People perceive objects at a distance as being smaller than they are. People may perceive colors differently based on the color of objects nearby or colors they were shown previously. 

Bias in perception traces its roots to many sources: previous experiences, irrelevant sensory information (often called sensory noise), how frequently something is observed in the environment and even how our brain penalizes errors in our estimations. The emerging theory accounts for all of these.

“This work has implications not only for basic scientific understanding of perception, but also for mental disorders, as people with certain types of psychiatric conditions have been reported to exhibit different perceptual biases,” Wei said. 

The work is also relevant to social science, in particular neuroeconomics, a field that studies how we make economic decisions. For example, the theory could be applied to better understand how humans perceive the value of an item, for example, when it comes to product design. Understanding how humans perceive the item itself, the biases that may be present and how the brain calculates the value of the item and the behaviors it leads to could influence the price of goods and how products are marketed. 

“We were mainly dealing with the perception of simple stimuli such as color and magnitude in this study, but the same principles can be applied to more complex variables.” Wei said. “For example, we can use a similar approach to study how we perceive emotions, such as happiness or sadness.”

Given the complexities of biases in perceptual decisions, some may wonder: Is there a path forward to reduce or get rid of these biases? 

“According to the theory, the best way to reduce the biases in perceptual decisions is to reduce the noise, or in other words, to gather more data before the decision is made. More information, less biases,” Wei said. 

Michael Hahn of Saarland University in Germany was also an author of the paper. Funding for the research was provided by UT Austin, and computing for the project was provided by Saarland University. 

  • Share on Facebook
  • Share on LinkedIn
  • Show and hide the URL copier tool.
  • Share via email
  • Neuroscience

UT Austin’s Top Research Stories of 2023

December 14, 2023 • by Staff Writer

Announcements

College Announces Newest Inductees to Hall of Honor

August 24, 2023 • by Emily Engelbart

Dopamine Controls Movement, Not Just Rewards

August 3, 2023 • by Staff Writer

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

People Probably Like You More Than You Think

  • Erica Boothby,
  • Gus Cooney,
  • Adam Mastroianni,
  • Andrew Reece,
  • Gillian Sandstrom

research articles with bias

A research-backed argument for why we underestimate the impression we make on others.

Do people understand the impressions they make on others or do their anxieties lead them to assume the worst? Across nearly 10 years of research and tens of thousands of observations, the authors have come to this answer: people underestimate how much others like them, and this bias has important implications for how people work together.

Initial conversations can have an outsized impact on how relationships develop over time. Naturally, people often dwell on the impressions they might have made the minute they finish speaking with someone for the first time: “Did they like me or were they just being polite?” “Was my pitch funny or offensive?” “Are they deep in thought or deeply bored?”

  • Erica Boothby is a postdoctoral researcher at the Wharton School at the University of Pennsylvania, where she teaches negotiations. Her research broadly focuses on social connection and the psychological barriers that inhibit connection, with consequences for people’s personal and professional lives. Prior to arriving at the Wharton School, Erica completed her PhD at Yale University and worked at Cornell University’s Behavioral Economics and Decision Research Center.
  • GC Gus Cooney is a social psychologist who studies conversation and social interaction. He teaches Negotiations at the Wharton School.
  • AM Adam Mastroianni is a social psychologist and research scholar at Northwestern University. He teaches negotiations to MBAs and executives and writes the popular science newsletter  Experimental History .
  • AR Andrew Reece is a behavioral data scientist at BetterUp.
  • Gillian Sandstrom is a senior lecturer in psychology at the University of Sussex. Her research focuses on how to make difficult conversations a little easier (e.g., talking about cancer, miscarriage, bereavement) and how to encourage people to talk to strangers.

Partner Center

Identifying Bias

  • What is Bias?
  • Domain names
  • Politics and the Media
  • Research Help

Ask a Librarian

About this guide.

Identifying bias can be tricky because it is not clearly stated. Bias can exist on a spectrum of political ideology, religious views, financial influence, misinformation, and more. All sources should be evaluated for potential bias -- from a tweeted link to a scholarly article. This guide shows different types of bias you might encounter and gives strategies for how to identify biased sources.

Defining Bias

Find the source, find the source.

Find the source of the information you're evaluating. Ask yourself the following questions:

  • Who owns/produces the source?
  • Who advertises in the source? Are the advertisements appropriate for the source?
  • Is there a political slant in the content?
  • Does the content contain all the facts or at least present both sides of an argument fairly?
  • What type of language is being used? Does the author use strong language or hyperbole?
  • Do they back up their argument with factual evidence? Can you see where they got their evidence through links or citations?

To find the answer to these questions, you need to read the text carefully and you may have to do some background/fact-checking research to help determine if the source is reliable or biased.

If you notice the following, the source may be biased:

  • Heavily opinionated or one-sided
  • Relies on unsupported or unsubstantiated claims
  • Presents highly selected facts that lean to a certain outcome
  • Pretends to present facts, but offers only opinion
  • Uses extreme or inappropriate language
  • Tries to persuade you to think a certain way with no regard for factual evidence
  • The author is unidentifiable, lacks expertise, or writes on unrelated topics
  • Is entertainment-based or a form of parody or satire
  • Tries to sell you something in disguise

Types of Bias (Click to expand each section)

Sensationalism, other keywords.

There are some keywords you should keep in mind when you're evaluating for bias:

  • Agenda , n. -- the underlying intentions or motives of a particular person or group
  • Hyperbole , n. -- exaggerated statements or claims not meant to be taken literally
  • Objective , adj. -- (of a person or their judgment) not influenced by personal feelings or opinions in considering and representing facts
  • Parody , n. -- an imitation of the style of a particular writer, artist, or genre with deliberate exaggeration for comic effect
  • Satire , n. -- The use of humor, irony, exaggeration, or ridicule to expose and criticize people's folly and vice
  • Subjective , adj. -- based on or influenced by personal feelings, tastes, or opinions

Helpful Guides

  • Identifying False & Misleading News by Renee Ettinger Last Updated Dec 22, 2023 813 views this year
  • Evaluating Sources of Information by Renee Ettinger Last Updated Jan 16, 2024 1057 views this year
  • Types of Sources by Sarah Bakken Last Updated Dec 22, 2023 397 views this year
  • Scholarly Sources by Sarah Bakken Last Updated Jan 16, 2024 309 views this year

"Definition of Bias in US English." English Oxford Living Dictionaries , OxfordUP, 2019, en.oxforddictionaries.com/definition/us/bias. Accessed 13 May 2019.

"Definition of Agenda in English." English Oxford Living Dictionaries , Oxford UP, 2019, en.oxforddictionaries.com/definition/agenda. Accessed 13 May 2019.

"Definition of Hyperbole in English." English Oxford Living Dictionaries , Oxford UP, 2019, en.oxforddictionaries.com/definition/hyperbole. Accessed 13 May 2019.

"Definition of Objective in English." English Oxford Living Dictionaries , Oxford UP, 2019, en.oxforddictionaries.com/definition/objective. Accessed 13 May 2019.

"Definition of Parody in English." English Oxford Living Dictionaries , Oxford UP, 2019, en.oxforddictionaries.com/definition/parody. Accessed 13 May 2019.

"Definition of Satire in English." English Oxford Living Dictionaries , Oxford UP, 2019, en.oxforddictionaries.com/definition/satire. Accessed 13 May 2019.

"Definition of Subjective in English." English Oxford Living Dictionaries , Oxford UP, 2019, en.oxforddictionaries.com/definition/subjective. Accessed 13 May 2019.

  • Next: Domain names >>
  • Last Updated: Jan 16, 2024 1:29 PM
  • URL: https://libguides.uwgb.edu/bias

Association of nonpharmacological interventions for cognitive function in older adults with mild cognitive impairment: a systematic review and network meta-analysis

  • Published: 06 January 2023
  • Volume 35 , pages 463–478, ( 2023 )

Cite this article

  • Xueyan Liu   ORCID: orcid.org/0000-0001-6228-2822 1 ,
  • Guangpeng Wang   ORCID: orcid.org/0000-0003-1442-0789 2 &
  • Yingjuan Cao   ORCID: orcid.org/0000-0002-3063-304X 3  

1757 Accesses

5 Citations

2 Altmetric

Explore all metrics

Understanding the effectiveness of nonpharmacological interventions to improve cognitive function in older adults with MCI and identifying the best intervention may help inform ideas for future RCT studies and clinical decision-making.

The main focus of this study was to assess the comparative effectiveness of nonpharmacological interventions on cognitive function in older adults with MCI and to rank the interventions.

RCT studies until September 2022 were searched from six databases, including PubMed, the Cochrane Library, Embase, Web of Science, PsycINFO and CINAHL. The risk of bias in eligible trials was evaluated using the Cochrane Risk of Bias tool. Both pairwise and network meta-analyses were used, and pooled effect sizes were reported using SMD and the corresponding 95% confidence intervals.

A total of 28 RCT studies were included in this study, pooling 18 categories of nonpharmacological interventions. MBE (mind–body exercise) (SMD (standard mean difference): 0.24, 95% CI: 0.08–0.41, P  = 0.004), DTE (dual-task exercise) (SMD: 0.61, 95% CI: 0.09–1.13, P  = 0.02), PE (physical exercise) (SMD: 0.58, 95% CI: 0.04–1.12, P  = 0.03) may be effective in improving cognitive function in older adults with MCI. Acupressure + CT (cognitive training) was the top-ranked intervention among all interventions. No greater benefits of MA (mindful awareness) on cognitive function were found.

Conclusions

Overall, nonpharmacological interventions significantly improved cognitive function in older adults with MCI. Acupressure + CT(cognitive training) was the most effective intervention for managing cognitive impairment. Future studies with high quality and large sample size RCT studies are needed to confirm our results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research articles with bias

Salinas-Rodríguez A, Palazuelos-González R, Rivera-Almaraz A et al (2021) Longitudinal association of sarcopenia and mild cognitive impairment among older Mexican adults. J Cachexia Sarcopenia Muscle 12:1848–1859. https://doi.org/10.1002/jcsm.12787

Article   PubMed   PubMed Central   Google Scholar  

Anderson ND (2019) State of the science on mild cognitive impairment (MCI). CNS Spectr 24:78–87. https://doi.org/10.1017/s1092852918001347

Article   PubMed   Google Scholar  

Dunne RA, Aarsland D, O’Brien JT et al (2021) Mild cognitive impairment: the Manchester consensus. Age Ageing 50:72–80. https://doi.org/10.1093/ageing/afaa228

McWhirter L, Ritchie C, Stone J et al (2020) Functional cognitive disorders: a systematic review. Lancet Psychiatry 7:191–207. https://doi.org/10.1016/s2215-0366(19)30405-5

Gómez-Ramírez J, Ávila-Villanueva M, Fernández-Blázquez M (2020) Selecting the most important self-assessed features for predicting conversion to mild cognitive impairment with random forest and permutation-based methods. Sci Rep 10:20630. https://doi.org/10.1038/s41598-020-77296-4

Article   CAS   PubMed   PubMed Central   Google Scholar  

Burks HB, des Bordes JKA, Chadha R et al (2021) Quality of life assessment in older adults with dementia: a systematic review. Dement Geriatr Cogn Disord 50:103–110. https://doi.org/10.1159/000515317

WHO Guidelines Approved by the Guidelines Review Committee (2019) Risk reduction of cognitive decline and dementia: WHO guidelines. World Health Organization, Geneva

Google Scholar  

Marotta N, Calafiore D, Curci C, et al (2022) Integrating virtual reality and exergaming in cognitive rehabilitation of patients with Parkinson disease: a systematic review of randomized controlled trials. Eur J Phys Rehabil Med. https://doi.org/10.23736/s1973-9087.22.07643-2

Gates NJ, Vernooij RW, Di Nisio M et al (2019) Computerised cognitive training for preventing dementia in people with mild cognitive impairment. Cochrane Database Syst Rev 3:Cd012279. https://doi.org/10.1002/14651858.CD012279.pub2

Law CK, Lam FM, Chung RC et al (2020) Physical exercise attenuates cognitive decline and reduces behavioural problems in people with mild cognitive impairment and dementia: a systematic review. J Physiother 66:9–18. https://doi.org/10.1016/j.jphys.2019.11.014

Biazus-Sehn LF, Schuch FB, Firth J et al (2020) Effects of physical exercise on cognitive function of older adults with mild cognitive impairment: a systematic review and meta-analysis. Arch Gerontol Geriatr 89:104048. https://doi.org/10.1016/j.archger.2020.104048

da Costa BR, Juni P (2014) Systematic reviews and meta-analyses of randomized trials: principles and pitfalls. Eur Heart J 35:3336–3345. https://doi.org/10.1093/eurheartj/ehu424

Dias S, Caldwell DM (2019) Network meta-analysis explained. Arch Dis Child Fetal Neonatal Ed 104:F8–F12. https://doi.org/10.1136/archdischild-2018-315224

Gong LL, Tao FY (2021) The effect of biopsychosocial holistic care models on the cognitive function and quality of life of elderly patients with mild cognitive impairment: a randomized trial. Ann Palliat Med 10:5600–5609. https://doi.org/10.21037/apm-21-966

Lam LC, Chan WC, Leung T et al (2015) Would older adults with mild cognitive impairment adhere to and benefit from a structured lifestyle activity intervention to enhance cognition?: a cluster randomized controlled trial. PLoS ONE 10:e0118173. https://doi.org/10.1371/journal.pone.0118173

Lam LCW, Chau RCM, Wong BML et al (2012) A 1-year randomized controlled trial comparing mind body exercise (Tai Chi) with stretching and toning exercise on cognitive function in older chinese adults at risk of cognitive decline. J Am Med Directors Assoc 13:568.e15-568.e20. https://doi.org/10.1016/j.jamda.2012.03.008

Article   Google Scholar  

Li L, Liu M, Zeng H et al (2021) Multi-component exercise training improves the physical and cognitive function of the elderly with mild cognitive impairment: a six-month randomized controlled trial. Ann Palliat Med 10:8919–8929

Liu CL, Cheng FY, Wei MJ et al (2022) Effects of exergaming-based tai chi on cognitive function and dual-task gait performance in older adults with mild cognitive impairment: a randomized control trial. Front Aging Neurosci 14:761053. https://doi.org/10.3389/fnagi.2022.761053

Sun J, Zeng H, Pan L et al (2021) Acupressure and cognitive training can improve cognitive functions of older adults with mild cognitive impairment: a randomized controlled trial. Front Psychol 12:726083. https://doi.org/10.3389/fpsyg.2021.726083

Sun Y, Li Y, Wang J et al (2021) Effectiveness of smartphone-based mindfulness training on maternal perinatal depression: randomized controlled trial. J Med Internet Res 23:e23410. https://doi.org/10.2196/23410

Wang L, Wu B, Tao H et al (2020) Effects and mediating mechanisms of a structured limbs-exercise program on general cognitive function in older adults with mild cognitive impairment: a randomized controlled trial. Int J Nurs Stud 110:103706. https://doi.org/10.1016/j.ijnurstu.2020.103706

Xu Z, Sun W, Zhang D et al (2021) Comparative effectiveness of non-pharmacological interventions for depressive symptoms in mild cognitive impairment: systematic review with network meta-analysis. Aging Ment Health. https://doi.org/10.1080/13607863.2021.1998356

Xue B, Xiao A, Luo X et al (2021) The effect of a game training intervention on cognitive functioning and depression symptoms in the elderly with mild cognitive impairment: A randomized controlled trial. Int J Methods Psychiatr Res 30:e1887. https://doi.org/10.1002/mpr.1887

Yang QH, Lyu X, Lin QR et al (2022) Effects of a multicomponent intervention to slow mild cognitive impairment progression: a randomized controlled trial. Int J Nurs Stud 125:104110. https://doi.org/10.1016/j.ijnurstu.2021.104110

Young KW, Ng P, Kwok T et al (2017) The effects of holistic health group interventions on improving the cognitive ability of persons with mild cognitive impairment: a randomized controlled trial. Clin Interv Aging 12:1543–1552. https://doi.org/10.2147/cia.S142109

Yu L, Liu F, Nie P et al (2021) Systematic review and meta-analysis of randomized controlled trials assessing the impact of Baduanjin exercise on cognition and memory in patients with mild cognitive impairment. Clin Rehabil 35:492–505. https://doi.org/10.1177/0269215520969661

Bae S, Lee S, Lee S et al (2019) The effect of a multicomponent intervention to promote community activity on cognitive function in older adults with mild cognitive impairment: a randomized controlled trial. Complement Ther Med 42:164–169. https://doi.org/10.1016/j.ctim.2018.11.011

Doi T, Verghese J, Makizako H et al (2017) Effects of cognitive leisure activity on cognition in mild cognitive impairment: results of a randomized controlled trial. J Am Med Dir Assoc 18:686–691. https://doi.org/10.1016/j.jamda.2017.02.013

Suzuki T, Shimada H, Makizako H et al (2013) A randomized controlled trial of multicomponent exercise in older adults with mild cognitive impairment. PLoS ONE 8:e61483. https://doi.org/10.1371/journal.pone.0061483

Suzuki T, Shimada H, Makizako H et al (2012) Effects of multicomponent exercise on cognitive function in older adults with amnestic mild cognitive impairment: a randomized controlled trial. BMC Neurol 12:128. https://doi.org/10.1186/1471-2377-12-128

Choi W, Lee S (2019) Virtual Kayak paddling exercise improves postural balance, muscle performance, and cognitive function in older adults with mild cognitive impairment: a randomized controlled trial. J Aging Phys Act. https://doi.org/10.1123/jap2018-0020

Hong SG, Kim JH, Jun TW (2018) Effects of 12-week resistance exercise on electroencephalogram patterns and cognitive function in the elderly with mild cognitive impairment: a randomized controlled trial. Clin J Sport Med 28:500–508. https://doi.org/10.1097/jsm.0000000000000476

Park H, Park JH, Na HR et al (2019) Combined intervention of physical activity, aerobic exercise, and cognitive exercise intervention to prevent cognitive decline for patients with mild cognitive impairment: a randomized controlled clinical study. J Clin Med 8:940. https://doi.org/10.3390/jcm8070940

Carcelén-Fraile MDC, Llera-DelaTorre AM, Aibar-Almazán A et al (2022) Cognitive stimulation as alternative treatment to improve psychological disorders in patients with mild cognitive impairment. J Clin Med 11:3947. https://doi.org/10.3390/jcm11143947

Gómez-Soria I, Brandín-de la Cruz N, Cuenca Zaldívar JN et al (2022) Effectiveness of personalized cognitive stimulation in older adults with mild possible cognitive impairment: a 12-month follow-up cognitive stimulation in mild cognitive impairment. Clin Gerontol 45:878–890. https://doi.org/10.1080/07317115.2021.1937764

de Oliveira Silva F, Ferreira JV, Plácido J et al (2019) Three months of multimodal training contributes to mobility and executive function in elderly individuals with mild cognitive impairment, but not in those with Alzheimer’s disease: a randomized controlled trial. Maturitas 126:28–33. https://doi.org/10.1016/j.maturitas.2019.04.217

Langoni CDS, Resende TL, Barcellos AB et al (2019) Effect of exercise on cognition, conditioning, muscle endurance, and balance in older adults with mild cognitive impairment: a randomized controlled trial. J Geriatr Phys Ther 42:E15–E22. https://doi.org/10.1519/jpt.0000000000000191

Li F, Harmer P, Fitzgerald K et al (2022) A cognitively enhanced online Tai Ji Quan training intervention for community-dwelling older adults with mild cognitive impairment: a feasibility trial. BMC Geriatr 22:76. https://doi.org/10.1186/s12877-021-02747-0

Masika GM, Yu DSF, Li PWC (2021) Can visual art therapy be implemented with illiterate older adults with mild cognitive impairment? A pilot mixed-method randomized controlled trial. J Geriatr Psychiatry Neurol 34:76–86. https://doi.org/10.1177/0891988720901789

Ghafoor U, Lee JH, Hong KS et al (2019) Effects of acupuncture therapy on MCI patients using functional near-infrared spectroscopy. Front Aging Neurosci 11:237. https://doi.org/10.3389/fnagi.2019.00237

Bademli K, Lok N, Canbaz M et al (2019) Effects of Physical Activity Program on cognitive function and sleep quality in elderly with mild cognitive impairment: a randomized controlled trial. Perspect Psychiatr Care 55:401–408. https://doi.org/10.1111/ppc.12324

Xu Z, Zhang D, Lee ATC et al (2020) A pilot feasibility randomized controlled trial on combining mind-body physical exercise, cognitive training, and nurse-led risk factor modification to reduce cognitive decline among older adults with mild cognitive impairment in primary care. PeerJ 8:e9845. https://doi.org/10.7717/peerj.9845

Doi T, Verghese J, Makizako H et al (2017) Effects of cognitive leisure activity on cognition in mild cognitive impairment: results of a randomized controlled trial. J Am Med Directors Assoc 18:686–691. https://doi.org/10.1016/j.jamda.2017.02.013

Lam LC, Chau RC, Wong BM et al (2011) Interim follow-up of a randomized controlled trial comparing Chinese style mind body (Tai Chi) and stretching exercises on cognitive function in subjects at risk of progressive cognitive decline. Int J Geriatr Psychiatry 26:733–740. https://doi.org/10.1002/gps.2602

Lam LC, Chau RC, Wong BM et al (2012) A 1-year randomized controlled trial comparing mind body exercise (Tai Chi) with stretching and toning exercise on cognitive function in older Chinese adults at risk of cognitive decline. J Am Med Dir Assoc 13:568.e515–520. https://doi.org/10.1016/j.jamda.2012.03.008

Seitidis G, Nikolakopoulos S, Hennessy EA et al (2022) Network meta-analysis techniques for synthesizing prevention science evidence. Prev Sci 23:415–424. https://doi.org/10.1007/s11121-021-01289-6

Article   CAS   PubMed   Google Scholar  

Zeng H, Liu M, Wang P et al (2016) The effects of acupressure training on sleep quality and cognitive function of older adults: a 1-year randomized controlled trial. Res Nurs Health 39:328–336. https://doi.org/10.1002/nur.21738

Chen IH, Yeh TP, Yeh YC et al (2019) Effects of acupressure on sleep quality and psychological distress in nursing home residents: a randomized controlled trial. J Am Med Dir Assoc 20:822–829. https://doi.org/10.1016/j.jamda.2019.01.003

Tseng YT, Chen IH, Lee PH et al (2021) Effects of auricular acupressure on depression and anxiety in older adult residents of long-term care institutions: a randomized clinical trial. Geriatr Nurs 42:205–212. https://doi.org/10.1016/j.gerinurse.2020.08.003

Zokaei N, MacKellar C, Čepukaitytė G et al (2017) Cognitive training in the elderly: bottlenecks and new avenues. J Cogn Neurosci 29:1473–1482. https://doi.org/10.1162/jocn_a_01080

McFadden KL, Healy KM, Dettmann ML et al (2011) Acupressure as a non-pharmacological intervention for traumatic brain injury (TBI). J Neurotrauma 28:21–34. https://doi.org/10.1089/neu.2010.1515

Deng X, Teng J, Nong X et al (2021) Characteristics of TCM constitution and related biomarkers for mild cognitive impairment. Neuropsychiatr Dis Treat 17:1115–1124. https://doi.org/10.2147/ndt.S290692

Smith CA, Collins CT, Levett KM et al (2020) Acupuncture or acupressure for pain management during labour. Cochrane Database Syst Rev 2:Cd009232. https://doi.org/10.1002/14651858.CD009232.pub2

McEwen BS, Sapolsky RM (1995) Stress and cognitive function. Curr Opin Neurobiol 5:205–216. https://doi.org/10.1016/0959-4388(95)80028-x

Law R, Clow A (2020) Stress, the cortisol awakening response and cognitive function. Int Rev Neurobiol 150:187–217. https://doi.org/10.1016/bs.irn.2020.01.001

Gates NJ, Rutjes AW, Di Nisio M et al (2020) Computerised cognitive training for 12 or more weeks for maintaining cognitive function in cognitively healthy people in late life. Cochrane Database Syst Rev 2:Cd012277. https://doi.org/10.1002/14651858.CD012277.pub3

Bahar-Fuchs A, Martyr A, Goh AM et al (2019) Cognitive training for people with mild to moderate dementia. Cochrane Database Syst Rev 3:cd013069. https://doi.org/10.1002/14651858.CD013069.pub2

Li BY, He NY, Qiao Y et al (2019) Computerized cognitive training for Chinese mild cognitive impairment patients: a neuropsychological and fMRI study. Neuroimage Clin 22:101691. https://doi.org/10.1016/j.nicl.2019.101691

Loetscher T (2021) Cognitive training interventions for dementia and mild cognitive impairment in Parkinson’s disease—a cochrane review summary with commentary. NeuroRehabilitation 48:385–387. https://doi.org/10.3233/nre-218001

Sala G, Gobet F (2019) Cognitive training does not enhance general cognition. Trends Cogn Sci 23:9–20. https://doi.org/10.1016/j.tics.2018.10.004

Kang JM, Kim N, Lee SY et al (2021) Effect of cognitive training in fully immersive virtual reality on visuospatial function and frontal-occipital functional connectivity in predementia: randomized controlled trial. J Med Internet Res 23:e24526. https://doi.org/10.2196/24526

Ge S, Zhu Z, Wu B et al (2018) Technology-based cognitive training and rehabilitation interventions for individuals with mild cognitive impairment: a systematic review. BMC Geriatr 18:213. https://doi.org/10.1186/s12877-018-0893-1

Xia R, Wan M, Lin H et al (2022) Effects of mind-body exercise Baduanjin on cognition in community-dwelling older people with mild cognitive impairment: a randomized controlled trial. Neuropsychol Rehabil. https://doi.org/10.1080/09602011.2022.2099909

Hölzel BK, Lazar SW, Gard T et al (2011) how does mindfulness meditation work? Proposing mechanisms of action from a conceptual and neural perspective. Perspect Psychol Sci 6:537–559. https://doi.org/10.1177/1745691611419671

Khine WWT, Voong ML, Ng TKS et al (2020) Mental awareness improved mild cognitive impairment and modulated gut microbiome. Aging (Albany NY) 12:24371–24393. https://doi.org/10.18632/aging.202277

Berk L, van Boxtel M, van Os J (2017) Can mindfulness-based interventions influence cognitive functioning in older adults? A review and considerations for future research. Aging Ment Health 21:1113–1120. https://doi.org/10.1080/13607863.2016.1247423

Whitfield T, Barnhofer T, Acabchuk R et al (2022) The effect of mindfulness-based programs on cognitive function in adults: a systematic review and meta-analysis. Neuropsychol Rev 32:677–702. https://doi.org/10.1007/s11065-021-09519-y

Download references

Author information

Authors and affiliations.

School of Nursing and Rehabilitation, Cheeloo College of Medicine, Shandong University, 44 Wenhuaxi Road, Lixia District, Jinan, Shandong Province, China

Xiangya School of Nursing, Central South University, 172 Tongzipo Road, Yuelu District, Changsha, Hunan Province, China

Guangpeng Wang

Department of Nursing, Qilu Hospital, Shandong University, 107 Wenhuaxi Road, Lixia District, Jinan, Shandong Province, China

Yingjuan Cao

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yingjuan Cao .

Ethics declarations

Conflict of interest.

No conflict of interest to declare.

Ethical approval

This article does not contain any studies with human participants or animals performed by the author.

Informed Consent

For this type of study, formal consent is not required.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1700 KB)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Liu, X., Wang, G. & Cao, Y. Association of nonpharmacological interventions for cognitive function in older adults with mild cognitive impairment: a systematic review and network meta-analysis. Aging Clin Exp Res 35 , 463–478 (2023). https://doi.org/10.1007/s40520-022-02333-3

Download citation

Received : 09 November 2022

Accepted : 19 December 2022

Published : 06 January 2023

Issue Date : March 2023

DOI : https://doi.org/10.1007/s40520-022-02333-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Mild cognitive impairment
  • Nonpharmacological interventions
  • Systematic review
  • Network meta-analysis
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 14 February 2024

Online images amplify gender bias

  • Douglas Guilbeault   ORCID: orcid.org/0000-0002-0177-3027 1 ,
  • Solène Delecourt 1 ,
  • Tasker Hull 2 ,
  • Bhargav Srinivasa Desikan 3 ,
  • Mark Chu 4 &
  • Ethan Nadler   ORCID: orcid.org/0000-0002-1182-3825 5  

Nature ( 2024 ) Cite this article

15k Accesses

487 Altmetric

Metrics details

  • Communication
  • Human behaviour

Each year, people spend less time reading and more time viewing images 1 , which are proliferating online 2 , 3 , 4 . Images from platforms such as Google and Wikipedia are downloaded by millions every day 2 , 5 , 6 , and millions more are interacting through social media, such as Instagram and TikTok, that primarily consist of exchanging visual content. In parallel, news agencies and digital advertisers are increasingly capturing attention online through the use of images 7 , 8 , which people process more quickly, implicitly and memorably than text 9 , 10 , 11 , 12 . Here we show that the rise of images online significantly exacerbates gender bias, both in its statistical prevalence and its psychological impact. We examine the gender associations of 3,495 social categories (such as ‘nurse’ or ‘banker’) in more than one million images from Google, Wikipedia and Internet Movie Database (IMDb), and in billions of words from these platforms. We find that gender bias is consistently more prevalent in images than text for both female- and male-typed categories. We also show that the documented underrepresentation of women online 13 , 14 , 15 , 16 , 17 , 18 is substantially worse in images than in text, public opinion and US census data. Finally, we conducted a nationally representative, preregistered experiment that shows that googling for images rather than textual descriptions of occupations amplifies gender bias in participants’ beliefs. Addressing the societal effect of this large-scale shift towards visual communication will be essential for developing a fair and inclusive future for the internet.

Images increasingly pervade the information we consume and communicate daily. The number of images in online search engines has leapt from thousands to billions in just two decades 2 . Every day, millions of people view and download images from platforms such as Google and Wikipedia 5 , 6 , and millions more are socializing through hyper-visual platforms such as Instagram, Snapchat and TikTok, which are based predominantly on the exchange of images. This growing trend is widely recognized by the tech and venture capital industries 3 , 4 , as well as by news agencies and advertisers who are now relying more heavily on images to attract people’s attention online 7 , 8 . This trend is also reflected by changes in the habits of the average American. A longitudinal survey from the American Academy of the Arts and Sciences shows that the amount of time Americans spend reading text is steadily declining 1 , whereas the time they spend producing and viewing images continues to rise 2 , 4 . What consequences does this unprecedented shift towards visual content have on how we ‘see’ the world? At the dawn of photography, Frederick Douglass—esteemed writer and civil rights leader—forewarned of the potential for images to reinforce social biases at large, arguing in his 1861 lecture ‘Pictures and Progress’ that “the great cheapness and universality of pictures must exert a powerful though silent influence on the ideas and sentiment of present and future generations” 19 . Since Douglass’ time, the internet has made it only cheaper and easier to circulate images on a massive scale 3 , 4 , potentially intensifying the impact of their silent influence. In this study, we explore the impact of online images on the large-scale spread of gender bias.

Despite the swelling proliferation of online images, most quantitative research into online gender bias focuses on text 13 , 15 , 20 , 21 , 22 . Only a few recent studies examine gender bias in a small sample of Google images 16 , 17 , 18 , without comparing the prevalence of gender bias and its psychological impact across images and text. Yet numerous psychological studies suggest that images may provide an especially potent medium for the transmission of gender bias. Research into the ‘picture superiority effect’ shows that images are often more memorable and emotionally evocative than text 9 , 10 , 23 , and may implicitly underlie the comprehension of text itself 11 , 12 , 24 , 25 . Images also differ from text in the salience with which they present demographic information. A textual description of a person can easily minimize gender bias by leveraging gender-neutral terminology or by omitting references to gender. For example, the sentence ‘The doctor administered the test’ makes no mention of the doctor’s gender. By contrast, an image of a doctor directly transmits demographic cues that elicit perceptions of the doctor’s gender. In this way, images strengthen the salience of gender in the representation of social categories. These intrinsic differences between images and text point to the prediction that online images amplify gender bias, both in its statistical prevalence and in its psychological impact on internet users.

Comparing gender bias in images and text

In this study, we developed computational and experimental techniques for comparing gender bias and its psychological impact across massive online corpora of images and texts. Our main analyses compared images and text data from the world’s most popular search engine, Google. Our findings were replicated using more than half a million images and billions of words from Wikipedia and Internet Movie Database (IMDb) 26 , 27 , 28 (Extended Data Figs. 1 and 2 ; see Supplementary Information sections A.1.1 and A.1.2 for details). We implemented our model at scale by examining the gender biases in images and texts associated with all 3,495 social categories drawn from WordNet, a canonical database of categories in the English language 29 . These categories include occupations—such as doctor, lawyer and carpenter—and generic social roles, such as neighbour, friend and colleague.

To measure gender bias in online images, we automatically retrieved the top 100 images from Google corresponding to each social category in Google Images (Extended Data Fig. 3 ; see ‘Data collection procedure for online images’ in Methods ). Collecting 100 images for 3,495 categories yielded 349,500 images. In the Supplementary Information , we report analyses showing that our results held when we increased the number of images collected for each category, and when we used gender-specific Google searches for each category (for example, female doctor), which yielded an extra 491,169 images (Supplementary Figs. 1 and 2 ). The scale of our image dataset is orders of magnitude larger than prior studies of gender bias in Google Images, which have typically examined 50 occupations or fewer, using only a few thousand images in total 16 , 17 , 18 . Each search was implemented from a fresh Google account with no prior history to avoid the uncontrolled effects of Google’s recommendation algorithm, which customizes results based on browsing history 30 . Searches were run by ten distinct data servers in New York City. All image data were collected in August 2020. Our results were replicated when collecting Google images using Internet Protocols from five further locations around the world: Amsterdam (the Netherlands), Bangalore (India), Frankfurt (Germany), Singapore (Singapore) and Toronto (Canada) (Supplementary Figs. 3 and 4 ).

To identify the gender of faces in each image, we hired a team of 6,392 human coders from Amazon Mechanical Turk (MTurk). The gender of each face was determined by identifying the majority (modal) gender classification selected by three unique coders who labelled faces as ‘female’, ‘male’ or ‘non-binary’ (2% of classification judgements indicated ‘non-binary’; these were excluded from our analyses). Our focus is not on how people self-identify in terms of gender. Rather, we focus on the gender that internet users perceive in online images. We replicated our findings using a canonical image dataset 28 of 72,214 celebrities depicted across IMDb and Wikipedia (511,946 images), where each image is associated with the self-identified gender of the person depicted (Extended Data Fig. 2 and Supplementary Information section A.1.2). All coders were fluent English speakers based in the USA, and our results are robust to controlling for coder demographics and the rate of intercoder agreement (Supplementary Tables 1 and 2 ; see ‘Demographics of human coders’ in Methods ). Coders reached unanimous agreement in their gender classifications for 91% of images. A standard chance-corrected measure of classification agreement (Gwet’s Agreement Coefficient, AC) indicates satisfactory intercoder reliability in our sample (Gwet’s AC1 = 0.48). For each category, we calculated the gender balance of the faces in its top 100 Google Image search results. We normalized this measure such that −1 indicates 100% female representation, 0 indicates perfect gender balance (50%/50%) and 1 indicates 100% male representation.

To measure gender bias in online texts, we leveraged word embedding models that construct a high-dimensional vector space based on the co-occurrence of words (for example, whether two words appear in the same sentence), such that words with similar meanings are closer in this vector space. Harnessing recent advances in natural language processing 22 , 31 , we identified a gender dimension in word embedding models that captures the extent to which each category co-occurs with textual references to either women or men. This method allows us to position each category along a −1 (female) to 1 (male) axis, such that categories closer to −1 are more commonly associated with women and those closer to 1 are more commonly associated with men (see ‘Constructing a gender dimension in word embedding space’ in Methods ). We focus here on applying this method to the canonical word2vec model 32 trained on the 2013 Google News corpus consisting of more than 100 billion words. Our results hold when comparing against our own word2vec model trained on a more recent sample of online news published between 2021 and 2023 (Extended Data Fig. 4 ). We also replicated our findings when comparing online images with a range of word embedding models, including Global Vectors for Word Representation (GloVe), Bidirectional Encoder Representations from Transformers (BERT), FastText, ConceptNet and Generative Pre-trained Transformer 3 (GPT-3), which vary in their dimensionality, their data sources (including Twitter and a random sample of the web) and the time period during which their training data were collected, ranging from 2013 to 2023 (Supplementary Table 3 and Supplementary Fig. 5 ).

Both our image-based and text-based measures capture the frequency with which each social category co-occurs with representations of each gender, along a −1 (female) to 1 (male) continuum, where 0 indicates equal association with each gender. To maximize the correspondence between our image-based and text-based measures, we apply minimum–maximum normalization to our text-based measure, so that −1 and 1 represent the most female and male categories, respectively, according to each method (results are robust to alternative normalization procedures; Supplementary Fig. 6 ). We were able to associate 2,986 social categories in WordNet with word embeddings in the Google News corpus, so we focus our comparisons on these categories (our image results are robust to including all 3,495 categories; Supplementary Fig. 7 ).

Using these measures, we quantify gender bias as a form of statistical bias along three dimensions. First, we examine the extent to which social categories are associated with a specific gender in images and texts. Second, we examine the extent to which women are represented, compared with men, across all social categories in images and texts. Third, we compare the gender associations in our image and text data with the empirical representation of women and men in public opinion and US census data on occupations. This allows us to test not only whether gender bias is statistically stronger in images than texts, but also whether this bias reflects a distorted representation of the empirical distribution of women and men in society.

Gender bias is stronger in images

To begin, we confirm that the gender associations for each social category are highly correlated across online images (Google Images) and texts (Google News) ( P  < 0.0001, r  = 0.5, Fig. 1a , Pearson correlation, two-tailed, n  = 2,986 categories), indicating shared patterns of gender representation across these sources. Yet the gender associations in images from Google Images are statistically more extreme than those in texts from Google News. Figure 1b shows that the magnitude of gender bias is significantly stronger in images than text for both female-skewed ( P  < 0.0001) and male-skewed categories ( P  < 0.0001) (Wilcoxon signed-rank test, n  = 2,986 categories, two-tailed). This result holds when comparing only categories for which the gender associations agree across images, texts and human judgements (Extended Data Fig. 5 ). Figure 1c highlights this gap by showing the gender associations in these images and texts for an illustrative sample of occupations.

figure 1

a , The correlation between gender associations in images from Google Images and texts from Google News for all social categories ( n  = 2,986), organized by deciles. Our image-based measure captures the frequency of female and male faces associated with each category in Google Images (−1 means 100% female; 1 means 100% male). Our text-based measure captures the frequency at which each category is associated with men or women in the Google News corpus (−1 means 100% female; 1 means 100% male; measure is minimum–maximum normalized; ‘Constructing a gender dimension in word embedding space’). Data are shown as mean values, and error bars represent 95% confidence intervals. *** P  = 2.2 × 10 − 16 (Pearson correlation, two-tailed). b , The strength of gender association in these online images and texts for all categories ( n  = 2,986), split into whether these categories are female- or male-skewed according to each measure separately. Box plots show interquartile range (IQR) ±1.5 × IQR. c , The gender associations for a sample of occupations according to these online images and texts; this sample was manually selected to highlight the kinds of social categories and gender biases examined.

Source Data

Yet we also find that, on average, women are underrepresented in images, compared with texts (Fig. 2 ). Figure 2a shows that texts from Google News exhibit a relatively weak bias towards male representation (average bias ( µ ) = 0.03, P  < 0.0001), whereas this male bias is more than four times stronger in images from Google Images ( µ  = 0.14, P  < 0.0001), marking a highly significant increase (mean difference = 0.11, P  < 0.0001) (Wilcoxon signed-rank test, two-tailed, n  = 2,986 categories). According to Google News, 56% of categories are male-skewed, whereas 62% are male-skewed according to Google Images ( P  < 0.0001, proportion test, two-tailed, n  = 2,986 categories). The underrepresentation of women is accentuated when using a deep learning algorithm to classify gender in these online images (Supplementary Figs. 8 – 10 ). This inequality even persists when searching explicitly for ‘female’ and ‘male’ images of each category in Google (Supplementary Figs. 1 and 2 ).

figure 2

a , The distribution of gender associations for social categories ( n  = 2,986) in images from Google Images and texts from Google News. The image-based measure captures the frequency of female and male faces associated with each category in Google Image search results (−1 means 100% female; 1 means 100% male); the text-based measure captures the frequency with which each category is associated with men or women in the Google News corpus (−1 means 100% female; 1 means 100% male associations). The solid lines indicate the average gender association according to text (green) and images (purple). b , The correlation of gender associations, paired at the category level ( n  = 2,986), as measured by these online images and texts, as well as by internet users’ ( n  = 2,500) judgements of each category. Human coders indicated their beliefs about the gender representation of each category by moving a slider along the same −1 (female) to 1 (male) scale (horizontal axis shows the average human judgement across evenly spaced bins). Data points show mean values for each bin, and error bands show 95% confidence intervals for the fitted curve defined by a locally estimated scatterplot smoothing (LOESS)-smoothed regression (span = 0.75). c , The gender association of all matched occupations ( n  = 685) according to (1) textual patterns in Google News (green), (2) the empirical distribution of gender in the 2019 US census Bureau of Labor Statistics (grey) and (3) Google Images (purple). Data are shown as mean values and error bars show 95% confidence intervals calculated using a Student’s t -test (two-tailed).

Our findings continue to hold when controlling for (1) linguistic features of categories, such as ambiguity, word frequency and gender connotation (for example, uncle) (Supplementary Figs. 11 and 12 and Supplementary Table 4 ); (2) the method for constructing the gender dimension in embedding space (Supplementary Figs. 6 and 13 – 15 ); (3) the frequency at which each category is searched in Google Images across the USA (Supplementary Figs. 16 and 17 and Supplementary Table 5 ); (4) the number of faces (Supplementary Fig. 18 ) and images (Supplementary Fig. 19 ) associated with each category, and the number of categories examined (Supplementary Fig. 7 ); (5) the ranking of images in Google search results (Supplementary Fig. 19 and Supplementary Table 7 ); (6) whether faces are automatically cropped from images before they are classified by human annotators (Supplementary Fig. 20 ) or a deep learning classifier (Supplementary Fig. 9 ); (7) whether images repeat in and across searches (Supplementary Table 8 ); (8) the number of faces associated with each Google search (Supplementary Table 7 ); and (9) whether images contain photographed or animated people (Supplementary Table 8 ).

Although these analyses support our prediction that online gender bias is more prevalent in images than texts, an open question is whether online images present a biased representation of the empirical distribution of gender in society. Next, we show that online images exhibit significantly stronger gender bias than public opinion and 2019 US census data on occupations.

To compare our results with public opinion, we hired a separate panel of 2,500 coders from MTurk who used the same −1 (female) to 1 (male) scale to provide their opinions about the gender they most associate with each category in our dataset (see ‘Collecting human judgements of social categories’ in Methods ). Although both our image and text measures are highly predictive of gender associations in public opinion, Fig. 2b shows that texts significantly underestimate male bias in public opinion (by −0.084 on average, P  < 0.001), whereas images significantly overestimate it (by 0.025 on average, P  < 0.001) (Wilcoxon signed-rank test, two-tailed, n  = 2,986 categories).

We also compare our measures with the frequency of genders across occupations according to the 2019 census by the US Bureau of Labor Statistics ( n  = 685 occupations could be matched between our data and the census). Figure 2c shows that, according to texts from Google News, the gender association of these occupations is neutral ( µ  = 0, P  = 0.65) and significantly less male than the census (census µ  = 0.08, P  < 0.001) and Google Images (images µ  = 0.15, P  < 0.001) (Wilcoxon signed-rank test, two-tailed, n  = 685 occupations). By contrast, although these occupations are male-skewed in both the census and Google Images, the same occupations are significantly more biased towards male representation in Google Images (mean difference = 0.07, P  < 0.001, Wilcoxon signed-rank test, two-tailed, n  = 685 occupations). Comparing images and texts separately for female- and male-typed occupations reinforces these findings (Supplementary Fig. 21 ).

Testing psychological effects of images

What consequences do these biases in online images have on internet users? Here we report the results of a preregistered experiment designed to test the impact of online images on gender bias in people’s beliefs (‘Data availability’). In this experiment, we recruited a nationally representative sample of US participants from the online platform Prolific ( n  = 450), who were tasked with using Google to search for descriptions of occupations relating to science, technology and the arts (Extended Data Fig. 6 ; see ‘Participant pool’ in Methods ). A total of 423 participants completed the task. Each participant used Google to retrieve descriptions of 22 randomly selected occupations from a set of 54 (see ‘Participant experience’ in Methods ). Participants were randomized into either (1) the Text condition, in which participants used Google News to search for and upload textual descriptions of these occupations, or (2) the Image condition, in which participants used Google Images to search for and upload images of occupations. After uploading the description for each occupation, each participant was asked to rate which gender they most associate with the occupation being described, using a −1 (female) to 1 (male) scale. To evaluate these experimental effects, participants were also randomized into the Control condition that used the same task design, except that participants used Google to search for and upload either images or textual descriptions of basic, unrelated categories (for example, apple and guitar) before rating the gender they associate with each occupation. In the Supplementary Information , we report the results of an extra condition in which a separate randomized group of participants were tasked with searching for textual descriptions using the generic Google search bar rather than the Google News search bar; altering the search bar had no effect on the outcomes (Supplementary Fig. 22 ). Across all conditions, our main outcome variable of interest is the absolute strength of participants’ gender associations for each occupation.

After completing the search task for all occupations, participants undertook an implicit association test (IAT) 33 , a standard method in psychology for detecting implicit biases (see ‘Measuring implicit bias using the IAT’ in Methods ). We adopted an IAT designed to detect the implicit bias towards associating women with liberal arts and men with science (Extended Data Figs. 7 and 8 ), because prior work demonstrates the ability of this IAT to predict human judgements and behaviours 34 , 35 relating to a consequential pattern of inequality in industry and academic institutions 36 , 37 . We administered the IAT to participants immediately after the experiment, and 3 days later. Participants’ implicit bias was measured using the standard IAT D score 33 ; positive D scores indicate that participants are faster at associating women with liberal arts and men with science. We acknowledge important continuing debate about the reliability of the IAT 38 , 39 , 40 . Our specific choice of IAT is supported by (1) prior work demonstrating its stable results across decades 34 and (2) a separate preregistered study we conducted that yielded consistent results with a similar design ( Methods ). We note, however, that the distribution of participants’ implicit bias scores was less stable across our preregistered studies than the distribution of participants’ explicit bias scores. Given these considerations, we view our implicit bias results as suggestive and emphasize our measure of participants’ explicit bias as our primary and most robust outcome of interest.

We begin by examining the extent of gender bias in the descriptions participants uploaded. A team of annotators labelled each textual description as female, male or neutral on the basis of whether it used female or male pronouns or names to describe the occupation (for example, a description referring to a ‘doctor’ as ‘he’ would be coded as ‘male’); textual descriptions were identified as neutral if they did not ascribe a particular gender to the occupation. Similarly, a team of annotators labelled the gender of the focal face in each uploaded image as female, male or neutral; images were coded as neutral if they contained no face or an undecipherable face. Then, for each occupation, we calculated the gender balance of the descriptions provided by participants by computing the average gender association across all descriptions. This approach compares gender associations across images and texts without relying on word embedding models, while also ensuring that the images and texts being compared were collected by users during the same time period.

Images amplify explicit gender bias

Consistent with our observational results, Fig. 3a shows that the descriptions participants uploaded were significantly more gendered in the Image condition than in the Text condition (mean difference = 0.42, P  < 0.0001, Wilcoxon signed-rank test, two-tailed). Figure 3b shows that exposure to more gendered stimuli in the Image condition led participants to report significantly stronger explicit gender associations than those in the Text (mean difference = 0.06, P  < 0.001) and Control (mean difference = 0.06, P  < 0.001) conditions, whereas there was no significant difference between those in the Text and Control conditions (mean difference = 0.001, P  = 0.56) (Wilcoxon signed-rank test, two-tailed; Wilcoxon equivalence test, P  < 0.05 for all bounds greater than or equal to |0.11|, n  = 54 occupations). For example, participants in the Text condition rated the category ‘model’ as female-skewed ( µ  = −0.32), but the female-skew of this rating nearly doubled in its intensity among participants in the Image condition ( µ  = −0.62). These findings hold when controlling for the number of online sources that participants encountered, the amount of time they spent evaluating descriptions and participants’ gender (Supplementary Fig. 23 and Supplementary Tables 9 – 11 ). Notably, the gender associations in participants’ uploads and self-reported beliefs are highly correlated with the gender associations detected for the same occupations in our observational analyses of Google Images and textual data from Google News (Extended Data Fig. 9 ).

figure 3

a – f , Participants ( n  = 423) from a nationally representative sample were randomized to one of the following: the ‘Image’ condition, in which they googled for images of occupations; the ‘Text’ condition, in which they googled for textual descriptions of occupations from Google News; or the ‘Control’ condition, in which they googled for either image-based or text-based descriptions of random categories (for example, ‘apple’) unrelated to occupations. The green, purple and dotted vertical lines indicate the mean results for the Text, Image and Control conditions, respectively. a , The average absolute strength of the gender associations in participants’ uploads for each occupation ( n  = 54; averaged at the occupation level) in both the Text and Image conditions (not applicable to the Control condition). b , The average absolute strength of the gender associations that participants reported for each occupation ( n  = 54; averaged at the occupation level) in each condition. c , The linear correlation between the average gender association of the descriptions that participants uploaded and the average gender association they explicitly reported for each occupation, coloured by condition. *** P  = 2.2 × 10 −16 (Pearson correlation, two-tailed). d , The correlation between the average strength of the gender association of the descriptions that participants uploaded and the average strength of the gender association they explicitly reported for each occupation, coloured by condition. *** P  = 6.2 × 10 −11 . e , The implicit gender bias ( D score) that participants ( n  = 405) exhibited in each condition. f , The correlation between the strength of participants’ self-reported gender associations for each occupation and their implicit bias ( D score) towards associating women with liberal arts and men with science ( n  = 9,167 observations across all participants). Error bars show 95% confidence intervals calculated using a Student’s t -test (two-tailed).

Images prime gender bias more strongly

These findings suggest that exposure to gendered descriptions in the Image condition more strongly primed participants’ explicit gender ratings of occupations. This priming mechanism is supported by Fig. 3c , which shows a high correlation between the gender associations in the descriptions that participants uploaded and the gender associations in their own explicit gender ratings across occupations ( r  = 0.79, P  < 0.0001), and by Fig. 3d , which shows a strong correlation between the absolute strength of gender associations in participants’ uploads and the absolute strength of the average gender associations they explicitly reported across occupations ( r  = 0.56, P  < 0.0001) (Pearson correlation, two-tailed, n  = 54 occupations). These results hold across occupations for both the Image and the Text conditions (Supplementary Table 12 ).

We found further evidence suggesting that images differ from text not only in the prevalence of gender bias they contain, but also in their ability to prime gender bias in people’s beliefs, holding prevalence constant (Extended Data Fig. 10 ). Participants who uploaded gendered images explicitly reported significantly stronger gender bias ( µ  = 0.41) than those who uploaded gendered textual descriptions of the same occupations ( µ  = 0.35; mean difference = 0.06, P  < 0.0001, t  = 4.58, Student’s t -test, two-tailed, n  = 54 occupations). This holds even when controlling for the amount of gender bias in the distribution of images and texts to which participants were exposed (Supplementary Table 13 ). Thus, even when gender was salient in both text and images, exposure to images led to stronger bias in people’s self-reported beliefs about the gender of occupations.

Images amplify implicit gender bias

Finally, we report suggestive results indicating that extended exposure to online images may have also amplified participants’ implicit gender bias. Participants across all conditions exhibited significant implicit bias towards associating men with science and women with liberal arts ( P  < 0.0001 in all conditions, Wilcoxon signed-rank test, two-tailed, n  = 423). Yet Fig. 3e shows that participants in the Image condition exhibited stronger implicit bias. There was no significant difference between participants’ implicit bias in the Text and Control conditions (mean difference = 0.06, P  = 0.24, Wilcoxon rank-sum test, two-tailed; Wilcoxon equivalence test, P  < 0.05 for all bounds greater than or equal to |0.13|). However, participants in the Image condition exhibited significantly stronger implicit bias than those in the Control condition ( P  = 0.005) (mean difference = 0.11, Wilcoxon rank-sum test, two-tailed). The difference in implicit bias between the Image and Text conditions did not reach conventional statistical significance (mean difference = 0.05, P  = 0.09, Wilcoxon rank-sum test, two-tailed; Wilcoxon equivalence test, P  < 0.05 for bounds greater than or equal to |0.14|). Across conditions, we find a clear correlation between the strength of participants’ self-reported gender associations and the strength of their implicit gender bias, both of which are greater in the Image condition (Fig. 3f ; P  < 0.0001, Jonckheere–Terpstra test = 19,382,281, two-tailed); this result is robust to a range of statistical controls (Supplementary Table 14 ). Notably, only participants in the Image condition exhibited significantly stronger implicit bias than control participants 3 days after the experiment (Supplementary Table 15 ), indicating enduring effects.

The rise of images in popular internet culture may come at a critical social cost. We have found that gender bias online is more prevalent and more psychologically potent in images than text. The growing centrality of visual content in our daily information diets may exacerbate gender bias by magnifying its digital presence and deepening its psychological entrenchment. This problem is expected to affect the well-being of, social status of and economic opportunities for not only women, who are systematically underrepresented in online images, but also men in female-typed categories such as care-oriented occupations 41 , 42 .

Our findings are especially alarming given that image-based social media platforms such as Instagram, Snapchat and TikTok are surging in popularity, accelerating the mass production and circulation of images. In parallel, popular search engines such as Google are increasingly incorporating images into their core functionality, for example, by including images as a default part of text-based searches 43 . Perhaps the apex of these developments is the widespread adoption of text-to-image artificial intelligence (AI) models that allow users to automatically generate images by means of textual prompts, further accelerating the production and circulation of images. Current work identifies salient gender and racial biases in the images that these AI models generate 44 , signalling that they may also intensify the large-scale spread of social biases. Consistent with related studies 45 , our work suggests that gender biases in multimodal AI may stem in part from the fact that they are trained on public images from platforms such as Google and Wikipedia, which are rife with gender bias according to our measures.

A promising direction for future research is to investigate the social and algorithmic processes contributing to bias in online images, pertaining not only to gender, but also to race and other demographic dimensions. The Google images we examine stem from various sources, with the most common source being personal blogs, followed by business, news and stock photo websites (Supplementary Fig. 24 ). The gender bias we observe seems to be driven partly by content that internet users choose to display on their blogs, and also by audiences’ preferences for which news to consume or images to purchase. Our supplementary results regarding celebrities on IMDb and Wikipedia (Extended Data Fig. 2 ) reflect extra contributing factors relating to status dynamics and hiring biases in entertainment media. In all cases, the human preference for familiar, prototypical representations of social categories is likely to play a role in perpetuating these biases 46 , 47 . We further anticipate that the study of online bias will benefit from extending our multimodal framework to analyse other modes of communication, such as audio and video, and to compare human and AI-generated content.

To keep pace with the evolving landscape of bias online, it is important for computational social scientists to expand beyond the analysis of textual data to include other content modalities that offer distinct ways of transmitting cultural information. Indeed, decades of research maintain that images lie at the foundation of human cognition 11 , 12 , 25 , 48 and may have provided the first means of human communication and culture 24 , 49 . It is therefore difficult to imagine how the science of human culture can be complete without a multimodal framework. Exploring the implications of an image-centric social reality for the evolution of human cognition and culture is a ripe direction for future research. Our study identifies one of many implications of this cultural shift concerning the amplification of social bias, stemming from the salient way in which images present demographic information when depicting social categories. Addressing the societal impact of this ascending visual culture will be essential in building a fair and inclusive future for the internet, and developing a multimodal approach to computational social science is a crucial step in this direction.

Here we outline the computational and experimental techniques we use to compare gender bias in online images and texts. We begin by describing the methods of data collection and analyses developed for the observational component of our study. Then we detail the study design deployed in our online search experiment. The preregistration for our online experiment is available at https://osf.io/3jhzx . Note that this study is a successful replication of a previous study with a nearly identical design, except the original study did not include a control condition nor several versions of the text condition; the preregistration of the previous study is available at https://osf.io/26kbr .

Observational methods

Data collection procedure for online images.

Our crowdsourcing methodology consisted of four steps (Extended Data Fig. 1 ). First, we gathered all social categories in WordNet, a canonical lexical database of English. WordNet contained 3,495 social categories, including occupations (such as ‘physicist’) and generic social roles (such as ‘colleague’). Second, we collected the images associated with each category from both Google and Wikipedia. Third, we used Python’s OpenCV—a popular open-source deep learning framework—to extract the faces from each image; this algorithm automatically isolates each face and extracts a square including the entire face and minimal surrounding context. Using OpenCV to extract faces helped us to ensure that each face in each image was separately classified in a standardized manner, and to avoid subjective biases in coders’ decisions for which face to focus on and categorize in each image. Fourth, we hired 6,392 human coders from MTurk to classify the gender of the faces. Following earlier work, each face was classified by three unique annotators 16 , 17 , so that the gender of each face (‘male’ or ‘female’) could be identified based on the majority (modal) gender classification across three coders (we also gave coders the option of labelling the gender of faces as ‘non-binary’, but this option was only chosen in 2% of cases, so we excluded these data from our main analyses and recollected all classifications until each face was associated with three unique coders using either the ‘male’ or the ‘female’ label). Although coders were asked to label the gender of the face presented, our measure is agnostic to which features the coders used to determine their gender classifications; they may have used facial features, as well as features relating to the aesthetics of expressed gender such as hair or accessories. Each search was implemented from a fresh Google account with no prior history. Searches were run in August 2020 by ten distinct data servers in New York City. This study was approved by the Institutional Review Board at the University of California, Berkeley, and all participants provided informed consent.

To collect images from Google, we followed earlier work by retrieving the top 100 images that appeared when using each of the 3,495 categories to search for images using the public Google Images search engine 16 , 17 , 18 (Google provides roughly 100 images for its initial search results). To collect images from Wikipedia, we identified the images associated with each social category in the 2021 Wikipedia-based Image Text Dataset (WIT) 27 . WIT maps all images across Wikipedia to textual descriptions on the basis of the title, content and metadata of the active Wikipedia articles in which they appear. WIT contained images associated with 1,523 social categories from WordNet across all English Wikipedia articles (see Supplementary Information section A.1.1 for details on our Wikipedia analysis). The coders identified 18% of images as not containing a human face; these were removed from our analyses. We also asked all annotators to complete an attention check, which involved choosing the correct answer to the common-sense question “What is the opposite of the word ‘down’?” from the following options: ‘Fish’, ‘Up’, ‘Monk’ and ‘Apple’. We removed the data from all annotators who failed an attention check (15%), and we continued collecting classifications until each image was associated with the judgements of three unique coders, all of whom passed the attention check.

Collecting human judgements of social categories

We hired a separate sample of 2,500 human coders from MTurk to complete a survey study in which they were presented with social categories (five categories per task) and asked to evaluate each category by means of the following question (each category was assessed by 20 unique human coders): “Which gender do you most expect to belong to this category?” This was answered as a scalar with a slider ranging from −1 (females) to 1 (males). All MTurkers were prescreened such that only US-based MTurkers who were fluent in English were invited to participate in this task.

Demographics of human coders

The human coders were all adults based in the USA who were fluent in English. Supplementary Table 1 indicates that our main results are robust to controlling for the demographic composition of our human coders. Among our coders, 44.2% identified as female, 50.6% as male and 3.2% as non-binary; the remainder preferred not to disclose. In terms of age, 42.6% identified as being 18–24 years, 22.9% as 25–34, 32.5% as 35–54, 1.6% as 55–74 and less than 1% as more than 75. In terms of race, 46.8% identified as Caucasian, 11.6% as African American, 17% as Asian, 9% as Hispanic and 10.3% as Native American; the remainder identified as either mixed race or preferred not to disclose. In terms of political ideology, 37.2% identified as conservative, 33.8% as liberal, 20.3% as independent and 3.9% as other; the remainder preferred not to disclose. In terms of annual income, 14.3% reported making less than US$10,000, 33.4% reported US$10,000–50,000, 22.7% reported US$50,000–75,000, 14.9% reported US$75,000–100,000, 10.5% reported US$100,000–150,000, 2.8% reported US$150,000–250,000 and less than 1% reported more than US$250,000; the remainder preferred not to disclose. In terms of the highest level of education acquired by each annotator, 2.7% selected ‘Below High School’, 17.5% selected ‘High School’, 29.2% selected ‘Technical/Community College’, 34.5% selected ‘Undergraduate degree’, 14.8% selected ‘Master’s degree’ and less than 1% selected ‘Doctorate degree’; the remainder preferred not to disclose.

Constructing a gender dimension in word embedding space

Our method for measuring gender associations in text relies on the fact that word embedding models use the frequency of co-occurrence among words in text (for example, whether they occur in the same sentence) to position words in an n -dimensional space, such that words that co-occur together more frequently are represented as closer together in this n -dimensional space. The ‘embedding’ for a given word refers to the specific position of this word in the n -dimensional space constructed by the model. The cosine distance between word embeddings in this vector space provides a robust measure of semantic similarity that is widely used to unpack the cultural meanings associated with categories 13 , 22 , 31 . To construct a gender dimension in word embedding space, we adopt the methodology recently developed by Kozlowski et al. 22 . In their paper, Kozlowski et al. 22 construct a gender dimension in embedding space along which different categories can be positioned (for example, their analysis focuses on types of sport). They start by identifying two clustered regions in word embedding space corresponding to traditional representations of females and males, respectively. Specifically, the female cluster consists of the words ‘woman’, ‘her’, ‘she’, ‘female’ and ‘girl’, and the male cluster consists of the words ‘man’, ‘his’, ‘he’, ‘male’ and ‘boy’. Then, for each of the 3,495 social categories in WordNet, we calculated the average cosine distance between this category and both the female and the male clusters. Each category, therefore, was associated with two numbers: its cosine distance with the female cluster (averaged across its cosine distance with each term in the female cluster), and its cosine distance with the male cluster (averaged across its cosine distance with each term in the male cluster). Taking the difference between a category’s cosine distance with the female and male clusters allowed each category to be positioned along a −1 (female) to 1 (male) scale in embedding space. The category ‘aunt’, for instance, falls close to −1 along this scale, whereas the category ‘uncle’ falls close to 1 along this scale. Of the categories in WordNet, 2,986 of them were associated with embeddings in the 300-dimensional word2vec model of Google News, and could therefore be positioned along this scale. All of our results are robust to using different terms to construct the poles of this gender dimension (Supplementary Fig. 18 ). However, our main analyses use the same gender clusters as ref. 22 .

To compute distances between the vectors of social categories represented by bigrams (such as ‘professional dancer’), we used the Phrases class in the Gensim Python package, which provided a prebuilt function for identifying and calculating distances for bigram embeddings. This method works by identifying an n -dimensional vector of middle positions between the vectors corresponding separately to each word in the bigram (for example, ‘professional’ and ‘dancer’). This technique then treats this middle vector as the singular vector corresponding to the bigram ‘professional dancer’ and is thereby used to calculate distances from other category vectors. This same method was applied to the construction of embeddings for all bigram categories in all models.

To maximize the similarity between our text-based and image-based measures of gender association, we adopted the following three techniques. First, we normalized our textual measure of gender associations using minimum–maximum normalization, which ensured that a compatible range of values was covered by both our text-based and image-based measures of gender association. This is helpful because the distribution of gender associations for the image-based measure stretched to both ends of the −1 to 1 continuum as a result of certain categories being associated with 100% female faces or 100% male faces. By contrast, although the textual measure described above contains a −1 (female) to 1 (male) scale, the most female category in our WordNet sample has a gender association of −0.42 (‘chairwoman’), and the most male category has a gender association of 0.33 (‘guy’). Normalization ensures that the distribution of gender associations in the image- and text-based measures both equally cover the −1 to 1 continuum, so that paired comparisons between these scales (matched at the category level) can directly examine the relative ranking of a category’s gender association in each measure. Minimum–maximum normalization is given by the following equation:

where x i represents the gender association of category x i ([−1,1]), x min represents the category with the lowest gender score, x max represents the category with the highest gender score, and \(\widetilde{{x}_{i}}\) represents the normalized gender association of category x i . To preserve the −1 to 1 scale in applying minimum–maximum normalization, we applied this procedure separately for male-skewed categories (that is, all categories with a gender association above 0), such that x min represents the least male of the male categories and x max represents the most male of the male categories. We applied this same procedure to the female-skewed categories, except that, because the female scale is −1 to 0, x min represents the most female of the female categories and x max represents the least female. For this reason, after the 0–1 female scale was constructed, we multiplied the female scores by −1 so that −1 represented the most female of the female categories and 0 represented the least. We then appended the female-normalized (−1 to 0) and male-normalized (0 to 1) scales. Both the male and female scales before normalization contained categories with values within four decimal points of zero (| x |  <  0.0001), such that this normalization technique had no effect of arbitrarily pushing certain categories towards 0. Instead, the above technique has the advantage of stretching out the text-based measure of gender association to ensure that a substantial fraction of categories reach all the way to the −1 female region and all the way to the 1 male region of the continuum, similar to the distribution of values for the image-based measure.

Experimental methods

Participant pool.

For this experiment, a nationally representative sample of participants ( n  = 600) was recruited from the popular crowdsourcing platform Prolific, which provides a vetted panel of high-quality human participants for online research. No statistical methods were used to determine this sample size. A total of 575 participants completed the task, exhibiting an attrition rate of 4.2%. We only examine data from participants who completed the experiment. Our main results report the outcomes associated with the Image, Text and Control conditions ( n  = 423); in the Supplementary Information , we report the results of an extra version of the Text condition involving the generic Google search bar ( n  = 150; Supplementary Fig. 26 ). We only examine data from participants who completed the task. To recruit a nationally representative sample, we used Prolific’s prescreening functionality designed to provide a nationally representative sample of the USA along the dimensions of sex, age and ethnicity. Participants were invited to partake in the study only if they were based in the USA, fluent English speakers and aged more than 18 years. A total of 50.8% of participants were female (no participants identified as non-binary). All participants provided informed consent before participating. This experiment was run on 5 March 2022.

Participant experience

Extended Data Fig. 2 presents a schematic of the full experimental design. This experiment was approved by the Institutional Review Board at the University of California, Berkeley. In this experiment, participants were randomized to one of four conditions: (1) the Image condition (in which they used the Google Image search engine to retrieve images of occupations), (2) the Google News Text condition (in which they used the Google News search engine, that is, news.google.com, to retrieve textual descriptions of occupations), (3) the Google Neutral Text condition (in which they used the generic Google search engine, that is, google.com, to retrieve textual descriptions of occupations) and (4) the Control condition (in which they were asked at random to use either Google Images or the neutral (standard) Google search engine to retrieve descriptions of random, non-gendered categories, such as ‘apple’). Note that, in the main text, we report the experimental results comparing the Image, Control and Google News Text conditions; we present the results concerning the Google Neutral Text condition as a robustness test in the Supplementary Information (Supplementary Fig. 26 ).

After uploading a description for a given occupation, participants used a −1 (female) to 1 (male) scale to indicate which gender they most associate with this occupation. In this way, the scale participants used to indicate their gender associations was identical to the scale we used to measure gender associations in our observational analyses of online images and text. In the control condition, participants were asked to indicate which gender they associate with a given randomly selected occupation after uploading a description for an unrelated category. Participants in all conditions completed this sequence for 22 unique occupations (randomly sampled from a broader set of 54 occupations). These occupations were selected to include occupations from science, technology, engineering and mathematics, and the liberal arts. Each occupation that was used as a stimulus could also be associated with our observational data concerning the gender associations measured in images from Google Images and the texts of Google News. Here is the full preregistered list of occupations used as stimuli: immunologist, mathematician, harpist, painter, piano player, aeronautical engineer, applied scientist, geneticist, astrophysicist, professional dancer, fashion model, graphic designer, hygienist, educator, intelligence analyst, logician, intelligence agent, financial analyst, chief executive officer, clarinetist, chiropractor, computer expert, intellectual, climatologist, systems analyst, programmer, poet, astronaut, professor, automotive engineer, cardiologist, neurobiologist, English professor, number theorist, marine engineer, bookkeeper, dietician, model, trained nurse, cosmetic surgeon, fashion designer, nurse practitioner, art teacher, singer, interior decorator, media consultant, art student, dressmaker, English teacher, literary agent, social worker, screen actor, editor-in-chief, schoolteacher. The set of occupations that participants evaluated was identical across conditions.

Once each participant completed this task for 22 occupations, they were then asked to complete an IAT designed to measure the implicit bias towards associating men with science and women with liberal arts 33 , 34 , 35 , 38 . The IAT was identical across conditions (‘Measuring implicit bias using the IAT’). In total, the experiment took participants approximately 35 minutes to complete. Participants were compensated at the rate of US $15 per hour for their participation.

Measuring implicit bias using the IAT

The IAT in our experiment was designed using the iatgen tool 33 ( https://iatgen.wordpress.com/ ). The IAT is a psychological research tool for measuring mental associations between target pairs (for example, different races or genders) and a category dimension (for example, positive–negative, science–liberal arts). Rather than measuring what people explicitly believe through self-report, the IAT measures what people mentally associate and how quickly they make these associations. The IAT has the following design (description borrowed from iatgen) 33 : “The IAT consists of seven ‘blocks’ (sets of trials). In each trial, participants see a stimulus word on the screen. Stimuli represent ‘targets’ (for example, insects and flowers) or the category (for example, pleasant–unpleasant). When stimuli appear, the participant ‘sorts’ the stimulus as rapidly as possible by pressing with either their left or right hand on the keyboard (in iatgen, the ‘E’ and ‘I’ keys). The sides with which one should press are indicated in the upper left and right corners of the screen. The response speed is measured in milliseconds.” For example, in some sections of our study, a participant might press with the left hand for all male + science stimuli and with their right hand for all female + liberal arts stimuli.

The theory behind the IAT is that the participant will be fast at sorting in a manner that is consistent with one’s latent associations, which is expected to lead to greater cognitive fluency in one’s intuitive reactions. For example, the expectation is that someone will be faster when sorting flowers + pleasant stimuli with one hand and insects + unpleasant with the other, as this is (most likely) consistent with people’s implicit mental associations (example borrowed from iatgen). Yet, when the category pairings are flipped, people should have to engage in cognitive work to override their mental associations, and the task should be slower. The degree to which one is faster in one section or the other is a measure of one’s implicit bias.

In our study, the target pairs we used were ‘male’ and ‘female’ (corresponding to gender), and the category dimension referred to science–liberal arts. To construct the IAT, we followed the design used by Rezaei 38 . For the male words in the pairs, we used the following terms: man, boy, father, male, grandpa, husband, son, uncle. For the female words in the pairs, we used the following terms: woman, girl, mother, female, grandma, wife, daughter, aunt. For the science category, we used the following words: biology, physics, chemistry, math, geology, astronomy, engineering, medicine, computing, artificial intelligence, statistics. For the liberal arts category, we used the following words: philosophy, humanities, arts, literature, English, music, history, poetry, fashion, film. Extended Data Figs. 3 – 6 illustrate the four main IAT blocks that participants completed (as per standard IAT design, participants were also shown blocks 2, 3 and 4, with the left–right arrangement of targets reversed). Participants completed seven blocks in total, sequentially. The IAT instructions for Extended Data Fig. 3 state, “Place your left and right index fingers on the E and I keys. At the top of the screen are 2 categories. In the task, words and/or images appear in the middle of the screen. When the word/image belongs to the category on the left, press the E key as fast as you can. When it belongs to the category on the right, press the I key as fast as you can. If you make an error, a red X will appear. Correct errors by hitting the other key. Please try to go as fast as you can while making as few errors as possible. When you are ready, please press the [Space] bar to begin.” These instructions are repeated throughout all blocks in the task.

To measure implicit bias based on participants’ reaction times during the IAT, we adopted the following standard approach (used by iatgen). We combined the scores across all four blocks (blocks 3, 4, 6 and 7 in iatgen). Some participants are also faster than others, adding statistical ‘noise’ as a result of variance in overall reaction times. Thus, instead of comparing within-person differences in raw latencies, this difference is standardized at the participant level, dividing the within-person difference by a ‘pooled’ standard deviation. This pooled standard deviation uses the standard deviation of what are called the practice and critical blocks combined. This yields a D score. In iatgen, a positive D value indicates association in the form of target A + positive, target B + negative, which in our case is male + science, female + liberal arts), whereas a negative D value indicates the opposite bias (target A + negative, target B + positive, which in our case is male + liberal arts, female + science), and a zero score indicates no bias.

Our main experimental results evaluate the relationship between the participants’ explicit and implicit gender associations and the strength of gender associations in the Google images and textual descriptions they encountered during the search task. The strength of participants’ explicit gender associations is calculated as the absolute value of the number they input using the −1 (female) to 1 (male) scale after each occupation they classified (Extended Data Fig. 2 ). Participants’ implicit bias is measured by the D score of their results on the IAT designed to detect associations between men and science and women and liberal arts. To measure the strength of gender associations in the Google images that participants encountered, we calculated the gender parity of the faces uploaded across all participants who classified a given occupation. For example, we identified the responses of all participants who provided image search results for the occupation ‘geneticist’, and we constructed the same gender dimensions as described in the main text, such that −1 represents 100% female faces, 0 represents 50% female (male) faces and 1 represents 100% male faces. To identify the gender of the faces of the images that participants uploaded, we recruited a separate panel of MTurk workers ( n  = 500) who classified each face (there were 3,300 images in total). Each face was classified by two unique MTurkers; if they disagreed in their gender assignment, a third MTurk worker was hired to provide a response, and the gender identified by the majority was selected. We adopted an analogous approach to annotating the gender of the textual descriptions that participants uploaded in the text condition. These annotators identified whether each textual or visual description uploaded by participants was female (1), neutral (0) or male (1). Each textual description was coded as male, female or neutral on the basis of whether it used male or female pronouns or names to describe the occupation (for example, referred to a ‘doctor’ as ‘he’); textual descriptions were identified as neutral if they did not ascribe a particular gender to the occupation described. We were then able to calculate the same measure of gender balance in the textual descriptions uploaded for each occupation as we applied in our image analysis.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

All data collected for this study are publicly available at https://github.com/drguilbe/ImgVSText . Preregistration for experiment is available at https://osf.io/3jhzx .  Source data are provided with this paper.

Code availability

All code underlying this study is publicly available at https://github.com/drguilbe/ImgVSText .

Time spent reading. American Academy of the Arts and Sciences https://www.amacad.org/humanities-indicators/public-life/time-spent-reading (2019).

Zhang, L. & Rui, Y. Image search—from thousands to billions in 20 years. ACM Trans. Multimedia Comput. Commun. Appl. 9 , 1–20 (2013).

CAS   Google Scholar  

Edwards, J. Planet selfie: we’re now posting a staggering 1.8 billion photos every day. Business Insider https://www.businessinsider.com/were-now-posting-a-staggering-18-billion-photos-to-social-media-every-day-2014-5 (2014).

Meeker, M. Internet Trends 2019 . Technical report (Bond Capital, 2019).

Noble, S. U. Algorithms of Oppression (New York Univ. Press, 2018).

Erickson, K., Rodriguez Perez, F. & Rodriguez Perez, J. What is the commons worth? Estimating the value of Wikimedia imagery by observing downstream use. In Proc. 14th International Symposium on Open Collaboration , 1–6 (ACM, 2018).

Li, Y. & Xie, Y. Is a picture worth a thousand words? An empirical study of image content and social media engagement. J. Mark. Res. 57 , 1–19 (2020).

Article   CAS   Google Scholar  

Collier, J. R., Dunaway, J. & Stroud, N. J. Pathways to deeper news engagement: factors influencing click behaviors on news sites. J. Comput.-Mediat. Commun. 26 , 265–283 (2021).

Article   Google Scholar  

Shepard, R. N. Recognition memory for words, sentences, and pictures. J. Verbal Learn. Verbal Behav. 6 , 156–163 (1967).

Hockley, W. E. The picture superiority effect in associative recognition. Mem. Cognit. 36 , 1351–1359 (2008).

Article   PubMed   Google Scholar  

Kosslyn, S. M. & Moulton, S. T. in Handbook of Imagination and Mental Simulation (eds Markman, K. D. & Klein, W. M. P.) 35–51 (Psychology Press, 2012).

Bergen, B. K. Louder Than Words: The New Science of how the Mind Makes Meaning (Basic Books, 2012).

Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356 , 183–186 (2017).

Article   CAS   PubMed   ADS   Google Scholar  

Bailey, A. H., Williams, A. & Cimpian, A. Based on billions of words on the internet, people = men. Sci. Adv. 8 , eabm2463 (2022).

Charlesworth, T. E. S., Caliskan, A. & Banaji, M. R. Historical representations of social groups across 200 years of word embeddings from google books. Proc. Natl Acad. Sci. 119 , e2121798119 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Munson, S. A., Kay, M. & Matuszek, C. Unequal representation and gender stereotypes in image search results for occupations. In CHI ’15: Proc. 33rd Annual ACM Conference on Human Factors in Computing Systems 3819–3828 (ACM, 2015).

Metaxa, D., Gan, M. A., Goh, S., Hancock, J. & Landay, J. A. An image of society: gender and racial representation and impact in image search results for occupations. Proc. ACM Hum. Comput. Interact. 5 , 1–23 (2021).

Vlasceanu, M. & Amodio, D. M. Propagation of societal gender inequality by internet search algorithms. Proc. Natl Acad. Sci. USA 119 , e2204529119 (2022).

Stauffer, J., Trodd, Z., Bernier, C.-M., Gates, H. L. Jr & Morris, K. B. Jr. Picturing Frederick Douglass: An Illustrated Biography of the Nineteenth Century’s Most Photographed American 1st edn, 361–362 (Liveright, 2015).

Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115 , E3635–E3644 (2018).

Article   CAS   PubMed   PubMed Central   ADS   Google Scholar  

Jones, J. J., Amin, M. R., Kim, J. & Skiena, S. Stereotypical gender associations in language have decreased over time. Sociol. Sci. 7 , 1–35 (2020).

Kozlowski, A. C., Taddy, M. & Evans, J. A. The geometry of culture: analyzing the meanings of class through word embeddings. Am. Sociol. Rev. 84 , 905–949 (2019).

Kensinger, E. A. & Schacter, D. L. Processing emotional pictures and words: effects of valence and arousal. Cogn. Affect. Behav. Neurosci. 6 , 110–126 (2006).

Barsalou, L. W. Grounded cognition. Annu. Rev. Psychol. 59 , 617–645 (2008).

Fernandino, L., Tong, J.-Q., Conant, L. L., Humphries, C. J. & Binder, J. R. Decoding the information structure underlying the neural representation of concepts. Proc. Natl Acad. Sci. 119 , e2108091119 (2022).

Pennington, J., Socher, R. & Manning, C. D. GloVe: global vectors for word representation. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moschitti, A. et al.) 1532–1543 (Association for Computational Linguistics, 2014).

Srinivasan, K., Raman, K., Chen, J., Bendersky, M. & Najork, M. WIT: Wikipedia-based image text dataset for multimodal multilingual machine learning. In Proc. 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2443–2449 (ACM, 2021).

Rothe, R., Timofte, R. & Gool, L. V. Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vision 126 , 144–157 (2018).

Article   MathSciNet   Google Scholar  

Devitt, A. & Vogel, C. The topology of WordNet: some metrics. In Proc. 2nd International Wordnet Conference, GWC 2004 (eds Sojka, P. et al.) 106–111 (Masaryk University, 2004).

Jing, Y. & Baluja, S. Pagerank for product image search. In Proc. 17th International Conference on World Wide Web 307–316 (ACM, 2008).

Grand, G., Blank, I. A., Pereira, F. & Fedorenko, E. Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nat. Hum. Behav. 6 , 975–987 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Lilleberg, J., Zhu, Y. & Zhang, Y. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC) 136–140 (IEEE, 2015).

Carpenter, T. P. et al. Survey-software implicit association tests: a methodological and empirical analysis. Behav. Res. Methods 51 , 2194–2208 (2019).

Charlesworth, T. E. S. & Banaji, M. R. Patterns of implicit and explicit stereotypes iii: longterm change in gender stereotypes. Soc. Psychol. Pers. Sci. 13 , 14–26 (2022).

Smeding, A. Women in science, technology, engineering, and mathematics (stem): an investigation of their implicit gender stereotypes and stereotypes’ connectedness to math performance. Sex Roles 67 , 617–629 (2012).

Nosek, B. A. et al. National differences in gender – science stereotypes predict national sex differences in science and math achievement. Proc. Natl Acad. Sci. USA 106 , 10593–10597 (2009).

Miller, D. I., Eagly, A. H. & Linn, M. C. Women’s representation in science predicts national gender-science stereotypes: evidence from 66 nations. J. Educ. Psychol. 107 , 631 (2015).

Rezaei, A. R. Validity and reliability of the iat: measuring gender and ethnic stereotypes. Comput. Hum. Behav. 27 , 1937–1941 (2011).

Gawronski, B., Ledgerwood, A. & Eastwick, P. W. Implicit bias ≠ bias on implicit measures. Psychol. Inq. 33 , 139–155 (2022).

Melnikoff, D. E. & Kurdi, B. What implicit measures of bias can do. Psychol. Inq. 33 , 185–192 (2022).

Croft, A., Schmader, T. & Block, K. An underexamined inequality: cultural and psychological barriers to men’s engagement with communal roles. Pers. Soc. Psychol. Rev. 19 , 343–370 (2015).

Block, K., Croft, A., De Souza, L. & Schmader, T. Do people care if men don’t care about caring? The asymmetry in support for changing gender roles. J. Exp. Soc. Psychol. 83 , 112–131 (2019).

Visual elements gallery of Google search. Google https://developers.google.com/search/docs/appearance/visual-elements-gallery#text-result-image (last accessed 10 September 2023).

Bianchi, F. et al. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency 1493–1504 (ACM, 2023).

Wolfe, R., Yang, Y., Howe, B. & Caliskan, A. Contrastive language-vision ai models pretrained on web-scraped multimodal data exhibit sexual objectification bias. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency 1174–1185 (ACM, 2023).

Winkielman, P., Halberstadt, J., Fazendeiro, T. & Catty, S. Prototypes are attractive because they are easy on the mind. Psychol. Sci. 17 , 799–806 (2006).

Kovacs, B. et al. Concepts and Categories: Foundations for Sociological and Cultural Analysis (Columbia Univ. Press, 2019).

DiMaggio, P. Culture and cognition. Ann. Rev. Sociol. 23 , 263–287 (1997).

McNeill, D. How Language Began: Gesture and Speech in Human Evolution (Cambridge Univ. Press, 2012).

Download references

Acknowledgements

We acknowledge members of the MRL (Macro Research Lunch) seminar at the Haas School of Business and the Computational Culture lab, as well as Nina Guilbeault, Nicholas Guilbeault, P. Reginato and R. Lo Sardo for helpful feedback on this project. We thank S. Nanniyur for assistance with data collection. This project was funded by grants from the Fisher Center for Business Analytics; the Center for Equity, Gender and Leadership; and the Barbara and Gerson Bakar Fellowship, each awarded to D.G., and by a grant from the Cora Jane Flood Endowment Fund awarded to S.D., all through the University of California, Berkeley.

Author information

Authors and affiliations.

Haas School of Business, University of California, Berkeley, Berkeley, CA, USA

Douglas Guilbeault & Solène Delecourt

Psiphon Inc., Toronto, Ontario, Canada

Tasker Hull

Institute For Public Policy Research, London, UK

Bhargav Srinivasa Desikan

School of the Arts, Columbia University, New York, NY, USA

Department of Physics, University of Southern California, Los Angeles, CA, USA

Ethan Nadler

You can also search for this author in PubMed   Google Scholar

Contributions

D.G. designed the project. D.G., E.N. and B.S.D. analysed the data. D.G. and S.D. wrote the manuscript. D.G., S.D., T.H., B.S.D., E.N. and M.C. collected the data. D.G., T.H., E.N. and B.S.D. developed the algorithmic methods.

Corresponding author

Correspondence to Douglas Guilbeault .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Yoav Bar-Anan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 replication using text and image data from wikipedia..

(A) The absolute strength of gender associations with each category according to textual and visual data from Wikipedia, shown for pre-trained word embedding models of the same Wikipedia dataset trained with different dimensionality 26 . We restrict our analysis to only those categories that (i) were associated with at least 10 faces in the English Wikipedia Image Text (WIT) dataset 27 , and (ii) were also present in pre-trained word embedding models of Wikipedia text data 26 . This yielded 495 categories. Results equally replicate if we examine all 1,244 categories that could be matched across these image and text sources (see section A.1.1 in the supplementary appendix). The image-based measure captures the frequency of male and female faces associated with Wikipedia articles on each category (−1 means 100% female; 1 means 100% male); the text-based measure captures the frequency at which each category is associated with male or female terms in Wikipedia articles (−1 means 100% female; 1 means 100% male associations). Panel A shows that the absolute strength of gender associations is significantly higher in images ( µ  = 0.33), as compared to word embedding models of Wikipedia, regardless of their dimensionality ( p  < 0.0001, Wilcoxon signed-rank test, two-tailed; all word embedding models exhibit an average strength of gender association below 0.1). Panel B shows that 80% of categories are male-skewed according to Wikipedia images ( p  < 0.0001, proportion test, n  = 495, two-tailed), whereas word embedding models of Wikipedia with different dimensionality show, respectively, 57% (50D), 59% (100D), 57.6% (200D), and 54% (300D) of categories as male, all of which present a significantly weaker male-skew than Wikipedia images at the p  < 0.0001 level (proportion test, two-tailed). Error bars show 95% confidence intervals calculating using a single-sample proportion test.

Extended Data Fig. 2 Replication using Wikipedia and IMDb images depicting celebrities, where each image is associated with the self-identified gender of the person depicted.

Comparing the gender associations with the social category “celebrity” across four datasets: the Rothe et al. 28 IMDb-Wikipedia Face Image dataset containing 28 (for which the self-reported gender of each face is known), the 2019 census, the FastText word embedding model of gender associations in Wikipedia, and the GloVe word embedding model of gender associations in Wikipedia (see section A.1.2 of the supplementary appendix for details on this analysis). The Rothe et al. 28 dataset contains 511,946 images, where 452,261 IMDb images depict 19,091 celebrities, and 59,685 Wikipedia images depict 58,904 celebrities. Panel A displays the gender associations of “celebrity” across these datasets (−1 means 100% female associations; 1 means 100% male associations; 0 means equally male and female associations). The gender association of “celebrity” is −0.05 according to the FastText model and −0.08 according to the GloVe model (both weakly female-skewed). Meanwhile, the census indicates that 49% of celebrities are women, resulting in a gender association of 0.02 that fails to be significantly skewed toward a particular gender ( p  = 0.54, Student’s t -test, two-sample, two-tailed). By contrast, the gender association of “celebrity” is 0.57 (0.16) according to Wikipedia (IMDb) images, marking a strong male-skew. Panel B shows the fraction of male faces identified in the IMDb-Wikipedia Face Image dataset, shown separately for Wikipedia and IMDb. 79% (58%) of celebrities depicted over Wikipedia (IMDb) are male, exhibiting a strong male bias in both sources ( p  = 2.2 × 10 −16 , Proportion test, two-sample, two-tailed, for both sources). Bars show the proportion of male faces, and error bars show 95% confidence intervals calculated with a single-sample proportion test (two-tailed).

Extended Data Fig. 3 A schematic diagram of our main data collection methodology.

(A) First, we identify all social categories in the lexical ontology Wordnet; (B) second, we use each social category from Wordnet as a search term and automatically collect the top 100 images associated with each search term in Google Images; then (C) we use the OpenCV machine learning application to automatically crop the faces from each of the images collected from Google Images; and finally (D) we automatically upload each extracted face to Amazon’s Mechanical Turk where it is classified by three unique human coders. The gender classification of each face is identified as the modal (majority) classification across these three coders. If a coder labeled the gender of a face as non-binary, we remove this classification and hire an additional coder to provide a classification to disambiguate. This allowed all faces in our dataset to be assigned a binary gender classification as “female” or “male”.

Extended Data Fig. 4 Replication comparing gender associations in Google Images to those in a word2vec model trained on a recent sample of online news, published between 2021 and 2023.

This recent sample of online news consists of 2,717,000 randomly sampled news articles published in English across various topics between January 2021 and August 2023. These articles were scraped from the following sources: 1,000,000 articles from the BBC; 500,000 from the Huffington Post; 480,000 from CNBC; 400,000 from Bloomberg; 160,000 articles Time Magazine; 150,000 from Techcrunch; and 27,000 from CNN. (A) Displaying the strong, positive correlation between the gender associations of all social categories in WordNet according to the 2013 Google News word2vec model and our word2vec model retrained using this recent online news data ( r  = 0.79, p  = 2.2 × 10 −16 , Pearson correlation, two-tailed). The trend line reflects the linear correlation between these models’ gender associations, with error bands displaying 95% confidence intervals. Each data point corresponds to a distinct category. 2,992 social categories could be matched across these models. (B) Displaying the absolute strength of gender associations across these same social categories according to each word2vec model as compared to our sample of Google Images collected in 2021. There is no significant difference in the strength of gender associations between the 2013 word2vec model ( µ  = 0.22) and our 2023 retrained word2vec model ( µ  = 0.22) ( p  = 0.14, Student’s t -test, two-tailed). However, the strength of gender associations in the Google Image data ( µ  = 0.39) is significantly higher than that of both word2vec models at the p  < 0.0001 level (Student’s t -test, two-tailed). See section A.1.8 of the supplementary appendix for further details on this analysis.

Extended Data Fig. 5 Robustness of results to examining only categories that have consistent gender associations across datasets.

This figure shows the strength of gender association across modalities for categories consistently identified as either female- or male-skewed by images from Google images, texts from Google News, and crowdsourced human judgments ( n  = 1,472 categories). See “Collecting Human Judgments of Social Categories” in the Methods section for details; this analysis leverages the average gender rating for each category averaged across 20 unique human coders (2,500 coders in total). 1,281 categories were consistently identified as male-skewed and 191 categories were consistently identified as female-skewed across these sources. The female (male) categories shown along the vertical axis are those that were associated with women (men) in images, text, and human judgments. The horizontal axis displays the gender associations for the same female (male) skewed categories according to images from Google Images and texts from Google News. Box plots show the interquartile range (IQR) +/− 1.5 X IQR. The strength of gender bias is significantly higher in images than text for female-skewed ( p  = 0.005, t = 2.84, MD = 0.05) and male-skewed ( p  = 2.2 × 10 −16 , t = 27.93, MD = 0.22) categories (Student’s t -test, two-sample, two-tailed, paired at the category level). **, p  < 0.01; ***, p  < 0.001.

Extended Data Fig. 6 Schematic representation of the experimental design.

A nationally representative US sample of participants ( n  = 600) were randomized into one of four conditions: (i) the Google image condition, (ii) the Google News text condition, (iii) the Google Neutral text condition, and (iv) the control condition (in which they were asked at random to use either Google Images or the main Google search engine to retrieve either visual or textual descriptions of random, non-gendered categories, such as guitar or apple ). After uploading a description for a given occupation, participants used a −1 (female) to 1 (male) scale to indicate which gender they most associate with this occupation. In the control condition, participants were asked to indicate which gender they associate with a randomly selected occupation after uploading a description for an unrelated category. Participants in all conditions completed this sequence for 22 unique occupations (randomly sampled from a broader set of 54 occupations). Once each participant completed this task for 22 occupations, they were asked to complete an Implicit Association test (IAT) designed to measure the implicit bias toward associating men with science and women with liberal arts.

Extended Data Fig. 7 Implicit Association Test (IAT) block 1 (A) and 2 (B).

Left panels indicate instructions. Right panels present an example of a word that participants need to assigned to the correct side of the screen. There were seven blocks in total. The instructions displayed are also written in text in the “Measuring Implicit Bias using the IAT” of the Methods.

Extended Data Fig. 8 Implicit Association Test (IAT) block 3 (A) and 4 (B).

Extended data fig. 9 demonstrating that gender associations in our observational data strongly predict gender associations in our experimental data..

The relationship between the gender associations in our observational sample of the top 100 Google images per occupation and (A) the gender associations in the Google images uploaded by participants in our search experiment (***, p  = 1.8 × 10 −10 ; r  = 0.73, Pearson correlation, two-tailed), as well as (B) participants’ self-reported gender associations for each occupation, which they provided after uploading an image from Google (***, p  = 1.5 × 10 −11 ; r  = 0.76, Pearson correlation, two-tailed). The relationship between the gender associations in our observational word embedding measures of Google News and (C) the gender associations in the textual descriptions from Google News uploaded by participants (***, p  = 2.5 × 10 −5 ; r  = 0.54, Pearson correlation, two-tailed), as well as (D) participants’ self-reported gender associations for each occupation, which they provided after uploading a textual description from Google News (***, p  = 7.6 × 10 −13 ; r  = 0.8, Pearson correlation, two-tailed). All results in all panels are averaged at the occupation level, such that 54 data points (occupations) are shown. In all panels: data points show mean values; lines show a standard OLS model fit to the scattered points using only the variables along the vertical and horizontal axis; error bands show 95% confidence intervals. ***, p  < 0.001 (Pearson correlation, two-tailed, for all panels). See section A.2.4 of the supplementary appendix for further details on this analysis.

Extended Data Fig. 10 Gendered images prime gender bias more strongly than gendered texts, holding constant the occupation being described by these images and texts.

Figure displays the average absolute strength of the gender associations that participants reported for each occupation in each condition, while restricting this analysis to only those descriptions that were explicitly gendered as either male or female in the text and image condition. The green (purple) vertical lines indicate average effects for the text (image) condition. n = 2,775 image descriptions; n = 706 text descriptions. Participants in the image condition exhibited significantly stronger biases in the gender associations they reported for occupations ( p  = 5.09 × 10 −6 , t  = 4.58, MD = 0.06, Student’s t -test, two-sample, two-tailed), even when participants’ in the image and text condition both uploaded a gendered description of the same occupation; this result holds when using a linear regression to control for the specific gender and the specific occupation associated with the uploaded description ( β  = 0.05, SE = 0.01, p  = 2.08 × 10 −5 ). These findings indicate that even when gender is salient in both text and image, exposure to images leads to stronger biases in people’s beliefs. See section A.2.6 of the supplementary appendix for further details on this analysis.

Supplementary information

Supplementary information, reporting summary, peer review file, source data, source data fig. 1, source data fig. 2, source data fig. 3, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Guilbeault, D., Delecourt, S., Hull, T. et al. Online images amplify gender bias. Nature (2024). https://doi.org/10.1038/s41586-024-07068-x

Download citation

Received : 26 October 2021

Accepted : 14 January 2024

Published : 14 February 2024

DOI : https://doi.org/10.1038/s41586-024-07068-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research articles with bias

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • BMC Psychol

Logo of bmcpsychol

Interventions designed to reduce implicit prejudices and implicit stereotypes in real world contexts: a systematic review

Chloë fitzgerald.

1 iEH2 (Institute for Ethics, History and the Humanities), Faculty of Medicine, University of Geneva, Geneva, Switzerland

Angela Martin

2 Department of Philosophy, University of Fribourg, Fribourg, Switzerland

Delphine Berner

Samia hurst, associated data.

Our full search strategies for each database is available in Additional file 1 so that the search can be accurately reproduced.

Implicit biases are present in the general population and among professionals in various domains, where they can lead to discrimination. Many interventions are used to reduce implicit bias. However, uncertainties remain as to their effectiveness.

We conducted a systematic review by searching ERIC, PUBMED and PSYCHINFO for peer-reviewed studies conducted on adults between May 2005 and April 2015, testing interventions designed to reduce implicit bias, with results measured using the Implicit Association Test (IAT) or sufficiently similar methods.

30 articles were identified as eligible. Some techniques, such as engaging with others’ perspective, appear unfruitful, at least in short term implicit bias reduction, while other techniques, such as exposure to counterstereotypical exemplars, are more promising. Robust data is lacking for many of these interventions.

Conclusions

Caution is thus advised when it comes to programs aiming at reducing biases. This does not weaken the case for implementing widespread structural and institutional changes that are multiply justified.

Electronic supplementary material

The online version of this article (10.1186/s40359-019-0299-7) contains supplementary material, which is available to authorized users.

A standard description of implicit biases is that they are unconscious and/or automatic mental associations made between the members of a social group (or individuals who share a particular characteristic) and one or more attributes (implicit stereotype) or a negative evaluation (implicit prejudice). Implicit prejudices are distinguished from implicit stereotypes in psychology: an implicit prejudice is supposedly a ‘hotter’ generic positive or negative feeling associated with a category, e.g. pleasant/white; an implicit stereotype involves a more belief-like association between a concept that is still valenced, but has fuller descriptive content, and a category, e.g. mentally agile/white. Although the distinction between implicit stereotypes and implicit prejudices is not as clear or necessarily as useful as much of the psychological literature assumes [ 1 ], it is important to track the distinction when analysing empirical findings because it can affect the results substantially. For example, Sabin and colleagues found that paediatricians demonstrated a weak implicit anti-black race prejudice (Cohen’s d = 0.41), but a moderate effect of implicit stereotyping, in which a white patient was more likely associated with medical compliance than a black patient (Cohen’s d = 0.60) [ 2 ].

The term implicit bias is typically used to refer to both implicit stereotypes and implicit prejudices and aims to capture what is most troubling for professionals: the possibility of biased judgement and of the resulting biased behaviour. Psychologists often define bias broadly; for instance, as ‘the negative evaluation of one group and its members relative to another’ [ 3 ]. However, on an alternative definition of bias, not all negative evaluations of groups would count as implicit biases because they are not troubling for our equity concerns. For instance, I might have a negative feeling associated with fans of heavy metal music – a negative implicit prejudice towards them. However, the fans of heavy metal music, as far as we are aware, are not a disadvantaged group, thus this implicit prejudice would not count as an implicit bias on this alternative definition. We thus stipulate that an implicit association (prejudice or stereotype) counts as implicit bias for our purposes only when it is likely to have a negative impact on an already disadvantaged group; e.g. if someone has an implicit stereotype associating young girls with dolls and caring behaviour, this would count as an implicit bias. It does not fit the psychologists’ definition above because it is not a negative evaluation per se, but it is an association that creates a certain image of girls and femininity that can prevent them from excelling in areas that are traditionally considered ‘masculine’ such as mathematics [ 4 ], and in which they already suffer discrimination. An example of an implicit prejudice that counts as a bias on our definition would be an association between negative feelings and homosexual couples - a negative implicit prejudice. This could disadvantage a group that already suffers discrimination and it thus qualifies as an implicit bias.

There has been much recent interest in studying the effects of implicit bias have on behaviour, particularly when that may lead to discrimination in significant areas of life, such as health care, law enforcement, employment, criminal justice, and education. Differing outcomes correlated with race, gender, sexual orientation, nationality, socio-economic status, or age, in these areas are likely to be partly the result of implicit biases, rather than or in addition to explicit prejudice or stereotyping. Given this fact, society has an interest in finding ways to reduce levels of implicit biases among the general population and among professionals who work in these areas in particular.

There is currently a growing awareness of implicit biases, particularly in the English-speaking world, and increasing attempts to counter them in professional settings. However, we found a lack of systematic evaluation of the evidence for the effectiveness of different interventions to reduce implicit bias.

In contrast to the recent study conducted by Forscher et al. [ 5 ], which used a technique new to psychology called network meta-analysis, and examined the effectiveness of procedures to change implicit bias, our focus was solely on the reduction of implicit social prejudice and implicit stereotypes, and only on those interventions that would be applicable in real world contexts and that were tested using the most widely employed implicit measure, the Implicit Association Test (IAT) and similar measures. Forscher et al.’s scope was wider because they investigated all changes in implicit biases of all kinds, admitted studies employing a variety of implicit measures, and did not restrict types of intervention.

Despite an unclear evidence base for their usefulness, interventions and training sessions to reduce implicit bias are being offered in the English-speaking world. Our review was partly prompted by this fact. Interventions that are not designed based on empirical evidence have the potential to do more harm than good. For instance, when people are told to avoid implicit stereotyping it can actually increase their biases [ 6 , 7 ]. Ineffective training sessions may give participants and companies false confidence when in fact the training has had no ameliorative effect. False confidence in this area is particularly problematic because there is evidence that being asked to reflect on instances where one has behaved in an unbiased manner actually increases implicit bias, while reflecting on presumed failures to be unbiased reduces it [ 8 ].

We conducted a systematic review of studies measuring the effects of interventions to reduce implicit biases in adults as measured by the IAT. Interventions had to be fairly easily applicable to real life scenarios, such as workplace or healthcare settings. We concentrated solely on implicit biases because interventions that target explicit biases may leave implicit prejudices and stereotypes intact. Given the wide variety of interventions tested using different methods, a systematic review was more apt than a meta-analysis. This variety in the literature is what prompted Forscher et al. to use a novel form of meta-analysis, called ‘network meta-analysis’, which had never previously been used in psychology.

To this date, the most broadly recognized measure of implicit biases is the IAT. The IAT is usually administered as a computerized task where participants must categorize negatively and positively valenced words together with either images or words, e.g. white faces and black faces for a Race IAT. The tests must be performed as quickly as possible. The relative speed of association of black faces with positively-valenced words (and white faces and negatively-valenced words) is used as an indication of the level of anti-black bias [ 9 ].

Since its creation, the IAT has been subject to analysis and criticism as a measuring tool in the academic world [ 5 , 10 , 11 ] and, more recently, in the wider media [ 12 , 13 ], where its utility as a predictor of real-world behaviour is questioned. Some valid criticisms of the IAT are against unwise uses of it or against interpretations of results obtained with it, rather than against the measure itself. Caution about how to use and interpret the IAT has been advised by its own creators, such as Brian Nosek, who in 2012 warned against using it as a tool to predict individual behaviour, for example [ 14 ]. The fact that it is does not have a high test-retest reliability in the same individual is widely known among researchers who use it. For that reason, it is not useful as a tool to label individuals e.g. as ‘an implicit sexist’ or to predict their individual behaviour. However, the creators of the IAT frequently use it as a tool to compare levels of implicit prejudice/implicit stereotype in different populations and see how this correlates with differences in behaviour [ 15 ].

The results of the IAT are highly context specific, as much research shows [ 16 ]. That does not mean that it has no validity or no connection to behaviour, just that we need more research to better understand exactly what it is measuring and how that relates to behavioural outcomes. Challenges are to be expected when trying to measure a construct that is outside conscious awareness. The connection between all measures of psychological attitudes and behaviour is complex, as is the case with self-report questionnaires, designed to measure explicit attitudes. In fact, implicit attitude tests partly came about as a result of the ineffectiveness of self-report measures to predict behaviour. Even if the most extreme criticisms of the IAT were true and the constructs it measured had very little effect on behaviour, we would expect a virtuous person who finds discrimination based on race abhorrent to be disturbed to discover that she automatically associates a historically oppressed race that still suffers discrimination with negative qualities. Professionals with integrity should thus be concerned to eliminate psychological associations that belie their moral principles.

Our research question was: which interventions have been shown to reduce implicit bias in adults? ERIC, PUBMED, PSYCHINFO were searched for peer reviewed studies published in English between May 2005 and April 2015. Our full search strategies are included in the Additional file  1 .

Study eligibility

Studies were included if they were written in English, participants were either all adults (over 18) or the average age was over 18, and they were published in peer-reviewed journals. We excluded minors because we were interested in interventions that would be applicable in workplaces, thus on adults. The intervention had to be a controlled intentional process conducted with participants in an experimental setting, with the aim of reducing an implicit prejudice or implicit stereotype. We limited our research to social stereotypes and prejudices against people, as opposed to animals, inanimate objects, etc. Prejudices and stereotypes had to involve pre-existing associations thus excluding novel associations. They also had to be against a specific target thus excluding more generalized ‘outgroup prejudice’. An outgroup, in contrast to an ingroup, is any group to which a person does not feel that she belongs, a ‘they’ as opposed to a ‘we’. [ 17 ]

In an optimal experimental design, an implicit pre-test and post-test would be conducted on the same subjects in addition to the inclusion of a control group. However, since this is rarely found in the literature, we included articles where the effect was measured in comparison to a control group with similar characteristics. An advantage of a design using only a control group is that it eliminates any concern about a training effect occurring in participants between performing the IAT pre- and post-test.

The effect of the intervention had to be measured using a valid implicit measure before and after the intervention. In order for results to be comparable, we only included studies employing the most frequently used measure, the IAT, or a measure derived from or conceptually similar to it, such as the SC-IAT (Single Category Implicit Association Test), GNAT (Go/No-go Association Task, BIAT (Brief Implicit Association Test). Paper-based or computer versions of these tests were permitted. The IAT is the most widely used measure, and thus the most criticized and tested measure. We needed to select one implicit measure because different measures, such as affective priming, potentially measure different psychological constructs.

The intervention had to be applicable to real-world contexts and thus of a length and kind that enabled it to be easily implemented in different areas where implicit bias is a potential problem (e.g. medicine, general education, police force, legal professions and judiciary, human resources). The ease of implementation criterion is a matter of judgment, but comparisons can be made with similar types of training, such as sexual harassment training. If the intervention could be adapted to make a programme of similar length to that of current trainings typically provided in these areas, it was deemed suitable. This criterion ruled out observations drawn from natural settings that could potentially be used to develop interventions (e.g. correlations between increased contact with the outgroup and reduced bias). Many articles were excluded on this basis. It also ruled out long-term interventions involving considerable time and emotional commitment from participants. For instance, if an intervention had involved weekly attendance at a course over the course of a year (not simply changes in students’ curricula), we would have excluded it. As it happens, no interventions needed to be excluded for this reason.

We also excluded interventions that were too invasive in a person’s private life or over a person’s bodily autonomy, such as forcing people to make new friends, drink alcohol at work to reduce biases, or direct brain stimulation. There remains a grey zone when it comes to invasiveness that is open to cultural difference (e.g. whether being touched by a person of the outgroup is considered invasive).

The effectiveness of the intervention in reducing levels of implicit bias had to be initially tested within a maximum of one month from the intervention. This did not rule out further testing after this initial test. Since we were interested in interventions that reduce bias, we excluded interventions undertaken with the aim of increasing an implicit prejudice or stereotype.

Study selection

The study selection process is illustrated in Fig.  1 . Three reviewers, Angela Martin (AM), Chloë FitzGerald (CF) and Samia Hurst (SM), reviewed the 1931 titles resulting from the database searches. At least two of the three independently screened each title. Screening involved proposing the rejection of titles if there was a clear indication that the study did not fulfil our inclusion criteria. The titles that were agreed by both reviewers, or in case of uncertainty, by all three reviewers, after discussion to be ineligible according to the inclusion criteria were discarded (1600) and the abstracts of the remaining 331 articles were independently screened by at least two of the three reviewers. Abstracts that were agreed by both reviewers to be ineligible according to the inclusion criteria were discarded (169). When the ineligible abstracts were discarded, the remaining 162 articles were read and independently screened by at least two of the reviewers. After discussion, their decision on whether the article should be included was recorded and reviewed by the third reviewer who had not initially screened the article. SH reviewed the statistical analyses in the remaining 32 studies, which resulted in 2 articles being discarded due to lack of information about the statistical methods used. The final number of eligible articles was 30. However, one of the included articles [ 18 ] was in fact a competition organized to test different interventions created by different authors and thus involved 18 different interventions tested several times. 1

An external file that holds a picture, illustration, etc.
Object name is 40359_2019_299_Fig1_HTML.jpg

Data collection process

We based our inclusion criteria on the published results. If the data and methods used to calculate the results were not available in the article, we did not attempt to contact the authors to obtain this information. CF and AM independently extracted the data from the articles and each reviewed the other’s data when extraction was complete. All disagreements with the information extracted were resolved through discussion.

Identified studies

As shown in Table  1 , there are a total of 30 eligible articles. We have included the 18 interventions designed by different authors as part of a competition, all described in a single article [ 18 ], as separate entries to aid comprehension of the table, thus making a total of 47 different interventions tested. When there are slightly different eligible studies within one article, they are listed separately in the table only when the modifications produced a result that was different from the original study (in terms of being effective or ineffective at reducing bias).

Articles included in systematic review

Titles in bold are interventions from the competition article [ 18 ]

We divided the interventions into 8 categories based on their psychological features. We used as our starting point modified versions of the 6 categories that had been created by the authors of the competition article of 17 interventions [ 18 ] and added two new categories. There are many different ways in which interventions can potentially be classified and we chose to base our categories on the ones already used in the competition article to facilitate discussion within the discipline. These categories are neither exhaustive nor completely exclusive. Our categories of intervention are:

  • Engaging with others’ perspective, consciousness-raising or imagining contact with outgroup – participants either imagine how the outgroup thinks and feels, are made aware of the way the outgroup is marginalised or given new information about the outgroup, or imagine having contact with the outgroup.
  • Identifying the self with the outgroup – participants perform tasks that lessen barriers between themselves and the outgroup.
  • Exposure to counterstereotypical exemplars – participants are exposed to exemplars that contradict the stereotype of the outgroup.
  • Appeals to egalitarian values – participants are encouraged to activate egalitarian goals or think about multiculturalism, co-operation or tolerance.
  • Evaluative conditioning – participants perform tasks to strengthen counterstereotypical associations.
  • Inducing emotion –emotions or moods are induced in participants
  • Intentional strategies to overcome biases – participants are instructed to implement strategies to override or suppress their biases.
  • Drugs – participants take a drug.

Effective interventions were those that showed a reduction in bias in the same individuals after the intervention in a pre−/post-test design, or in the group who underwent the intervention in a control group design. According to our criteria, the post-test had to be completed within a maximum of 1 month from the original intervention, but this did not rule out further tests at later dates.

The most effective categories were: intentional strategies to overcome biases (all 3 interventions were effective); exposure to counterstereotypical exemplars (7 out of 8 interventions had at least one effective instance); identifying the self with the outgroup (6 interventions out of 7 had at least one effective instance); evaluative conditioning (5 out of 5 interventions had at least one effective instance); and inducing emotion (3 out of 4 interventions were effective). The sole study in our drugs category was effective. The appeals to egalitarian values category had 4 interventions that were effective and 4 that were not. The largest category was engaging with others’ perspective, with 11 interventions, but a mere 4 of these were effective.

The number of studies in each category is small, thus strong conclusions cannot be drawn from these results. Patterns indicating clearly which methods were more successful as interventions were not visible. There is an indication that some directions may prove unfruitful, at least in short term bias reduction, such as engaging with others’ perspective, while exposure to counterstereotypical exemplars seems to be the most promising form of intervention, at least in the short term.

The country where studies were conducted was overwhelming the United States – US - (35 interventions), which explains why black/white race was the most examined bias in our review (34 interventions). There were 3 interventions aimed at Middle-Eastern/white bias and one each targeting Latino/white, Arab-Muslim/black and Asian/Anglo bias. Aside from race bias, 3 interventions were tested on weight bias, 2 on sexuality bias, 2 on religion bias, 1 on age bias and 1 on gender bias. 4 interventions were conducted in the United Kingdom (UK), 2 in Australia, 1 in Spain, 1 in the Netherlands, and 4 interventions were conducted in several different countries (including Belgium, Taiwan, Hungary, Italy, Pakistan and New Zealand). There was no clear pattern concerning whether some types of bias were more susceptible to interventions than others, given that the vast majority of articles in our review investigated black/white racial bias.

A majority of studies looked at implicit prejudice. However, 5 articles looked at implicit stereotypes as well as implicit prejudices in their interventions and 3 articles looked only at implicit stereotypes. Of these, only 3 interventions were effective at reducing stereotyping. The stereotypes investigated were the following: fat/lazy versus thin/motivated (3 articles); Dutch/high status versus ethnic minority/low status; Dutch/leader versus ethnic minority/leader (SC-IAT); men/leader versus women/supporter; men/science versus women/humanities; Spanish/active versus Moroccan/restful; white/mental versus black/physical.

Limitations

Of specific studies.

Although we judged all the studies in our review of sufficient quality to be included, the quality of the study design and statistical analysis employed varied greatly. One recurrent problem was the fact that there was often a lack of a proper statistical methods section and statistical tests used were instead reported in the results [ 26 , 28 , 38 ], or even in a footnote [ 46 ]. Some studies described their statistical methods only minimally [ 19 , 25 , 29 , 31 – 33 ].

The paucity of empirically demonstrated effective interventions to reduce implicit bias and the pressure towards publishing positive results [ 48 ] is likely to tempt researchers to analyse data in a way that leads to positive results. The lack of statistical description suggests a risk of this.

An intervention tested by one study, rather than reducing implicit bias, actually increased it [ 34 ]. White participants who performed an intervention where they were embodied by a black avatar displayed greater implicit race bias than those who were embodied by a white avatar.

Of the field

Due to the interdisciplinarity of the subject and variety of fields from which articles proceeded (social psychology, medical ethics, health psychology, neuroscience, education, death studies, LGBT studies, gerontology, counselling, mental health, professional ethics, religious studies, disability studies, obesity studies) there was a lack of uniformity in the way that studies were described. In many cases, neither the titles nor the abstracts were very precise. They sometimes omitted to mention whether they tested implicit or explicit attitudes, a crucial piece of information e.g. [ 25 , 41 ]. The distinction between implicit prejudice and implicit stereotype, which is important in the psychological literature, was also often blurred so that stereotype was cited in the title when the method described using an IAT to test implicit prejudice e.g. [ 41 ]. Methods and measures used were frequently omitted from the abstract, requiring the reader to read the article in full to gain this knowledge e.g. [ 31 ].

Many interventions were tested only on undergraduate psychology students, who are unlikely to be representative of the general population [ 49 ].

As is true in many areas, more replication studies are needed to confirm results. For example, two studies in our review tested a similar intervention, involving participants being embodied by a black avatar; while one found that the intervention actually increased implicit racial prejudice [ 34 ], the other found that it reduced it [ 38 ]. There were important differences between these two studies and the latter was not a replication study. All the interventions that are found to be effective in one study need to be replicated to provide confirmation.

There were some problems related to the indexing of articles: the keywords in PSYCHINFO and PUBMED in this field have changed frequently over the last few years because implicit bias is an emerging field of interest and study. Thus, indexing in databases was somewhat inconsistent making it difficult to capture all relevant articles with keywords. The fact that our search terms differed from those used by Forscher et al. [ 5 ], and that these differences were not all accounted for by differences in research question and inclusion criteria, is a sign of the problematic variations in terminology in the field.

The effects of interventions tend to be tested only over the short term. There were no longitudinal studies in our review. Even if short-term changes in biases are efficient, these changes will not be useful at providing practical solutions to discrimination unless they persist in the long term.

There is a risk that the sorts of stereotypes being studied are likely to be those that people are most aware of, and that stereotypes that are equally or more pernicious may be less visible and thus not be tested for. For instance, social class stereotypes can be hard to identify, especially given that they are not always clearly linked to economic status and that they may vary greatly from culture to culture. Furthermore, the sort of intervention tested is likely to be limited in scope to those that people think will be effective. For example, one philosopher has argued that many researchers are biased against certain effective techniques for reducing biases partly because they seem too mechanical [ 50 ]. The fact that such limited results have been found in the search for effective interventions may be caused by biases in researchers’ thinking.

While there are well-establish general publication biases in favour of positive publications, [ 48 ] we did not find this in our study as many published null results.

While several interventions aimed at reducing implicit biases had at least one instance of demonstrated effectiveness, the sample size was small and we were not able to identify reliable interventions for practical use. Thus, currently the evidence does not indicate a clear path to follow in bias reduction. Intentional strategies to overcome biases, evaluative conditioning, identifying the self with the outgroup, and exposure to counterstereotypical exemplars are categories that merit further research. Furthermore, caution is advised, as our review reveals that many interventions are ineffective; their use at present cannot be described as evidence-based.

As the authors of the competition study point out, the interventions that were successful in their competition had some features in common in reducing black/white race bias: the interventions that linked white people with negativity and black people with positivity were more successful than the ones that only linked black people with positivity; interventions where participants were highly involved, which means that they strongly identified with people in the scenarios that were used, were also successful [ 18 ]. Our category of identifying the self with the outgroup, which included several effective studies, includes this feature of high involvement.

There are similarities between our results and those from the recent network meta-analysis on change in implicit bias conducted by Forscher et al.: they found that procedures that associated sets of concepts, invoked goals or motivations, or taxed people’s mental resources produced the largest positive changes in implicit bias [ 5 ]; two of the categories that were most effective in our review, evaluative conditioning and counterstereotypical exemplars, involve associating sets of concepts, and interventions invoking goals or motivations would be included in our intentional strategies category, which also included effective interventions. Any confirmation between our review and that of Forscher et al. is of note, especially given that we used different search terms, research questions, and inclusion criteria. Forscher et al. also found that studies measuring interventions with the IAT rather than other implicit measures tended to produce larger changes in implicit bias. Overall, they found great variance in the effects of the interventions, which supports our conclusion that current interventions are unreliable. We do not yet know why interventions work in some circumstances and not in others and thus more fine-grained research is needed examining which factors cause an intervention to be effective.

So far, there has been very little research examining long-term changes in implicit attitudes and their effects on behaviour; the recent criticisms of the IAT mentioned in our introduction highlight this. Rather than invalidating the measure, they serve to show which directions future research with the IAT should go. In fact, in a follow-up study conducted by the same researchers as the competition study included in our review, interventions that had been demonstrated to be effective immediately were tested after delays of hours and days and none were found to be effective over these extended time periods [ 51 ].

To some extent, the ineffectiveness of interventions after a longer time period is to be expected. Implicit biases have been partly formed through repeated exposure to associations: their very presence hints at their being not only generated but also maintained by culture. Any counter-actions, even if effective immediately, would then themselves be rapidly countered since participants remain part of their culture from which they receive constant inputs. To tackle this, interventions may need to be repeated frequently or somehow be constructed so that they create durable changes in the habits of participants. More in-depth interventions where participants follow a whole course or interact frequently with the outgroup have been successful [ 51 – 53 ].

Unfortunately, this suggests that interventions of the type most desired by institutions to implement in training, i.e. short, one-shot sessions that can be completed and the requisite diversity boxes ticked, may simply be non-existent. If change is really to be produced, a commitment to more in-depth training is necessary.

In conducting the review, we were aware that interventions to reduce implicit biases were not sufficient to reduce prejudice in the public in general and in professionals in different fields on the long-term. These interventions should only form part of a bigger picture that addresses structural issues, social biases and may include more intensive training that aims to change the culture and society outside institutions in addition to within them [ 54 ]. Programmes in education to address the formation of stereotypes from much earlier on would be one way to effect longer term changes. In terms of addressing workplace culture, it may be worth reflecting on how culture changes are effected in institutions in other instances, such as in the case of medical error management in health care establishments. Affirmative action programmes that increase the numbers of women and minorities in leadership positions is one example of a policy with the potential to change the cultural inputs that foment implicit bias within a workplace.

Another approach that could be effective is to focus on reducing the impact of implicit bias on behaviour rather than reducing the bias itself. Organisational policies and procedures that are designed to increase equity will have an impact on all kinds of bias, including implicit bias. For example, collecting data that monitors equity, such as gender pay gaps, and addressing disparities, or reducing discretion in decision-making.

The majority of studies in our review only looked at effects of interventions on implicit prejudice, without investigating related implicit stereotypes. The lack of investigation into implicit stereotypes is troubling. Implicit prejudice is a measure of generic positive or negative implicit feelings, but it is likely that many behaviours that lead to micro-discriminations and inequalities are linked to specific and fine-grained stereotypes. This is particularly the case with gender stereotypes, as bias towards women is not typically linked to a generic negative feeling towards women, but towards women occupying certain roles that are not stereotypically ‘feminine’. For instance, one study found that only the implicit stereotype linking men with high status occupational roles and women with low status occupational roles predicted implicit and explicit prejudice towards women in authority. Other implicit stereotypes, linking women/home and men/career, or women/supportive and men/agential, lacked this predictive effect [ 55 ]. Only 8 of the articles in our review examined implicit stereotypes, but one of these found that an intervention that was effective at reducing implicit black/white race prejudice was not effective at reducing the implicit stereotype black/physical vs. white/mental [ 39 ]. Hence, it is not only important in the case of gender to investigate the effects of interventions on stereotypes as well as prejudice. The vast majority of studies on race prejudice seem to assume that it is the blanket positive/negative comparison of whites/blacks that needs to be addressed, but it could be the case that interventions will be more effective if they tackle more specific stereotypes.

A possible limitation of the review is that we included interventions that targeted different outgroups, and one may wonder whether interventions tested on one group are really applicable/effective to biases towards other groups. Indeed, if intervention X reduces the bias in group Y, it is by no means certain that same intervention is efficient to reduce bias against group Z. Implicit bias may well be a heterogeneous phenomenon [ 56 ]. On the other hand, an inefficient intervention X on group P may be efficient if tested for some other group or bias. Nonetheless, it is interesting to compare the types of intervention that are tested on different biases and to collect the evidence for interventions against different biases in one place. Often, researchers in a field interested in a particular bias, such as health professionals researching obesity, limit themselves to reading the literature on that bias and from their specific field and thus may overlook much evidence that could be relevant to their research. Furthermore, it may be that different biases require different types of intervention, but this can only be seen clearly if the different literatures are compared.

Current data do not allow the identification of reliably effective interventions to reduce implicit biases. As our systematic review reveals, many interventions have no effect, or may even increase implicit biases. Caution is thus advised when it comes to programs aiming at reducing biases. Much more investigation into the long term effects of possible interventions is needed. The most problematic fine-grained implicit stereotypes need to be identified and a range of specifically-tailored interventions need to be designed to combat the whole gamut of prejudices that are problematic in our societies, not only targeting black/white race prejudice. More research needs to be conducted examining the conditions under which interventions will work and the factors that make them fail.

The fact that there is scarce evidence for particular bias-reducing techniques does not weaken the case for implementing widespread structural and institutional changes that are likely to reduce implicit biases, but that are justified for multiple reasons.

Our advice for future studies in this area can be summarized as follows:

  • Investigate the effect of interventions on implicit stereotypes as well as implicit prejudices
  • Use large sample sizes
  • Pre-register study designs
  • Use key words and titles that will span disciplines
  • Include all relevant study parameters in the title and abstract
  • Include all statistical analyses and data when publishing
  • Include all the details of the study method
  • Investigate the long term effects of interventions
  • Investigate the effects of institutional/organizational changes on implicit biases
  • Test interventions on a wide range of real workforces outside universities

Additional file

Full search strategies. (DOCX 15 kb)

Acknowledgments

We are very grateful to Tobias Brosch for his advice in the planning stage of the review and to Janice Sabin and Jules Holroyd for extremely helpful comments on the manuscript, particularly their suggestions about the importance of focussing on organisational policy to promote equity. We would also like to thank the librarians from the University of Geneva Medical School library and the Psychology section of the Humanities library for their kind help with the initial keyword searches.

The systematic review was funded by a grant from the Swiss National Science Foundation, number 32003B_149407. The funding body approved the proposal for the systematic review as part of a larger project. After approval, they were not involved in the design of the study, nor the collection, analysis and interpretation of data, nor in writing the manuscript.

Availability of data and materials

Abbreviations, authors contributions.

AM initially researched the suitable databases, performed the searches and organized the reviewing of the titles with supervision from CF and SH. AM, CF and SH reviewed the titles as described in the Methods section and SH reviewed the statistical sections. Data was extracted by AM and CF and Table ​ Table1 1 was drafted from this information by DB. DB contributed to the selection of categories of intervention and prompted further discussion regarding the presentation and organization of data. CF drafted the manuscript with major contributions from AM and input from SH. All authors read and approved the final manuscript.

Ethics approval and consent to participate

Not applicable

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 The title of the study lists 17 interventions, but the authors included a comparison condition, which makes a total of 18 interventions tested for our purposes.

Contributor Information

Chloë FitzGerald, Email: moc.liamg@dlaregztifsnc .

Angela Martin, Email: [email protected] .

Delphine Berner, Email: [email protected] .

Samia Hurst, Email: [email protected] .

FOX News

Harvard professor says ‘all hell broke loose’ when his study found no racial bias in police shootings

A Harvard professor said that "all hell broke loose" and he was forced to go out in public with armed security after he published a study that found no evidence of racial bias in police shootings.  

During a sit-down conversation with Bari Weiss of The Free Press, Harvard Economics Professor Roland Fryer discussed the fallout from a 2016 study he published on racial bias in Houston policing.

The study found that police were more than twice as likely to manhandle, beat or use some other kind of nonfatal force against blacks and Hispanics than against people of other races. However, the data also determined that officers were 23.8 percent less likely to shoot at blacks and 8.5 percent less likely to shoot at Hispanics than they were to shoot at whites.

When Fryer claimed the data showed "no racial differences in officer-involved shootings," he said, "all hell broke loose," and his life was upended.

HOUSE REPUBLICAN SUBPOENAS HARVARD LEADERS FOR 'FAILING TO PRODUCE' SUFFICIENT DOCS IN ANTISEMITISM PROBE

Fryer received the first of many complaints and threats four minutes after publication.

READ ON THE FOX NEWS APP

"You're full of s—t," the sender said.

Fryer said people quickly "lost their minds" and some of his colleagues refused to believe the results after months of asking him not to print the data.

"I had colleagues take me to the side and say, 'Don't publish this. You'll ruin your career,'" Fryer revealed.

The world-renowned economist knew from comments by faculty that he was likely to garner backlash. Fryer admitted that he anticipated the results of the study would be different and would confirm suspicions of racial bias against minorities. When the results found no racial bias, Fryer hired eight new assistants and redid the study. The data came back the same.

After the report was published, Fryer lived under police protection for over a month. He had a seven-day-old daughter at the time and went shopping for diapers.

FETTERMAN SLAMS HARVARD FOR HOSTING PALESTINIAN PROFESSOR WHO BLAMED ISRAEL FOR OCT. 7 HAMAS ATTACKS

"I was going to the grocery store to get diapers with the armed guard. It was crazy. It was really, truly crazy," he said.

Fryer, who became the youngest tenured Black professor at Harvard at age 30, was suspended for two years from the university in 2019 after he allegedly engaged in "unwelcome conduct of a sexual nature. He continues to deny the allegations.

At the time, then-Harvard dean Claudine Gay claimed Fryer's research and conduct with other employees "exhibited a pattern of behavior" that failed to meet expectations within the community.

CLICK TO GET THE FOX NEWS APP

"The totality of these behaviors is a clear violation of institutional norms and a betrayal of the trust," she said.

Gay resigned from her position as Harvard president in early January after widespread plagiarism allegations and criticism of her testimony to Congress, where she failed to fully clarify whether calling for the genocide of Jews violates Harvard's policies against bullying and harassment.

Weiss, referencing Gay in her conversation with Fryer, asked him if he believes in karma.

"I hear it's a motherf---er," he replied.

Harvard did not return Fox News Digital's request for comment, 

Original article source: Harvard professor says ‘all hell broke loose’ when his study found no racial bias in police shootings

Professor of Economics at Harvard University, Roland Fryer speaks during the annual Clinton Global Initiative in New York, New York. ((Photo by Ramin Talaie/Corbis via Getty Images))

IMAGES

  1. 35 Media Bias Examples for Students (2023)

    research articles with bias

  2. Research bias: What it is, Types & Examples

    research articles with bias

  3. 17 Confirmation Bias Examples (2023)

    research articles with bias

  4. Types of Bias in Research.

    research articles with bias

  5. What is Bias?

    research articles with bias

  6. What is Bias?

    research articles with bias

COMMENTS

  1. Identifying and Avoiding Bias in Research

    In research, bias occurs when "systematic error [is] introduced into sampling or testing by selecting or encouraging one outcome or answer over others" 7. Bias can occur at any phase of research, including study design or data collection, as well as in the process of data analysis and publication ( Figure 1 ). Bias is not a dichotomous variable.

  2. Moving towards less biased research

    Bias, perhaps best described as 'any process at any stage of inference which tends to produce results or conclusions that differ systematically from the truth,' can pollute the entire spectrum of research, including its design, analysis, interpretation and reporting. 1 It can taint entire bodies of research as much as it can individual studies. ...

  3. Taking a hard look at our implicit biases

    Few people would readily admit that they're biased when it comes to race, gender, age, class, or nationality. But virtually all of us have such biases, even if we aren't consciously aware of them, according to Mahzarin Banaji, Cabot Professor of Social Ethics in the Department of Psychology, who studies implicit biases.

  4. The good, the bad, and the ugly of implicit bias

    1 , 3 Here we critically explore the impact of such interventions, illuminating the good, the bad, and the ugly of implicit bias and the implications for women in science.

  5. Quantifying and addressing the prevalence and bias of study ...

    39 Citations 62 Altmetric Metrics Abstract Building trust in science and evidence-based decision-making depends heavily on the credibility of studies and their findings. Researchers employ many...

  6. Implicit bias in healthcare: clinical practice, research and decision

    Introduction Bias is the evaluation of something or someone that can be positive or negative, and implicit or unconscious bias is when the person is unaware of their evaluation. 1,2 It is negative implicit bias that is of particular concern within healthcare.

  7. Revisiting Bias in Qualitative Research: Reflections on Its

    What Constitutes Bias in Qualitative Research? Bias—commonly understood to be any influence that provides a distortion in the results of a study ( Polit & Beck, 2014 )—is a term drawn from the quantitative research paradigm.

  8. Best Available Evidence or Truth for the Moment: Bias in Research

    In terms of research, "bias is any trend or deviation from the truth in data collection, data analysis, interpretation and publication which can cause false conclusions" ( Simundic, 2013, p. 12). From this definition it can be determined that bias may occur in any part of the research process.

  9. Reducing bias and improving transparency in medical research: a

    Reducing bias and improving transparency in medical research: a critical overview of the problems, progress and suggested next steps - Stephen H Bradley, Nicholas J DeVito, Kelly E Lloyd, Georgia C Richards, Tanja Rombey, Cole Wayant, Peter J Gill, 2020 Journal of the Royal Society of Medicine 17.3 5-Year Impact Factor: 11.2 JOURNAL HOMEPAGE

  10. Interventions designed to reduce implicit prejudices and implicit

    Implicit biases are present in the general population and among professionals in various domains, where they can lead to discrimination. Many interventions are used to reduce implicit bias. However, uncertainties remain as to their effectiveness. We conducted a systematic review by searching ERIC, PUBMED and PSYCHINFO for peer-reviewed studies conducted on adults between May 2005 and April ...

  11. Types of Bias in Research

    Knowledge Base Research bias Types of Bias in Research | Definition & Examples Research bias results from any deviation from the truth, causing distorted results and wrong conclusions. Bias can occur at any phase of your research, including during data collection, data analysis, interpretation, or publication.

  12. Publication and related biases in health services research: a

    Publication and related biases (including publication bias, time-lag bias, outcome reporting bias and p-hacking) have been well documented in clinical research, but relatively little is known about their presence and extent in health services research (HSR). This paper aims to systematically review evidence concerning publication and related bias in quantitative HSR.

  13. Bias in research

    Bias in research doi: 10.11613/bm.2013.003. University Department of Chemistry, University Hospital Center "Sestre Milosrdnice", Zagreb, Croatia. [email protected] PMID: 23457761 PMCID: PMC3900086 10.11613/bm.2013.003 Abstract By writing scientific articles we communicate science among colleagues and peers.

  14. Bias in research

    The aim of this article is to outline types of 'bias' across research designs, and consider strategies to minimise bias. Evidence-based nursing, defined as the "process by which evidence, nursing theory, and clinical expertise are critically evaluated and considered, in conjunction with patient involvement, to provide the delivery of optimum nursing care,"1 is central to the continued ...

  15. Bias in research

    This article describes some basic issues related to bias in research. Keywords: bias, sampling errors, research design Go to: Introduction Scientific papers are tools for communicating science between colleagues and peers.

  16. Scientists Develop New Theory to Understand Why Our Perception is Biased

    Bias in perception traces its roots to many sources: previous experiences, irrelevant sensory information (often called sensory noise), how frequently something is observed in the environment and even how our brain penalizes errors in our estimations. The emerging theory accounts for all of these. ... Funding for the research was provided by UT ...

  17. Positivity bias in higher education research

    In the analysis that follows, the focus is wholly on positivity bias in higher education research. 3 EVIDENCE 3.1 Methodology. To provide some evidence on positivity bias in higher education research, and to illustrate the possible extent of the problem, a bibliographic approach was selected.

  18. PDF Bias in research

    Bias in research Joanna Smith,1Helen Noble2 The aim of this article is to outline types of 'bias across ' research designs, and consider strategies to minimise bias.

  19. People Probably Like You More Than You Think

    Across nearly 10 years of research and tens of thousands of observations, the authors have come to this answer: people underestimate how much others like them, and this bias has important ...

  20. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises

    Abstract. Confirmation bias, as the term is typically used in the psychological literature, connotes the seeking or interpreting of evidence in ways that are partial to existing beliefs, expectations, or a hypothesis in hand. The author reviews evidence of such a bias in a variety of guises and gives examples of its operation in several ...

  21. What is Bias?

    About this guide Identifying bias can be tricky because it is not clearly stated. Bias can exist on a spectrum of political ideology, religious views, financial influence, misinformation, and more. All sources should be evaluated for potential bias -- from a tweeted link to a scholarly article.

  22. Information bias in health research: definition, pitfalls, and

    Bias can be defined as any systematic error in the design, conduct, or analysis of a study. In health studies, bias can arise from two different sources; the approach adopted for selecting subjects for a study or the approach adopted for collecting or measuring data from a study.

  23. Shared Identity and the Doctor-Patient Relationship

    The research on concordance. ... "Bias is not the same as lack of concordance, but it's related," says Alex Keuroghlian, director of the Division of Education and Training at The Fenway Institute and an HMS associate professor of psychiatry at Massachusetts General Hospital. "We tend to 'otherize' people who don't share our ...

  24. Association of nonpharmacological interventions for cognitive function

    The risk of bias in eligible trials was evaluated using the Cochrane Risk of Bias tool. Both pairwise and network meta-analyses were used, and pooled effect sizes were reported using SMD and the corresponding 95% confidence intervals. A total of 28 RCT studies were included in this study, pooling 18 categories of nonpharmacological ...

  25. Nature

    303 See Other. openresty

  26. Increasing rigor and reducing bias in qualitative research: A document

    Qualitative research methods have traditionally been criticised for lacking rigor, and impressionistic and biased results. Subsequently, as qualitative methods have been increasingly used in social work inquiry, efforts to address these criticisms have also increased.

  27. Interventions designed to reduce implicit prejudices and implicit

    Aside from race bias, 3 interventions were tested on weight bias, 2 on sexuality bias, 2 on religion bias, 1 on age bias and 1 on gender bias. 4 interventions were conducted in the United Kingdom (UK), 2 in Australia, 1 in Spain, 1 in the Netherlands, and 4 interventions were conducted in several different countries (including Belgium, Taiwan ...

  28. Harvard professor says 'all hell broke loose' when his study found no

    A Harvard professor said that "all hell broke loose" and he was forced to go out in public with armed security after he published a study that found no evidence of racial bias in police shootings. ...