Ladies (& Gentlemen), Let’s Kick Some Bias!

Dr. Annabelle Painter, Babylon Health
08 March 2020

How Can We Prevent Gender Bias in Medical AI Technology?

Gender bias in healthcare is a well-recognised issue. From diagnosis to drug development and treatment, the modern healthcare system has been shown to advantage men over women.

The statistics on gender differences in pain management are a poignant example:

  • Women who are in pain wait longer to be prescribed painkillers [1].
  • Pain is more likely to be misdiagnosed as a mental health issue in females [2].
  • There is a delay in diagnosing brain tumours presenting as a headache in women [3].

Historically there has also been an exclusion of women from drug trials [4]. 80% of painkillers have only ever been tested on men despite the fact that 70% of sufferers of chronic pain are women [5].

Responsibly designed artificial intelligence (AI) and machine learning algorithms have the potential to overcome gender bias in medicine. However, if machine learning methods are implemented without careful thought and consideration, they can lead to the perpetuation and even accentuation of existing biases.

We have already seen evidence of gender bias in the development of AI technology, for example, natural language processing (NLP) has been shown to perform better in male speakers than female speakers [6]. With the rapid growth of medical AI technology, it is crucial that we identify the risk of gender bias and do all that we can to mitigate them.

How can we develop technology in a way that prevents rather than perpetuates bias? Here are 4 key principles that can help:

Use Diverse Training Data Sets

Women have been underrepresented in medical trials historically, and yet, they receive treatment based on the conclusions of these studies. In a telling example, when a safety trial for a new ‘female viagra pill’ was conducted in 2015, 92% of the study participants were men [7].

The same pattern could be perpetuated in AI models if we do not ensure that data from women are adequately represented in training sets. Women go through significant hormonal and physiological changes during their lives, including menstrual cycle changes, pregnancy, and menopause. It is important that women at all life stages are represented in data sets to make sure that we can effectively assess women of all ages.

Look Out for Labelling Bias

A common AI technique used in healthcare is called supervised learning. This method uses ‘labelled’ data sets, where the inputs (e.g. the patient’s symptoms) and the predicted labels (e.g. the disease we want the AI to detect) are already known.

It is important that we think carefully about where these labels come from. Some medical labels are a ‘ground truth’ or ‘gold standard’ (i.e. a laboratory or biopsy-proven diagnosis), but others are based on a doctor’s clinical judgement. If the label has come from a human decision, this could be vulnerable to human cognitive biases. If we assume that these labels are accurate, then we may end up incorporating the cognitive biases of doctors into the model.

Cognitive biases such as availability bias, overconfidence, and confirmation bias are well known to affect doctors’ diagnoses [8]. It is important that these factors are carefully considered when selecting a labelled medical data set to limit the risk of incorporating existing biases into AI models.

Test Technology across Different Patient Groups

Once we have trained a model, it should be tested separately on different demographic groups (based on age, gender, race or other factors), to identify whether any subgroups are being treated differently.

If a particular population group is inadequately represented in the training data set or has been labelled incorrectly, this could lead to misleading results for this group. It is therefore important that subgroups are assessed independently to identify weaknesses in the model.

Involve Women in Technology Development

A gender-balanced team, at all seniority levels, minimises the influence of societal biases and norms on patients’ health outcomes. Gender disparities in the STEM workforce are well documented and are particularly notable within AI [9].

Outside of the male and female gender norms, people with alternative gender identities, such as non-binary or genderfluid, must also be considered as part of this discussion.

How healthy are you? Find out with Healthcheck!
Get a holistic view of your health with Babylon’s health assessment tool in Pulse. Receive practical insights on how to improve your overall health and reduce your future disease risk*.

*Babylon’s Symptom Checker and Healthcheck are not intended for detection or diagnosis of diseases. In addition, both Symptom Checker and Healthcheck are not suitable for pregnant women, children under the age of 18 years, and users with long-term medical conditions or disabilities who may have different needs and risks. Symptom Checker should never be used in a medical emergency, and users should contact their local emergency services instead.

1. Gender disparity in analgesic treatment of emergency department patients with acute abdominal pain. Chen EH, Shofer FS, Dean AJ, Hollander JE, Baxt WG, Robey JL, Sease KL, Mills AM. Acad Emerg Med. 2008 May;15(5):414-8. doi: 10.1111/j.1553-2712.2008.00100.x. PMID: 18439195
2. Shapiro, A.P., Teasell, R.W. Misdiagnosis of chronic pain as hysteria and malingering. Current Review of Pain 2, 19–28 (1998).
3. Age and Gender Variations in Cancer Diagnostic Intervals in 15 Cancers: Analysis of Data from the UK Clinical Practice Research Datalink. Din NU, Ukoumunne OC, Rubin G, Hamilton W, Carter B, Stapley S, Neal RD. PLoS One. 2015 May 15;10(5):e0127717. doi: 10.1371/journal.pone.0127717. eCollection 2015. PMID: 25978414
4. Foulkes MA (June 2011). “After inclusion, information and inference: reporting on clinical trials results after 15 years of monitoring inclusion of women”.Journal of Women’s Health. 20(6): 829–36. doi:10.1089/jwh.2010.2527. PMID 21671773
8. Saposnik G, Redelmeier D, Ruff CC, Tobler PN. Cognitive biases associated with medical decisions: a systematic review. BMC Med Inform Decis Mak. 2016 Nov 3;16(1):138. doi: 10.1186/s12911-016-0377-1. PMID: 27809908; PMCID: PMC5093937