Pride and AI: Fairness in Generative AI

On reducing biases in AI models, the importance of inclusive AI in healthcare, and how to conduct a fairness assessment. Special post for Pride month.

Jun 21, 2024

Some of the people close to me are part of the LGBTQ+ community. I support diversity and inclusion, and consider myself a strong ally of the LGBTQ+ community.

June is Pride month. And while we have come very far as a society, at least in some parts of the world, we are still not there. The rise of the hate speech in democratic countries is concerning, and there are about 70 countries around the world that still consider consensual same-sex relations a crime. In some of these countries, individuals who engage in same-sex sexual relations may even face death penalty.

As a side note, it is ridiculous to see LGBTQ+ individuals protesting in support of fundamentalist regimes they know nothing about, under which they would undoubtedly face severe persecution - or even a death sentence, just for being who they are. Sigh.

When will humanity be in a good place on this front? When sexual orientation or gender identity would simply be another aspect of a person's background, without being an issue anymore.

In the rapidly evolving universe of Generative AI, ensuring inclusivity and fairness for all communities is key. As we celebrate Pride Month, it's crucial to recognize that AI models must reflect the diversity and complexity of the human experience, including the LGBTQ+ community. Historically marginalized, the LGBTQ+ community faces unique challenges that could be worsened by biased algorithms.

Inclusive AI in Healthcare

AI in healthcare has to take into account some special considerations for LGBTQ+ patients to be inclusive and avoid biases. Just a few examples:

Under-representation: If an AI system is trained on data that lacks diversity or representation of certain populations, it may not be able to provide optimal or inclusive enough results for such patients. For LGBTQ+ patients, this means their unique health needs might get overlooked or misinterpreted.

If the training data does not include LGBTQ+-specific health concerns and needs, the AI may not recognize patterns or issues unique to LGBTQ+ individuals or fail to understand the nuances of LGBTQ+ health, which could lead to misdiagnoses or inappropriate follow-up recommendations. For example, AI systems should consider transgender patients, where the patient gender noted in the medical record does not necessarily match the body parts mentioned in a radiology report.

Moreover, some AI systems might have been trained on outdated data or may be operating under some heterosexual / cisgender assumptions. Things have changed, and assumptions need updating to reflect current LGBTQ+ realities and diverse experiences.

And there could be additional considerations around patient privacy. Personal healthcare information (aka PHI) is always sensitive, and systems should always protect patient privacy and confidentiality. But when it comes to information like sexual orientation or gender identity (SOGI), it could be even more sensitive. On one hand, being able to identify individuals who need help can lead to improved health outcomes; on the other hand, if this information falls into the wrong hands, it could have the opposite effect, potentially causing harm. Patient privacy is key to digital systems in healthcare, otherwise it can erode trust and deter individuals from seeking care altogether.

If an AI system is trained on data that lacks diversity or representation of certain populations, it may not be able to provide good enough results for such patients.

Improving fairness of your AI system

So, here are some best practices for increasing fairness of your AI system:

Diversify your datasets: AI learns from patterns in data that it sees. Make sure to include sufficient and representative data for under-represented populations such as LGBTQ+ patients, to have your AI model trained on diverse-enough data that is inclusive, in a proportional way. This applies to both training and evaluation data sets that your AI models use and could help avoid biases. Diversify also the sources of your data: for example, it is better not to rely on data from just one hospital and use data from multiple hospitals, ideally from different locations and geographies, especially if your AI is intended to serve the healthcare ecosystem globally.

Diversify your subject-matter experts: a lot has been said about the clinical expert annotators being the real heroes of AI in healthcare, and I agree with it all. You definitely cannot rely on a single subject matter expert. Diversify your human annotators and the subject-matter experts who are enriching your data with ground truth, as well as your Red Teamers and the people who are testing the results of your model. Make sure they come from diverse backgrounds, ethnicities, genders, orientations, geographies and specialties.
More about Red Teaming for Generative AI will be coming up soon in one of my next blog posts.

Perform regular testing: Test your AI system performance regularly, across different demographics of patients, both manually and using automated testing. Ensure that it’s not just passing the test overall but also scoring well with all subgroups, as much as possible.

Conduct fairness assessments: To address bias and fairness issues, AI systems should regularly conduct Fairness Assessments, to ensure the AI system performs equally well across different patient demographics, including LGBTQ+ populations.

In the context of healthcare, Fairness Assessment means making sure your AI model performs just as well for different types of patients.

How to conduct a Fairness Assessment

Fairness Assessment is part of our company’s Responsible AI principles, that we implement for every AI model. Fairness Assessment is about making sure your AI is treating everyone equitably. Meaning that your model performs just as well for different types of patients.
Few important steps when conducting a Fairness Assessment:

Define fairness goals: Start by asking what fairness means in your specific context – and that’s going to be your measurement. For example, if you are measuring an F1 score of a model through precision and recall, or if you are measuring Specificity and Sensitivity, you want to make sure the performance of your model is not lower in a statistically significant manner against specific groups.

Define fairness dimensions: what groups are you assessing fairness for? Dimensions would sometimes represent demographics, for example, gender, age, ethnicity or sexual orientation. But those dimensions need to be relevant for your AI model. For example, some dimensions might be less relevant if the AI model you are assessing is a medical imaging model, because gender and age aside, humans are quite similar on x-rays, you know.

Representative data: Ensure your fairness assessment dataset is representative of those fairness dimensions. This means gathering data that has enough representation for all relevant segments of the population, covering for the various demographic dimensions you are trying to assess for. For example, if you want to test a model for fairness for the gender and age dimensions, make sure your patient dataset has enough representation of patients from different genders and of diverse ages.

Detect bias: Use tools and techniques to identify bias in your AI. Common methods include statistical parity analysis and disparate impact analysis, among others. Another way to look at it is to make sure that quality of the results of your model, as defined by your goals above, does not present statistically significant differences in results between groups across your fairness dimensions. For examples, that your AI model’s F1 score is not more than x points better for male patients than for female patients.

AI has great potential to reach the unreachable, and improve care for LGBTQ+ population.

Are we there yet?

The evolution of healthcare systems, and AI in particular, have great potential to reach the unreachable, and improve care for LGBTQ+ population. But there is still a lot to be done to ensure AI systems perform fairly for all potential users. And this needs to be applied across the technology stack.

LGBTQ+’s are not underrepresented in the population per se. As opposed to past research studies, where not just the LGBTQ+ population was absent, but also women and non-Caucasian men, data of LGBTQ+ individuals is there, as its community members are part of the general population and visit medical facilities. The issue is that in many cases there is no way of knowing the data is of a member of LGBTQ+ community, since, even given patient consent to storing this information, sexual orientation and gender identity data is not always stored in the records. This creates challenges in analyzing this data to maximize its value.

Mind you, this is more challenging to just adding a column to a database, due to the sensitivity of this information, plus the fact that sexual and gender fluidity might be a more than a binary question these days.

HL7-FHIR introduced recent extensions for Sexual Orientation and Gender Identity data that try to address this range . Time to adopt those.

But many digital healthcare systems and tools have not yet enabled documenting sexual orientation and gender identity in a structured way within the medical records upon patient consent. So how can AI developers make sure their training and evaluation data sets are diverse and inclusive, when there’s no information about gender identity or sexual orientation in the patient’s medical records?

And how can developers extract gender identity or sexual orientation information that is explicitly mentioned in unstructured text, if some information extraction AI systems and tools still have gaps in extracting this information as part of the patient’s social determinants of health?
Not even talking about inference of those attributes, which apparently falls under ‘Prohibited AI Systems’ according to the recent EU AI Act. Just explicit mentions in unstructured clinical notes.

Healthcare systems and models need to recognize that biological sex and gender identity might not match in case of transgender people. And when a system is not sure about what pronouns to use for the patient (he/she/they), turning to nonbinary language can help avoid mistakes.

By prioritizing inclusivity in AI development we can foster a more equitable future.

In summary, by prioritizing inclusivity in AI development across the stack, we can foster a more equitable technological future and ensure that AI tools serve every individual, regardless of their sexual orientation or gender identity.

Or, in other words: love is love. It’s time digital healthcare systems, and AI models in particular, get that.

Dedicated to all the people in my life who are a part of the LGBTQ+ community and have the courage to be their true self.