What's a systematic review & meta-analysis?
A primer so you can "do your own research" beyond an internet search
If you think it may be helpful to understand the basics of systematic reviews and meta-analyses, read on. I hope this primer can assist you in evaluating what you read and hear when the news, a legislator, or whoever says, “A systematic review shows…”
A new study, “Legal gender recognition and the health of transgender and gender diverse people: A systematic review and meta-analysis,” was published earlier this month. (I’m one of the authors!) I’ll write about what we learned from the study in another post. In short, the review shows that legal transition is good for health. (Not surprising, but now we have some numbers to back it up!)
Specifically,
Living in a place with favorable laws and policies that allow trans people to change their gender marker and name on official identification documents is good for trans people’s mental and physical health.
Having ID documents with affirmed gender identity and name is good for trans people’s mental and physical health.
What is a systematic review? What’s a meta-analysis? And why care?
Systematic reviews
A systematic review is an extra rigorous literature review. In literature reviews, you use the library and the internet to find what you can on a topic, then write about what you find. For systematic reviews, you do that, but more, well, systematically. You write down all your plans and methods, do them, then report on them, so that someone else could follow your same steps and replicate the results. Theoretically, anyway.
Systematic reviews, then, are formal summaries of what we know about a topic. So, if 20 scientific, peer-reviewed studies have been published, the systematic review and meta-analysis will combine their results to answer a research question.
Because they combine several studies, their data is considered more accurate. It’s like getting the average of all the different scientists’ results. When I need to write about a topic, I first check for systematic reviews on the subject—the authors have done a lot of the work for me!
In biomedicine and public health, that usually means gathering and compiling all the published evidence to answer questions like “What do we know about X’s relationship to Y?” or “Combining all the evidence we have, what impact does A have on B?”
Some systematic reviews related to transgender health include:
Systematic Review: Puberty suppression with GnRH analogues in adolescents with gender incongruity
11 studies from 2009 to 2019 found that puberty blockers improve mental health, prevent undesirable body changes, and do not require routine laboratory testing
Hormone Therapy, Mental Health, and Quality of Life Among Transgender People: A Systematic Review
20 studies showed that gender-affirming hormones increased quality of life and decreased depression and anxiety
Regret after Gender-affirmation Surgery: A Systematic Review and Meta-analysis of Prevalence
27 studies found that surgical regret was less than 1%
Hormonal Treatment in Young People With Gender Dysphoria: A Systematic Review
13 studies published up to 2017 found that puberty blockers and gender-affirming hormones do what young trans people want them to do
If you are not interested in the nitty-gritty details of systematic reviews, skip the Nerdy Stuff section below and go to the Meta-Analysis header.
The nerdy stuff
I didn’t hear about these things until I was getting a master’s in public health, when I learned to read and understand them effectively. During my postdoctoral studies, I took a class from the scholars who developed Cochrane, the experts in systematic reviews, and conducted and published one myself. Since then, I’ve worked on several systematic reviews and meta-analyses (SRMA), mainly for the World Health Organization, most of which have been published. My expertise has often been on “reviewing the quality of the evidence,” also known as “risk of bias” assessments.
I use the Legal Gender Recognition review as an example. The World Health Organization requested this study as part of their efforts to publish guidelines on transgender health issues.
In systematic reviews, you carefully define the topic using PICO: Population, Intervention or exposure, Comparison group, and Outcome. In this case, these were as follows:
Population: transgender, nonbinary, and gender diverse people, meaning anyone whose “gender identity differs from their sex assigned at birth.”
Intervention/Exposure 1: laws, policies, and/or administrative procedures aimed at providing legal gender recognition and following international human rights standards
vs. Comparator 1: lack of such laws, policies, or noncompliance with international human rights standards
Intervention/Exposure 2: Having an official ID that matches current gender identity (gender and/or name)
vs. Comparator 2: lack of such ID
Outcomes:
quality of life
mortality and life expectancy
mental health, including substance use
access and utilization of health services
stigma, discrimination, and violence
physical health
socioeconomic status
Once your PICO is defined, you carefully plan your search terms, usually with a librarian’s help. For example, see the string of search terms just for Population.1 This ensures you cast a wide net to find as many relevant publications as possible. The search terms and date are shared in the publication or an appendix, so that, theoretically, another person could replicate the search and find the same articles.
Additional Inclusion and Exclusion Criteria are also written and strictly followed. For example, you may restrict the search to a time period (e.g., HIV studies published after the year 1996), or you may limit to only English language materials (or not), no conference abstracts (they are not fully studies), etc. But you'll have to explain why. In the HIV example, the year 1996 is when effective medications for HIV became available, so studies published before and after that date have significant differences.
Title & Abstract Screening: Two or more researchers look through every reference's title and abstract (summary) and exclude irrelevant ones. When in any doubt, researchers keep it in at this stage. This usually eliminates the vast majority of references. In this study, the systematic search found 2,696 titles from seven databases, then excluded 2,456 by looking at titles and abstracts.
Full-Text Review: Two or more researchers then look at the full text of the publications to see if they are really relevant and have the PICO data required. This study reviewed 106 full-text articles and eliminated 82. How many articles were excluded, and for what reasons, are reported in the paper.
Data Abstraction: Two or more researchers read the remaining studies and systematically copy down the relevant information, such as all the PICO-related info, plus the first author’s last name, year of publication, location, sample size, where the funding came from, etc. Of the 106 full-text articles, 24 were included in the example paper. That data is then used to summarize; it’s usually presented in a big table in the publication. It’s Table 2 in the legal gender recognition paper.
Risk of Bias Assessment/Quality Review: Not all studies are conducted well; some are more rigorous than others, and some may be biased by who is funding them. To account for that, the researchers also use an established rubric to rate how well the study followed the methods relevant to its study type. For example, rubrics differ if it’s a randomized controlled trial to test a drug or a one-time survey.
Finally, the researchers review all the abstracted info, create a bunch of tables, write it up, review it with all authors, and send it to an editor for peer review.
Scientific peer-review means a journal editor, themselves scientists, decides the manuscript is worth sending to other scientists for review. Then, several scientists in the same field review it, request edits, and recommend it for publication (pending edits). The authors have a chance to rewrite, perhaps conduct additional analyses, and send it back. The reviewers could also recommend that a study not be published, but the editor gets the final say.
Meta-Analysis
Suppose a systematic review is specific and narrow enough in its research question, and enough data exists. In that case, a meta-analysis combines or pools all the participants’ results from all the studies to get one overall summary number.
For example, the study listed above about regret pulled the data from 27 studies and pooled the participants’ outcomes (level of regret or satisfaction with their surgeries). Together, there were 7,928 patients in the 27 studies. The proportion of those 7,928 who expressed regret was less than 1%!
In the legal gender recognition review, we found enough data on specific outcomes to conduct meta-analyses, specifically suicidal ideation, suicide attempts, psychological distress, and tobacco use.
But the included studies didn’t provide enough quantitative information on the other outcomes, like access to health services and testing for HIV, to be able to pool data. Instead, we synthesize the data in words. If there is enough qualitative data (words that cannot be summarized in numbers), we can do a meta-synthesis—a formal word-based summary of the data from multiple studies.
A meta-analysis is usually presented visually as a forest plot.
Below is a section of a forest plot from one of my reviews that included 130+ studies. Authors often sort the studies from the smallest (odds ratio 0.83) to the largest (odds ratio 4.90) result, creating this nice cascading visual.
It’s a great way to see where the data falls (each box and line represents one study’s outcome). You can also easily see the overall summary outcome (the diamond at the bottom and the vertical red dotted line).
Below is one of the forest plots from the legal gender recognition study that had only three studies with data on past-year suicidal ideation. It’s a whole lot of numbers and abbreviations, but that’s for the methodologists. Look at the black diamond in the graph. That’s the summary result as an odds ratio; since it falls to the left of the 1 line, it indicates that the data favors legal gender recognition (LGR) over not.
The red boxes indicate the size of the sample in each study; the bigger the box, the bigger the sample size. (We want bigger sample sizes in surveys and intervention studies.) At a glance, you can see that Scheim’s study had the largest sample (from the USTS 2015), and the Bauer study was the smallest. (That’s the same Scheim who is the first author of the LGR paper.) The horizontal lines are the error bars (the confidence intervals); the wide intervals—the longer lines—show uncertainty. So we want nice, narrow, short intervals. In the Scheim study, the error bars are so narrow that the red box covers them!
Any questions? What did I forget to explain or include?