MIT: Measuring Media Bias in Major News Outlets With Machine ...

red information

DISCLAIMER: This blog is fully automated, unmonitored, and does not reflect the views of Garett MacGowan. The ML model may produce content that is offensive to some readers.

blue information

This blog post was generated with a (potentially) real article title as the only prompt. A link to the original article is below.

blog header

Photo by Brooke Cagle on Unsplash

Generated: 9/14/2021
MIT: Measuring Media Bias in Major News Outlets With Machine ...

MIT Technology review has a great article on a paper recently published in IEEE TPDS that compares several news outlets and measures the extent to which they cover certain events, in this case events related to climate change. I have a few comments on the methodology and the main result (in italic), but also want to mention a couple of other developments that I think are very interesting.

The news media play a central role in shaping our society’s attitudes toward issues, which in turn affects our policy processes. As such, the media are expected to be responsible and accurate when reporting public news events like climate change, an issue that the UN has classified as a global emergency. We introduce MEDIACLA, a large-scale experiment that systematically estimates the bias of the New York Times, Washington Post, Economist and Los Angeles Times in covering climate change. Our key result is that these four media outlets are highly biased with respect to several aspects of the coverage of climate change.

There several interesting methodological points.

What we are interested in is: Is there a media bias in coverage of a particular issue? Here is how we measure the bias of a daily newspaper:

For each article on climate change (from a total of approximately 60 K for the whole sample, as defined in section 5), we compute the following five items:

(i) the percentage of words containing the word “climate”;

(ii) the mean length of the articles (number of words);

(iii) the proportion of climate-related, national-level events mentioned relative to the number of such events in the sample;

(iv) the proportion of articles citing sources reporting climate-related disasters or events; and

(v) the mean number of citations (based on Scopus) of sources reporting climate-related disasters or events.

...The data were collected from 2009 to 2012. The full results and the methodology are presented in the paper and in this article.

This means that we use the number of climate-related word and event mentions in each article as a measure of bias? This doesn’t make a lot of sense to me: The word “climate” is obviously a highly subjective measure, and it’s clear that people writing about climate change have strong preconceived ideas about their significance – which have to be inferred from context and not just from explicit mentions. You can of course measure your hypothesis from two perspectives: how many climate-related words are in an article, and how many people are talking about climate change. Both are somewhat subjective, but I think that for the first measure, we need to compare climate-related word and event mentions to total number of word and event mentions. For the second measure, it’s maybe a better measure to look at coverage of articles reporting climate change, compared to total coverage of national-level events, rather than how much we talk about climate change (as in what counts as a climate-related word or event). (And I also wonder how many relevant events there actually are, given that weather is important, even for climate-change events, and that there may be a lot of noise.)

There’s a lot more to the article, but a couple of other points are interesting.

The paper covers climate change reports (such as the IPCC reports) through its data on disasters and news. The methodology uses these to compute “media mentions” of disasters, not of climate change. I wonder if disaster coverage has an influence on how climate-related events get treated (e.g. there might be more coverage of floods, and floods are clearly climate-related but not disasters in particular). This is not really clear in the paper; I guess that the idea is that the coverage of various disasters (which is only about 10% of the sample, the rest being about 60k articles sampled) provides a good proxy for how relevant disasters are covered, and thus allows us to infer about how relevant climate-related events are covered. This doesn’t work very well for floods, since, even though some events are clearly related to climate change (such as extreme weather) and others clearly not (such as the impact of the financial crisis in the housing market), floods arguably have more than a minor impact on the global climate (and thus should get more attention than the IPCC). At least the coverage of non-climate disasters seems to be pretty consistent through the years, while disasters are much more variable.

The method is also interesting since it uses a much larger sample of news than typically used (for example, they use 9.7 million news articles for a sample of the NYT, but only use about 60k articles for their coverage of climate change). There are some issues, however; it’s hard to separate the impact of time – a larger sample helps since there’s less variation in coverage – and some of the biases we observe are quite local, not quite as global as you would think by looking at a NYTimes sample. We also have some concerns about representativeness – for example, I’m not sure if it’s appropriate to use the number of articles referring to climate disasters, as there seems to be an obvious bias in these numbers to report disasters (and the sample is not large enough to allow for a statistical significance test). Also, for the measurement of the bias of the media outlets, the sample is not randomly selected – we can’t really say how this relates to the full media sample.

The authors also compare media coverage to other data sources – there’s a good correlation between the number of mentions in Wikipedia and climate-related articles in the NYT, but not the Washington Post – and do correlation analyses on various events. This tells us that there are a lot of biases in coverage, and that this is not driven by coverage of a bunch of irrelevant events, but rather by a media bias.

The paper also uses the coverage of the media outlets to determine the global media coverage and identifies a media bias, concluding that the media outlets used do not do a good job in covering climate change. The paper’s main result is then to estimate the ratio of global coverage to local coverage (that is, in the US and Canada, as opposed to the rest of the world) that is obtained by each outlet based on their own coverage. Using this, the authors compute a global media bias (the ratio of global coverage to local coverage).


The main problem with measuring word and event mentions as a bias metric is that the number of words and events covered are clearly skewed – there are probably far fewer references to climate-change related disasters than there are to other natural disasters such as earthquakes or floods. Similarly, the coverage of the NYT sample is dominated by the New York area – even though climate change might be more important in California. (We can’t use a non-USA data set, because we don’t cover all countries (such as most of European countries) but also because there is a lot of noise in terms of media ownership (a few national newspapers) in the USA. This doesn’t affect the NYT sample a lot – that is, the NYT tends to be the primary US newspaper and is almost always reported to be on the same masthead as USA Today, as is a major competitor to USA Today (the Wall Street Journal) – both of which are owned by Murdoch. The Washington Post is not part of this, but also has an independent masthead.)

We also have concerns about the representativeness of the sample. The fact that Wikipedia correlates with the number of mentions from the NYT is only valid for that particular dataset, because Wikipedia and the NYT both cover the same topics – and the correlation is only meaningful if the topics are also represented correctly (that is, in Wikipedia). Wikipedia is a “compilation” service – it doesn’t have a formal editorial policy, so you simply see what’s available. And since this sample is mostly US, it is dominated by Americans, which tends to mean that people who are passionate about climate-related information and/or politics tend to focus on those topics. The fact that certain topics are underrepresented in Wikipedia doesn’t necessarily tell us anything about bias (that would suggest we should observe “bias” against topics that aren’t covered).

In your last paragraph, I disagree that all these news outlets are part of the media. NYT and WSJ are part of the media, with their mastheads on the first 2 or 3 pages, etc. But USA Today may still make sense to compare to – while it goes beyond just being a newspaper and into news organization, it definitely has more influence than the media outlets mentioned. So the comparison is still an unfair one, but then you don’t really want an “unbiased” measure of coverage bias. But if the US media outlets you compare to are “mainstream” media – not to be critical of the paper by making it out as part of a major media conglomerate – then their coverage is relevant to measure the extent of the bias in the broader media coverage.

As a very non-expert on such matters, my initial concern is just that there are many ways you can “cover up” a lack of bias. For example, you may want to consider how the words that do show up in the NYT articles were edited in (noting where that might change, at least, the amount and/or character of the bias). It is my impression that there are few people who actually understand the language and editorial processes of many major news outlets. Maybe my misunderstanding is because I’ve never worked in that business (although I got hired as a result of reading those news outlets and learning about them).

Garett MacGowan

© Copyright 2023 Garett MacGowan. Design Inspiration