Skip to Content View Previous Reports

About The Story – Intro

Methodology

By the Project for Excellence in Journalism

The data for this study were collected in two parts. First is data originally conducted by other people or organizations which PEJ then collected and aggregated. Second, particularly the content analysis, is original work conducted specifically for this report.

For the data aggregated from other researchers, the Project took several steps. First, we tried to determine what data had been collected and by whom for the eight media sectors studied. In any cases this included securing rights to data through license fees or other means. We organized the data into the seven primary areas of interest we wanted to examine: content, audience, economics, ownership, newsroom investment, alternative news outlets and digital trends.

Next, the Project studied the data closely to determine where elements reinforced each other and where there were apparent contradictions or gaps. In doing so, the Project endeavored to determine the value and validity of each data set. That in many cases involved going back to the sources that collected the research in the first place. Where data conflicted, we have included all relevant sources and tried to explain their differences, either in footnotes or in the narratives.

In analyzing the data for each media sector, we sought insight from experts by having at least three outside readers for each sector chapter. Those readers raised questions, offered arguments and questioned data where they saw fit.

All sources are cited in footnotes or within the narrative, and listed alphabetically in a source bibliography. The data used in the report are also available in more complete tabular form online, where users can view the raw material, sort it on their own and make their own charts and graphs[link]. Our goal was not only to organize the available material into a clear narrative, but to also collect all the public data on journalism in one usable place. In many cases, the Project paid for the use of the data.

The methodology for the original content analysis research conducted by PEJ is as follows.

A Year in the News: Methodology

Sampling and Inclusion

The content analysis research in the 2009 State of the News Media Report is the summation of a year’s worth of coding conducted by PEJ. The coding is ongoing throughout the year with weekly findings reported in the News Coverage Index reports.

All coding was conducted in-house by PEJ’s trained staff of researchers.

The 2008 analysis totals 69,942 stories. This consists of 7,350 newspaper stories, 6,539 online stories, 19,796 stories from network television, 21,892 stories on cable news, and 14,365 stories from radio programs.

The central focus of study is to analyze a wide swath of American news media to identify what is being covered and not covered—the media’s broad news agenda.

The Universe: What we Studied

Because the landscape is becoming more diverse—in platform, content, style and emphasis—and because media consumption habits are also changing, even varying day to day, the analysis is designed to be broad. Therefore, our sample, based on the advice of our academic team, was designed to include a broad range of outlets—illustrative but not strictly representative of the media universe.

The sample is also a purposive one, selected to meet this criteria rather than to be strictly random. It is a multistage sampling process that cannot be entirely formulaic or numeric because of differences in measuring systems across media. It involves the balancing of several factors including the number of media sectors that offer news, the number of news outlets in any given sector, the amount of news programming in each outlet and the audience reach. In addition to front-end selections, we have also weighted the various sectors on the back end to account for differences in audience. The weighting process is discussed further down in this document.

The mainstream or establishment daily news media in the United States can be broken down into five main sectors. These are:

Network TV news
Newspapers
Online news sites
Cable News
Radio News

Within each media sector, the number of outlets and individual programs vary considerably, as do the number of stories and size of the audience. We began by first identifying the various media sectors, then identifying the news media outlets within each, then the specific news programs and finally the stories within those.

The primary aim was to look at the main news stories of the week across the industry. With that in mind, for outlets and publications where time does not permit coding the entire news content offered each day (three hours of network morning programming, for instance), we coded the lead portion. In other words, we coded the first 30 minutes of the cable news programs, the first 30 minutes of the network morning news programs, the front page of newspapers, etc. This may have skewed the overall universe toward more “serious” stories, but this is also the most likely time period to include coverage of the main, national news events of the day, those that would make up the top stories each week or each month.

Below we describe the selection process and resulting sample for each main sector.

Network News

Sector Reach

Each evening, the three broadcast news shows on ABC, NBC, and CBS, reach approximately 23 million viewers. The morning news shows on those networks are seen by 13.1 million viewers.1 At the same time, the nightly newscast on PBS reaches roughly 2.4 million viewers each night, according to their internal figures. Because the universe of national broadcast channels is limited with these four channels, it is practical to include all of the networks as part of our sample universe.

Sector Sample

Each of the three major broadcast networks produces two daily national general interest news shows, one in the morning (such as Good Morning America) and one in evening. Therefore it is practical to include at least part of all these news programs on ABC, CBS, and NBC in our sample. In addition, the Newshour with Jim Lehrer is considered by many as an alternative nightly news broadcast compared to the three major networks, and because its audience reaches around 3 million viewers each night, we included that program.

Units of Study

For the evening newscasts, the study coded the entire program. For the morning programs, it coded the news segments that appear during the first 30 minutes of the broadcast, including the national news inserts but not local inserts. By selecting this sample of the morning shows, it is possible that we will be missing some news stories that appear later in the programs. However, through prior PEJ research, we have learned that the morning shows generally move away from the news of the day after the first 30 minutes—save for the top-of-the-hour news insert–and present more human interest and lifestyle stories after that point. The stories that the networks feel are most important will appear during the first 30 minutes and be included in our study.

For PBS Newshour, the second half of the program often differs from the first half but remains news oriented. Thus for our 30 minutes of content each day, on March 31, 2008, we began rotating between the first and the second half of the show in order to get a closer representation of the program’s overall content. Prior to that date, we used the first half of the program daily.

The resulting network sample was:

Commercial Evening News: Entire 30 minutes of all 3 programs each day (90 minutes)
Commercial Morning News: 1st 30 minutes of all 3 programs each day (90 minutes)

PBS NewsHour: Rotate between 1st and 2nd 30 minutes each day

This resulted in 3.5 hours of programming each day.

Cable Television

Sector Reach

According to ratings data, the individual programs of the three main cable television news channels, CNN, MSNBC, and Fox News, do not reach as many viewers as those of the broadcast network news shows. During prime time hours, a median average of 3.6 million viewers watch cable news, according to 2008 data, while 2 million viewers watch during daytime hours.2 But ratings data arguably undercount the reach of cable news. Survey data now finds that somewhat more people cite cable news as their primary or first source for national and international news as do broadcast network news.

Sector Sample

The most likely option was to study the three main cable news channels that compete, CNN, MSNBC, and Fox News. These represent the dominant channel of programming from each news-producing cable company. (This means selecting MSNBC as opposed to CNBC, and CNN as opposed to CNN Headline News and MSNBC over Headline News, which now sometimes beats MSNBC in ratings).

Units of Study

Since these channels provide programming round the clock, with individual programs sometimes reaching fairly small audiences, it was not practical for us to code all of the available shows. There is a great challenge in selecting several times out of the day to serve as a sample of cable news overall.

On the other hand, earlier studies have shown that for much of the day, one cable news program on a channel is indistinguishable to most people from another. If one were to ask a daytime viewer of cable news which program they preferred, the 10 a.m. or the noon, they might look at you in confusion. For blocks of hours at a time, the channels will have programs with generic titles “CNN Newsroom,” “Your World Today,” or “Fox News Live.” Our studies have shown that there are four distinct programming sectors to cable, early morning, daytime, early evening and prime time.

Working with academic advisors we weighed various options. A selection based on the most watched programs would result in the O’Reilly Factor (3.8 million viewers a night in November 2008) for Fox, and Larry King (1.7 million viewers a night in November 2008) for CNN. However, some of these shows are not “news” shows per se, but rather their content derives from the host’s opinions and guests on any given day. Separating news and talk proved problematic as well as it is often difficult to distinguish between the two and several programs offer both in the same hour.

The best option, we concluded, was to draw from two time periods:

1) The daytime period, to demonstrate what on-going or live events are being covered. The study includes two 30 minute segments of daytime programming each day, rotating among the three networks.

2) Early Evening and Prime time (6 PM – 11PM) together as a unit, rather than separating out talk and news or early prime and late prime. Within this five hour period, we included all programming that focuses on general news events of the day. Basically, this removes three programs: Fox’s Greta Van Susteren, which is more narrowly focused on crime, CNN’s Larry King which as often as not is focused on entertainment or personal stories rather than news events and MSNBC’s documentaries program. Because MSNBC’s audience numbers are so much lower than those for Fox or CNN, we also decided to include slightly less of its programming. Even though CNN trails Fox in Nielsen Ratings, its monthly cumulative or “cume” audience figure is higher so the two are sampled equally.

To include the most cable offerings possible each week, the study coded the 30 minutes of selected programs and rotates them daily. Morning shows were not included because those shows are run at the same time for every part of the country – meaning that a broadcast that starts at 7 a.m. on the east coast will begin at 4 a.m. on the west coast. Those programs appear far too early for much of the country to actually view. This is in contrast with the broadcast morning programs, which are shown on tape delay in different parts of the country, in the manner of other broadcast programs.

This process resulted in the following cable sample:

Daytime:

Rotate, coding two out of three 30 minute daytime slots each day (60 minutes)

Prime Time:

Three 30 minute segments for Fox (90 minutes)
Three 30 minutes segments for CNN (90 minutes)

Two 30 minutes segments for MSNBC each day (60 minutes)

The Index rotates among all programming from 6 to 11.p.m. that was focused on general news events of the day excluding CNN’s Larry King Live and Fox’s Greta Van Susteren.

Both CNN and MSNBC made some programming changes during 2008, and our sample included the replacement shows when appropriate.

CNN Fox MSNBC
6pm Situation Room Special Report w/ B. Hume Tucker Carlson/Race for the Whitehouse/1600 Pennsylvania Avenue
7pm Lou Dobbs Tonight (was previously on at 6pm) Fox Report w/ S. Smith Hardball
8pm Out in the Open/CNN Election Center/Campbell Brown: No Bias No Bull The O’Reilly Factor Countdown w/K. Olbermann
9pm —— Hannity & Colmes Live w/Dan Abrams/The Rachel Maddow Show
10pm Anderson Cooper 360 —– —-

This resulted in 5 hours of cable programming each day (including daytime).

Newspapers

Sector Reach

Roughly 48 million people buy a newspaper each weekday.3 This number does not include the online audience or the “pass along” rate of newspapers, which some estimate, depending on the paper, to be approximately three times the circulation rate. In addition, specific newspapers, such as the New York Times and Washington Post, have an influence on the national and international news agenda even greater because they serve as sources of news that many other outlets look to in making their own programming and editorial decisions. So while the overall audience for newspapers is declining over recent years, newspapers still play a large and consequential role in setting the overall news agenda that cannot be strictly quantified or justified by circulation data. There is a growing body of data that the Total Audience of newspapers, combining their reach in print and online combined, is growing slightly.

Sector Sample

To create some representation of what national stories are being covered by the 1,450 newspapers around the country, we divided the country’s daily papers into three tiers based on circulation: over 650,000; 100,000 to 650,000; and under 100,000. Within each tier, we selected papers along the following criteria:

First, papers need to be available electronically the day of publication. Three websites, including www.nexis.com, www.newsstand.com, and www.pressdisplay.com, offer same day full-text delivery service. Based on their general same-day availability (excluding non-daily papers, non-U.S. papers, non-English language papers, college papers, and special niche papers) a list of U.S. general interest daily newspapers was constructed. The list included seven papers in Tier 1, 44 papers in Tier 2, and 22 papers in Tier 3.

Tier 1: Due to its national prominence and readership, and the desirability of having at least one newspaper that was coded every day without any interruption due to rotation (in the same way the network newscasts are coded) we decided to code the New York Times each day (Sunday through Friday). We then wanted to include a representation from the other large nationally reputed or distributed papers, so each day we coded two out of four of the largest papers, the Washington Post, Los Angeles Times, USA Today, and Wall Street Journal.

Tier 2 and 3: Four newspapers were selected from Tier 2 and Tier 3 respectively. To ensure geographical diversity, each of the four newspapers within Tier 2 and Tier 3 was randomly selected from a different geographic region according to the parameters established by the U.S. Census Bureau, i.e., Northeast Region, Midwest Region, South Region and West Region. An effort was also made to ensure the ownership diversity. One selected newspaper was found too difficult to capture during our testing, and it was replaced by another newspaper from the same region within the same circulation category. We rotated two of the four newspapers in Tier 2 and Tier 3 each day.

Tier 2 and Tier 3 papers were changed in April 2008. The former newspapers are included in when appropriate.

This process resulted in the following newspaper sample:

1st Tier

The New York Times
The Washington Post
Los Angeles Times
USA Today
Wall Street Journal

2nd Tier

Philadelphia Inquirer (before April 1, 2008, The Boston Globe)
Chicago Tribune (before April 1, 2008, Star Tribune)
Arkansas Democrat Gazette (before April 1, 2008, Austin American-Statesman)
San Francisco Chronicle (before April 1, 2008, Albuquerque Journal)

3rd Tier
New Hampshire Union-Leader (before April 1, 2008, The Sun Chronicle)
Metro West Daily News (before April 1, 2008, Star Beacon)
The Gazette, Colorado Sprigs (before April 1, 2008, The Chattanooga Times Free Press)
Modesto Bee (before April 1, 2008, The Bakersfield Californian)

Units of Study

For each of the papers selected, we coded only articles that began on page A1 (including jumps). The argument for this is that the papers have made the decision to feature those stories on that day’s edition. That means we did not code the articles on the inside of the A section, or on any other section. The first argument for ignoring these stories is that they will be unnecessary for our Index, which measures only the biggest stories each week. If a story appears on the inside of the paper, but does not make A1 at any point, it would almost certainly not be a big enough story to make the top list of stories we will track each week. The weakness of this approach, arguably, is that it undercounts the full news agenda of national and international news in that it neglects those stories that were not on Page 1 on certain days but were on others. While this is less pertinent in the weekly index, perhaps, at the end of the year, when trying to assess the full range of what the media covered, those stories that spent time on the inside of the paper but didn’t disappear were undercounted.

Part of the reasoning for excluding those national and international stories that begin inside the front section of the paper is practical. Coding the interior of the papers to round out the sample for year end purposes is an enormous amount of work for relatively minimal gain.

The other argument for forgoing national and international stories that fail to make Page 1 is more conceptual. We were measuring what newspapers emphasize, their top agenda. Given the cost versus the benefit, capturing the front page of more newspapers seemed the better alternative. In the same regard, we were not coding every story that might appear on a website—an even more infinite task–just the top stories.
The other challenge with newspapers we did not face with some other media is that we will only include stories that are “national” or international. “National” is defined as a story being covered by newspapers from different locations, as opposed to a local story that is only covered in one paper. The only local stories included in the study are those that are pertain to a larger national issue—how the war in Iraq is affecting the hometown, for instance, or new job cuts at the local industries because of the sliding economy—it was included.

This resulted in the newspaper sample of about 25 stories a day.

Online News Sites

Sector Reach

About 37% of Americans regularly go online for news.4 Both online news sites and web blogs are becoming more important in the overall news agenda. Any sample of the modern news culture must include representation of some of the more popular examples of these sectors.

Sector Sample

The online news universe is even more expansive than radio and has seemingly countless sites that could serve as news sources. To get an understanding of online news sources we chose to include several of the most popular news sites in our universe as a sample of the overall online news agenda. We also wanted balance in the type of online news sites, between those that produced their own content and those who aggregated news from throughout the web.

To choose the sites we were to include in our sample, we referred to the list of the top ten news sites with the most unique visitors according to the Nielsen/NetRatings from September 2006. Out of that list of ten sites, we choose five different sites that represented a mix of sites that either created their own material for their web site (MSNBC.com, CNN.com), popular sites that aggregated material from other web sites (Google News, Yahoo News), and a site such as AOL News which usually uses material from news wire services, but also creates some unique material at times as well.
The sites that were coded were as follows:

Site Unique Audience (000)5
Yahoo News 90,162
MSNBC.com 26,745
CNN.com 24,676
AOL News 18,646
Google News 9,425

Units of Study

For the online news sites, the study captured each site once a day. We rotate the time of day that we capture the websites between 9 am Eastern time and 4 pm Eastern time. Prior to April 28, 2008, we only captured the websites between 9 and 10 am Eastern time. We began rotating the times in order to make sure the timing of our capture did not impact our findings.

For each site we capture, we coded the top five stories, since those have been determined to be the most prominent at that point by the particular news service. As is true with our decision about page A1 in newspapers, if a story is not big enough for the online sites to highlight it in their top five stories, it is likely not a story that will register on our tally of the top stories each week.

This resulted in a sample of 25 stories a day.

Radio

Sector Reach

Radio is a diverse medium that reaches the majority of Americans – 235 million over the course of a week.6
And News/Talk is the top or second top category of listening in every region in the country except one, ranking it as the most popular of all measured radio formats.7
Many more Americans get from news headlines while listening to other formats as well.

The challenge with coding national radio programs is that much of radio news content is localized, and the number of shows that reach a national audience is only a fraction of the overall programming. On the other hand, our content analysis of radio confirms that news on commercial radio in most cities has been reduced to headlines from local wires and syndicated network feeds, plus talk, much of which is nationally syndicated itself. The exception is in a few major cities where a few all-news commercial radio stations still survive, such as Washington, D.C. (where WTOP is a significant radio news operation).

Sector Sample

The Index includes three different areas of radio news programming.

1. Public Radio Programming: The Index includes 30 minutes of a public radio’s broadcast of National Public Radio’s morning program, Morning Edition, each day.

NPR produces two hours of Morning Edition each day, which also includes multiple news roundups produced by a different unit of NPR. Member stations may pick any segments within those two hours and mix and match as fits their programming interests. Thus, what airs on a member station is considered a “co-production” of NPR and that member station rather than programming coming directly from NPR. In order to account for this unique relationship, PEJ rotated between the coding the first 30 minutes of the first hour and the first 30 minutes of the second hour of the member station that we record the show from, WFYI. This gave us a closer representation of the overall content of Morning Edition.

2. Talk Radio: The index includes some of the most popular national talk shows that are public affairs or news oriented. Since the larger portion of the talk radio audience, and talk radio hosts, are of a conservative political persuasion, we included more conservative hosts each day than liberal hosts.

The three most popular conservative radio talk shows for 2008 were Rush Limbaugh, Sean Hannity and Michael Savage. We coded each of these shows every other day. We started rotating Rush Limbaugh’s radio program to code him every other day from July 1, 2008. Prior to that, we had been coding Rush Limbaugh every day, Monday through Friday.

Since the politically liberal audience for talk radio is much smaller, we only coded one liberal talk show a day rotating daily between Ed Schultz and Randi Rhodes, two of the top liberal radio hosts based on national audiences. The Arbitron ratings, according to Talker’s Magazine online, for spring 2008 are as follows:

Minimum Weekly Cume (in millions, rounded to the nearest .25, based on Spring ’08 Arbitron reports)8

Rush Limbaugh 14.25
Sean Hannity 13.25
Michael Savage 8.25
Ed Schultz 3.0
Randi Rhodes (1.25 in 2007, went off air in early 2009)

3. Headline Feeds: Hourly news feeds from national radio organizations like CBS and CNN appear on local stations across the country. These feeds usually last approximately 5 minutes at the top of each hour, and are national in that people all over the country get the same information. They frequently supplement local talk and news shows.

To get a representation of these feeds, we coded two national feeds, each twice a day (9 am ET and 5 pm ET). The networks that were included were CBS Radio and ABC Radio. The stations that were used to capture the CBS Radio headlines were primarily WTOP in Washington D.C., and WNIS in New York City. The stations used primarily to capture the ABC Radio headlines were KGO in San Francisco and WABC in Washington D.C.

The stations used to capture each program were selected based on the availability of a solid feed through the station’s web site. We also compared their shows to that of other stations to ensure that the same edition is aired on that station as with others carrying the same program.

This resulted in the following sample:

News: 30 minutes of Morning Edition each day

Headlines: Four headline segments each day (2 from ABC Radio and 2 from CBS Radio), about 20 minutes total.

Talk: The first 30 minutes of three talk programs each day- one or two conservative (out of Rush Limbaugh, Sean Hannity and Michael Savage) and one liberal (Ed Schultz or Randi Rhodes).

This resulted in a sample of roughly 2.5 hours of programming a day.

Universe of Outlets

Newspapers (Sun-Fri)

NY Times every day

Coded 2 out of these 4 every day

Wash Post
LA Times
USA Today
Wall Street Journal

Coded 2 out of 4 every day

Philadelphia Inquirer (before April 1, 2008, The Boston Globe)
Chicago Tribune (before April 1, 2008, Star Tribune)
Arkansas Democrat Gazette (before April 1, 2008, Austin American-Statesman)
San Francisco Chronicle (before April 1, 2008, Albuquerque Journal)

Coded 2 out of 4 every day

New Hampshire Union-Leader (before April 1, 2008, The Sun Chronicle)
Metro West Daily News (before April 1, 2008, Star Beacon)
The Gazette, Colorado Sprigs (before April 1, 2008, The Chattanooga Times Free Press)
Modesto Bee (before April 1, 2008, The Bakersfield Californian)

Web sites (Mon-Fri)

CNN.com
Yahoo News
MSNBC.com
Google News
AOL News

Network TV (Mon-Fri)

Morning shows
ABC – Good Morning America
CBS – Early Show
NBC – Today
Evening news
ABC – World News Tonight
CBS – CBS Evening News
NBC – NBC Nightly News
PBS – Newshour with Jim Lehrer

Cable TV (Mon-Fri)

Daytime (2–2:30 pm) – coded 2 out of 3 every day
CNN
Fox News
MSNBC
Nighttime CNN – coded 3 out of the 4 every day
Lou Dobbs Tonight
Situation Room (6 pm)
Out in the Open/CNN Election Center/Campbell Brown: No Bias No Bull
Anderson Cooper 360
Nighttime Fox News – coded 3 out of the 4 every day
Special Report w/ Britt Hume
Fox Report w/ Shepard Smith
O’Reilly Factor
Hannity & Colmes
Nighttime MSNBC – code 2 out of the 4 every day
Tucker Carlson(6 pm)/ Race for the White House/1600 Pennsylvania Avenue
Hardball (7 pm)
Countdown w/ Keith Olbermann
Live with Dan Abrams. The Rachel Maddow Show

Radio (Mon-Fri)

Headlines every day
ABC Radio headlines at 9am and 5pm
CBS Radio headlines at 9am and 5pm
NPR Morning Edition every day

Talk Radio
Rush Limbaugh every other day, before July 1, 2008 every day

1 out of 2 additional conservatives each day
Sean Hannity
Michael Savage

1 out of 2 liberals each day
Ed Schultz
Randi Rhodes

That resulted in 35 outlets each weekday, and seven newspapers were included only on Sundays.

Universe Procurement and Story Inclusion

Newspapers

For each of the seven newspapers included in our sample, we coded all stories where the beginning of the text of the story appears on the front page of that day’s hard copy edition. If a story only has a picture, caption, or teaser to text inside the edition, we did not include that story in our sample. We coded all stories that appeared on the front page with a national or international focus. Because we were looking at the coverage of national and international news, if a story was about an event that was solely local to the paper’s point of origination, we excluded such a local story from our sample. The sole exception to this rule was when a story with a local focus was tied to a story that we determined to be a “Big Story” – defined as one that has been covered in multiple national-news outlets for more than one news cycle. For example, a story about a local soldier who has come back from the Iraq War has a local angle but is related to a national issue and was important in the context of our study.

We coded the entirety of the text of all the articles we include. If an article included a jump to an inside page in the hard copy edition, we coded all the text including that which makes up the jump.

When possible, we subscribed to the hard copies of the selected newspapers and had them delivered to our Washington, DC office. This was possible for national papers that have same-day delivery methods (New York Times, Washington Post, Wall Street Journal, and USA Today). For these papers, we used the hard copy edition to determine the placement on the front page of the edition, and to get all the text we coded. The one element of the hard-copy stories we were not easily able to achieve with those editions was the word count of each article. For that, we used the LexisNexis computer database to determine the word count for each of the stories coded.

For all of the other papers that we are not able to get hard copies of within the same day of publication, we took advantage of internet resources that have digital copies of the hard copy editions. Pressdisplay.com and Newsstand.com offer services where we could subscribe to digital versions of the hard copy and get them the same day. From these digital versions, we got the text of the relevant articles and also determined the word counts. To get the word counts, we copied the text of the articles (not including captions, titles, bylines, or pull-out quotes) into the Microsoft Word software program and ran the “word count” function to get the final number. When necessary, we went to the paper’s web site in order to find the text of articles that were not available on either of the two web services. Through prior experience and through examination of each individual article, we were able to determine when the text of the article on the web site is the same as it would be on the hard copy of the paper.

Network and Cable Television

For all television programs, we coded the first 30 minutes of the broadcast, regardless of how long the program lasts. As with newspapers, we coded all stories that are news reports that relate to a national or international issue. Therefore, we did not code stories that are part of a local insert into a national show. For example, each half-hour, NBC’s Today Show cuts to a local affiliate which will report local stories and local weather. We did not include those local insert stories.

We also excluded from our sample commercials, promos, and teasers of upcoming stories. We were only interested in the actual reporting that takes place during the broadcasts.

Any story that fit the above criteria and began within the first 30 minutes was included in the study, even if the story finished outside of the 30 minute time period. A three-minute story that began 28-minutes into a program was be coded in its entirety, even though the final minute ran after our 30-minute cutoff mark. The exception to this rule was when a television station showed a speech or press conference that ran longer than the 30-minute period (often much longer). In those cases, we cut off the coding at the 30-minute mark in order to prevent that event from unduly impacting our overall data.

The method of collection of all television programs was the same. PEJ is a subscriber to DirectTV satellite television service, and we have 9 TiVo recording boxes hooked up to the DirectTV signals. Through these TiVo services, we digitally recorded each broadcast and then archived the programs onto DVDs. There is redundancy in our recording method so that each show is recorded on two machines to prevent an error from causing problems in our capture.

In the case that there was a schedule change or a program change, such as when there was a special event or a channel decided to air a special show rather than its normal show, we coded the same 30-minute period that we would have coded otherwise in order to get a realistic account of what news would be available to the consumer who tuned in at that time. However, when a show was preempted for a special event, such as a presidential campaign debate or the State of the Union address, we did not include that period as part of our sample.

Radio

The rules for capturing and selecting stories to code for radio were very similar to television. We coded the first 30 minutes or each show regardless of how long the show lasts. We also excluded local inserts from local affiliates, and continued coding any story that ran past the 30-minute mark.

For each of the radio shows selected, we found national feeds of the show that were available on the web. As with television, we have two computers capturing each show so as to avoid errors if one feed was not working. The actual recording is done using a software program called Replay A/V which captures the digital feeds and creates digital copies of the programs onto our computers. We then archived those programs onto DVDs.

Online

For each of the web sites we included in our sample, we captured and coded the top 5 stories that appeared on the site at the time of capture. Our captures times rotated every day. They occurred either between 9 am and 10 am Eastern time or between 4 and 5 pm Eastern time each weekday. The captures physically occurred with a coder going to each site using an internet browser and saving the home page and appropriate article pages to our computers, exactly as they appeared in our browsers at the time of the capture. We relied on people rather than a software package to capture sites because some software packages have proved invasive to websites.

As with newspapers, some stories are longer than one web page. In those cases, we included the entire text of the article for as many web pages the article lasted.

Because each web site is formatted differently, we came up with a standard set of rules to determine which stories were the most prominent on a given home page. We spent a significant amount of time examining various popular news sites and discovered patterns which led us to the best possible rules. First, we ignored all advertisements, audio/visual features, or extra features on the sites that were not news reported stories. We were only interested in the main channels of the web sites where the lead stories of the day were displayed. Second, we determined the top “lead” story. That was the story with the largest font size for its title on the home page. The second most prominent story was the story that had a picture associated with it, if that story was different than the story with the largest title. By considering many sites, we realized that a number of sites put pictures with stories they find particularly interesting, but are clearly not intended to be the most important story of the day. However, we wanted those stories to be in our sample because the reader’s eye is often drawn to them.

Having figured out the first and second most prominent stories, we then relied on two factors to determine the next three most prominent stories. We first considered the size of the headline text and then the height on the home page. Therefore, for determining the third most prominent story, we looked for the story with the largest title font after the top two most prominent stories. If there are several stories with identical font sizes, we determined that the story that is higher up on the page is more prominent. In cases where two articles had the same font size AND the same height on the screen, we chose the article to the left to be the more prominent.

Coding Procedures and Intercoder Reliability

A coding protocol was designed for this project based on PEJ’s previous related studies. Eighteen variables were coded, including coder ID, date coded, story ID number (these three were generated from the coding software automatically), story date, source, broadcast start time, broadcast story start timecode, headline, story word count, placement/prominence, story format, story describer, big story, sub-storyline, geography focus, broad story topic, broadcast story ending timecode, campaign mention, and lead newsmaker. Additional variables were added to conduct further analysis of 2008 presidential campaign stories: campaign lead newsmaker, presidential campaign topic, significant presence, and tone.

Variable source includes all the outlets we coded. Variable broadcast start time applies to radio and TV broadcast news and gives the starting time of the program in which the story appears. Broadcast story start timecode is the amount of time a story appears after the start of the show, while broadcast story ending timecode is the amount of time a story appears when the show ends. Variable headline determines whether the story is part of a regular news round-up segment. Variable story word count designates the word count of each individual print/online news story. Variable placement/prominence designates where stories are located within a publication, on a website, or within a broadcast. The location reflects the prominence given the stories by the journalists creating and editing the content. Story format measures the type and origin of the text-based and broadcast stories, which designates, at a basic level, whether the news story is a product of original reporting, or drawn from another news source. Story describer is a short description of the content of each story. Big stories are particular topics that occurred often in news media during the time period under study. Sub-storyline applies to stories that fit into some of the long-running big stories or other common storyline, reflecting aspects, features or development of some big stories. Variable geographic focus concerns the geographic area to which the topic is relevant in relation to the location of the news source. Variable broad story topic determines the type of broad topic categories addressed by a story. Variable campaign mention determines whether the story has any mention at all of a U.S. campaign or election. The lead newsmaker variable determines the person whose actions or statements constitute the main subject matter of the story.

The campaign lead newsmaker variable, similar to the definition of lead newsmaker, designates the person who was the main focus of the election story discussed. Presidential campaign topic variable measures the broad election-related topic or what the campaign story is about “on its face”. The significant presence variable tracks how often specific candidates were a significant presence in a story, but were not the main focus or lead newsmaker of the story. The tone variable measures whether a story’s tone is constructed in way, via use of quotes, assertions, or innuendo, which results in positive, neutral or negative coverage for the primary figure as it relates to the topic of the story.

The coding team responsible for performing the content analysis is made up of fourteen individuals. The daily coding operation is directed by a content supervisor, two methodologists, a training coordinator and a coding manager. Several of the coders have been trained extensively since the summer of 2006 and most of the coders have more than a year’s worth of coding experience.

Numerous tests of intercoder reliability have been conducted since the inception of the NCI in order to ensure accuracy among all coders.

2008 Intercoder Tests

In 2008, PEJ conducted two phases of major intercoder testing to ensure continuing accuracy among all coders.

The first phase tested for variables that require little to no subjectivity from the coder. We refer to these codes as Housekeeping Variables. The second phase of testing was conducted over a period of six weeks. In this phase we tested for variables that are more complex and require more training and expertise. We call these the Main Variables.

Housekeeping Variables

The first phase of testing measured coder agreement for housekeeping variables and was completed in September 2008. These are variables that are necessary for each story, but involve little inference from each coder.

We used a random sample of 181 stories, representing all five media sectors that we code. This sample represented 14% of the number of the stories we code in an average week.

Each story was coded by two different coders and 14 coders participated in the study.
26 print (13 newspaper, 13 online), and 155 broadcast (56 network, 57 cable and 42 radio) stories were sampled.

The percent of agreement was as follows:
Story Date: 100%
Source: 96%
Placement: 91%

Print Only Variable:
Story Word Count (+/1 20 words): 98%

Broadcast Only Variables:
Broadcast Start Time: 100%
Story Start Time (+/1 6 seconds): 82%
Story End Time (+/1 6 seconds): 82%
Headline: 99%

Main Variables

The second group of variables we tested was referred to as the main variables, and are variables that involve more training and interpretation. Having already demonstrated that we had a high level of agreement for all of our housekeeping variables, we then had the coders participate in separate tests for these main variables.

For these tests, we selected 103 stories representing each of the five media sectors. This represented 8% of the number of stories coded in any given week. These tests were conducted over the course of six weeks throughout November and December of 2008. Each week, we selected a random sample of stories and asked all coders to code the main variables. In this analysis, we combined datasets from all of the 6 weeks.

All 14 coders participated in these tests.

For main variables, we achieved the following level of agreement:

Format: 94%
Big Story: 87%
Substory: 80%
Geographic Focus: 92%
Broad Topic: 79%

Agreement for Lead newsmaker was measured separately. This test was conducted over the course of nine weeks (five weeks in 2008 and four weeks in 2009.) Here we selected a total of 144 stories, representing each of the five media sectors. This represented 11% of a week’s sample. We asked coders to recode a randomly selected sample of stories for Lead Newsmaker.

We achieved the following agreement:

Lead newsmaker: 83%

Campaign Related Variables

A majority of the codes and variables used in campaign related studies came out of the coding protocol created for the NCI. For a story to be considered primarily about the presidential campaign, 50% or more of the time or space of the story had to be devoted to coverage of the campaign. PEJ conducted further analysis of these stories through additional variables: campaign lead newsmaker, significant presence, presidential campaign topic and tone.

The specific levels of agreement for these variables among all 14 coders were as follows:

Campaign lead newsmakers: 92%
Significant presence: 81%
Presidential campaign topic: 83%
A team of five of PEJ’s experienced coders worked with a coding administrator in order to complete the specific tone coding for the campaign stories. Of the five coders, all but one had previously coded for tone in a previous PEJ campaign study. The previous study that PEJ conducted in October 2007 using the same process for determining tone had a rate of agreement for intercoder reliability of 86%.
Specific to tone coding, each of the five coders were trained (or re-trained) on the tone coding methodology and then were given the same set of 40 stories to code for tone for each of the four candidates.
The rate of intercoder reliability agreement for tone was 81%.

Testing Details

All the percentages of agreement for the above variables were calculated using a software program available online called PRAM (See “The Content Analysis Guidebook”, by Kimberly A. Neuendorf, Sage Publications, 2002).

Since the inception of the News Coverage Index, as new coders were hired and included in the coding team, they were given extensive training from the training coordinator, content supervisor, and other experienced coders. New coders were not allowed to participate in the weekly coding for the project until they had demonstrated a level of agreement with experienced coders for all variables at an 80% level or higher.

Each coder worked between 20 and 37.5 hours a week in our Washington D.C. office and was trained to work on all the print and broadcast mediums included in the sample. The schedule for each coder varied, but since all of the material included in the Index is archived, the actual coding can be performed at any point during the week.

To achieve diversity in the coding and ensure statistical reliability, generally no one coder coded more than 50% of a particular media sector within one week. Each coder coded at least three mediums each week. In the case of difficult coding decisions about a particular story, the final decision was made by either the coding administrator or a senior member of the PEJ staff.

The physical coding data was entered into a proprietary software program that has been written for this project by Phase II Technology. The software allows coders to enter the data for each variable, and also allows coders to review their work and correct mistakes when needed. The same software package compiles all of the coding data each week and allows us to perform the necessary statistical tests.

Total Media Combined: Creation and Weighting

The basis of measurement for top stories is time in broadcast and cable and words in text-based media. Thus for cable news, for example, we refer to the percent of total seconds that a certain story received. In other words, of all the seconds analyzed in cable news this week, US economy stories accounted for 47% (or 23, 359 seconds out of a total of 49, 914 seconds). The industry term for this is “newshole”—the space given to news content.

The main index considers broadcast and print together, identifying the top stories across all media. To do this, words and seconds are merged together to become total newshole. After considering the various options for merging the two, the most straightforward and sensible method was to first generate the percent of newshole for each specific medium. This way all media are represented in the same measurement—percent.

Next, we needed to create a method for merging the various percentages. There were several options. We could have run a simple average of all five. We could have averaged all print and all broadcast and then average those two? Or should we apply some kind of weight based on apparent audience?

Because each medium measures its audience differently (ratings per month in television, weekly circulation in newspapers, unique visitors in online), any system based on audience figures raises serious discontinuities. Nonetheless, several of our advisors thought some kind of weight should be applied. Various options were considered, including a combination of different metrics, including actual data alongside supplemental survey data. One consistent measure is that of public opinion surveys. The same question is posed about multiple media. Two such questions are asked regularly by the Pew Research Center for the People and the Press. One asks about “regular usage” and the other asks where people go for “national and international news.”

Before arriving at a method, we tested multiple models:

Model 1: compile percentages for big stories for each of the five media sectors (newspapers, online sites, network TV, cable TV and radio), and then average those five lists into one final list.

Model 2: Divide the media sectors into two groups, text-based media (newspapers, online sites) and broadcast (network TV, cable TV, and radio). Average the lists of percentages between the two groups to get one final list.

Model 3: compile percentages for big stories for each of the five media sectors, and then add the weighted five lists together into one final list. The weights given to each media sector were calculated by averaging three most recent survey data in terms of where people get news about national and international issues, collected by the Pew Research Center for the People and the Press (August 2006, November 2005, and June 2005). First, we take the average response for each media across the three time-periods. Next, we rebalance the average percents to match the five media sectors in the index—newspapers, Internet, network TV, cable TV, and radio—to equal 100%. Thus, the weight for newspapers would be 0.28, for Internet would be 0.16, for network TV would be 0.18, for Cable TV would be 0.26, and for radio would be 0.12.

Model 4: compile percentages for big stories for each of the five media sectors, and then add the weighted five lists together into one final list. The weights assigned to each media sector were generated based on the regularly media usage survey data, collected by the Pew Research Center for the People and the Press in their Biennial Media Consumption Survey 2006. Thus, the weight for newspaper would be 0.307, for Internet would be 0.218, for network TV would be 0.165, cable TV would be 0.201 and for radio would be 0.109.

By testing two trial weeks’ data, we found that the lists of top five stories were exactly the same (top stories’ names and their ranks) using all four of these models, although some percentages varied. In the end, the academic and survey analysts on our team felt the best option was model 3. It has the virtue of tracking the media use for national and international news which is what the Index studies. Also, the Pew Research Center for the People & the Press asks this question about once every six months so we can reflect changes in media use. We adopted this model updated the weights when appropriate.

The weights used for data in Model 3 have been updated twice since the inception of the News Coverage Index.

The first update was on June 16, 2008, based on the three most recent surveys conducted by the Pew Research Center for the People & the Press (September 2007, July 2007, and February 2007).

Thus, the weights used for the Index in 2008 were as follows:

2008 Weights

Newspapers: 0.26
Online: 0.20
Network TV: 0.18
Cable TV: 0.24
Radio: 0.12