Skip to Content View Previous Reports

About The Study – Intro

About The Study
By the Project For Excellence In Journalism
Methodology

The data for this study were collected in two parts. The first part consists of data originally conducted by other people or organizations that PEJ then collected and aggregated. The second part, particularly the content analysis, is original work conducted specifically for this report.

For the data aggregated from other researchers, the Project took several steps. First, we tried to determine what data had been collected and by whom for the eight media sectors studied. In many cases this included securing rights to data through license fees or other means. We organized the data into the seven primary areas of interest we wanted to examine: content, audience, economics, ownership, newsroom investment, alternative news outlets and digital trends.

Next, the Project studied the data closely to determine where elements reinforced each other and where there were apparent contradictions or gaps. In doing so, the Project endeavored to determine the value and validity of each data set. That in many cases involved going back to the sources that collected the research in the first place. Where data conflicted, we have included all relevant sources and tried to explain their differences, either in footnotes or in the narratives.

In analyzing the data for each media sector, we sought insight from experts by having at least three outside readers for each sector chapter. Those readers raised questions, offered arguments and questioned data where they saw fit.

All sources are cited in footnotes or within the narrative, and listed alphabetically in a source bibliography. The data used in the report are also available in more complete tabular form online, where users can view the raw material, sort it on their own and make their own charts and graphs. Our goal was not only to organize the available material into a clear narrative, but to also collect all the public data on journalism in one usable place. In many cases, the Project paid for the use of the data.

In addition, PEJ conducted original research in a number of special reports and features. The methodologies for each can be found below. You can scroll through them all or click to go directly to the report of interest.
Survey of Economic attitudes
Analysis of Nielsen data
Who Owns the News Media
A Year in the News Content Analysis
Citizen media study

Survey of Economic Attitudes

This survey was conducted jointly with the Pew Internet and American Life Project. The first portion of the survey about media consumption, Understanding the Participatory News Consumer, was released March 1. Here, we release for the first time, the findings on public attitudes toward online economics.

This report is based on the findings of a daily tracking survey on Americans’ use of the Internet. The results in this report are based on data from telephone interviews conducted by Princeton Survey Research Associates International between December 28, 2009, and January 19, 2010, among a sample of 2,259 adults, age 18 and older, in English.  For results based on the total sample, one can say with 95% confidence that the error attributable to sampling and other random effects is plus or minus 2.3 percentage points.  For results based Internet users (n=1,675), the margin of sampling error is plus or minus 2.7 percentage points.  In addition to sampling error, question wording and practical difficulties in conducting telephone surveys may introduce some error or bias into the findings of opinion polls.

A combination of landline and cellular random digit dial (RDD) samples was used to represent all adults in the continental United States who have access to either a landline or cellular telephone. Both samples were provided by Survey Sampling International according to PSRAI specifications of the Princeton survey group.  Numbers for the landline sample were selected with probabilities in proportion to their share of listed telephone households from active blocks (area code + exchange + two-digit block number) that contained three or more residential directory listings. The cellular sample was not list-assisted, but was drawn through a systematic sampling from dedicated wireless 100-blocks and shared service 100-blocks with no directory-listed landline numbers.

A new sample was released daily and was kept in the field for at least five days. The sample was released in replicates, which are representative subsamples of the larger population. This ensures that complete call procedures were followed for the entire sample.  At least seven attempts were made to complete an interview at sampled telephone number. The calls were staggered over times of day and days of the week to maximize the chances of making contact with a potential respondent. Each number received at least one daytime call in an attempt to find someone available.

For the landline sample, half of the time interviewers first asked to speak with the youngest adult male currently at home, and if no male was available, interviewers asked to speak with the youngest female. In the other half, the caller reversed the procedure. For the cellular sample, interviews were conducted with the person who answered the phone. Interviewers verified that the person was an adult and in a safe place (for example, while not driving a car) before administering the survey. Cellular sample respondents were offered a post-paid cash incentive for their participation. All interviews completed on any given day were considered to be the final sample for that day.

Weighting is generally used in survey analysis to compensate for sample designs and patterns of non-response that might bias results. A two-stage weighting procedure was used to weight this dual-frame sample. The first stage weight is the product of two adjustments made to the data – a Probability of Selection Adjustment (PSA) and a Phone Use Adjustment (PUA). The PSA corrects for the fact that respondents in the landline sample have different probabilities of being sampled depending on how many adults live in the household. The PUA corrects for the overlapping landline and cellular sample frames.

The second stage of weighting balances sample demographics to population parameters. The sample is balanced to match national population parameters for sex, age, education, race, Hispanic origin, region (U.S. Census definitions), population density and telephone usage. The basic weighting parameters came from a special analysis of the Census Bureau’s 2009 Annual Social and Economic Supplement that included all households in the continental U.S. The population density parameter was derived from Census 2000 data. The cellphone usage parameter came from an analysis of the January-June 2009 National Health Interview Survey. Weighting was accomplished using Sample Balancing, a special iterative sample-weighting program that simultaneously balances the distributions of all variables using a statistical technique called the Deming Algorithm. Weights were trimmed to prevent individual interviews from having too much influence on the final results. The use of these weights in statistical analysis ensures that the demographic characteristics of the sample closely approximate the demographic characteristics of the national population.

Following is the full disposition of all sampled telephone numbers:

The disposition reports all of the sampled telephone numbers ever dialed from the original telephone number samples. The response rate estimates the fraction of all eligible respondents in the sample that were ultimately interviewed. At PSRAI it is calculated by taking the product of three component rates:

  • Contact rate – the proportion of working numbers where a request for interview was made.
  • Cooperation rate – the proportion of contacted numbers where a consent for interview was at least initially obtained, versus those refused.
  • Completion rate – the proportion of initially cooperating and eligible interviews that were completed.

Thus the response rate for the landline sample was 22 percent. The response rate for the cellular sample was 20 percent.

Analysis of Nielsen NetView Data

The analysis of news website traffic and audience behavior is based on Nielsen’s NetView database, which collects usage information on thousands of sites.

Site Criteria

Within the Nielsen database, we worked with Nielsen’s list of News and Information websites. To create our list for analysis, though, we needed to take some additional steps.

The first adjustment had to do with the level of the sites listed. Nielsen organizes sites in its universe on three tiers: parent, brand, and channel.  These three tiers represent a hierarchy for every site.  For example, CNN’s parent is Time Warner. Time Warner.com includes all of Time Warner’s Web properties like Cartoon Network and several sports channels.  CNN Digital network is a brand, the second tier, and includes CNN.com in addition to Sports Illustrated (SI.com) and Time.com, among others.  The final, most narrow level is CNN.com, which Nielsen calls a “channel.”

This then put each site on as equal footing as possible. For every site that we could we disaggregated listings to get to the news site itself.

The next step was to cull the list for sites that would not be traditionally considered news sites.  Examples of sites that we removed are Yahoo, Answers and About.com, sites that do not provide news in the traditional sense.

Finally, we calculated the average unique audience from September-November 2009 and selected all sites with an average of at least 500,000 unique visitors per month. There were 199 sites that met that threshold. The list of sites is here.

Analysis
To do this analysis, PEJ researchers first broke down the various sites into categories. The first set of categories had to do with each website’s affiliation. Was it tied to a legacy outlet like print or television, or was it online only? The second groupings were formed around each site’s editorial or topic focus. Was it national and international news, local or specialized around a certain topic and if so which one (science, celebrity, etc.)? Third, we created categories around the nature of its content. Was it mostly reporting that the website produced itself? Was it mostly an aggregator of others’ work? Was it mostly offering commentary as opposed to news reporting? In making these categories researchers examined many of the sites themselves following criteria for determining where they would fit.

Each site was placed into a category within each of the three main areas. Not all fit easily or clearly. But we used consistent criteria to make determinations. For example, MSNBC relies a good deal on wire service copy for much of its content, but it includes its own staff bylines and thus is categorized here as an original content provider. And many local newspapers carry national news but their local coverage is most prominent on the site.

Next, researchers examined each category of sites (as well as the top 20 sites) using nine of Nielsen’s metrics: Unique Audience, Total Sessions, Sessions per Person, Total Minutes, Minutes per Person, Minutes per Session, Total Page Views, Page Views per Session and Web Pages per Person. Data were averaged from September-November 2009. The definitions for the various metrics are as follows:

Unique Audience (Visitors): The number of unique individuals, not including webcrawlers, which view a website in a given time frame (normally in a month).  This is calculated several ways by different measurement services. The most effective way to measure this is by making users log in to a site every time they use it. Since most sites do not require login, however, the most prevalent way to measure this is through the use of tracking cookies.

Total Sessions (Visits): A session is defined as a continuous series of URL requests, running applications or AOL proprietary online service page requests. Logging off or 30 minutes of computer inactivity ends a session. (This differs slightly from a visit, which considers URL requests only). (From Nielsen Online)

Sessions (visits)per Person: The average number of sessions a single visitor has at one site in a given time period.  This figure is calculated by taking the total number of sessions to a site in a given period, and dividing it by the number of unique visitors/

Total Minutes: The total number of minutes spent on a site over the given time period.

Minutes per Person: The average number of minutes an average user spends on a site during a given time period.  This is calculated by taking the total number of minutes on a site and dividing it by the number of unique visitors. This figure is affected by the unique audience of any site. It does not reflect, necessarily, how much time an individual user spends on average on a site, but is an approximation of that figure based on total minutes and total unique visitors, both of which are affected by the size of the site.

Minutes per Session (visits): The average number of minutes a visitor spends per session (visit) on a site during the given time period..

Total Page Views: The number of times a single page of a website is viewed.  This is calculated by the number of times that particular page is loaded from the server; if a user hits refresh on a page, it counts as a page view.  This measurement is complicated by more modern Web technologies such as Ajax that refresh the content on pages automatically, causing page view measurements that tend to be far higher because of their architecture.

Web Pages (Page Views) per Session (visit): The average number of pages called up on a site during the average session.  It is calculated by taking the total number of sessions and dividing it by the total number of page views.

Web Pages (Page Views) per Person: The total number of page views divided by unique audience.  Generally, this is expressed as Web pages per month, but it can be calculated on any time period.

The topline of data on the 199 news sites lives HERE.

Who Owns the News Media

Who Owns the News Media was developed to enhance the information available in the State of the News Media report. The tool uses a tab format to house specific data related to all companies, and to companies within the main sectors of media: newspapers, online, network, cable and local TV, magazines, audio and ethnic media.

The goal of Who Owns the News Media was to create a tool that aggregated comparative information on the companies that own news media properties. We wanted to do this within each media sector as well as more broadly across news media over all. To do this, we took several steps. First, we identified the various U.S.-based companies within each media sector. In some cases, the list is so long that we determined a cutoff point for which companies to include. The newspapers sector, for example, includes all companies with a total weekday circulation of 100,000 or more.

Next, we looked for relevant statistical data that were available for most companies and could be compared from one company to the next. Some data are compared within the media sector and other data, like total revenues, can be compared across all companies.

The full methodology, which explains the process and rules for identifying the companies and statistical data, can be found here.

A Year in the News

The content analysis research in the State of the News Media Report is the summation of a year’s worth of coding conducted by PEJ. The coding is done throughout the year with weekly findings reported in the News Coverage Index, or NCI, reports.

All coding was conducted in-house by PEJ’s trained staff of researchers.
The 2009 analysis totals 68,717 stories. This consists of 7,370 newspaper articles, 7,830 online stories, 19,427 stories from network television, 18,856 stories on cable news, and 15,234 stories from radio programs.

The central focus of study is to analyze a wide swath of American news media to identify what is being covered and not covered — the media’s broad news agenda.
The Year in the News Interactive uses the data from PEJ’s News Coverage Index. It was designed to allow usersto explore and answer questions about media coverage in 2009. The data are based on selected variables from PEJ’s News Coverage Index for the year. Learn more about the Index from our methodology.

The Universe: What We Are Studying

Because the landscape is becoming more diverse — in platform, content, style and emphasis — and because media consumption habits are also changing, even varying day to day, the Index is designed to be broad. Therefore, our sample, based on the advice of our academic team, is designed to include a broad range of outlets, illustrative but not strictly representative of the media universe.

The sample is also a purposive one, selected to meet these criteria rather than to be strictly random. It is a multistage sampling process that cannot be entirely formulaic or numeric because of differences in measuring systems across media. It involves the balancing of several factors, including the number of media sectors that offer news, the number of news outlets in any given sector, the amount of news programming in each outlet and the audience reach. In addition to front-end selections, we have also weighted the various sectors on the back end to account for differences in audience. The weighting process is discussed below in this document.

The mainstream or establishment daily news media in the United States can be broken down into five main sectors. These are:

Network TV News
Newspapers
Online News Sites
Cable News
Radio News

Within each media sector, the number of outlets and individual programs vary considerably, as do the number of stories and size of the audience. We began by first identifying the various media sectors, then identifying the news media outlets within each, then the specific news programs and finally the stories within those.

The primary aim of the Index is to look at the main news stories of the week across the industry. With that in mind, for outlets and publications where time does not permit coding the entire news content offered each day (three hours of network morning programming, for instance), we code the lead portion. In other words, we code the first 30 minutes of the cable news programs, the first 30 minutes of the network morning news programs, the front page of newspapers, etc. This may skew the overall universe toward more “serious” stories, but this is also the most likely time period to include coverage of the “main” news events of the day, those that would make up the top stories each week or each month.

Below we describe the selection process and resulting sample for each main sector.

Note: The statistics cited here are the statistics that were accurate at the time of the launch of the Index (January 2007). When available, updated data are included in the footnotes section.

Network News

Sector Reach

Each evening, the three broadcast network news programs on ABC, NBC and CBS reach about 27 million viewers. The morning news shows on those networks are seen by 14.1 million viewers.1 In addition, the nightly newscast on PBS reaches 2.4 million viewers daily, according to its internal figures. Because the universe of national broadcast networks is limited to these four, it is practical to include all of the networks as our sample universe.

Sector Sample

Each of the three major broadcast networks produces two daily national general interest news shows, one in the morning (such as Good Morning America) and one in the evening. It is practical, therefore, to include at least part of all these news programs on ABC, CBS and NBC in our sample. (The magazine genre of programs are not included in the universe, both because in most cases they are not daily (except for Nightline) and because they are not devoted predictably to covering the news of the day.) At the same time, because the PBS NewsHour is considered by many as an alternative nightly news broadcast to the three major networks, and it reaches a substantial audience, we also include that program.

Units of Study

For the commercial evening newscasts, the study codes the entire program. For the morning programs, it codes the news segments that appear during the first 30 minutes of the broadcast, including the national news inserts but not local inserts. By selecting this sample of the morning shows, it is possible that we will be missing some news stories that appear later in the programs. Through prior PEJ research, however, we have learned that the morning shows generally move away from the news of the day after the first 30 minutes, save for the top-of-the-hour news insert, and present more human interest and lifestyle stories after that point. The stories that the networks feel are most important will appear during the first 30 minutes and will be included in our study.

For PBS NewHour, where the second half of the program differs from the first half, we began rotating beginning March 31, 2008, between the first and second half-hours of the show in order to get a closer representation of the program’s overall content.
The resulting network sample is as follows:

Commercial Evening News: Entire 30 minutes of all three programs each day (90 minutes).

Commercial Morning News: First 30 minutes of all three programs each day (90 minutes).

PBS NewsHour: Rotate between first and second 30 minutes each day
The combination of morning and evening news broadcasts results in three and a half hours of programming each day.

Cable Television

Sector Reach

According to ratings data, the individual programs of the three main cable television news channels — CNN, MSNBC and Fox News — do not reach as many viewers as those of the broadcast network news shows. During prime-time hours, 2.7 million viewers watch cable news, while 1.6 million watch during daytime hours.2 But ratings data arguably undercount the reach of cable news. Survey data now find that somewhat more people cite cable news as their primary or first source for national and international news as they do broadcast network news.

Sector Sample

The most likely option was to study CNN, MSNBC and Fox News. These represent the dominant channel of programming from each news-producing cable company. (This means selecting MSNBC as opposed to CNBC, and CNN as opposed to CNN Headline News, and MSNBC over Headline News, which now sometimes beats MSNBC in ratings).

Units of Study

Since these channels provide programming round the clock, with individual programs sometimes reaching fairly small audiences, it is not practical for us to code all of the available shows. On the one hand, there is a great challenge in selecting several times out of the day to serve as a sample of cable news over all.

On the other hand, earlier studies have shown that for much of the day, most people find one cable news program on a channel to be indistinguishable from another. If one were to ask a daytime viewer of cable news which program he or she preferred, the 10 a.m. or the noon, the questioner might get a confused look in response. For blocks of hours at a time, the channels will have programs with generic titles such as CNN Newsroom, Your World Today or Fox News Live. Our studies have shown that there are four distinct programming sectors to cable: early morning, daytime, early evening and prime time.

Working with academic advisers, we weighed various options. A selection based on the most-watched programs would result in the O’Reilly Factor (1.8 million viewers a night) for Fox and Larry King Live (500,000 viewers a night) for CNN.3 Some of these shows, however, are not news programs per se, but rather have content that derives from the host’s opinions and guests on any given day. Separating news and talk also proved problematic because it is often difficult to distinguish between the two categories, and several programs offer both news and talk in the same hour.

The best option, we concluded, was to draw from two time periods:

1) The daytime period, to demonstrate what continuing or live events are being covered. The study includes two 30-minute segments of daytime programming each day, rotating among the three networks.

2) Early evening and prime time (6 p.m.-11p.m.) together as a unit, rather than separating out talk and news or early prime and late prime. Within this five-hour period, we included all programming that focuses on general news events of the day. Basically, this removes three programs: Fox’s Greta Van Susteren, which is more narrowly focused on crime; CNN’s Larry King, which as often as not is focused on entertainment or personal stories rather than news events, and MSNBC’s documentary program.

Prior to January 1, 2009, three out of four evening cable programs were coded each evening for both Fox News and CNN. For MSNBC, two out of four evening programs were coded each evening. These shows were rotated each weekday. In the past, MSNBC’s ratings were significantly lower than the ratings for Fox News and CNN, and that was the justification for including one fewer of their shows each evening.

At the start of 2009, PEJ made a change to the evening cable sample. While MSNBC’s ratings still trail both Fox News and CNN, the general trend over the last few years has been that the differences have decreased. It becomes harder, therefore, to justify having a different amount of programming for the three stations. Consequently, beginning on January 1, 2009, all three stations have two of their four evening cable shows coded on a nightly basis.

To include the most cable offerings possible each week, the study codes the first 30 minutes of selected programs and rotates them daily. Morning shows were not included because those shows are run at the same time for every part of the country – meaning that a broadcast that starts at 7 a.m. on the East Coast will begin at 4 a.m. on the West Coast. Those programs appear far too early for much of the country to actually view. This is in contrast to the broadcast morning programs, which are shown on tape delay in different parts of the country, in the manner of other broadcast programs.

This process resulted in the following cable sample:

Daytime
Rotate, coding two out of three 30-minute daytime slots each day (60 minutes a day)

Prime Time
Two 30-minute segments for Fox News (60 minutes)
Two 30-minute segments for CNN (60 minutes)
Two 30-minute segments for MSNBC (60 minutes)

The Index rotates among all programming from 6 p.m. to 11 p.m. that was focused on general news events of the day, excluding CNN’s Larry King Live and Fox’s Greta Van Susteren.

CNN, MSNBC and Fox News made some programming changes during 2009, and our sample included the replacement shows when appropriate.

CNN Fox News MSNBC
6 p.m. Situation Room Special Report with Britt Hume/ Special Report with Bret Baier 1600 Pennsylvania Avenue/ The Ed Show
7 p.m. Lou Dobbs Tonight/ CNN Tonight Fox Report with Shephard Smith Hardball with Chris Matthews
8 p.m. Campbell Brown: No Bias, No Bull/CNN Prime Time The O’Reilly Factor Countdown with Keith Olbermann
9 p.m. —— Hannity & Colmes/ Hannity The Rachel Maddow Show
10 p.m. Anderson Cooper 360 —— ——

This results in four hours of cable programming each day (including daytime).

Newspapers

Sector Reach

About 54 million people buy a newspaper each weekday.4 This number does not include the “pass along” rate of newspapers, which some estimate, depending on the paper, to be approximately three times the circulation rate. In addition, some newspapers, such as the New York Times and Washington Post, have an even greater influence on the national and international news agenda because they serve as sources of news that many other outlets look to in making their own programming and editorial decisions. So while the overall audience for newspapers has declined over recent years, newspapers still play a large and consequential role in setting the news agenda that cannot be strictly quantified or evaluated by circulation data alone. There is a growing body of data to suggest that the total audience of newspapers, combining their reach in print and online, may actually be growing slightly.

Sector Sample

To create a representation of what national stories are being covered by the 1,450 newspapers around the country, we divided the country’s daily papers into three tiers based on circulation: over 650,000; 100,000 to 650,000; and under 100,000. Within each tier, we selected papers along the following criteria:

First, papers need to be available electronically on the day of publication. Three websites — www.nexis.com, www.newsstand.com and www.pressdisplay.com — offer same-day full-text delivery service. Based on their general same-day availability (excluding nondaily papers, non-U.S. papers, non-English language papers, college papers and special niche papers) a list of U.S. general interest daily newspapers was constructed. The original sample list included seven papers in Tier 1, 44 papers in Tier 2 and 22 papers in Tier 3.

Tier 1: Due to its national prominence and readership, and the desirability of having at least one newspaper that was coded every day without any interruption due to rotation (in the same way the network newscasts are coded), we decided to code the New York Times six days a week day (Sunday through Friday). We then wanted to include a representation from the other large nationally admired or distributed papers, so each day we code two of four of the largest papers — the Washington Post, the Los Angeles Times, USA Today and the Wall Street Journal.

Tiers 2 and 3: Four newspapers were selected from Tier 2 and Tier 3 respectively. To ensure geographical diversity, each of the four newspapers within Tier 2 and Tier 3 was randomly selected from a different geographic region according to the parameters established by the U.S. Census Bureau — Northeast Region, Midwest Region, South Region and West Region. An effort was also made to ensure diversity by ownership. We rotate two of the four newspapers in Tier 2 and Tier 3 each day.

Tier 2 and Tier 3 papers have been changed twice (2008 and again in 2009) since the inception of the News Coverage Index, For 2009 our sector sample was:

1st Tier
New York Times
Washington Post
Los Angeles Times
USA Today
Wall Street Journal

2nd Tier
Kansas City Star
Pittsburgh Post-Gazette
San Antonio Express-News
San Jose Mercury News

3rd Tier
Herald News (Massachusetts)
Anniston Star (Alabama)
Spokesman-Review (Washington)
East Valley Tribune  (Arizona)/ Meadville Tribune (Pennsylvania)

Units of Study
For each of the papers selected, we code only articles that begin on Page A1 (including jumps). The argument for this is that the papers have made the decision to feature those stories on that day’s edition. That means we do not code the articles on the inside of the A section or in any other section.

The first argument for ignoring these stories is that they would be unnecessary for our Index, which measures only the biggest stories each week. If a story appears on the inside of the paper, but does not make A1 at any point, it would almost certainly not be a big enough story to make the top list of stories we track each week. The weakness of this approach, arguably, is that it undercounts the full agenda of national and international news in that it neglects those stories that were not on Page A1 on certain days but were on others. While this is perhaps less pertinent in the weekly Index, at the end of the year, when trying to assess the full range of what the media covered, those stories that appeared on the inside of the paper but did not vanish may be undercounted.
Part of the reasoning for excluding those national and international stories that begin inside the front section of the paper is practical. Coding the interior of the papers to round out the sample for year-end purposes would be an enormous amount of work for relatively minimal gain.

The other argument for forgoing national and international stories that fail to make Page A1 is more conceptual. We are measuring what newspapers emphasize, their top agenda. Given the cost versus the benefit, capturing the front page of more newspapers seemed the better alternative. (In the same regard, we do not code every story that might appear on a website, an even more daunting task, but instead code just the top stories.)
The other challenge with newspapers that we did not face with some other media is that we include only stories that are national or international. National is defined as a story being covered by newspapers from different locations, as opposed to a local story that is only covered in one paper. The only local stories included in the study are those that are pertain to a larger national issue — how the war in Iraq is affecting the hometown, for instance, or new job cuts at the local industries because of the sliding economy.
This results in a newspaper sample of approximately 25 stories a day.

Online News Sites

Sector Reach

About 30 million Internet users go online for news each day.5 About 6.8 million people read a blog each day, some of the most popular of which are news oriented.6 Both online news sites and blogs are becoming more important in the overall news agenda. Any sample of the modern news culture must include a representation of some of the more popular online news sources.

Sector Sample

The online news universe is even more expansive than radio and has seemingly countless sites that could serve as news sources. To get an understanding of online news sources, we chose to include several of the most popular news sites in our universe as a sample of the overall online news agenda. We also wanted balance in the type of online news sites, between those that produced their own content and those that aggregated news from throughout the Web.

When the Index was originally launched in 2007, the sample included five  prominent Web sites that were tracked each weekday. These sites were Yahoo News, MSNBC.com, CNN.com, AOL News, and Google News. Considering the increased usage of the Internet for news shown recent surveys conducted by the Pew Research Center for the People & the Press, PEJ decided to expand our Internet content.

The increase in the number of sites included in the NCI took effect on January 1, 2009.
To choose the sites to be included in our expanded online sample, we referred to the lists of the top news sites based on the averages of six months of data (May-October 2008) from two rating services, Nielsen and Hitwise. (Data providing the most-visited news sites ranked by specific Web addresses were not available from Comscore at the time of our sampling.)

First, we found the top general interest news sites ranked by their average unique audience data for six months based on Nielsen NetView monthly data on a subdomain level. Second, we found the top general interest news sites ranked by their average market share for six months based on monthly rankings for top news and media websites provided by Hitwise. We then averaged the ranks of the top sites on these two lists to determine the top 12 general interest news websites.

The sites included in our current sample are as follows:

Yahoo News
MSNBC.com
CNN.com
NYTimes.com
Google News
AOL News
Foxnews.com
USAToday.com
Washingtonpost.com
ABCNews.com
BBC News (international version)
Reuters.com

Units of Study

For the online news sites, the study captures each site once a day. We rotate the time of day that we capture the websites between 9 a.m. to 10 a.m. Eastern time and 4 p.m. to 5 p.m. Eastern time. For each site capture, we code the top five stories, since those have been determined to be the most prominent at that time by the particular news service. As is true with our decision about Page A1 in newspapers, if a story is not big enough for the online sites to highlight it in their top five stories, it is likely not a story that will register on our tally of the top stories each week.

This results in a sample of 30 stories a day.

Radio
Sector Reach

Radio is a diverse medium that reaches the majority of the people, with 94 percent of Americans 12 years and older listening to traditional radio each week. 7 About 16% of radio listeners tune in to news, talk and information radio in an average week, which ranks it as the most popular of all measured radio formats.8 Many more Americans get news from headlines while listening to other formats as well.

The challenge with coding national radio programs is that much of radio news content is localized, and the number of shows that reach a national audience is only a fraction of the overall programming. On the other hand, our content analysis of radio confirms that news on commercial radio in most cities has been reduced to headlines from local wires and syndicated network feeds (plus talk, much of which is nationally syndicated itself). The exception is in a few major cities where a few all-news commercial radio stations still survive, such as Washington, where WTOP is a significant all-news operation.

Sector Sample

The Index includes three different areas of radio news programming.
1. Public Radio: The Index includes 30 minutes of National Public Radio’s (NPR) morning program, Morning Edition, each day.

NPR produces two hours of Morning Edition each day, which also includes multiple news roundups produced by a different unit of NPR. Member stations may pick any segments within those two hours and mix and match as fits their programming interests. Thus, what airs on a member station is considered a “co-production” of NPR and that member station rather than programming coming directly from NPR. In order to account for this unique relationship, PEJ rotates between coding the first 30 minutes of the first hour and the first 30 minutes of the second hour of the member station that we record the show from, WFYI, in Indianapolis This gives us a closer representation of the overall content of Morning Edition.

2. Talk Radio: The Index includes some of the most popular national talk shows that are public affairs or news-oriented. Since the larger portion of the talk radio audience, and talk radio hosts, are conservatives, we included more conservative hosts than liberals.  We code the first 30 minutes of each selected show.

The most popular conservative radio talk shows in 2009 were Rush Limbaugh and Sean Hannity. In early 2009, we coded Michael Savage as the third most popular conservative talk host. Later, due to changing audience data, we switched to Glenn Beck in the latter part of the year. We coded each of these shows every other day.

Since the politically liberal audience for talk radio is much smaller, we only coded one liberal talk show a day. In 2009, we coded Ed Schultz and either Randi Rhodes, Stephanie Miller or Thom Hartmann. These were the top liberal radio hosts based on national audience numbers. The Arbitron ratings, according to Talker’s Magazine online, for spring 2006 are as follows:

Minimum Weekly Cume (in millions, rounded to the nearest .25, based on Spring ’06 Arbitron reports) 9
Rush Limbaugh (13.5)
Sean Hannity (12.5)
Michael Savage (8.25)
Glenn Beck**
Ed Schultz (2.25)
Randi Rhodes (1.25)
Alan Colmes (1.25)
Thom Hartmann**
Stephanie Miller†

**Note: The most recent ratings from Talkers Magazine show a marked increase for Glenn Beck and Thom Hartmann. Consequently, these two shows were added to our rotation in October 2009. Numbers from 2006 do not apply.
†Note: We coded Stephanie Miller’s radio show from March 2, 2009 until May 8, 2009, while Randi Rhodes was off the air. Numbers from 2006 do not apply.

3. Headline Feeds: Hourly news feeds from national radio organizations like CBS and CNN appear on local stations across the country. These feeds usually last about five  minutes at the top of each hour and are national in the sense that people all over the country get the same information. They frequently supplement local talk and news shows.
To get a representation of these feeds, we code two national feeds, each twice a day (9 a.m. and 5 p.m. Eastern time). The networks coded are CBS Radio and ABC Radio.
The stations used to capture each program are selected based on the availability of a solid feed through the station’s website. We have also compared their shows to that of other stations to ensure that the same edition is aired on that station as on other stations carrying the same program.

This results in the following sample:

News: 30 minutes of NPR’s Morning Edition each day, as broadcast on a selected member station.

Talk: The first 30 minutes of two or three talk programs each day — one or two conservative (out of Rush Limbaugh, Sean Hannity and Michael Savage/ Glenn Beck) and one liberal (Ed Schultz or Randi Rhodes/ Stephanie Miller/ Thom Hartmann).
Headlines: Four headline segments each day (two from ABC Radio and two from CBS Radio), about 20 minutes total.
This results in a sample of about two and a half hours of programming a day.

Universe of Outlets

Each day, then, the NCI includes about 10 hours of broadcast (television and radio), 7 newspapers (with about 25 stories), and 6 news websites (30 stories).
Newspapers (Thirteen in all, Sunday through Friday)

New York Times every day
Code two out of these four every day
Washington Post
Los Angeles Times
USA Today
Wall Street Journal
Code two out of these four every day
Kansas City Star
Pittsburgh Post-Gazette
San Antonio Express-News
San Jose Mercury News
Code 2 out of these 4 every day
Herald News (Massachusetts)
Anniston Star (Alabama)
Spokesman-Review (Washington State)
East Valley Tribune (Arizona)/ Meadville Tribune (Pennsylvania)

Websites (Code 6 of 12 each day, Monday through Friday)
Yahoo News
MSNBC.com
CNN.com
NYTimes.com
Google News
AOL News
Foxnews.com
USAToday.com
Washingtonpost.com
ABCNews.com
BBC News (International Version)
Reuters.com

Network TV (Seven in all, Monday through Friday)

Morning shows
ABC – Good Morning America
CBS – Early Show
NBC — Today
Evening news
ABC – World News Tonight
CBS – CBS Evening News
NBC – NBC Nightly News
PBS –  NewsHour

Cable TV (Fifteen in all, Monday through Friday)
Daytime (2 p.m. to 2:30 p.m.) – code two out of three every day
CNN
Fox News
MSNBC

Nighttime CNN – code two out of four every day
Situation Room
Lou Dobbs Tonight/ CNN Tonight
Campbell Brown: No Bias, No Bull/CNN Prime Time
Anderson Cooper 360

Nighttime Fox News – code two out of four every day
Special Report with Britt Hume/ Special Report With Bret Baier
Fox Report With Shepard Smith
O’Reilly Factor
Hannity & Colmes/ Hannity

Nighttime MSNBC – code two out of the four every day
1600 Pennsylvania Avenue/ The Ed Show
Hardball with Chris Matthews (7 p.m.)
Countdown With Keith Olbermann
The Rachael Maddow Show

Radio (Eight in all, Monday through Friday)

News and Headlines every day
ABC Radio headlines at 9 a.m. and 5 p.m.
CBS Radio headlines at 9 a.m. and 5 p.m.
NPR Morning Edition every day

Talk Radio
Rush Limbaugh every other day
One out of two additional conservatives every day
Sean Hannity
and
Michael Savage/ Glenn Beck

One out of two liberals every day
Ed Schultz
Or
Randi Rhodes/ Stephanie Miller/ Thom Hartmann
That brings us to 33 or 34 outlets each weekday. Sundays are accounted for with seven  newspapers.

Universe Procurement and Story Inclusion

Newspapers

For each of the seven newspapers included in our sample, we code all articles where the beginning of the text of the story appears on the front page of that day’s hard copy edition. If a story only has a picture, caption or teaser to text inside the edition, we do not include that story in our sample. We code all stories that appear on the front page with a national or international focus.

Because we are looking at the coverage of national and international news, if a story is about an event that is solely local to the paper’s point of origin, we exclude such a story from our sample. The only exception to this rule is when an article with a local focus is tied to a story that we have determined to be a “Big Story” – defined as one that has been covered in multiple national news outlets for more than one news cycle. For example, a story about a local soldier who has come back from the war in Iraq has a local angle but is related to a national issue and is important in the context of our study.

We code the entirety of the text of all the articles we include. If an article includes a jump to an inside page in the hard copy edition, we code all the text including what is on the jump.
When possible, we have subscribed to the hard copies of the selected newspapers and have them delivered to our Washington office. This is possible for national papers that have same-day delivery (the New York Times, the Washington Post, the Wall Street Journal and USA Today). For these papers, we use the hard copy edition to determine the placement on the front page of the edition and to get all the text we will code. We use the LexisNexis computer database to determine the word count for each of the stories.

For all of the other papers that we are not able to get hard copies of within the same day of publication, we take advantage of Internet resources that have digital copies of the hard copy editions. Pressdisplay.com and Newsstand.com have subscription services offering same-day access digital versions of the hard copy. From these digital versions, we obtain the text of the relevant articles and also determine the word counts. To get the word counts, we copy the text of the articles (not including captions, headlines, bylines or pull-out quotes) into the Microsoft Word software program and run the “word count” function to get the final number. When necessary, we go to the paper’s website in order to find the text of articles that are not available on either of the two Web services. Through examination of each individual article, we are able to determine when the text of the article on the website is the same as it would be in the hard copy of the paper.

Network and Cable Television

For all television programs, we code the first 30 minutes of the broadcast (with the exception of the PBS NewsHour), regardless of how long the program lasts. As with newspapers, we code all stories that are news reports that relate to a national or international issue. Therefore, we do not code stories that are part of a local insert into a national show. For example, each half-hour, NBC’s Today Show cuts to a local affiliate,  which will report local stories and local weather. We do not include those local insert stories.

We also exclude from our sample commercials, promos and teasers of upcoming stories. We are only interested in the actual reporting that takes place during the broadcasts.

Any story that fits the above criteria and begins within the first 30 minutes is included in the study, even if the story finishes outside of the 30-minute time period. A three-minute story that begins 28 minutes into a program would be coded in its entirety, even though the final minute ran after our 30-minute cutoff mark. The exception to this rule is when a television station is showing a speech or press conference that runs longer than the 30-minute period (often much longer). In those cases, we cut off the coding at the 30-minute mark in order to prevent that event from unduly impacting our overall data.

The method of collection of all television programs is the same. PEJ subscribes to DirectTV satellite television service and we have nine TiVo recording boxes hooked up to the DirectTV signals. Through these TiVo services, we digitally record each broadcast and then archive the programs onto DVDs. There is redundancy in our recording method so that each show is recorded on two machines in order to avoid problems in our capture that might result from technical error.

Occasionally, outlets deviate from the regularly scheduled news programs. When a show is pre-empted for a special live event, such as a presidential campaign debate or the State of the Union address, we do not include that period as part of our sample.

Radio

The rules for capturing and selecting stories to code for radio are very similar to television. We code the first 30 minutes or each show regardless of how long the show lasts. We also exclude local inserts from local affiliates and continue coding any story that runs past the 30-minute mark.

For each of the radio shows selected, we have found national feeds of the show that are available on the Web. As with television, we have two computers capturing each show so as to avoid errors if one feed is not working. The actual recording is done using a software program called Replay A/V, which captures the digital feeds and creates digital copies of the programs onto our computers. We then archive those programs onto DVDs.

Online

For each of the websites we are including in our sample, we capture and code the top five stories that appear on the site at the time of capture. Our capture times rotate on a regular basis. They occur either between 9 a.m. and 10 a.m. Eastern time or between 4 p.m. and 5 p.m. Eastern time each weekday. The captures occur with a coder going to each site using an Internet browser and saving the home page and appropriate article pages to our computers, exactly as they appear in our browsers at the time of the capture. We rely on people rather than a software package to capture sites because some software packages have proved invasive to websites.

With the current rotation of websites along with the rotation of the times of day that we capture the sites, we wanted to make sure that we did not always capture the same sites at the same time (CNN.com always at 9 a.m., for example). We also wanted to assure that for the websites where we coded another outlet from the same news organization, such as USA Today’s newspaper and the usatoday.com website, we did not code both of those outlets on the same days. In order to avoid these two concerns, we created a method of rotation in which the capturing times for the website rotate every two days.

This means that the pattern the capture times follow is 9 a.m., 9 a.m., 4 p.m., 4 p.m., 9 a.m., etc.

Here is an example of how the online rotation works:

As with newspapers, some stories are longer than one Web page. In those cases, we include the entire text of the article for as many Web pages as the article lasts.
Because each website is formatted differently, we came up with a standard set of rules to determine which stories are the most prominent on a given home page. We spent a significant amount of time examining various popular news sites and discovered patterns that led us to the best possible rules. First, we ignore all advertisements and extra features on the sites that are not reported news stories. We are interested only in the main channels of the websites where the lead stories of the day are displayed. Second, we determine the top “lead” story. That is the story with the largest font size for its headline on the home page. The second-most prominent story is the story that has a picture associated with it, if that story is different than the story with the next largest headline. By considering many sites, we realized that a number of sites associate pictures with stories that they find particularly interesting but are clearly not intended to be the most important story of the day. We do want those stories to be in our sample, however, because the reader’s eye will be drawn to them.

Having figured out the most and second-most prominent stories, we then rely on two factors to determine the next three most prominent stories. We first consider the size of the headline text and then the height on the home page. Therefore, for determining the third most prominent story, we look for the story with the largest headline font after the top two most prominent stories. If there are several stories with identical font sizes in headlines, we determine that the story that is higher up on the page is more prominent. In cases where two articles have the same font size and the same height on the screen, we choose the article to the left to be the more prominent one.

For the first two years of the NCI, we did not include online news stories that were audio or video features. Starting in 2009, PEJ changed its method of measuring online stories to allow for the inclusion of audio and video stories. See the section below entitled “Inclusion of Online Audio and Video in Index Calculations” for details on how the changes to the Index statistics have been incorporated.

Coding Procedures and Intercoder Reliability

A coding protocol was designed for this project based on PEJ’s previous related studies. Seventeen variables are coded: coder ID, date coded, story ID number (these three are generated from the coding software automatically), story date, source, broadcast start time, broadcast story start timecode, story word count, placement/prominence, story format, story describer, big story, substoryline, geographic focus, broad story topic, lead newsmaker and broadcast story ending timecode.

The source variable takes in all the media outlets we code. The variable for broadcast start time applies to radio and TV broadcast news and gives the starting time of the program in which the story appears. Broadcast story start timecode is the time at which a story begins after the start of the show, while broadcast story ending timecode is the time at which a story ends. The variable for story word count designates the word count of each individual print/online news story. The placement/prominence variable designates where stories are placed within a publication, on a website or within a broadcast. The location reflects the prominence given the stories by the journalists creating and editing the content. Story format measures the type and origin of the text-based and broadcast stories, which designates, at a basic level, whether the news story is a product of original reporting or drawn from another news source. Story describer is a short description of the content of each story. Big stories are particular topics that occurred often in news media during the time period under study. Substoryline applies to stories that fit into some of the long-running big stories, reflecting specific aspects, features or narrower elements of some big stories. The variable for geographic focus refers to the geographic area to which the topic is relevant in relation to the location of the news source.

NOTE: If you are using the year in the news interactive and you create a chart using story geography, please note that one geographic category, local, is not rendered by the chart. In general, the amount of “local” coverage was too small to offer a meaningful number, except in the newspaper category.

The variable for the broad story topic identifies which of the type of broad topic categories is addressed by a story. The variable for lead newsmaker names the person or group that is the central focus of the story.

The coding team responsible for performing the content analysis in 2009 ranged from 16 to 20 people through the year. This includes a coding manager, content and training coordinator, methodologist and content supervisor. Several of the coders have been trained extensively since the summer of 2006 and most of the coders have more than a year’s worth of coding experience.

Numerous tests of intercoder reliability have been conducted since the inception of the NCI in order to ensure accuracy among all coders.

2009 – Early 2010 Intercoder Tests

In 2009, PEJ conducted two phases of major intercoder testing to ensure continuing accuracy among all coders.
The first phase tested for variables that require little to no subjectivity from the coder. We refer to these codes as Housekeeping Variables. The second phase of testing was conducted in the fall of 2009. In this phase we tested for variables that are more complex and require more training and expertise. We call these the Main Variables.

Housekeeping Variables

In summer 2009, we tested intercoder agreement for Housekeeping variables. These are variables that are necessary for each story but involve little inference from each coder.
We used a random sample of 131 stories, representing all five media sectors that we code. This sample represented 10% of the number of the stories we code in an average week.

A total of 15 coders participated in the study. Each coder was asked to recode each of the 131 stories.

A total of 27 print (12 newspaper, 15 online) and 104 broadcast (44 network, 36 cable and 24 radio) stories were sampled.
The percent of agreement was as follows:
Story Date: 99%
Source: 97%
Placement: 94%
Print Only Variable:
Story Word Count (+/- 20 words): 84%
Broadcast Only Variables:
Broadcast Start Time: 98%
Story Start Time (+/- 6 seconds): 97%
Story End Time (+/- 6 seconds): 91%

Main Variables

The second group of variables we tested was referred to as the main variables, and they involve more training and interpretation. Having already demonstrated that we had a high level of agreement for all of our housekeeping variables, we then had the coders participate in separate tests for these main variables.

In the fall of 2009, we conducted intercoder testing for main variables. One hundred and thirty stories coded were randomly selected from all five media sectors 20 newspaper articles, 10 online stories, 36 network stories, 41 cable stories and 23 radio stories). These stories were coded over the course of 10 weeks.

A total of 16 coders participated in this test.

For main variables, we achieved the following level of agreement:
Format: 86%
Big Story: 85%
Substoryline: 83%
Geographic Focus: 89%
Lead Newsmaker: 86%
Lead Newsmaker 2: 90%

For our most complicated variable, Broad Story Topic, we conducted multiple tests in mid to late 2009 and early 2010. The average agreement for Broad Story Topic was 81%.

Testing Details

All the percentages of agreement for the above variables were calculated using a software program available online called PRAM.10

Since the inception of the News Coverage Index, as new coders were hired and included in the coding team, they were given extensive training by the training coordinator, the content supervisor and other experienced coders. New coders were not allowed to participate in the weekly coding for the project until they had demonstrated a level of agreement with experienced coders for all variables at an 80% level or higher.

Each coder worked between 20 and 37.5 hours a week in our Washington office and was trained to work on all the print and broadcast media included in the sample. The schedule for each coder varies, but since all of the material included in the Index is archived, the actual coding can be performed at any point during the week.

To achieve diversity in the coding and ensure statistical reliability, generally no one coder codes more than 50% of a particular media sector within one week. Each coder codes at least three media each week. In the case of difficult coding decisions about a particular story, the final decision is made by either the coding administrator or a senior member of the PEJ staff.

The coding data are entered into a proprietary software program that has been written for this project by Phase II Technology. The software allows coders to enter the data for each variable and also allows coders to review their work and correct mistakes when needed. The same software package compiles all of the coding data each week and allows us to perform the necessary statistical tests.

Total Media Combined: Creation and Weighting

The basis of measurement used to determine top stories in broadcast and cable is time, and in text-based media, it is words. Thus for cable news, for example, we refer to the percentage of total seconds that a certain story received. In other words, of all the seconds analyzed in cable news in a week, ground events in Iraq accounted for xx% (or xx seconds out of a total of xxx). The industry term for this is news hole—the space given to news content.

The main Index considers broadcast and print together, identifying the top stories across all media. To do this, words and seconds are merged together to become total news hole. After considering the various options for merging the two, the most straightforward and sensible method was to first generate the percent of newshole for each specific medium. This way all media are represented in the same measurement — percent.
Next, we needed to create a method for merging the various percentages. There were several options. Should we run a simple average of all five? Should we average all print and all broadcast and then average those two? Or, should we apply some kind of weight based on apparent audience?

Because each medium measures its audience differently (ratings per month in television, circulation in newspapers, unique visitors in online), any system based on audience figures raises serious issues of discontinuity. Nonetheless, several of our advisers thought some kind of weight should be applied. Various options were considered, including a combination of different metrics, such as audience data alongside supplemental survey data. One consistent measure is that of public opinion surveys. The same question is posed about multiple media. Two such questions are asked regularly by the Pew Research Center for the People & the Press. One asks about “regular usage” and the other asks where people go for “national and international news.”
Before arriving at a method for the launch of the Index in January 2007, we tested multiple models:

Model 1: Compile percentages for big stories for each of the five media sectors (newspapers, online sites, network TV, cable TV and radio), and then average those five lists into one final list.

Model 2: Divide the media sectors into two groups, text-based media (newspapers, online sites) and broadcast (network TV, cable TV and radio). Average the lists of percentages between the two groups to get one final list.

Model 3: Compile percentages for big stories for each of the five media sectors, and then add the weighted five lists together into one final list. The weights given to each media sector were calculated by averaging the three most recent survey data in terms of where people get news about national and international issues, collected by the Pew Research Center for the People & the Press (June 2005, November 2005 and August 2006). First, we take the average response for each media sector across the three time periods. Next, we rebalance the average percentages to match the five media sectors in the Index —newspapers, online, network TV, cable TV and radio — to equal 100%. Thus, the weight for newspapers would be 0.28, online would be 0.16, network TV would be 0.18, cable TV would be 0.26, and radio would be 0.12.

Model 4: Compile percentages for big stories for each of the five media sectors, and then add the weighted five lists together into one final list. The weights assigned to each media sector were generated based on the regular media usage survey data, collected by the Pew Research Center for the People & the Press in its Biennial Media Consumption Survey 2006. Thus, the weight for newspapers would be 0.307, online would be 0.218, network TV would be 0.165, cable TV would be 0.201 and radio would be 0.109.
By testing two trial weeks’ data, we found that the lists of top five stories were exactly the same (top stories’ names and their ranks) using all four of these models, although some percentages varied. In the end, the academic and survey analysts on our team felt the best option was Model 3. It has the virtue of tracking the media use for national and international news, which is what the Index studies. Also, the Pew Research Center for the People & the Press asks this question about once every six months so we can reflect changes in media use. We adopted this model and updated the weights when appropriate.

Note: The weights used for data in Model 3 have been updated twice since the inception of the News Coverage Index.

On January 1, 2009, we updated our weights for the year. For 2009, the weights were generated by averaging December 2008 survey data and the last 2007 survey data (September 2007) collected by the Pew Research Center for the People & the Press. Thus, the weights used for the Index in 2009 are as follows:

2009 Weights
Newspapers: 0.25
Online: 0.23
Network TV: 0.16
Cable TV: 0.25
Radio: 0.11
The first update was on June 16, 2008, based on the three most recent surveys conducted by the Pew Research Center for the People & the Press (September 2007, July 2007 and February 2007). The weights used for the Index in 2008 were as follows:

2008 Weights
Newspapers: 0.26
Online: 0.20
Network TV: 0.18
Cable TV: 0.24
Radio: 0.12

Inclusion of Online Audio and Video in Index Calculations

The decision to include audio and video stories for online beginning in 2009 has meant that PEJ needed to create a method to incorporate different ways of measuring length (time in seconds versus amount of words) within the same media sector.

Prior to this change, the calculations for the percentage of the news hole for the online sector had been the percentage of words (in text). By now including multimedia elements in our Web sample, this created a challenge for coming up with a percentage of news hole calculation that can incorporate both text and the length of multimedia stories in seconds. PEJ undertook several tests to come up with a simple, yet accurate, method of creating an equivalent measure.

The process PEJ uses for valuing multimedia stories is to take the length of the multimedia story (in seconds) and multiply by 4 to get an approximate equivalent value to a text story of that number of words. For example, an online video that is 30 seconds in length would be given an equivalent value of 120 in words (30 x 4). An online video that is 60 seconds in length would be given an equivalent value of 240 words (60 x 4).
PEJ arrived at this method by first timing how long it takes for people to read news stories out loud. After having five people timed reading different types of news stories, we discovered that people read approximately 3 words per second. However, simply multiplying the length of a story by 3 would not accurately reflect the value of a multimedia story.

We then compared the distribution of the length of online text stories to the distribution of the length of online multimedia stories. To make this comparison, we took the distribution of 3500 online text stories over the last six months in the NCI and compared that to the distribution of length for 280 video stories compiled from seven separate Web news sites. The median length of a text story was approximately 600 words, while the median length of a video story was approximately 150 seconds.

Drawing from these comparisons, we determined that multiplying the length by 4 gave a reasonable approximate value to use in comparison to the length of a text story in words. No single multiplier would match exactly since the distribution of the length of Web videos is not linear and because there is no simple way to quantify the value of visuals within multimedia stories along with the text. However, this simple method of multiplication gives us a straightforward way to make approximate equivalents between two measures (seconds and words) that are not otherwise easily compared.

Community Journalism Report

This report is based on academic research conducted over two years. The first year of research was conducted by a multi-university team including Steve Lacy of Michigan State University, Esther Thorson and Margaret Duffy of the University of Missouri and Dan Riffe of University of North Carolina.  The second year of research, released here was conducted by Esther Thorson, Margaret Duffy, Ken Fleming, Youngah Lee and Mi Rosie Jahng, at the University of Missouri School of Journalism. Both levels of research were funded through a grant from Pew Charitable Trusts and the Knight Foundation.

The methodology for their analysis of the 60 select sites is as follows:

Forty-five markets were randomly collected from three U.S. city sizes (large:   507,000 to 2.2 million households; medium: 100,000 to 506,000 households; and small: 50,000 to 99,000 households) of the 280 Census-defined (2000) Metropolitan Statistical Areas. No markets smaller than 50,000 households were selected, and Chicago, one of the three largest metropolitan areas, was included added for a total of 46 markets.

To qualify for inclusion, each market had to have at least one online site meeting the definition of “citizen journalism.” Given the variety of different forms that citizen sites can take, we honed a four-featured definition:

  1. Local service or region definition: The site must identify some specific geographic area it serves. Such information may be found on a home page banner, in a mission statement, in a FAQ section or through some other means of self-identification on the site.
  2. Citizen participation: The site must indicate that a significant portion of content is provided by volunteers or community members, not professional journalists.  Such information may also be found on site locations that provide details about the geographic area served.
  3. Journalism content: At least some of the news and opinions provided must focus on the local geographic area rather than broader national or world areas.
  4. Origination: At least some of the material on the site must be originally produced for the site by citizens who participate. The site may also qualify for inclusion if citizens are aggregating material found in other places that is of relevance and importance to the audience the site serves.

Determination of whether a market had one or more citizen journalism sites and the identification of a list of sites in each market involved using three sources that have lists of citizen journalism cites. These were Placeblogger, Knight Citizen News Network and Cyberjournalism.net.

To identify the 60 “superior” sites, we used a variety of methods and sources.  First, we conducted key word searches to identify commentators and professional journalists who had discovered and were discussing sites they considered noteworthy.  Next, we reviewed major journalism and citizen journalism sites, organizations and publications.  This included Poynter.org, Neiman Reports, the Knight Citizen News Network, Journalism That Matters (JTM), the Reynolds Journalism Institute News Collaboratory, the online journalism awards, the American Journalism Review, the Columbia Journalism Review and the like.  We attended numerous meetings and conferences, including those at JTM, the Berkman Center and the Reynolds Journalism Institute.  From these searches and consultations, we identified 60 sites identified as noteworthy.  We do not claim to include all of the “best” sites as different individuals would ascribe high quality based on different criteria.  In addition, most citizen sites are dynamic and may change their focus or procedures often.  However, we believe that experts would generally agree that the 60 sites selected represent high levels of journalistic quality and sophistication.

A codebook defining all the variables used in both story content and site level coding for all the sites selected was developed.  Coders were extensively trained.  Agreement for all of the variables reached 67% or higher. One equaled 67%, two were between 70% and 75%, four were between 76% and 80%, three were between 81% and 85%, and the rest were greater than 95%. The five coders coded 707 story content postings, after achieving an overall reliability of 94.5%.

Three coders coded all of the site-level postings using a total of 41 variables. Agreement reached 72% or higher. Two were at 72%, three were between 77% and 79%, four were between 83% and 85%, twelve were at 86%, five were between 87% and 89%, and the rest were 90% or greater. The three coders coded 363 site-level postings after achieving an overall reliability of 91.45%. The Scott’s Pi formula was used in the calculation.

The method for the follow-up surveys with site owners is explained as follows:

To gain more insight about citizen journalist/blogger site operators, we attempted to find phone numbers for all 205 citizen sites.  Surprisingly, many of the sites offered no way to either identify the site owner or get in touch with him or her.  It was reported to us that many of these owners intentionally hide their access because they fear negative input from the public.  Twenty-six phone numbers were found by contacting the site owners by e-mail or an online form.  Three numbers were found through white pages based on the name of the blogger/owner found on the site.  We identified 129 phone numbers of site owners.  Of these, 91 site owners were willing to be interviewed.  This is a response rate of 71%. 


Footnotes

1. Data from 2009 indicate that the three evening networks reach about 22 million viewers and the three morning newscasts average about 12.7 million people daily. Nielsen Media Research, used under license.

2. Data from 2009 indicate that 3.9 million viewers watch cable news during prime time hours and over 2 million watch during daytime hours. PEJ Analysis of Nielsen Media Research, used under license.

3. For August 2009, the O’Reilly Factor averaged 3.4 million viewers a night while Larry King Live averaged 1.1 million viewers. Nielsen Media Research on Media Bistro.com.

4. For 2008, circulation numbers indicate that 48 million people buy a newspaper each weekday. 2008 Editor and Publisher Yearbook Data.

5. According to the December 2007 survey, 27% of adults go online for news each day. Pew Internet, December 2007 survey. A more recent survey shows that 37% of Americans regularly go online for news. Pew Research Center for the People and the Press, July 2008 survey.

6. Pew Internet, 2005 survey.

7. By the spring of 2007, 93% of the population 12 and older listened to radio on a weekly basis. Arbitron ratings, Spring 2007. Radio reaches 235 million Americans over the course of a week. Arbitron ratings, March 2008.

8. Arbitron, “Radio Today: How Americans Listen to Radio, 2007 Edition,” April 13, 2007. March 2008 data show that News/Talk is the top or  second-leading category of listening in every region of the country except one, ranking it as the most popular of all measured radio formats. Arbitron ratings, March 2008.

9. Current ratings data available at Talkers Magazine online.

10. Kimberly A. Neuendorf, “The Content Analysis Guidebook,” Sage Publications, 2002.