Skip to Content View Previous Reports



The data for this study were collected in two parts. Much of the study is based on research conducted originally by other people or organizations. Other research, particularly the content analysis, is original work conducted specifically for this report.

For the data aggregated from other researchers, the Project took several steps. First, we tried to determine what data had been collected and by whom for the eight media sectors studied. We organized the data into the seven primary areas of interest we wanted to examine: content, audience, economics, ownership, newsroom investment, alternative news outlets and public attitudes. For all data ultimately used, the Project sought and gained permission for their use.

Next, the Project studied the data closely to determine where elements reinforced each other and where there were apparent contradictions or gaps. In doing so, the Project endeavored to determine the value and validity of each data set. That in many cases involved going back to the sources who collected the research in the first place. Where data conflicted, we have included all relevant sources and tried to explain their differences, either in footnotes or in the narratives.

In analyzing the data for each media sector, we sought insight from experts by having at least three outside readers for each sector chapter. Those readers raised questions, offered arguments and questioned data where they saw fit.

All sources are cited in footnotes or within the narrative, and listed alphabetically in a source bibliography. The data used in the report are also available in more complete tabular form online, where users can view the raw material, sort it on their own and make their own charts and graphs. Our goal was not only to organize the available material into a clear narrative, but to also collect all the public data on journalism in one usable place. In many cases, the Project paid for the use of the data.

For the original content analysis research conducted by the Project, the methodology follows.

Web Site Analysis Methodology

As the Internet continues to change the news industry and the methods of production, circulation and consumption, it is ever more critical to understand the emerging trends and news outlets available online. Citizens must make daily choices about what sites to go to for various kinds of news information, but it is largely up to them to figure out which site can best fit their needs at the moment. And in many instances they may be making choices without fully understanding why.

The content analysis element of the 2007 Annual Report on the State of the News Media was designed to try to sort through the many different kinds of sites that offer news information. What do some sites emphasize over other things? Are there common tendencies? The creation of the study and the analysis of the findings was a multi-step process.

Sample Design and Web Site Capture

To assess the range of news Web sites available, we selected 38 different Web sites that provide such information. The sites were initially drawn from the seven media sectors that PEJ analyzes in each annual report:

* Newspaper (9 sites from a mix of national, regional and local papers)
* Cable news (3 sites)
* Network News (3 sites, commercial and public; NBC’s online identity is merged with that of MSNBC
* Local TV (2 sites)
* Radio (2 sites, one national network and one local)
* Weekly news magazine (3 sites)
* Online-only news sites (10 sites ranging from aggregators to citizen-based sites to online magazines and
* Online blogs (4))

In addition, we included one foreign broadcast site (BBC News) and the site of one wire service. (Due to the language barrier, Ethnic, non-English language Web sites were not included in the study.)

The result was the following list of sites:

Sites Studied

ABC News Com

BBC News

Benicia News

Boston Phoenix


CBS News

Chicago Sun Times


Crooks and Liars

Daily Kos

Des Moines Register



Fox News

Global voices

King5 TV

Los Angeles Times

Little Green Footballs

Michelle Malkin


AOL News

Google News

Yahoo News

New York Post

New York Times


PBS NewsHour



San Fran. Bay Guardian


Time Magazine


USA Today

Washington Post

The Week Magazine

WTOP Radio

Web sites were captured by a team of professional content coders. At each download, coders made an electronic and printed hard-copy of the homepages for each site as well as the top five news stories. Prominence was determined as follows:

The biggest headline at the top of the screen is the most prominent story. It may or may not have an image associated with it. The second-most prominent story is one that is attached to an image at the top of the screen, if that is a different story from the most prominent story. If there is no image at the top of the screen, (or there are two significant stories attached to the same image) refer then to the next-largest headline. To determine the next-most-prominent stories, refer first to the size of the headlines, and then the place (height) on the screen. If two stories have the same font size and are at the same height on the screen, then give the story on the left more prominence.

Stories were defined as:

* Any headlines that linked to a landing page within the Web site rather than a specific news report were omitted, as were links to landing pages of other Web sites.
* We did include links to specific stories on other Web sites as well as video or audio stories.

Capture Timing

Web sites were initially studied from September 18 through October 6, 2006. For that initial review, each site was captured and coded four different times. For two captures, the research team coded for the entire set of variables, both the homepage analysis and the variables related to the content of news stories. The other two rounds of capture were coded only for the variables relating to the content of the lead stories.

Each site was then studied again during the week of February 12-16, 2007, and coded separately. Results for the two time periods were compared. In cases where features had changed, we closely examined the site again to confirm the change or correct inconsistencies. Final analyses were based on the confirmed February site scores.

Coding Scheme and Procedure

To create the coding scheme, we first worked to identify the different kinds of features available online — everything from contacting the author to quickly finding just what you want to receiving your news free — and how they could be measured. After several weeks of exploratory research, we identified 63 different quantitative measures and developed those into a working codebook (see list of primary variables below).

Coding was performed at the PEJ by a team of seven professional in-house coders, overseen by a senior researcher and a methodologist. Coders were trained on a standardized codebook that contained a dictionary of coding variables, operations definitions, measurement scales and detailed instructions and examples. The codebook was divided into two sections. The first was based on an inventory of the Web site’s homepage. That was performed three separate times — twice in September, 2006, and once in February, 2007. The second component involved coding the content of news stories themselves. We included the top five stories for the variables related to the content of the news and took the average score for each variable.

Before coding began, coders were trained on the codebook. Excel coding sheets were designed and used consistently throughout the process. Meetings were held throughout to discuss questions, and where necessary additional captures took place to verify findings.

Coders followed a series of standardized rules for coding and quantifying Web site traits. Three variables deserve specific mention:

1. Multimedia components on the homepage: Coders counted all content items, defined as links to all material other than landing pages or indexes of some sort. Included were narrative text, still photos, interactive graphics, video, audio, live streams, live Q&A’s, polls, user-based blogs, podcast content and slide shows. Next, the coders tallied the total number of content items on the page as well as the totals for each media form and entered the percentages for each into the data base.

2. Advertisements: In counting advertisements on the homepage, coders included all ads, from obvious banners and flash advertisements to the smaller single-link sponsors of a site. Self-promotional ads were also included in the total. The idea of this variable was to estimate the economic agenda of a given site based on the amount of advertising on the homepage. Advertisements on internal pages were not included in the tally. Because of day-to-day variance in the total number of homepage ads, the final figure was either the average based on all the visits to a site or, in cases where a site redesign had clearly occurred, the latest use of ads.

3. Also in the Byline variable, blog posts required special rules. In counting bylines, for instance, researchers coded a blog entry as if the entry was posted by the blog host—John Amato on Crooks and Liars, for example. If the blog entry was posted by a regular contributor or staff, the “story” scored a “2.” And if the blog entry was posted by an outside contributor, not bylined, or consisted primarily of outside material (an entry, for instance, that simply said, “Read this,” followed by an excerpt from another source), then the post received a score of “3,” the lowest on the scale of original stories.


In analyzing the data, we were able to group variables into six different areas of Web emphasis: User Customization, User Participation, Multimedia Use, Content Branding and Originality, Depth of Content and Revenue Streams.

Customization includes

* Homepage customization (allows user to tailor page)
* Search options (simple or advanced search)
* RSS feeds — options and prominence
* Podcasts — options and prominence
* Mobile phone delivery options

Participation includes

* Users’ contribution to content
* Scheduled, live discussions
* Ability to:
o e-mail author
o post comments
o rate the article/post
o take a poll
* List of most-viewed stories
* List of most-e-mailed stories
* List of most-linked-to stories

Multimedia includes

Percent of homepage content devoted to:

* Narrative
* Photos/non-interactive graphics
* Video
* Audio
* Live stream
* User blog
* Live Q & A
* Slide show
* Poll
* Interactive graphic

Editorial Branding includes

* Breadth of sources
* Editorial process
* Use of bylines
* Direction of story links

(internal or external)

Story Depth includes

* Frequency of updates
* Use of related story links
* Use of archive links

Revenue Streams includes

* Registration requirements
* Fee-based content
* Archive fees
* Number of homepage ads

(Self-promotional and external)

Codes within each variable were translated into a numerical rating from low to high for that particular feature. Then PEJ research analysts produced an Excel template to tally the scores (summing the variables) for each site within the six categories. Thus for each of the six categories, each site had a final score. The range of scores was then divided into four quartiles and sites were marked according to which quartile they fell into.