uga arches UGA Tobacco Documents Project
Go to Home Page


Plot Significant Differences Between Document Groupings

This tool allows you to determine whether particular words or phrases are dominant in one group of documents compared to the document corpus as a whole. You can select groups according to decade, industry source, and target audience. Output includes graphs of raw frequencies and relative frequencies of the terms you select, as well as z-scores to assist in determining if the difference from the norm is statistically significant. Follow the blue hyperlinks to the Glossary page for additional information.
1. List Terms to Examine:



Put your terms here. Because of the statistic used, you must choose a single term type, either words or collocations . Collocations are denoted by hyphenation: tobacco-smoke, lung-cancer . If you forget, the computer will limit analysis to the first term type it encounters.
2. Select Data Groups to Examine:

Grouped by Decades
Grouped by Half Decades
Grouped by Shifted Decades
Grouped by Industry Source
Industry-Internal vs Industry External Audiences
Named Audiences vs Unnamed Audiences
Bliley and Undated Documents
3. or

NIH-NCI Tobacco-Documents Project at the University of Georgia (Grant # 1 RO1 CA87490). The scripts run by this server are the invention of Clayton Darwin using Python . All graphical displays are created on the fly using ChartDirector© from ASE .