Regression Models for Categorical Dependent Variables Using Stata, Third Edition, by J. Scott Long and Jeremy Freese, is an essential reference for those who use Stata to fit and interpret regression models for categorical data. Although regression models for categorical dependent variables are common, few texts explain how to interpret such models; this text decisively fills the void.
The third edition is divided into two parts. Part I begins with an excellent introduction to Stata and follows with general treatments of the estimation, testing, fitting, and interpretation of models for categorical dependent variables. The book is thus accessible to new users of Stata and those who are new to categorical data analysis. Part II is devoted to a comprehensive treatment of estimation and interpretation for binary, ordinal, nominal, and count outcomes.
Readers familiar with previous editions will find many changes in the third edition. An entire chapter is now devoted to interpretation of regression models using predictions. This concept is explored in greater depth in Part II. The authors also discuss how many improvements made to Stata in recent years—factor variables, marginal effects with margins, plotting predictions using marginsplot—facilitate analysis of categorical data.
The authors advocate a variety of new methods that use predictions to interpret the effect of variables in regression models. Readers will find all discussion of statistical concepts firmly grounded in concrete examples. All the examples, datasets, and author-written commands are available on the authors’ website, so readers can easily replicate the examples with Stata.
Examples in the new edition also illustrate changes to the authors’ popular SPost commands after a recent rewrite inspired by the authors’ evolving views on interpretation. Readers will note that SPost now takes full advantage of the power of the margins command and the flexibility of factor-variable notation. Long and Freese also provide a suite of new commands, including mchange, mtable, and mgen. These commands complement margins, aiding model interpretation, hypothesis testing, and model diagnostics. They offer the same syntactical convenience features that users of Stata expect, for example including powers or interactions of covariates in regression models and seamlessly working with complex survey data. The authors also discuss how to use these commands to estimate marginal effects, either averaged over the sample or evaluated at fixed values of the regressors.
The third edition of Regression Models for Categorical Dependent Variables Using Stata continues to provide the same high-quality, practical tutorials of previous editions. It also offers significant improvements over previous editions—new content, updated information about Stata, and updates to the authors’ own commands. This book should be on the bookshelf of every applied researcher analyzing categorical data and is an invaluable learning resource for students and others who are new to categorical data analysis.
© Copyright 1996–2023 StataCorp LLC
Table of contents (PDF)
List of figures
Preface (PDF)
1.2 Which models are considered?
1.3 Whom is this book for?
1.4 How is the book organized?
1.5 The SPost software
1.5.2 Installing SPost13
Installing SPost13 using search
Installing SPost13 using net install
1.5.3 Uninstalling SPost13
1.6 Sample do-files and datasets
1.6.2 Using spex to load data and run examples
1.7 Getting help with SPost
1.7.2 Getting help from the authors
1.8 Where can I learn more about the models?
2.2 Abbreviations
2.3 Getting help
2.3.2 PDF manuals
2.3.3 Error messages
2.3.4 Asking for help
2.3.5 Other resources
2.4 The working directory
2.5 Stata file types
2.6 Saving output to log files
2.7 Using and saving datasets
2.7.2 Data in other formats
2.7.3 Entering data by hand
2.8 Size limitations on datasets
2.9 Do-files
2.9.2 Long lines
2.9.3 Stopping a do-file while it is running
2.9.4 Creating do-files
2.9.5 Recommended structure for do-files
2.10 Using Stata for serious data analysis
2.11 Syntax of Stata commands
2.11.2 Variable lists
2.11.3 if and in qualifiers
2.11.4 Options
2.12 Managing data
2.12.2 Getting information about variables
2.12.3 Missing values
2.12.4 Selecting observations
2.12.5 Selecting variables
2.13 Creating new variables
2.13.2 The replace command
2.13.3 The recode command
2.14 Labeling variables and values
2.14.2 Value labels
2.14.3 The notes command
2.15 Global and local macros
2.16 Loops using foreach and forvalues
2.17 Graphics
2.18 A brief tutorial
2.19 A do-file template
2.20 Conclusion
3.1 Estimation
3.1.2 ML and sample size
3.1.3 Problems in obtaining ML estimates
3.1.4 Syntax of estimation commands
3.1.5 Variable lists
Specifying interaction and polynomials
More on factor-variable notation
3.1.6 Specifying the estimation sample
Information about missing values
Postestimation commands and the estimation sample
3.1.7 Weights and survey data
3.1.8 Options for regression models
3.1.9 Robust standard errors
3.1.10 Reading the estimation output
3.1.11 Storing estimation results
3.1.12 Reformatting output with estimates table
3.2 Testing
3.2.2 Wald and likelihood-ratio tests
3.2.3 Wald tests with test and testparm
3.2.4 LR tests with lrtest
3.3 Measures of fit
3.3.2 Methods and formulas used by fitstat
3.3.3 Example of fitstat
3.4 estat postestimation commands
3.5 Conclusion
4.2 Approaches to interpretation
4.2.2 Method of interpretation using parameters
4.2.3 Stata and SPost commands for interpretation
4.3 Predictions for each observation
4.4 Predictions at specified values
4.4.2 Using margins for predictions
Making multiple predictions
Predictions for groups defined by levels of categorical variables
4.4.3 (Advanced) Nondefault predictions using margins
The expression() option
4.4.4 Tables of predictions using mtable
(Advanced) Combining and formatting tables using mtable
4.5 Marginal effects: Changes in predictions
4.5.2 Marginal effects using mtable
4.5.3 Posting predictions and using mlincom
4.5.4 Marginal effects using mchange
4.6 Plotting predictions
4.6.2 Plotting predictions using mgen
4.7 Interpretation of parameters
4.7.2 Standardized coefficients
4.7.3 Factor and percentage change coefficients
4.8 Next steps
5.1 The statistical model
5.1.2 A nonlinear probability model
5.2 Estimation using logit and probit commands
5.2.2 Comparing logit and probit
5.2.3 (Advanced) Observations predicted perfectly
5.3 Hypothesis testing
5.3.2 Testing multiple coefficients
5.3.3 Comparing LR and Wald tests
5.4 Predicted probabilities, residuals, and influential observations
5.4.2 Residuals and influential observations using predict
5.4.3 Least likely observations
5.5 Measures of fit
5.5.2 Pseudo-R²’s
5.5.3 (Advanced) Hosmer–Lemeshow statistic
5.6 Other commands for binary outcomes
5.7 Conclusion
6.1 Interpretation using regression coefficients
6.1.2 (Advanced) Interpretation using y*
6.2 Marginal effects: Changes in probabilities
6.2.2 Summary measures of change
AMEs
Standard errors of marginal effects
6.2.3 Should you use the AME, the MEM, or the MER?
6.2.4 Examples of marginal effects
AMEs for factor variables
Summary table of AMEs
Marginal effects for subgroups
MEMs and MERs
Marginal effects with powers and interactions
6.2.5 The distribution of marginal effects
6.2.6 (Advanced) Algorithm for computing the distribution of effects
6.3 Ideal types
6.3.2 Comparing ideal types with statistical tests
6.3.3 (Advanced) Using macros to test differences between ideal types
6.3.4 Marginal effects for ideal types
6.4 Tables of predicted probabilities
6.5 Second differences comparing marginal effects
6.6. Graphing predicted probabilities
6.6.2 Using mgen with the graph command
6.6.3 Graphing multiple predictions
6.6.4 Overlapping confidence intervals
6.6.5 Adding power terms and plotting predictions
6.6.6 (Advanced) Graphs with local means
6.7 Conclusion
7.1 The statistical model
7.1.2 A nonlinear probability model
7.2 Estimation using ologit and oprobit
7.2.2 Predicting perfectly
7.3 Hypothesis testing
7.3.2 Testing multiple coefficients
7.4 Measures of fit using fitstat
7.5 (Advanced) Converting to a different parameterization
7.6 The parallel regression assumption
7.6.2 Testing the parallel regression assumption using brant
7.6.3 Caveat regarding the parallel regression assumption
7.7 Overview of interpretation
7.8 Interpreting transformed coefficients
7.8.2 Odds ratios
7.9 Interpretations based on predicted probabilities
7.10 Predicted probabilities with predict
7.11 Marginal effects
7.11.2 Marginal effects for a quick overview
7.12 Predicted probabilities for ideal types
7.13 Tables of predicted probabilities
7.14 Plotting predicted probabilities
7.15 Probability plots and marginal effects
7.16 Less common models for ordinal outcomes
7.16.2 The generalized ordered logit model
7.16.3 (Advanced) Predictions without using factor-variable notation
7.16.4 The sequential logit model
7.17 Conclusion
8.1 The multinomial logit model
8.2 Estimation using the mlogit command
Options
8.2.2 Selecting different base outcomes
8.2.3 Predicting perfectly
8.3 Hypothesis testing
8.3.2 Testing the effects of the independent variables
8.3.3 Tests for combining alternatives
8.4 Independence of irrelevant alternatives
8.4.2 Small-Hsiao test of IIA
8.5 Measures of fit
8.6 Overview of interpretation
8.7 Predicted probabilities with predict
8.8 Marginal effects
8.9 Tables of predicted probabilities
8.9.2 (Advanced) Predictions using local means and subsamples
8.10 Graphing predicted probabilities
8.11 Odds ratios
8.11.2 Plotting odds ratios
8.12 (Advanced) Additional models for nominal outcomes
8.12.2 Conditional logit model
8.12.3 Multinomial probit model with IIA
8.12.4 Alternative-specific multinomial probit
8.12.5 Rank-ordered logit model
8.13 Conclusion
9.1 The Poisson distribution
9.1.2 Compaing observed and predicted counts with mgen
9.2 The Poisson regression model
9.2.1 Estimation using poisson
9.2.2 Factor and percentage changes in E(y | x)
9.2.3 Marginal effects on E(y | x)
9.2.4 Interpretation using predicted probabilities
Treating a count independent variable as a factor variable
Predicted probabilities using mgen
9.2.5 Comparing observed and predicted counts to evaluate model specification
9.2.6 (Advanced) Exposure time
9.3 The negative binomial regression model
9.3.1 Estimation using nbreg
9.3.2 Example of NBRM
9.3.3 Testing for overdispersion
9.3.4 Comparing the PRM and NBRM using estimates table
9.3.5 Robust standard errors
9.3.6 Interpretation using E(y | x)
9.3.7 Interpretation using predicted probabilities
9.4 Models for truncated counts
9.4.1 Estimation using tpoisson and tnbreg
9.4.2 Interpretation using E(y | x)
9.4.3 Predictions in the estimation sample
9.4.4 Interpretation using predicted rates and probabilities
9.5 (Advanced) The hurdle regression model
9.5.2 Predictions in the sample
9.5.3 Predictions at user-specified values
9.5.4 Warning regarding sample specification
9.6 Zero-inflated count models
9.6.2 Example of zero-inflated models
9.6.3 Interpretation of coefficients
9.6.4 Interpretation of predicted probabilities
Plotting predicted probabilities with mgen
9.7 Comparisons among count models
9.7.2 Tests to compare count models
9.7.3 Using countfit to compare count models
9.8 Conclusion
Subject index (PDF)
© Copyright 1996–2023 StataCorp LLC