Christopher F. Baum’s *An Introduction to Stata Programming, Second Edition*, is a great reference for anyone who wants to learn Stata programming.

Baum assumes readers have some familiarity with Stata, but readers who are new to programming will find the book accessible. He begins by introducing programming concepts and basic tools. More advanced programming tools such as structures and pointers and likelihood-function evaluators using Mata are gradually introduced throughout the book alongside examples.

This new edition reflects some of the most important statistical tools added since Stata 10. Of note are factor variables and operators, the computation of marginal effects, marginal means, and predictive margins using **margins**, the use of **gmm** to implement generalized method of moments estimation, and the use of **suest** for seemingly unrelated estimation.

As in the previous edition of the book, Baum steps the reader through the three levels of Stata programming. He starts with do-files. Do-files are powerful batch files that support loops and conditional statements and are ideal to automate your workflow as well as to guarantee reproducibility of your work.

He then delves into ado-files, which are used to extend Stata by creating new commands that share the syntax and behavior of official commands. Baum gives an example of how to write a command to calculate percentiles and the range of a variable, complete with documentation and certification.

After introducing the fundamentals of command development, Baum shows users how these concepts can be applied to help them write their own custom estimation commands by using Stata’s built-in numerical maximum-likelihood estimation routine, **ml**, its built-in nonlinear least-squares routines, **nl and nlsur**, and its built-in generalized method of moments estimation routine.

Finally, he introduces Mata, Stata’s matrix programming language. Mata programs are integrated into ado-files to build a custom estimation routine that is optimized for speed and numerical stability. Baum briefly discusses how ado-file programming concepts relate to Mata functions and objects. He also explains some of the advantages of using Mata for certain programming tasks. Baum introduces concepts by providing the background and importance of the topic, presents common uses and examples, and then concludes with larger, more applied examples he refers to as “cookbook recipes”.

Many of the examples are of particular interest because they arose from frequently asked questions from Stata users. If you want to understand basic Stata programming or want to write your own routines and commands using advanced Stata tools, Baum’s book is a great reference.

© Copyright 1996–2023 StataCorp LLC

**Table of figures**

** List of tables**

**Preface** (PDF)

** Acknowledgments**

** Notation and typography**

Ado-file programming

Mata programming for ado-files

1.1 Plan of the book

1.2 Installing the necessary software

2.1 Introduction

2.2 Navigational and organizational issues

2.2.2 Locating important directories: sysdir and adopath

2.2.3 Organization of do-files, ado-files, and data files

2.3 Editing Stata do- and ado-files

2.4 Data types

2.4.2 Date and time handling

2.4.3 Time-series operators

2.4.4 Factor variables and operators

2.5 Handling errors: The capture command

2.6 Protecting the data in memory: The preserve and restore commands

2.7 Getting your data into Stata

2.7.1 Inputting and importing data

Free format versus fixed format

The import delimited command

Accessing data stored in spreadsheets

Fixed-format data files

2.7.2 Importing data from other package formats

2.8 Guidelines for Stata do-file programming style

2.8.2 Enhancing speed and efficiency

2.9 How to seek help for Stata programming

3.1 Introduction

3.2 Some general programming details

3.2.2 The numlist

3.2.3 The if exp and in range qualifiers

3.2.4 Missing-data handling

3.2.5 String-to-numeric conversion and vice versa

Working with quoted strings

3.3 Functions for the generate command

3.3.2 The cond() function

3.3.3 Recoding discrete and continuous variables

3.4 Functions for the egen command

egen functions from the user community

3.5 Computation for by-groups

3.6 Local macros

3.7 Global macros

3.8 Extended macro functions and macro list functions

3.9 Scalars

3.10 Matrices

4.2 Computing summary statistics over groups

4.3 Computing the extreme values of a sequence

4.4 Computing the length of spells

4.5 Summarizing group characteristics over observations

4.6 Using global macros to set up your environment

4.7 List manipulation with extended macro functions

4.8 Using creturn values to document your work

5.1 Introduction

5.2 Data validation: The assert, count, and duplicates commands

5.3 Reusing computed results: The return and ereturn commands

5.4 Storing, saving, and using estimated results

5.5 Reorganizing datasets with the reshape command

5.6 Combining datasets

5.7 Combining datasets with the append command

5.8 Combining datasets with the merge command

5.8.2 The dangers of many-to-many merges

5.9 Other data management commands

5.9.2 The cross command

5.9.3 The stack command

5.9.4 The separate command

5.9.5 The joinby command

5.9.6 The xpose command

6.1 Efficiently defining group characteristics and subsets

6.2 Applying reshape repeatedly

6.3 Handling time-series data effectively

6.4 reshape to perform rowwise computation

6.5 Adding computed statistics to presentation-quality tables

6.6 Presenting marginal effects rather than coefficients

6.7 Generating time-series data at a lower frequency

6.8 Using suest and gsem to compare estimates from nonoverlapping samples

6.9 Using reshape to produce forecasts from a VAR or VECM

6.10 Working with IRF files

7.1 Introduction

7.2 Prefix commands

7.2.2 The statsby prefix

7.2.3 The xi prefix and factor-variable notation

7.2.4 The rolling prefix

7.2.5 The simulate and permute prefixes

7.2.6 The bootstrap and jackknife prefixes

7.2.7 Other prefix commands

7.3 The forvalues and foreach commands

8.2 Calculating moving-window summary statistics

8.2.2 Calculating moving-window correlations

8.3 Computing monthly statistics from daily data

8.4 Requiring at least n observations per panel unit

8.5 Counting the number of distinct values per individual

8.6 Importing multiple spreadsheet pages

9.1 Introduction

9.2 Storing results in Stata matrices

9.3 The post and postfile commands

9.4 Output: The export delimited, outfile, and file commands

9.5 Automating estimation output

9.6 Automating graphics

9.7 Characteristics

10.2 Computing marginal effects for graphical presentation

10.3 Automating the production of LATEX tables

10.4 Extracting data from graph files’ sersets

10.5 Constructing continuous price and returns series

11.1 Introduction

11.2 The structure of a Stata program

11.3 The program statement

11.4 The syntax and return statements

11.5 Implementing program options

11.6 Including a subset of observations

11.7 Generalizing the command to handle multiple variables

11.8 Making commands byable

11.9 Documenting your program

11.10 egen function programs

11.11 Writing an e-class program

11.12 Certifying your program

11.13 Programs for ml, nl, nlsur

11.13.1 Writing an ml-based command

11.13.2 Programs for the nl and nlsur commands

11.14 Programs for gmm

11.15 Programs for the simulate, bootstrap, and jackknife prefixes

11.16 Guidelines for Stata ado-file programming style

11.16.2 Helpful Stata features

11.16.3 Respect for datasets

11.16.4 Speed and efficiency

11.16.5 Reminders

11.16.6 Style in the large

11.16.7 Use the best tools

12.2 Generalization of egen function pct9010() to support all pairs of quantiles

12.3 Constructing a certification script

12.4 Using the ml command to estimate means and variances

12.5 Applying inequality constraints in ml estimation

12.6 Generating a dataset containing the longest spell

12.7 Using suest on a fixed-effects model

13.1 Mata: First principles

13.2 Mata fundamentals

13.2.2 Relational and logical operators

13.2.3 Subscripts

13.2.4 Populating matrix elements

13.2.5 Mata loop commands

13.2.6 Conditional statements

13.3 Mata’s st_ interface functions

13.3.2 Access to locals, globals, scalars, and matrices

13.3.3 Access to Stata variables’ attributes

13.4 Calling Mata with a single command line

13.5 Components of a Mata Function

13.5.2 Variables

13.5.3 Stored results

13.6 Calling Mata functions

13.7 Example: st_interface function usage

13.8 Example: Matrix operations

13.9 Mata-based likelihood function evaluators

13.10 Creating arrays of temporary objects with pointers

13.11 Structures

13.12 Additional Mata features

13.12.2 Associative arrays in Mata functions

13.12.3 Compiling Mata functions

13.12.4 Building and maintaining an object library

13.12.5 A useful collection of Mata routines

14.2 Shuffling the elements of a string variable

14.3 Firm-level correlations with multiple indices with Mata

14.4 Passing a function to a Mata function

14.5 Using subviews in Mata

14.6 Storing and retrieving country-level data with Mata structures

14.7 Locating nearest neighbors with Mata

14.8 Using a permutation vector to reorder results

14.9 Producing LATEX tables from svy results

14.10 Computing marginal effects for quantile regression

14.11 Computing the seemingly unrelated regression estimator

14.12 A GMM-CUE estimator using Mata’s optimize() function

**References**

**Author index** (PDF)

**Subject index** (PDF)

© Copyright 1996–2023 StataCorp LLC