The Mata Book: A Book for Serious Programmers and Those Who Want to Be

The Mata Book: A Book for Serious Programmers and Those Who Want to Be is the book that Stata programmers have been waiting for. Mata is a serious programming language for developing small- and large-scale projects and for adding features to Stata. What makes Mata serious is that it provides structures, classes, and pointers along with matrix capabilities. The book is serious in that it covers those advanced features, and teaches them. The reader is assumed to have programming experience, but only some programming experience. That experience could be with Stata’s ado-language, or with Python, Java, C++, Fortran, or other languages like them. As the book says, “being serious is a matter of attitude, not current skill level or knowledge”.

 

© Copyright 1996–2023 StataCorp LLC

Acknowledgment

 

1 Introduction
1.2 What is Mata?
1.3 What is covered in this book
1.4 How to download the files for this book

 

2 The mechanics of using Mata
2.1 Introduction
2.2 Mata code appearing in do-files
2.3 Mata code appearing in ado-files
2.4 Mata code to be exposed publicly

 

3 A programmer’s tour of Mata

3.1 Preliminaries

3.1.1 Results of expressions are displayed when not stored
3.1.2 Assignment
3.1.3 Multiple assignment

3.2 Real, complex, and string values

3.2.1 Real values
3.2.2 Complex values
3.2.3 String values (ASCII, Unicode, and binary)

3.3 Scalars, vectors, and matrices

3.3.1 Functions rows(), cols(), and length()
3.3.2 Function I()
3.3.3 Function J()
3.3.4 Row-join and column-join operators
3.3.5 Null vectors and null matrices

3.4 Mata’s advanced features

3.4.1 Variable types
3.4.2 Structures
3.4.3 Classes
3.4.4 Pointers

3.5 Notes for programmers

3.5.1 How programmers use Mata’s interactive mode
3.5.2 What happens when code has errors
3.5.3 The _error() abort function

 

4 Mata’s programming statements
4.1 The structure of Mata programs
4.2 The program body

4.2.1 Expressions
4.2.2 Conditional execution statement
4.2.3 Looping statements

4.2.3.1 while
4.2.3.2 for
4.2.3.3 do while
4.2.3.4 continue and break

4.2.4 goto
4.2.5 return

4.2.5.1 Functions returning values
4.2.5.2 Functions returning void

 

5 Mata’s expressions
5.1 More surprises
5.2 Numeric and string literals

5.2.1 Numeric literals

5.2.1.1 Base-10 notation
5.2.1.2 Base-2 notation

5.2.2 Complex literals
5.2.3 String literals

5.3 Assignment operator
5.4 Operator precedence
5.5 Arithmetic operators
5.6 Increment and decrement operators
5.7 Logical operators
5.8 (Understand this ? skip : read) Ternary conditional operator
5.9 Matrix row and column join and range operators

5.9.1 Row and column join
5.9.2 Comma operator is overloaded
5.9.3 Row and column count vectors

5.10 Colon operators for vectors and matrices
5.11 Vector and matrix subscripting

5.11.1 Element subscripting
5.11.2 List subscripting
5.11.3 Permutation vectors

5.11.3.1 Use to sort data
5.11.3.2 Use in advanced mathematical programming

5.11.4 Submatrix subscripting

5.12 Pointer and address operators
5.13 Cast-to-void operator

 

6 Mata’s variable types
6.1 Overview
6.2 The forty variable types

6.2.1 Default initialization
6.2.2 Default eltype, orgtype, and therefore, variable type
6.2.3 Partial types
6.2.4 A forty-first type for returned values from functions

6.3 Appropriate use of transmorphic

6.3.1 Use transmorphic for arguments of overloaded functions
6.3.2 Use transmorphic for output arguments

6.3.2.1 Use transmorphic for passthru variables

6.3.3 You must declare structures and classes if not passthru
6.3.4 How to declare pointers

 

7 Mata’s strict option and Mata’s pragmas
7.1 Overview
7.2 Turning matastrict on and off
7.3 The messages that matastrict produces, and suppressing them

 

8 Mata’s function arguments
8.1 Introduction
8.2 Functions can change the contents of the caller’s arguments

8.2.1 How to document arguments that are changed
8.2.2 How to write functions that do not unnecessarily change arguments

8.3 How to write functions that allow a varying number of arguments
8.4 How to write functions that have multiple syntaxes

 

9 Programming example: n_choose_k() three ways
9.1 Overview
9.2 Developing n_choose_k()
9.3 n_choose_k() packaged as a do-file

9.3.1 How I packaged the code: n_choose_k.do
9.3.2 How I could have packaged the code

9.3.2.1 n_choose_k.mata
9.3.2.2 test_n_choose_k.do

9.3.3 Certification files

9.4 n_choose_k() packaged as an ado-file

9.4.1 Writing Stata code to call Mata functions
9.4.2 nchooseki.ado
9.4.3 test_nchooseki.do
9.4.4 Mata code inside of ado-files is private

9.5 n_choose_k() packaged as a Mata library routine

9.5.1 Your approved source directory

9.5.1.1 make_lmatabook.do
9.5.1.2 test.do
9.5.1.3 hello.mata
9.5.1.4 n_choose_k.mata
9.5.1.5 test_n_choose_k.do

9.5.2 Building and rebuilding libraries
9.5.3 Deleting libraries

 

10 Mata’s structures
10.1 Overview
10.2 You must define structures before using them
10.3 Structure jargon
10.4 Adding variables to structures
10.5 Structures containing other structures
10.6 Surprising things you can do with structures
10.7 Do not omit the word scalar in structure declarations
10.8 Structure vectors and matrices and use of the constructor function
10.9 Use of transmorphic with structures
10.10 Structure pointers

 

11 Programming example: Linear regression
11.1 Introduction
11.2 Self-threading code
11.3 Linear-regression system lr*() version 1

11.3.1 lr*() in action
11.3.2 The calculations to be programmed
11.3.3 lr*() version-1 code listing
11.3.4 Discussion of the lr*() version-1 code

11.3.4.1 Getting started
11.3.4.2 Assume subroutines
11.3.4.3 Learn about Mata’s built-in subroutines
11.3.4.4 Use of built-in subroutine cross()
11.3.4.5 Use more subroutines

11.4 Linear-regression system lr*() version 2

11.4.1 The deviation from mean formulas
11.4.2 The lr*() version-2 code
11.4.3 lr*() version-2 code listing
11.4.4 Other improvements you could make

11.5 Closeout of lr*() version 2

11.5.1 Certification
11.5.2 Adding lr*() to the lmatabook.mlib library

 

12 Mata’s classes

12.1 Overview

12.1.1 Classes contain member variables
12.1.2 Classes contain member functions
12.1.3 Member functions occult external functions
12.1.4 Members—variables and functions—can be private
12.1.5 Classes can inherit from other classes

12.1.5.1 Privacy versus protection
12.1.5.2 Subclass functions occult superclass functions
12.1.5.3 Multiple inheritance
12.1.5.4 And more

12.2 Class creation and deletion
12.3 The this prefix
12.4 Should all member variables be private?
12.5 Classes with no member variables
12.6 Inheritance

12.6.1 Virtual functions
12.6.2 Final functions
12.6.3 Polymorphisms
12.6.4 When to use inheritance

12.7 Pointers to class instances

 

13 Programming example: Linear regression 2
13.1 Introduction
13.2 LinReg in use
13.3 LinReg version-1 code
13.4 Adding OPG and robust variance estimates to LinReg

13.4.1 Aside on numerical accuracy: Order of addition
13.4.2 Aside on numerical accuracy: Symmetric matrices
13.4.3 Finishing the code

13.5 LinReg version-2 code
13.6 Certifying LinReg version 2
13.7 Adding LinReg version 2 to the lmatabook.mlib library

 

14 Better variable types
14.1 Overview
14.2 Stata’s macros
14.3 Using macros to create new types
14.4 Macroed types you might use

14.4.1 The boolean type
14.4.2 The Code type
14.4.3 Filehandle
14.4.4 Idiosyncratic types, such as Filenames
14.4.5 Macroed types for structures
14.4.6 Macroed types for classes
14.4.7 Macroed types to avoid name conflicts

 

15 Programming constants
15.1 Problem and solution
15.2 How to define constants
15.3 How to use constants
15.4 Where to place constant definitions

 

16 Mata’s associative arrays
16.1 Introduction
16.2 Using class AssociativeArray
16.3 Finding out more about AssociativeArray

 

17 Programming example: Sparse matrices
17.1 Introduction
17.2 The idea
17.3 Design

17.3.1 Producing a design from an idea
17.3.2 The design goes bad
17.3.3 Fixing the design

17.3.3.1 Sketches of R_*x*() and S_*x*() subroutines
17.3.3.2 Sketches of class’s multiplication functions

17.3.4 Design summary
17.3.5 Design shortcomings

17.4 Code
17.5 Certification script

 

18 Programming example: Sparse matrices, continued
18.1 Introduction
18.2 Making overall timings

18.2.1 Timing T1, Mata R=RR
18.2.2 Timing T2, SpMat R=RR
18.2.3 Timing T3, SpMat R=SR
18.2.4 Timing T4, SpMat R=RS
18.2.5 Timing T5, SpMat R=SS
18.2.6 Call a function once before timing
18.2.7 Summary

18.3 Making detailed timings

18.3.1 Mata’s timer() function
18.3.2 Make a copy of the code to be timed
18.3.3 Make a do-file to run the example to be timed
18.3.4 Add calls to timer_on() and timer_off() to the code
18.3.5 Analyze timing results

18.4 Developing better algorithms

18.4.1 Developing a new idea
18.4.2 Aside

18.4.2.1 Features of associative arrays
18.4.2.2 Advanced use of pointers

18.5 Converting the new idea into code sketches

18.5.0.3 Converting the idea into a sketch of R_SxS()
18.5.0.4 Sketching subroutine cols_of_row()

18.5.1 Converting sketches into completed code

18.5.1.1 Double-bang comments and messages
18.5.1.2 // NotReached comments
18.5.1.3 Back to converting sketches

18.5.2 Measuring performance

18.6 Cleaning up

18.6.1 Finishing R_SxS() and cols_of_row()
18.6.2 Running certification

18.7 Continuing development

 

The Mata Reference Manual

 

A Writing Mata code to add new commands to Stata
A.1 Overview
A.2 Ways to structure code
A.3 Accessing Stata’s data from Mata
A.4 Handling errors
A.5 Making the calculation and displaying results
A.6 Returning results
A.7 The Stata interface functions

A.7.1 Accessing Stata’s data
A.7.2 Modifying Stata’s data
A.7.3 Accessing and modifying Stata’s metadata
A.7.4 Changing Stata’s dataset
A.7.5 Accessing and modifying Stata macros, scalars, matrices
A.7.6 Executing Stata commands from Mata
A.7.7 Other Stata interface functions

 

B Mata’s storage type for complex numbers
B.1 Complex values
B.2 Complex values and literals
B.3 Complex scalars, vectors, and matrices
B.4 Real, complex, and numeric eltypes
B.5 Functions Re(), Im(), and C()
B.6 Function eltype()

 

C How Mata differs from C and C++
C.1 Introduction
C.2 Treatment of semicolons
C.3 Nested comments
C.4 Argument passing
C.5 Strings are not arrays of characters
C.6 Pointers

C.6.1 Pointers to existing objects
C.6.2 Pointers to new objects, allocation of memory
C.6.3 The size and even type of the object may change
C.6.4 Pointers to new objects, freeing of memory
C.6.5 Pointers to subscripted values
C.6.6 Pointer arithmetic is not allowed

C.7 Lack of switch/case statements
C.8 Mata code aborts with error when C would crash

 

D Three-dimensional arrays (advanced use of pointers)
D.1 Introduction
D.2 Creating three-dimensional arrays

 

References

 

© Copyright 1996–2023 StataCorp LLC

Author: William W. Gould
ISBN-13: 978-1-59718-263-8
©Copyright: 2018
e-Book version available

The Mata Book: A Book for Serious Programmers and Those Who Want to Be is the book that Stata programmers have been waiting for. Mata is a serious programming language for developing small- and large-scale projects and for adding features to Stata. What makes Mata serious is that it provides structures, classes, and pointers along with matrix capabilities.