- Stata Do File
- Stata Do File Line Break
- How To Run Stata Do File
- Stata Do File Bookmarks
- Stata Do File Bookmark
- by Michael Dampf
- •
- 11
- by Timbul Widodo
- •
The next time a survey comes out, I just download all the zip files into a one folder, run my do-file, and relax. Posted in Stata, Th' Universe Bookmark the permalink. Post navigation. I do economic research at a large university, and use STATA to assist in this. Usually my work requires cleaning data, running regressions, and outputting tables and results. Stata can do this in a pretty straightforward way, and ultimately I am happy with the software. There are 5 other file types using the DO file extension!do - Digital Line Graph optional vector data.do - Disk image.do - 4oD Player playlist.do - MicroSim PCBoard file.do - Stata batch analysis data.
private life insurance industry in India. This study utilizes two inputs (shareholders’ investments and policyholders’
investments) and two outputs (net returns on investments to the shareholders and net returns on investments to
the policyholders). This study focuses upon 20 private life insurance companies operating in India over a period of
4 years from 2010-11 to 2013-14. Since this study attempts to maximize output, an output oriented DEA model is used. The study finds that investment efficiency of private life insurance industry has improved on Banker, Charnes and Cooper (BCC) model and Charnes, Cooper and Rhodes (CCR) model. The study further highlights that during all years under study, 15% to 40% life insurance companies have been found on the CRS frontier and 40% to 60% life insurance companies have been found on the VRS frontier. With regard to scale efficiency issues, 15% to 40% companies have been operated at their most productive scale over the study period.
- by Dr. Nikita Kumari
- •
- by Nicola Tommasi
- •
- by Nicola Tommasi
- •
- 3
- by Nicola Tommasi
- •
- by Nicola Tommasi
- •
- by Nicola Tommasi
- •
- by eugene huynh
- •
1. Etiquetas para municipios de residencia y de migración temporal según Catálogo de INEGI para la Encuesta Demográfica Retrospectiva 2017.
2. Etiquetas para ocupación (SINCO) según Catálogo publicado y utilizado por INEGI para la Encuesta Demográfica Retrospectiva 2017.
3. Solo deben copiar el txt a un dofile en STATA y correr el programa. Se genera la variable mpio_res_l para las etiquetas de los municipios y mpio_mig_l para los municipios de migración temporal.
4. Para SINCO la etiqueta se genera de manera tradicional con el comando label.
Just copy the code in your own do-file and run to obtain labels of all municipalities in Mexico and occupation classification to complete your EDER 2017 database using STATA. Please send your feedback if possible.
- by Almendra GT
- •
- by Radit Jatihasmoro
- •
- by Bruno Candea
- •
- by Kyle Longest
- •
- 9
This report examined the impact of household air pollution upon the health of household members and the minimizing this impact through usage of efficient cookstoves (i.e. Jaan Pakistan Supreme Stove). The survey was conducted upon the respondents residing in some of the off-grid districts of Pakistan by asking questions regarding their own understandings of danger of household air pollution and feasibility of using efficient biomass fuel combustion technology and its impact upon their lives.
- by Imran Butt
- •
METHOD: The first version formats surveys from 19 sources, representing over 8 million respondents from 160 countries between 1962 and 2016. It merges data into respondent, country-survey, and country-year datasets and creates common variables. Example applications investigate trends by country and effects of economic globalization, income inequality, and political corruption.
RESULTS: Merged datasets currently include satisfaction with democracy, support for democracy, perceived electoral integrity, generalized social trust, and basic demographics. Data visualizations highlight the potential of this resource and contrast it with selected national indices. Globalization tends to increase satisfaction while inequality and corruption tend to decrease it, but with differences between socioeconomic groups.
DISCUSSION: Using more data enables answering new research questions with greater reliability, robustness, and generalizability. Future releases will format more variables from more survey sources to expand the scope and usefulness of the resource. One aim of merging datasets is to use them for meta-analyses following systematic reviews.
- by Andrew J Klassen
- •
- by Imran Butt
- •
- by Léa Saint-Raymond
- •
- by Imran Butt
- •
- by Anton Reed
- •
Stata links
FILE MANAGEMENT
Gentzkow and Shapiro (2014) “Code and Data for the Social Sciences: A Practitioner’s Guide.” - I strongly recommend reading this before embarking on your very first empirical research project. The guide introduces you to a lot of useful concepts of data management developed in computer science, which will save tons of time during an increasingly long journey of conducting a piece of empirical research in economics. The most important are Chapters 2, 4 and 5, which help you organize your data files and millions of your Stata do files (no joking, by the time you publish your empirical paper, you will have tons of Stata codes).
TUTORIALS
Essam and Hughes (2016) Stata Cheetsheets --- All the important Stata commands at one glance. (HT: Marc Bellemare)
Lembcke (2009) “Introduction to Stata” and “Advanced Stata Topics”--- These are the Stata course lecture notes for PhD students at the Department of Economics, LSE. Since 2004, each year’s course instructor has updated and expanded them. I took the course in 2004, but the current version of the lecture note is much more than what I learned at the course. You will learn a lot from this. In particular, “Advanced Stata Topics” touches on how to write and publish your own Stata programme, maximum likelihood estimation in Stata, and how to use Mata (Stata’s matrix programming language), the topics that are usually not covered in a Stata course for economists.
Using Stata to Analyze Survey Data by Nicholas Minot (IFPRI): This is an excellent introduction to Stata specifically tailored for would-be development economists.
Maybe useful:
A. Colin Cameron and P. K. Trivedi Microeconometrics: Methods and Applications
Germán Rodriguez “Stata Tutorial” Princeton University
Phil Bardsley, Kim Chantala, and Dan Blanchette 'Stata Tutorial' University of North Carolina at Chapel HillStata Starter Kit by UCLA Academic Technology Service
INTRODUCTION
What Stata can/can't do by A. Colin Cameron (Dept. of Economics, University of California, Davis)
ADO FILES
To install an ado file, type 'ssc install xxx' (where xxx should be replaced with the name of the ado file) in your Stata interactive session.
DO FILES
Making do-files is essential because it allows other researchers to replicate your empirical analysis. It's increasingly become the norm among empirical researchers to make public on the website Stata do-files used to produce results in published papers. Here are some websites on how to make do-files.
Michael S. Hill (2015) 'In Stata coding, Style is the Essential: A brief commentary on do-file style'
Stata Tutorial by Carolina Population Center, University of South Carolina
Stata section of Guide to Genetic Analysis by Centre for Integrated Genomic Medical Research (Links to example do-files are dead, but it contains some information on editor software.)
Using external text editors to write do files by Friedrich Huebler
RA Manual Notes on Writing Code, by Matthew Gentzkow and Jesse M. Shapiro (2012), offer the best practices in computer programming that are useful for writing Stata do files (and scripts for other software).
Stata help for timer: A useful command if you run a do file that contains a command to take very long to be executed (e.g. regression with a lot of fixed effects).
If you use Stata/MP on cluster computing facilities, see Stata Help: statamp if you use Stata/MP on cluster computing facilities.
READING FILES
Every data analysis begins with opening a data file. First, look at this website for jargons for data formats. (The description on rectangular files is wrong, though.)
Stata Help infiling: Official guide on which command to use for reading different types of data.
Excel
Excel files can finally be imported by a Stata command: import excel.
For earlier versions of Stata to read an Excel file, follow this blog entry. Make sure to use the forward slash (/) rather than the backslash () for the path name. It should then work.
Stata
There is a useful ado program named USE10 which allows you to read the Stata version 10 data with Stata version 9. Type “ssc install use10” to install it.
SPSS
To read SPSS data files, use the usespss ado. (HT: David McKenzie.)
CSV
If each data entry is separated by a comma (called the CSV format), use INSHEET.
If your data includes an identification number with more than 7 digits, make sure you include the double option to the insheet command. Read Stata Help for data_type for details.
Tab-delimited
If the separater is a tab or a space, use INFILE.
Fixed format
If the data file is in the fixed format (no separater between data entries; entries are identified by column numbers), it's more tricky. There are three cases:
(1) If it's a flat file (each single line represents one observation), see Stata: How to Write a Dictionary Program to Read Raw Data by the Electronic Data Service (EDS) at Columbia University;
(2) If it's a rectangular file (the fixed number of lines represent one observation), see 'Example of a Program to Read Data with Multiple Records/Case' at the bottom of Stata: How to Write a Dictionary Program to Read Raw Data by the Electronic Data Service (EDS) at Columbia University;
(3) If it's a hierarchical file (a flexible number of lines represent one observation such as World Fertility Surveys), see Stata: How to Read Hierarchical Files in Stata by the Electronic Data Service (EDS) at Columbia University.
From scratch
To create a dataset from scratch, first type “drop _all” and then type “set obs #” where # is the number of observations in this new dataset. Then create variables by the generate command etc. For a small dataset, you can use the INPUT command to directly enter the data.
Multiple files in the same directory
EDITING DATA STRUCTURE
Before starting to edit data itself, you need to edit the structure of data files: reshape, append, and merge.
RESHAPE: Whenever you use the datasets downloaded from World Development Indicators, you need to do this.
Using Stata's RESHAPE command, by Amy Yuen at Electronic Data Center of Emory University General Libraries
APPEND/MERGE: Good empirical research often relies on the use of two or more completely different datasets. So you need to append or merge different datasets before starting analysis.
ISID: When you want to merge two datasets which do not share the common unique identifier but do share the same variables (e.g. birth date, birth region), the ISID command lets you check if a certain set of variables uniquely identify observations. See Stata Help on ISID.
Stata Tutorial Part 4: Manipulating Files, by Syracuse University Library
DATA PROCESSING
How to create dummy variables by Stata FAQs
Create a new dataset by hand by Carolina Population Centre, University of North Carolina
List of math functions by Stata Help - can be used in combination of generate command to edit variables.
List of operators by Stata Help
Date variables by Data and Statistical Services, Princeton University --- This webpage tells you how to convert date variables into different formats (e.g. convert the variables of year, month, and day into one date variable etc.).
To categorize observations by percentile bins, use the command xtile. See this Statalist message.
UNIQUE: Stata module to report number of unique values in variable(s) --- Sometimes this ado command is useful. For example, you may want to know whether a particular variable takes more than one value for each group of observations. To see the detail, type “ssc install unique” to install the ado file and then type “help unique” for its help.
REGEXM: useful if you want to identify observations whose string variable contains a particular set of letters.
Loop over all values of a particular variable: there is a lesser-known command LEVELSOF, creating a local macro r(levels) which contains the list of all values of the specified variable.
SUMMARY STATISTICS
ESTPOST - This is part of the ESTOUT ado file package, automatizing the process of creating a table of summary statistics. Highly recommended.
Section 6 (pages 33-43) of Using Stata for Survey Data Analysis by Nick Minot at IFPRI --- Very useful, especially if you are analyzing household survey data.
How to conduct a t-test for survey data, by UCLA Academic Technology Service --- Useful if each observation in your data needs to be weighted according to the sampling method. See also how to use the SVY command.
Generating Regression and Summary Statistics Tables in Stata: A Checklist and Code, by Matthew Groh (May, 2014) --- Provides an example do file that uses the MAT2TXT Stata module.
ESTIMATIONS
Stata Textbook Examples: Econometric Analysis of Cross Section and Panel Data by Jeffrey M. Wooldridge, by UCLA Academic Technology Service --- First, find an example of the estimation method you want to conduct in Wooldridge's graduate econometrics textbook. Then log on to this webpage to see what Stata command does the estimation you want.
Beyond simple OLS estimation by UCLA Academic Technology Service - robust estimation, clustering, quantile regression, linear hypothesis testing, errors-in-variables regression (eivreg), censored/truncated data, SUR, multivariate regression, etc.
Fixed effects estimation
The XTREG command with the FE option (ie. fixed effects estimation) has recently been modified. See what’s new in Stata 10 (items 4, 5, and 7 in particular) and in Stata 11 (the fourth bullet point in particular).
Fixed Effects Estimation (xtreg command with fe option) by Stata FAQ - explains why there is a constant term in the estimation result table.
Differences among within, between, and overall R-squared obtained by the xtreg, fe command by Justin Smith (15 August 2006)
R squared in Fixed Effects Estimation by Stata FAQ - explains why reported R squared is different between xtreg, fe and areg. See also this note by Indiana University Information Technology Services. Theoretical background can be found in Hayashi's Econometrics textbook (page 333-4), for example. (This issue seems to be outdated with the xtreg command improved by Stata version 10 or higher.)
If you notice the areg command the xtreg command with the fe option produce different clustered standard errors from each other, read this.
Prais-Winsten panel regression: use the XTPCSE command. Examples include Rohlfs et al (2010).
Weighted least squares estimation
Weighted Least Squares when the variance of the error term is known by Stata Help
Choosing the Correct Weight Syntax by UNC Carolina Population Center - if you wonder what pweight, fweight, aweight, and iweight really mean.
Weighted Least Squares Regression by UCLA Academic Technology Service (See Deaton (1997) The Analysis of Household Surveys, pp.67-73, for the use of weighted least squares in the context of survey design.)
probit, logit, and other nonlinear regressions
MARGINS: a new command introduced since version 12, to report the average value of the predicted dependent variable by each specified value of regressors (if I understand corectly). Useful for interpreting estimated coefficients from nonlinear regressions, as explained by SSCC at University of Wisconsin-Madison.
INTEFF: this is an ado package to correctly estimate the magnitude and standard errors of the effect of an interaction term in nonlinear models such as probit and logit. See Ai and Norton (2003) for detail. This command, however, does not work if there are quite a few dummy variables as regressors. It seems the MARGINS and MARGINSPLOT commands supercede the INTEFF.
Event study
How to conduct an event study estimation with Stata by Data and Statistical Services, Princeton University
Attrition bias
Lee (2009)’s treatment effects bounds. In the case of attrition bias, this method is now the industry standard. Now you can easily do it in Stata with the leebounds command. New
Standard errors
Bootstrapping: See Lecture 4 (pages 6-8) in Programming in Stata, RLAB Data Service, London School of Economics.
X_OLS: Timothy Conley's standard error correction for spatial correlation. This is the standard way of calculating standard errors in the literature when you use the data where outcomes and regressors are spatially correlated.
Douglas Miller’s Stata code page contains a Stata do file to execute Cameron, Gelbach, and Miller (2008)’s Wild Bootstrap standard error clustering method, which is increasingly popular among applied microeconometric researchers when the number of clusters is small.
Matching estimation
CEM: Coarsened Exact Matching, by Iacus, King, and Porro (2008), for creating a control group whose observables are balanced against the treated group ex ante. Used by Azoulay, Zivin, and Wang (2010).
Matching Estimators ado file by Abadie, Drukker, Herr, and Imbens
Synth by Abadie, Diamond, and Hainmueller --- A method to estimate the treatment effect from observational data when only one unit is treated.
Pair-wise Mahalanobis matching with an optimal greedy algorithm: See page 209 of Bruhn and McKenzie (2009). This article’s replication data file (click “Download Data Set” on this webpage), contains a Stata code for this matching method.
AFTER EACH REGRESSION IS RUN...
How to interpret output tables that appear after executing estimation commmands such as summarize, regress, logistic, etc. by UCLA Academic Technology Service
reformat ado-file, by Sealed Envelop Ltd. - This ado-file is useful when you have tons of fixed-effects (e.g. country dummies) and are interested in coefficients on these dummies.
Stata Class 3, by Stas Kolenikov, Duke University - introduces commands after estimation for plotting residuals etc.
From version 10, you can save estimation results in the disk by the command estimates save. As a result, the ESTSAVE ado is no longer necessary to install.
parmest ado-file allows you to create a Stata data file of coefficient estimates along with t-values and p-values. By default, Stata does not store t-values and p-values after regressions. This ado-file is useful if you need to use t-values and/or p-values after each regression is run.
REPORT ESTIMATION RESULTS
ESTOUT - A great ado-file package to create a table of regression results either in the text file format, in the HTML format, or in the TeX format! It's more versatile than OUTREG2 (see below). It is slightly complicated but it's worth paying the fixed cost of learning how to use. To minimize the fixed cost, follow the following steps:
To install the package, see here.
First, learn how to use ESTSTO by reading this.
Then, learn how to use ESTTAB by reading this.
Only for fancier things to do, you need to learn ESTOUT (the more flexible version of ESTTAB) and ESTADD (the more flexible version of ESTSTO's ADDSCALARS option).
With the ESTOUT package, you can easily create a summary statistics table!
The ESTOUT package also allows you to include 'YES' or 'NO' to indicate whether a certain set of fixed effects are controlled for (a standard practice in labor economics type research). See this document.
Generating Regression and Summary Statistics Tables in Stata: A Checklist and Code, by Matthew Groh (May, 2014) --- If you prefer creating regression tables in the Excel format.
TABOUT - Seems to be a very useful ado for automating the process of creating any kinds of tables formatted to appear on an academic paper. Example Stata do files mentioned in this tutorial can be downloaded at the author’s website.
OUTREG2.ado - An improved version of OUTREG.ado (see below). It's less versatile than ESTOUT, but it's more flexible in producing a TeX file. One problem is that, after fixed effects estimation (areg or xtreg, fe), the nocons option does not work.
How to use outreg.ado, by Kellogg Research Computing, Northwestern University - probably the most useful explanation of outreg ado file, including the PDF file of outreg help file. When you want to use addstat option for reporting more than 10 statistics, outreg does not work properly. A solution can be found here (Statalist archives). (If you want to further convert the resulting EXCEL file into a LaTeX format, download EXCEL2LATEX here and extract the downloaded zip file into 'C:Documents and SettingsusernameApplication DataMicrosoftAddIns' (where 'username' is your own username). Then open the Excel and click 'Tools - Add-Ins...' and check the box for Excel2Latex. You'll see a new small icon in tool bars. Select the table you want to convert and then click the icon. Now you can create a TeX file of your table.)
How to report multinominal logit regression results with OUTREG, by Statalist
GRAPHICS
Online Tutorial for Making Graphs by Stata Corp. - An excellent website in the sense that you can choose the visual image (rather than picking the words like “bar graphs”, “scatter plots”, etc.) to learn how to make various types of graph.
How to make various types of graph(Follow links below the heading of 'Graphics') by UCLA Academic Technology Service - Useful if you want to make the twoway graphs.
BY option for GRAPH command by Stata Help - this is how to make graphs for each category (e.g. country by country).
BINSCATTER - A Stata package written by Michael Stepner, which allows you to create a scatter plot from (literally) millions of observations, by grouping observations into several intervals of the x variable and plotting the average value of the y variable for each group. (HT: David Seim)
Nonparametric regression curve in a scatter plot - search for 'nonparametric'.
Draw kernel density functions for each group in the same graph by UCLA Academic Technology Service
Guide to creating PNG images with Stata by Friedrich Huebler
How to create animated graphics using Stata, by Chuck Huber.
How to create a map from Stata by Friedrich Huebler
Drawing social networks in Stata with Netplot by Rense Corten --- if you are analyzing social network data.
PROGRAMMING
Programming in Stata, RLAB Data Service, London School of Economics: these are lecture notes for a Stata course at Department of Economics, LSE. Lectures 3 to 5 deal with how to make your own program with Stata (macro, looping, ado-file, etc.). Very useful.
How to display variable labels: See this Statalist message by Nick Cox on 27 May, 2010.
The CAPTURE command is useful when executing a do file, especially when you want to conduct different data processing steps depending on whether there is an error (which can be expressed as “if _rc0” in the Stata code). See the paragraphs below the heading “If as a Way to Control Program Flow” in this webpage.
How do I run Stata in batch mode? (Stata FAQ): if you want to run a do file without launching Stata interactively in Unix
TROUBLESHOOTING
If you always type “set memory 900m” after launching Stata because you use a large dataset, read this.
Stata Do File
If you run Stata on Windows and encounter an error message 'op. sys. refuses to provide memory, r(909)', you may want to consider ditching Windows. Here's why.
If you encounter an error message 'insufficient disk space, r(699)', see this Stata FAQ article.
If you encounter a warning message “Warning: variance matrix is nonsymmetric or highly singular”, see this post in Statalist by Jeff Pitblado of Stata Corp.
If you encounter an error message “could not rename c:adoplusstata.trk to c:adoplusbackup.trk r(699);” when you try to install an ado file by the “ssc install” command, read pages 47-48 of Lembcke (2009) “Introduction to Stata”. Unfortunately, this method does not change the Stata setting permanently. Everytime you use an ado file, you have to do this.
Stata Do File Line Break
FROM STATA TO OTHER SOFTWARE
Export tables to Excel, written by Kevin Crow on The Stata Blog.
How to transform dta file into csv file, by UCLA Academic Technology Service. If data contains many decimal places, make sure to use the format command before the outsheet command so that Stata won’t randomly round up values. If you don’t need the top row containing variable names, use the noname option.
Order command by Stata Help - if you want to change the order of variables in the table you create from the Stata dataset.
How to edit Stata graphs in Microsoft Word, by Stata FAQ
Stata tools for Latex, by UCLA Academic Technology Service - for those of you who write empirical papers with LaTeX.
TEXTBOOK EXAMPLES
Stata commands for examples in Wooldridge's graduate level textbook Econometric Analysis of Cross Section and Panel Data, by UCLA Academic Technology Service
Stata commands for examples in Wooldridge's undergrad level textbook Introductory Econometrics: A Modern Approach, by Boston College Academic Technology Support
Stata commands for Greene's textbook Econometric Analysis (4th ed.), by UCLA Academic Technology Service
How To Run Stata Do File
Accessible readings behind Stata commands
IVREG2
Murray, Michael P. (2006) 'Avoiding Invalid Instruments and Coping with Weak Instruments,' Journal of Economic Perspectives, 20(4), p. 128.
CLUSTER option for REGRESS
Deaton (1997) The Analysis of Household Surveys, pp.74-77.
Stata Do File Bookmarks
Bertrand et al. (2004) 'How Much Should We Trust Differences-in-differences Estimates?,' Quarterly Journal of Economics, vol.119, p.271.
KDENSITY
Deaton (1997) The Analysis of Household Surveys, pp.171-76.
The following websites may or may not be useful (I haven't checked them yet):
Tips for using Stata 10, by Survey Design and Analysis Services Pty Ltd
Useful Links by Kellogg Research Computing, Northwestern University
Stata Do File Bookmark
Stata materials by Stas Kolenikov, Duke University - includes very graphically well-presented Stata course notes.