赞
踩
命令 | 解释 | 示例 | 备注 |
pwd | 显示当前路径 | pwd | |
dir | 显示当前路径下所有文件 | dir | |
mkdir | 当前路径下新建文件夹 | mkdir d:/mydata | |
cd | 更改路径为 | cd d:/mydata | |
append | 纵向合并 ,个案拼接 | append using math | |
merge | 横向合并,变量合并 | merge using | |
xpose | 转置 | xpose,clear | |
cap | 悄悄运行,但无论正误继续do命令 | cap | 与quiet 区别 |
qui | 悄悄运行,遇错停止 | qui | |
duplicate | 重复值 | duplicate report 报告重复值 duplicate list 将重复值列表呈现 duplicate drop 删除重复值 | |
bys | sort by简写,字符型变量分组 | ||
sort | 升序 | ||
gsort | 降序 | ||
gen | 生成新变量,无函数 | gen newvar=var/10 | |
egen | 生成新变量,有函数 | egen newvar=mean(var) | |
xtile ,nq(n) | n等分分组 | ||
record | 重新赋值 | 简洁 |
- *使用日志文件
- log using c:\logfile //设置为smcl格式
- log using c:\logfile.log,text //设置为文本格式
- log using c:\logfile.log,text replace //覆盖日志文件
- log using c:\logfile.log,text append //添加到原有文件之后
- log off //暂时关闭
- log on //重新打开,对应的是log off
- log close //完全关闭,对应log using
-
- *其他常用第三方命令,首次使用需运行"ssc install 程序名"下载安装
- *回归结果导出
- outreg2 被解释变量 解释变量1 解释变量2 using regfile,replace seeout
- *日志文件导出,以计算pearson相关系数为例
- logout,save(my) excel replace:pwcorr lwt bwt,sig star(0.05)
- *最小二乘回归表导出
- esttab using test.rtf
-
- cd //查看当前工作目录
- cd "d:/CHFS_tracking" //设置当前工作目录
-
- by foreign : count if rep78 > 4
- describe
- codebook foreign
- list foreign price mpg rep78 in 1/5
- browse foreign price mpg rep78 in 1/5
-
- generate epi001=1 if foreign==1
- replace epi001=0 if foreign==0
-
-
- ttest price,by(foreign) //成组t检验
- logit foreign rep78 price
-
- *----------------------------------------*
- *========= 描述变量 ==========*
- *----------------------------------------*
-
- sysuse auto,clear
-
- tabstat price,by(foreign) stat(mean sd min max) //分层描述
-
-
- *summarize语句:描述连续性变量
- sysuse auto //使用软件自带的数据auto
- summarize price
- summarize mpg //
- summarize mpg,detail //显示其他更多信息
- summarize,separator (8) //显示所有变量的信息,且每八个变量分割一下
-
- *tabulate语句:描述分类变量
- tabulate foreign
- tab1 rep78 foreign //对多个变量逐个描述
- tabulate rep78 foreign //rep78和foreign的交叉表
- tab2 rep78 foreign //为指定变量列表中的变量生成任意可能组合的交叉表
-
- *----------------------------------------*
- *========= 循环语句 ==========*
- *----------------------------------------*
- *----------forvalues循环语句
- *带条件语句(if else)的循环语句
- forvalues x=1/9{
- if mod(`x',2){
- display "`x' is odd"
- }
- else{
- display "`x' is even"
- }
- }
- *带条件语句(if continue)的循环语句
- forvalues x=1/9{
- if mod(`x',2){
- display "`x' is odd"
- continue
- }
- display "`x' is even"
- }
- *提前终止(if continue break)循环的循环语句
- forvalues x=1/9{
- if mod(`x',2)==0{
- display "The first odd is `x'"
- continue,break
- }
- }
- *-----foreach循环语句in/of都可,但of效率更高,推荐使用
- *循环变量list为字符变量(变量名也可以简写,只写前几个字母,t*表示t开头的变量)
- foreach var of varlist pri-rep t*{
- quietly summarize `var'
- summarize `var' if `var' > r(mean)
- }
- *循环新变量(list为新建字符变量)
- foreach var of newlist z1-z4{
- generate `var' = runiform()
- }
- *循环数字,list为数值
- foreach num of numlist 1/4 8 103{
- display `num'
- }
- *注:foreach上述三种方法不够稳健,推荐使用下面的宏的方法
- *foreach of local macro,list为局部宏
- local grains "rice wheat flax"
- foreach x of local grains{
- display "`x'"
- }
- *foreach of global macro,list为全局宏
- global money "dollar lira pound"
- foreach y of global money {
- display "`y'"
- }
-
-
- *-------while循环语句------------*
- *continue,break可使命令跳出整个循环
- local i = 1
- while `i' < 4{
- if mod(`i',2)==0{
- display "The first odd is `i'"
- continue,break
- }
- display "The i is `i'"
- local i = `i' + 1
- }
- *exit命令可以退出当前程序或do命令
- local i = 1
- while `i' < 4{
- if mod(`i',2)==0{
- display "The first odd is `i'"
- exit
- }
- display "The i is `i'"
- local i = `i' + 1
- }
- *有多个嵌套循环,只想跳出当前循环,可以使用if break组合
- local i = 1
- while `i' < 4{
- if mod(`i',2)==0{
- display "The first odd is `i'"
- break
- }
- display "The i is `i'"
- local i = `i' + 1
- }
- *若while循环是递增或递减的,还可以使用以下方法:
- local i = 0
- while (`i++') < 4{
- if mod(`i',2) == 0{
- display "The first odd is `i'"
- continue,break
- }
- display "The i is `i'"
- }
- *----------------------------------------*
- *========= 报表制作 ==========*
- *----------------------------------------*
- *-------------tabulate命令---------------*
- *tabulate命令:生成二维频数表,一般简写为tab
- *选项:column——每列相对频数;row——每行相对频数;cell——每个单元格相对频数
- *expected——每个单元格期望频数;nofreq——不输出频数;nolabel——输出数值而不是数值标签
- webuse citytemp2
- tabulate region agecat,row column expected chi2
- *------------- table命令------------------*
- *table命令:生成概要统计表格
- *注意:contents(clist)用于定义表格输出的统计量,最多可选5个统计量,且统计量后边需要跟上变量名,即c(mean varname)
- *绘制一维表
- webuse auto2
- table rep78,c(n mpg mean mpg sd mpg med mpg) format(%9.2f) //按照rep78变量取值分组汇总mpg的频数、均值、标准差和中位数,且有小数的保留两位小数
- *绘制二维表,表格内容居中对齐,增加行合计和列合计
- table rep78 foreign,c(mean mpg) format(%9.2f) center row col
- *绘制三维表
- *sc——scolumn可以增加超级列合并,比如下面例子不仅按照race分为了other和white两大列,还会新形成一个total列
- *fw为赋权
- webuse byssin
- table workplace smokes race [fw=pop],c(mean prob) format(%9.3f) sc col row
- *绘制更高维度的表格——通过by()选项实现
- webuse byssin1
- table workplace smokes race [fw=pop],by(sex) c(mean prob)format (%9.3f) sc col row
- *-------------tabstat命令--------------------*
- *tabstat命令:生成精简统计表格
- *statistics(statname)用于定义表格输出的统计量
- webuse auto
- tabstat price weight mpg rep78 ,by(foreign) stat(mean sd min max) long format
- *----------------------------------------------*
- *==== 在数据集中生成包含统计信息的变量 =====*
- *---------------------------------------------*
- *-------------collapse命令--------------------*
- *包含概要统计的数据集:collapse命令
- *collapse命令用法:
- collapse (stat) varlist
- collapse (stat) target_var=varname
- *stat选项是用来指定输出的变量的
- webuse college,clear
- list
- collapse (mean) gpa hour (median) medgpa=gpa medhour=hour [fw=number],by(year)
- list
- *注意:此时的gpa和hour变量都是原gpa和hour的均值
- *-------------contract命令--------------------*
- *生成包含频数和百分数的数据集:contract命令
- *把原始数据整理成频数形式
- webuse auto2,clear
- list rep78 foreign
- contract rep78 foreign
- list
- expand _freq //转换成原始数据格式
- *-------------statsby命令--------------------*
- *按分类生成统计量:statsby
- webuse auto2,clear
- statsby,by(foreign):regress mpg gear turn
- list
- *只看gear变量对应的系数
- webuse auto2,clear
- statsby gear=_b[gear],by(foreign):regress mpg gear turn
- list
- *----------------------------------------------*
- *==== 将回归表格导出到excel =====*
- *---------------------------------------------*
- *命令1:比较全,可以把F统计量等信息也导出
- sysuse auto,clear
- regress price turn gear
- putexcel set "C:\results.xlsx",sheet("regress results")
- putexcel F1 = ("Number of obs") G1 = (e(N))
- putexcel F2 = ("F") G2 = (e(F))
- putexcel F3 = ("Prob > F") G3 = (Ftail(e(df_m),e(df_r),e(F)))
- putexcel F4 = ("R-squared") G4 = (e(r2))
- putexcel F5 = ("Adj R-squared") G5 = (e(r2_a))
- putexcel F6 = ("Root MSE") G6 = (e(rmse))
- matrix a = r(table)'
- matrix a = a[.,1..6] //指定输出的列数,现在只指定了输出回归表格前六列的内容
- putexcel A8=matrix(a,names)
- *命令2:只导出了回归结果,比较简单,适合只需要回归系数的情况
- putexcel B3 = matrix(r(table)',names) using "C:\results1.xlsx"
- *----------------------------------------------*
- *==== reshape命令 =====*
- *---------------------------------------------*
- webuse reshape3,clear
- reshape long inc@r ue,i(id) j(year)
- *----------------------------------------------*
- *==== 对变量进行编码 =====*
- *---------------------------------------------*
- *recode x2 (1 2 = 1)(3 = 2)(4/9 = 3) //会替代原始数据
- recode x2 (1 2 = 1)(3 = 2)(4/9 = 3),prefix(rec) //将编码后的数据放入新变量且新变量为原始变量名加前缀rec
- recode x2 (1 2 = 1)(3 = 2)(4/9 = 3)(nonmissing = 9),prefix(rec) //重新编码时其他非缺失值可以用nonmissing选项
- recode x2 (1 2 = 1 Below)(3 = 2 Average)(4/9 = 3 Above),prefix(rec2) label(reclab) //给新变量设置变量值标签
- recode x2 (1 2 = 1)(3 = 2)(4/9 = 3)(10/max),prefix(rec3)
- encode gender,gen(sex) //将字符变量编码为数值变量,比如将female编码为0,male编码为1
- decode sex,gen(gender1) //将数值变量编码为字符变量
- *----------------------------------------------*
- *==== 对缺失值进行编码 =====*
- *---------------------------------------------*
- *默认系统缺失值为","
- *变量值标签不会随编码自动改变,必要时需手动加以调整
- *将数值编码为缺失值的命令:mvdecode
- mvdecode rep78 ,mv(998=.\999=.a) //当rep78=998时将其编码为系统缺失值".",rep78=999时编码为扩展缺失值".a"
- mvdecode _all,mv(998=.\999=.a) //所有变量都按此规则编码
- *将缺失值编码为数值的命令:mvencode
- mvencode rep78 if foreign == 0,mv(998) //foreign=0时将rep78的缺失值编码为998
- mvencode rep78 if foreign == 1,mv(999) //foreign=1时将rep78的缺失值编码为999
- mvencode _all,mv(.=999\.a=998\.b=997\else=996)
- *----------------------------------------------*
- *==== 数值变量和字符变量的转换 =====*
- *---------------------------------------------*
- *字符变量转换为数值变量:destring和real()
- destring foreign , gen(foreignd)
- gen foreignd = real(foreign) //另一种方法
- destring foreign,replace
- destring foreign,replace force
- destring foreign,gen(foreignd) ignore(x) //变量中除了数值,还有x值,可选择忽略x值(设为缺失值)
- destring foreign,gen(foreignd) ingore(" ") //忽略空格
- destring trunk weight length turn,gen(trunkd weightd lengthd turnd) ignore("$,%") percent //同时忽略多个字符,百分数用小数表示
- *----------------------------------------------*
- *==== 数据集的纵向追加 =====*
- *---------------------------------------------*
- sysuse auto,clear
- keep if foreign == 0
- keep make price mpg rep78 headroom foreign
- save domestic
- sysuse auto,clear
- keep if foreign == 1
- save foreign
- use domestic,clear
- append using foreign,gen(_append) //_append为0表示来自主数据集,为1表示来自using的第一个数据集
- append using foreign,gen(_append) keep(make price mpg rep78 headroom foreign) //只选择某些变量追加到主数据集
- *----------------------------------------------*
- *==== 数据集的横向合并 =====*
- *---------------------------------------------*
- *1:1
- webuse autosize
- merge 1:1 make using http://www.stata-press.com/data/r13/autoexpense
- merge 1:1 make using http://www.stata-press.com/data/r13/autoexpense,keep(match) //仅保留完全匹配的记录
- *1:m
- webuse overlap2,clear
- merge 1:m id using http://www.stata-press.com/data/r13/overlap1 //如果主数据和using数据有重复变量,保留主数据的变量值
- merge 1:m id using http://www.stata-press.com/data/r13/overlap1,update
- merge 1:m id using http://www.stata-press.com/data/r13/overlap1,update replace
- *m:1
- webuse overlap1,clear
- merge m:1 id using http://www.stata-press.com/data/r13/overlap2
- merge m:1 id using http://www.stata-press.com/data/r13/overlap2,update
- merge m:1 id using http://www.stata-press.com/data/r13/overlap2,update replace
- *1:1序贯合并:没有关键变量,按记录号合并
- webuse sforce,clear
- merge 1:1 _n using http://www.stata-press.com/data/r13/dollars
- *注:1:m和m:1合并时,数据库间存在同名变量时:
- *默认以主数据中同名变量的变量值作为合并后的变量值
- *用update选项时:匹配的缺失值用using数据集的数据代替
- *用update replace选项时:匹配记录的值用using数据集的数据代替
- *在横向合并前使用cf命令查看主数据和从数据的变量的差别,具体可以help(cf)
- cf _all using http://www.stata-press.com/data/r13/autoexpense,all
- isid id //用来判断id的值是否唯一
- *----------------------------------------------*
- *==== 数据集的交叉合并:组内交叉 =====*
- *---------------------------------------------*
- *joinby命令:实现了m:m,具体help joinby
- use "D:\黄静\child.dta",clear
- describe
- list
- webuse parent,clear
- save "D:\黄静\parent.dta"
- use "D:\黄静\parent.dta",clear
- describe
- list
- sort family_id
- joinby family_id using child //只保留匹配上的记录
- joinby family_id using child,unmatched(both) //保留主数据和从数据的所有记录
- joinby family_id using child,unmatched(master) //保留主数据记录
- joinby family_id using child,unmatched(using) //保留从数据记录
- describe
- list
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。