一区二区三区电影_国产伦精品一区二区三区视频免费_亚洲欧美国产精品va在线观看_国产精品一二三四

聯(lián)系我們 - 廣告服務(wù) - 聯(lián)系電話:
您的當(dāng)前位置: > 關(guān)注 > > 正文

當(dāng)前熱門:【干貨】Python與STAT時間日期轉(zhuǎn)換問題

來源:CSDN 時間:2023-04-06 10:00:01

常見問題與解決方案

問題目錄:1. 時間變量的操作2. 索引的重建與設(shè)置3. 數(shù)據(jù)的排序4. 數(shù)據(jù)類型的轉(zhuǎn)化5. 異常處理 -跳過異常繼續(xù)執(zhí)行6. CSV 與 Excel的 批量轉(zhuǎn)換7. 文件操作8. 數(shù)據(jù)讀入與寫入9. 缺失值問題 np.nan,None,NaT10. 使用Python輸出回歸表格11. 特殊結(jié)構(gòu)數(shù)據(jù)整理12. 估計系數(shù)聯(lián)合檢驗(Wald tests via statsmodels)13. Python 與 STATA 時間日期轉(zhuǎn)換問題14. Python執(zhí)行面板數(shù)據(jù)的Hausman檢驗15. Fama Macbeth回歸、滾動回歸等等16. 事件研究法 event study生創(chuàng)創(chuàng)如注K新UF導(dǎo)導(dǎo)出導(dǎo)入


(資料圖)

問題目錄:

Python,作為當(dāng)今主流的編程語言,受到全世界愛好者的追捧。如果你是一名科研小白,導(dǎo)師也許要求你使用它來完成一些任務(wù)。確實,入門容易,看看市面上種類繁多的教科書就行,但是,能夠做到靈活運用又是談何容易,Python語句編寫的靈活性常常讓很多的文科生想死的感覺都有。以下的內(nèi)容就是鄙人在講授相關(guān)課程時學(xué)生容易產(chǎn)生疑問的知識點,不是很全面,還在持續(xù)的更新之中…。

1. 時間變量的操作

主要的時間模塊有包括time、datetime和calendar。在Python中表示時間的方式:

時間戳:10位整數(shù)位和若干小數(shù)位,例如 1551153156.6358607元組(struct_time): 含有9個元素的元組,例如 (tm_year=2011, tm_mon=9, tm_mday=28, tm_hour=10, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=271, tm_isdst=-1)格式化字符串: 格式化的時間字符串, 例如 ‘2019-02-26 12:45:46’time模塊,以元組(struct_time)為核心實現(xiàn)時間戳和格式化時間字符串的相互轉(zhuǎn)換。datetime模塊,以datetime類實例對象為核心實現(xiàn)時間戳和格式化時間字符串的相互轉(zhuǎn)換。

import time#1.時間戳------->時間元組:time1 = time.time() #顯示的是時間戳tuple = time.gmtime(time1)  # UTC時間 時間元組print(tuple)#time.struct_time(tm_year=2021, tm_mon=11, tm_mday=3, tm_hour=16, tm_min=9, tm_sec=20, tm_wday=2, tm_yday=307, tm_isdst=0)tuple1 = time.localtime(time1)  # UTC + 8 時間 時間元組print(tuple1)#time.struct_time(tm_year=2021, tm_mon=11, tm_mday=4, tm_hour=0, tm_min=9, tm_sec=20, tm_wday=3, tm_yday=308, tm_isdst=0)#2.時間元組-------->時間戳:tuple2 = time.localtime()time2 = time.mktime(tuple2)print(time2)#1635955760.0#3.時間元組--------->字符串:tuple = time.localtime()strTime = time.strftime("%Y-%m-%d %H:%M:%S",tuple)print(strTime)#2021-11-04 00:09:20strTime1 = time.strftime("%Y{y}%m{m}%dlvdvzhrlx %H{h}%M{m1}%S{s}",tuple).format(y="年",m="月",d="日",h="時",m1="分",s="秒") #"2021年11月02日 23時49分53秒"print(strTime1)#2021年11月04日 00時11分49秒#4.字符串---------->時間元組:tupleTime = time.strptime("2018年01月05日","%Y{y}%m{m}%dlvdvzhrlx".format(y="年",m = "月",d="日"))print(tupleTime)#time.struct_time(tm_year=2018, tm_mon=1, tm_mday=5, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=5, tm_isdst=-1)

時間元組(time.struct_time) 時間格式化

參考文獻(xiàn):python 之 時間戳 Python中的時間元組與時間日期 Python3 日期和時間 Python日期時間(詳細(xì))

在實戰(zhàn)中,數(shù)據(jù)常常存儲在DataFrame之中,時間 date變量 可以是 作為index存在的,也可以是作為一個單獨的變量存在的。格式為:2021-11-10 或者 20211110 或者2021/11/10 或者2021-11-10 00:00:00 等等樣子。

首先,我們在 IPYTHON的命令窗口 In 1: 輸入 df.dtypes(假設(shè)數(shù)據(jù)對象是df) 或者輸入 df.info(),查看各個變量的個數(shù),以及各個變量的屬性。

如果date是字符型則顯示為 object, 如果是數(shù)字型則顯示為 int64 或者 float64, 如果是日期型則顯示為 datetime64[ns]。

a.如果時間 date變量 是 作為index存在的(一般情況下通過 API提取的數(shù)據(jù)存在這樣的情況,并且這時的date是時間型datetime64[ns]),下面我們要提取它的年月日。df[‘year’] = df.index.year df[‘month’] = df.index.month df[‘day’] = df.index.day

b.如果是date是 數(shù)字型變量(20101203),顯示為 int64 或者 float64。df[“year”] = df[“date”].apply(lambda x: int(str(x)[:4])) #year 數(shù)字型 df[“year2”] = df[“date”].apply(lambda x: str(x)[:4]) #year2字符型

df[“month”] = df[“date”].apply(lambda x: int(str(x)[4:6])) df[“day”] = df[“date”].apply(lambda x: int(str(x)[6:]))

c.如果是date是 字符型變量(“20101203”),顯示為 object。方法一: import pandas as pd from datetime import datetime df[“date2”]=pd.to_datetime(df[“date”]) #字符型轉(zhuǎn)化為時間型 datetime64[ns] df[“year”] =pd.to_datetime(df[“date”]).dt.year #year數(shù)字型 df[“month”] =pd.to_datetime(df[“date”]).dt.month df[“day”] =pd.to_datetime(df[“date”]).dt.day

方法二: df[“year”] = df[“date”].apply(lambda x: int(x[:4])) #year數(shù)字型 df[“month”] = df[“date”].apply(lambda x: int(x[4:6])) df[“day”] = df[“date”].apply(lambda x: int(x[6:]))

方法三:使用astype

df[‘date’].astype("datetime’64)

方法使四:用datetime df[‘date’] = df[‘date’].apply(lambda x: datetime.strptime(x,’%Y-%m-%d’))

將日期轉(zhuǎn)換為月份M、季度Q df[‘period’] = df[‘date’].dt.to_period(‘M’)

d.如果date是日期型,則顯示為 datetime64[ns]。df[“year”] =pd.to_datetime(df[“date”]).dt.year #year數(shù)字型 df[“month”] =pd.to_datetime(df[“date”]).dt.month df[“day”] =pd.to_datetime(df[“date”]).dt.day

e.如果date是python將excel讀取的日期文本數(shù)字 轉(zhuǎn)為日期(5位數(shù)字時間戳)。

如果時間戳是很長的位數(shù),則:

df[‘gtime’]=pd.to_datetime(df[‘gtime’],unit=‘s’)) #以上以默認(rèn)1970-01-01為0點位

或者:

df[‘time_stamp’]=pd.to_datetime(df[‘time_stamp’],unit=‘s’,origin=pd.Timestamp(‘2018-07-01’)) #這個的意思是將time_stamp這列的時間戳轉(zhuǎn)換為日期格式,單位是秒(unit=‘s’),計算的日期是從2018年的7月1號開始,即從2018年的7月1號開始加上時間戳的那么多秒。

下面以excel讀入時間數(shù)據(jù)(時間變?yōu)榭此聘↑c型的數(shù)字了,并且是5位數(shù)字時間戳,開始的時候確實有點懵逼-^^-)為例來講解:

#這列fin46本來在excel中是日期,讀入Python后變?yōu)閿?shù)字了,并且還有很多的缺失值nan#如果直接使用時間轉(zhuǎn)換函數(shù),可能轉(zhuǎn)不了,需要先將nan轉(zhuǎn)為數(shù)值0df_final["fin462"]=df_final["fin46"].fillna(0) from datetime import datetime#以下是時間轉(zhuǎn)換函數(shù)def date_change(stamp):    delta = pd.Timedelta(str(stamp)+"D")    real_time = pd.to_datetime("1899-12-30") + delta    return real_time#套用以上函數(shù),轉(zhuǎn)為 2021-11-16 00:00:00 字符型格式df_final["fin463"]=df_final["fin462"].apply(date_change)#去掉時分秒df_final["fin463"] = df_final["fin463"].dt.date #或者代碼df_final["fin4633"] = df_final["fin463"][0].strftime("%Y-%m-%d")#將日期 fin463 轉(zhuǎn)換為月份、季度df_final["period"] = df_final["fin463"].dt.to_period("M")df_final["period2"] = df_final["fin463"].dt.to_period("Q")#把一些本來是缺失的時間變?yōu)? 空值df_final.loc[df_final["fin462"] == 0,"fin463"] = None#字符型轉(zhuǎn)化為時間型 datetime64[ns]df_final["date2"]=pd.to_datetime(df_final["fin463"]) #刪除中間變量,因為他們已經(jīng)無用武之地del df_final["fin462"]del df_final["fin463"]#生成三列:年月日df_final["year"] =pd.to_datetime(df_final["date2"]).dt.year #year數(shù)字型df_final["month"] =pd.to_datetime(df_final["date2"]).dt.monthdf_final["day"] =pd.to_datetime(df_final["date2"]).dt.day

參考文章: https://blog.csdn.net/weixin_41261833/article/details/104839119 https://vimsky.com/examples/usage/python-pandas-period-strftime.html https://vimsky.com/zh-tw/examples/usage/python-pandas-datetimeindex-strftime.html https://www.cnblogs.com/shadow1/p/10951979.html https://www.cnblogs.com/nxf-rabbit75/p/10660317.html https://blog.csdn.net/u010591976/article/details/104253489?ivk_sa=1024320u

參考文章:

a = "44042"import pandas as pddef date(stamp):    delta = pd.Timedelta(str(stamp)+"D")    real_time = pd.to_datetime("1899-12-30") + delta    return real_timeprint(str(date(a)).split(" ")[0])

原文鏈接:https://blog.csdn.net/qq_57173722/article/details/121030038

io = "中國數(shù)據(jù).xlsx"dt = pd.read_excel(io, sheet_name = 0)dt1 = pd.read_excel(io, sheet_name = 1)def date(para):    if type(para) == int:        delta = pd.Timedelta(str(int(para))+"days")        time = pd.to_datetime("1899-12-30") + delta        return time    else:        return paradt["日期"] = dt["日期"].apply(date)dt

還有這種時間顯示的格式,pandas 讀excel,日期變成了數(shù)字,pandas方法解決:

import pandas as pddata = pd.read_excel("文件路徑")data["發(fā)貨日期"] = data["發(fā)貨日期"].fillna(method="ffill")  # 因為有合并單元格,datadef date(para):    delta = pd.Timedelta(str(int(para))+"days")    time = pd.to_datetime("1899-12-30") + delta    return timedata["發(fā)貨日期"] = data["發(fā)貨日期"].apply(date)data

計算兩個日期的時間(天數(shù))間隔。

import timedef demo(day1, day2):    time_array1 = time.strptime(day1, "%Y-%m-%d")    timestamp_day1 = int(time.mktime(time_array1))    time_array2 = time.strptime(day2, "%Y-%m-%d")    timestamp_day2 = int(time.mktime(time_array2))    result = (timestamp_day2 - timestamp_day1) // 60 // 60 // 24    return resultday1 = "2018-07-09"day2 = "2020-09-26"day_diff = demo(day1, day2)print("兩個日期的間隔天數(shù):{} ".format(day_diff))#或者:def is_leap_year(year):    if (year % 4 == 0 and year % 100 != 0) or year % 400 == 0:        return 1    else:        return 0def get_days(year, month, day):    days = 0    month_days = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]    if is_leap_year(year):        month_days[1] = 29    for i in range(1, year):        year_day = 365        if is_leap_year(i):            year_day = 366        days += year_day    for m in range(month - 1):        days += month_days[m]    days += day    return daysdef get_result(start_time, end_time):    res = end_time - start_time    return resyear1, month1, day1 = 2018, 7, 9year2, month2, day2 = 2020, 9, 26days1 = get_days(year1, month1, day1)days2 = get_days(year2, month2, day2)day_diff = get_result(days1, days2)print("兩個日期的間隔天數(shù):{} ".format(day_diff))

2. 索引的重建與設(shè)置

關(guān)鍵詞:reset_index

reset_index可以還原索引,重新變?yōu)槟J(rèn)的整型索引。

df.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”)常用的是: df.reset_index(drop=False, inplace=True)

a.如果數(shù)據(jù)框df 索引 和 變量 都存在一個 date, 重建索引時要使用: df.reset_index(drop=True,inplace=True) 這樣date就能夠順利從index角色變?yōu)?普通變量的角色,方便 再次重新編制索引了。

b.如果數(shù)據(jù)框df 索引 與變量不存在重復(fù), 重建索引時要使用: df.reset_index( inplace=True) #索引變?yōu)槠胀ㄗ兞? df.reset_index(drop=True, inplace=True) #原索引被刪除了

當(dāng)然,前期在重建索引時,最好不要刪除原索引,后期可以隨時刪除: df.drop(“date”,axis=1,inplace=True) 或者: del df[“date”] #都能夠刪除這個原來的索引date.

關(guān)鍵詞:set_index

DataFrame可以通過set_index方法,可以設(shè)置單索引和復(fù)合索引。

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

常用的是: df.set_index(“code”,drop=False, inplace=True)

3. 數(shù)據(jù)的排序

要對pandas.DataFrame和pandas.Series進行排序,可以使用sort_values()和sort_index()方法。

a.按元素排序sort_values()(如果要更改原始對象,則參數(shù)inplace=True)

df_s = df.sort_values(‘state’) #按照一個變量排序 df_s = df.sort_values(‘state’, ascending=False)#按照一個變量排序,降序排 df_s = df.sort_values([‘a(chǎn)ge’, ‘state’], ascending=[True, False]))#按照多個變量排序,降或升序排都可以設(shè)置 df_d .sort_values(by=1, axis=1, ascending=False, inplace=True)#按照列來排序

b. 按索引排序(行名/列名)sort_index()

與sort_values()一樣,默認(rèn)值為升序。如果要使用降序,請將升序參數(shù)設(shè)置為False。 df_s = df.sort_index(ascending=False)

按列名列排序(參數(shù)axis),意義不大。 df.sort_index(axis=1, ascending=False, inplace=True)

c.多層索引的取值

loc使用的是標(biāo)簽索引,iloc使用的是位置索引。但是,iloc的取值并不會受多層索引影響,只會根據(jù)數(shù)據(jù)的位置索引進行取值。

s.loc[‘張三’] s.loc[‘張三’,‘期中’] s.loc[:,‘期中’] s.iloc[0] df.loc[‘張三’].loc[‘期中’] df.loc[(‘張三’,‘期中’)]

多層索引的排序:DataFrame按行索引排序的方法是sort_index(),df.sort_index()中的level參數(shù)可以指定是否按照指定的層級進行排列,第一層級索引值為0,第二層級索引值為1。df.sort_index(level=0,ascending=False).

4. 數(shù)據(jù)類型的轉(zhuǎn)化

在實際的數(shù)據(jù)處理中,常見的數(shù)據(jù)類型包括了:numpy中的array、list、tuple、dict、pandas中的series、pandas中的dataframe。我們在套用他人的代碼的時候,很多同學(xué)經(jīng)常出錯,原因就是別人代碼輸入端的規(guī)定數(shù)據(jù)類型和你的數(shù)據(jù)類型不一樣,對象不一樣,對象的屬性和方法就出現(xiàn)了很大的差異,那程序當(dāng)然運行不了呀。所以,你需要理解別人的代碼,然后把我們自己的數(shù)據(jù)轉(zhuǎn)化為 合格的數(shù)據(jù)類型。

a. DataFrame 轉(zhuǎn)化為 List 類型

import pandas as pd import numpy as np from pandas import DataFrame

data_file = ‘d:/Users/qiguan/Downloads/data.xlsx’ #data_file 數(shù)據(jù)文件路徑

data = pd.read_excel(data_file, index_col=None)#讀取文件數(shù)據(jù)

data_array = np.array(data)#首先將pandas讀取的數(shù)據(jù)轉(zhuǎn)化為array

data_list =data_array.tolist() #然后轉(zhuǎn)化為list形式

b. 把DataFrame轉(zhuǎn)成Series類型,改變列中值的類型方法

b1. 使用 pd.Series把dataframe轉(zhuǎn)成Seriesimport pandas as pd df_series = pd.Series(df[‘Value’].values, index=df[‘Date’])

b2.使用astype改變列中的值的類型,注意前面要有npimport numpy as np df[‘列名’] = df[‘列名’].astype(np.int64)

c. DataFrame類型轉(zhuǎn)換成Numpy中array類型import numpy as np import pandas as pd df=pd.DataFrame({‘A’:[1,2,3],‘B’:[4,5,6],‘C’:[7,8,9]})

c1.使用DataFrame中的values方法np1 = df.values #type(np1) 查看類型為 numpy.ndarray

c2.使用Numpy中的array方法np2 = np.array(df) #type(np2) 查看類型為 numpy.ndarray 2種方法效果相同,都能實現(xiàn)DataFrame到array的轉(zhuǎn)換

5. 異常處理 -跳過異常繼續(xù)執(zhí)行

python一般使用try…except…處理異常,這樣子的話,保障程序在遇到意外情況的時候,能夠繼續(xù)執(zhí)行下去。當(dāng)然,這個辦法也有缺陷,就是丟失掉了一些數(shù)據(jù),失去了改正的機會。要是想改正,我們就必須讓程序返回錯誤的信息是什么才好下手去找錯。

比如,很多時候代碼只要有一個異常,程序就不繼續(xù)執(zhí)行了。異常情況是有很多的,有的你想也想不到。那么,當(dāng)循環(huán)中出現(xiàn)異常時,如何跳過循環(huán)中的異常繼續(xù)執(zhí)行。比如當(dāng)我move一個文件的時候,如果目標(biāo)文件不存在,程序可跳過異常,繼續(xù)執(zhí)行,下面是一種可行的方法:

import pandas as pddates=range(20161010,20161114)pieces=[]for date in dates:    try:        data=pd.read_csv("A_stock/overview-push-%d/stock overview.csv" %date, encoding="gbk")        pieces.append(data)    except Exception as e:        pass  #存有異常,但是跳過現(xiàn)在的錯誤的當(dāng)前循環(huán),繼續(xù)執(zhí)行下一步循環(huán)。    continue #無論是否有異常,都將繼續(xù)執(zhí)行 continue后面的命令,但是這里沒有了后續(xù)命令,所以這兒的pass continue結(jié)構(gòu)可以修改,即 pass改為continue,不過,如果原始的continue后面有語句則不同了。data=pd.concat(pieces)

下面的代碼相對完善一點:

try:    #x=1/0   #這個錯誤對應(yīng)于 except Exception as e:    print("========>1")    name    print("==========>2")    l = [1, 2, 3]    # l[100]    print("==========>3")    d={}    d["name"]    print("=============>4")except NameError as e:    print("----------->", e)    print(1)except IndexError as e:    print("----------->", e)    print(2)except KeyError as e:    print("----------->", e)    print(3)except Exception as e:#異常不是上面的幾種類型,但仍存有異常,則執(zhí)行這個語句    print("統(tǒng)一的處理方法",333)else:   #所有的異常都沒有用發(fā)生,則執(zhí)行下面的語句    print("在被監(jiān)測的代碼塊沒有發(fā)生異常時執(zhí)行")finally:  #不管上面的語句是否正確,都能執(zhí)行下面的語句。    print("不管被監(jiān)測的代碼塊有無發(fā)生異常都會執(zhí)行")print("執(zhí)行接下去的代碼")

try except 與pass continue結(jié)合編寫代碼:

(1)為了跳過for循環(huán)里的某次循環(huán),以下代碼當(dāng)某次循環(huán)發(fā)生錯誤時,執(zhí)行except代碼塊,continue跳過該次循環(huán):

x=5for i in range(x):    try:        i += 1        print(i)    except:        pass #這兒pass可以換為 continue    print("kk:",i)

(2)還可以寫成這樣,遇到錯誤執(zhí)行except代碼塊,pass忽略錯誤并繼續(xù)往下運行,略有不同的就是無論程序錯誤與否都會運行到continue這一行代碼:

x=5for i in range(x):    try:        i += 1        print(i)    except:        pass    continue

如果continue后面有語句情況就不一樣了。

x=5for i in range(x):    try:        i += 1        print(i)    except:        pass    continue    print("kk:",i)

修改一下:

x=5for i in range(x):    try:        i += 1        print(i)    except:        continue    print("kk:",i)

(3)還有一種用法就是遇到錯誤時直接中斷整個for循環(huán):

try:    for i in range(x):        i += 1        print(i)except:    pass

pass, continue 與條件語句的結(jié)合:

var = 10                    while var > 0:                 var = var -1   if var == 5:      pass   print ("當(dāng)前變量值 :", var)print( "Good bye!")

var = 10                    # 第二個實例while var > 0:                 var = var -1   if var == 5:      continue    print ("當(dāng)前變量值 :", var)print( "Good bye!")

綜上,continue 終止本次循環(huán)功能不變,pass無所事事的功能不變,但continue在與 pass結(jié)合起來,并與條件語句、循環(huán)語句使用的時候,要看清楚程序的結(jié)構(gòu)層次,注意不同寫法的差異。

6. CSV 與 Excel的 批量轉(zhuǎn)換

在實際生活工作中,我們經(jīng)常Csv文件與Excel文件,但是這兩類文件通常情況下不能直接轉(zhuǎn)換,Csv是逗號分隔數(shù)據(jù)的純文本文件,可以保存任何文本類型數(shù)據(jù),但不能保存格式、公式、宏等),需要手動另存對應(yīng)格式文件。文件數(shù)量少還好,一旦涉及文件數(shù)量較多時,重復(fù)“另存為”操作很繁鎖,而且往往會因編輯后未及時保存而丟失加工后的數(shù)據(jù)。

6.1 Excel轉(zhuǎn)換為Csv (1)讀取當(dāng)前目錄下Excel文件 (2)讀取Excel文件內(nèi)容 (3)轉(zhuǎn)換為Csv文件

#excel文件轉(zhuǎn)化為csv文件def excel_to_csv(in_path,out_path):    #獲取當(dāng)前目錄下所有的excel文件    file_list=[file for file in os.listdir(in_path) if os.path.splitext(file)[1][1:]=="xlsx" or os.path.splitext(file)[1][1:]=="xls" ]    #遍歷每個文件    for file in file_list:        #讀取文件數(shù)據(jù)        df=pd.read_excel(os.path.join(in_path,file),index_col=0)        #轉(zhuǎn)換為csv文件,不保存索引        df.to_csv(os.path.join(out_path,os.path.splitext(file)[0]+".csv"),index=False)    #輸出提示信息    print("轉(zhuǎn)換完成")

6.2 Csv轉(zhuǎn)換為Excel (1)讀取當(dāng)前目錄下Csv文件 (2)讀取Csv文件內(nèi)容 (3)轉(zhuǎn)換為Excel文件

#csv文件轉(zhuǎn)化為excel文件def csv_to_excel(in_path,out_path):    #獲取當(dāng)前目錄下所有的csv文件    file_list=[file  for file in os.listdir(in_path) if os.path.splitext(file)[1][1:]=="csv"]    #遍歷每個文件    for file in file_list:        #讀取文csv件數(shù)據(jù)        df=pd.read_csv(os.path.join(in_path,file),index_col=0)        #轉(zhuǎn)換為excel文件,不保存索引        df.to_excel(os.path.join(out_path,os.path.splitext(file)[0]+".xlsx"),index=False)    #輸出提示信息    print("轉(zhuǎn)換完成")#讀者可自己運行觀察結(jié)果

7. 文件操作

參考文獻(xiàn):Python文件操作,看這篇就足夠 python 讀寫、創(chuàng)建 文件 python 文件操作,最全的一個(轉(zhuǎn)) Python文件操作

8. 數(shù)據(jù)讀入與寫入

參考文獻(xiàn):讀寫Excel文件第三方庫匯總

9. 缺失值問題 np.nan,None,NaT

文獻(xiàn)1 文獻(xiàn)2 https://blog.csdn.net/sinat_26811377/article/details/103216680 https://blog.csdn.net/laicikankna/article/details/106881864

10. 使用Python輸出回歸表格

https://github.com/divinites/stargazer-1 使用Stargazer模塊 https://lost-stats.github.io/Presentation/Tables/Regression_Tables.html https://qiita.com/huda_1629/items/d47d023936003ccf3fd1 https://cloud.tencent.com/developer/news/273282 利用python輸出規(guī)范的回歸結(jié)果,可惜百度網(wǎng)盤共享的 代碼不見了。 https://cloud.tencent.com/developer/article/1739458 可以參考一下 https://github.com/YangShuangjie/summary3 原來可以運行,現(xiàn)在依賴包升級了,這個用不了了,希望哪個大神也升級一下這個程序。

from stargazer.stargazer import Stargazer model1_stargazer = Stargazer([model1]) model1_stargazer

import osos.chdir("d:\\")   #設(shè)置一下HTML輸出的目錄地址import pandas as pdfrom sklearn import datasetsimport statsmodels.api as smfrom stargazer.stargazer import Stargazerfrom IPython.core.display import HTMLdiabetes = datasets.load_diabetes()df = pd.DataFrame(diabetes.data)df.columns = ["Age", "Sex", "BMI", "ABP", "S1", "S2", "S3", "S4", "S5", "S6"]df["target"] = diabetes.targetest = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[0:4]])).fit()est2 = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[0:6]])).fit()est3 = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[[2,4]]])).fit()est4 = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[3:6]])).fit()stargazer = Stargazer([est, est2, est3, est4])HTML(stargazer.render_html()) #網(wǎng)頁上顯示表格#stargazer.render_html() #展示HTML原始代碼#或者如下代碼可以幫我們?nèi)ゴ鎯Y(jié)果,然后在Excel里面去編輯完美一點:stargazer_tab = Stargazer([est, est2, est3, est4])stargazer_tabopen("regression.html", "w").write(stargazer_tab.render_html())  # for html

"

\n

Dependent variable:target

(1)(2)(3)(4)

ABP416.674***397.583***670.157***

(69.495)(70.870)(71.217)

Age37.24124.704

(64.117)(65.411)

BMI787.179***789.742***921.169*

(65.424)(66.887)(64.409)

S1197.852113.167202.038

(143.812)(64.409)(158.070)

S2-169.251-23.728

(142.744)(156.064)

Sex-106.578-82.862

(62.125)(64.851)

const152.133***152.133***152.133***152.133***

(2.853)(2.853)(2.967)(3.277)

Observations442442442442

R20.4000.4030.3490.207

Adjusted R20.3950.3950.3460.201

Residual Std. Error59.976 (df=437)59.982 (df=435)62.367 (df=439)68.901 (df=438)

F Statistic72.913***(df=4; 437)48.915***(df=6; 435)117.417***(df=2; 439)38.032(df=3; 438)

Note:\n p<0.1;\n p<0.05;\n ***p<0.01\n

以上代碼在notebook中運行代碼,結(jié)果貼在CSDN中后有點小瑕疵,應(yīng)該是CSDN編譯的問題,在notebook中沒有問題。

https://github.com/YangShuangjie/summary3

# Load the data and fitimport numpy as npimport pandas as pdimport statsmodels.api as smimport statsmodels.formula.api as smf# olsdat = sm.datasets.get_rdataset("Guerry", "HistData").datares_ols = smf.ols("Lottery ~ Literacy + np.log(Pop1831)", data=dat).fit()#glmdata = sm.datasets.scotland.load()data.exog = sm.add_constant(data.exog)gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())res_glm = gamma_model.fit()# geedata = sm.datasets.get_rdataset("epil", package="MASS").datafam = sm.families.Poisson()ind = sm.cov_struct.Exchangeable()mod = smf.gee("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)res_gee = mod.fit()# logitspector_data = sm.datasets.spector.load()spector_data.exog = sm.add_constant(spector_data.exog)logit_mod = sm.Logit(spector_data.endog, spector_data.exog)res_logit = logit_mod.fit()# load panel data and fit the modelfrom linearmodels.datasets import wage_paneldata = wage_panel.load()year = pd.Categorical(data.year)data = data.set_index(["nr", "year"])data["year"] = yearfrom linearmodels.panel import PooledOLSexog_vars = ["black","hisp","exper","expersq","married", "educ", "union", "year"]exog = sm.add_constant(data[exog_vars])mod = PooledOLS(data.lwage, exog)res_pooled = mod.fit()from linearmodels.panel import PanelOLSexog_vars = ["expersq","union","married"]exog = sm.add_constant(data[exog_vars])mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True)res_fe_re = mod.fit()from linearmodels.panel import FirstDifferenceOLSexog_vars = ["exper","expersq", "union", "married"]exog = data[exog_vars]mod = FirstDifferenceOLS(data.lwage, exog)res_fd = mod.fit()exog_vars = ["black","hisp","exper","expersq","married", "educ", "union"]exog = sm.add_constant(data[exog_vars])mod = PooledOLS(data.lwage, exog)res_robust = mod.fit(cov_type="robust")res_clust_entity = mod.fit(cov_type="clustered", cluster_entity=True)res_clust_entity_time = mod.fit(cov_type="clustered", cluster_entity=True, cluster_time=True)#我看了一下summary3的源代碼,作者添加了 to_excel 和 to_csv 等函數(shù)。現(xiàn)在不能用了from summary3 import summary_col # summary_col(res_ols)summary_col([res_ols]) print(summary_col([res_ols,res_glm,res_gee,res_logit],more_info=["df_model","scale"]))print(sumary_col([res_fe_re,res_fd,res_robust,res_clust_entity,res_clust_entity_time],             regressor_order=["black"],show="se",title="Panel Results Summary Table"))summary_col([res_glm,res_logit]).to_excel()summary_col([res_clust_entity,res_fd]).to_csv("your path\\filename.csv")

# -*- coding: utf-8 -*-"""Created on Sun Dec  5 22:10:44 2021@author: Administrator"""import pandas as pdimport numpy as npfrom statsmodels.datasets import grunfelddata = grunfeld.load_pandas().datadata.year = data.year.astype(np.int64)# MultiIndex, entity - timedata = data.set_index(["firm","year"])from linearmodels import PanelOLSmod = PanelOLS(data.invest, data[["value","capital"]])fe_res_VS = mod.fit(cov_type="clustered", cluster_entity=True)pd.set_option("precision", 4)pd.options.display.float_format = "{:,.4f}".formatReg_Output_FAmount= pd.DataFrame()#1) Table1 = pd.DataFrame(fe_res_VS.params)Table1["id"] = np.arange(len(Table1))#create numerical index for pd.DataFrameTable1 = Table1.reset_index().set_index(keys = "id")#set numercial index as new indexTable1 = Table1.rename(columns={"index":"parameter", "parameter":"coefficient 1"})P1 = pd.DataFrame(fe_res_VS.pvalues)P1["id"] = np.arange(len(P1))#create numerical index for pd.DataFrameP1 = P1.reset_index().set_index(keys = "id")#set numercial index as new indexP1 = P1.rename(columns={"index":"parameter"})Table1 = pd.merge(Table1, P1, on="parameter")Table1["significance 1"] = np.where(Table1["pvalue"] <= 0.01, "***",\       np.where(Table1["pvalue"] <= 0.05, "**",\       np.where(Table1["pvalue"] <= 0.1, "*", "")))Table1.rename(columns={"pvalue": "pvalue 1"}, inplace=True) SE1 = pd.DataFrame(fe_res_VS.std_errors)SE1["id"] = np.arange(len(SE1))#create numerical index for pd.DataFrameSE1 = SE1.reset_index().set_index(keys = "id")#set numercial index as new indexSE1 = SE1.rename(columns={"index":"parameter", "std_error":"coefficient 1"})SE1["parameter"] =  SE1["parameter"].astype(str) + "_SE"SE1["significance 1"] = ""SE1 = SE1.round(4)SE1["coefficient 1"] = "(" + SE1["coefficient 1"].astype(str) + ")"Table1 = Table1.append(SE1)Table1 = Table1.sort_values("parameter")Table1.replace(np.nan,"", inplace=True)del P1del SE1fe_res_CVS = mod.fit(cov_type="clustered", cluster_entity=True)#2) Table2 = pd.DataFrame(fe_res_CVS.params)Table2["id"] = np.arange(len(Table2))#create numerical index for pd.DataFrameTable2 = Table2.reset_index().set_index(keys = "id")#set numercial index as new indexTable2 = Table2.rename(columns={"index":"parameter", "parameter":"coefficient 2"})P2 = pd.DataFrame(fe_res_CVS.pvalues)P2["id"] = np.arange(len(P2))#create numerical index for pd.DataFrameP2 = P2.reset_index().set_index(keys = "id")#set numercial index as new indexP2 = P2.rename(columns={"index":"parameter"})Table2 = pd.merge(Table2, P2, on="parameter")Table2["significance 2"] = np.where(Table2["pvalue"] <= 0.01, "***",\       np.where(Table2["pvalue"] <= 0.05, "**",\       np.where(Table2["pvalue"] <= 0.1, "*", "")))Table2.rename(columns={"pvalue": "pvalue 2"}, inplace=True) SE2 = pd.DataFrame(fe_res_CVS.std_errors)SE2["id"] = np.arange(len(SE2))#create numerical index for pd.DataFrameSE2 = SE2.reset_index().set_index(keys = "id")#set numercial index as new indexSE2 = SE2.rename(columns={"index":"parameter", "std_error":"coefficient 2"})SE2["parameter"] =  SE2["parameter"].astype(str) + "_SE"SE2["significance 2"] = ""SE2 = SE2.round(4)SE2["coefficient 2"] = "(" + SE2["coefficient 2"].astype(str) + ")"Table2 = Table2.append(SE2)Table2 = Table2.sort_values("parameter")Table2.replace(np.nan,"", inplace=True)del P2del SE2#Merging Tables and adding StatsReg_Output_FAmount= pd.merge(Table1, Table2, on="parameter", how="outer")Reg_Output_FAmount = Reg_Output_FAmount.append(pd.DataFrame(np.array([["observ.", fe_res_VS.nobs, "", fe_res_CVS.nobs, ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount = Reg_Output_FAmount.append(pd.DataFrame(np.array([["Rsquared", "{:.4f}".format(fe_res_VS.rsquared), "",  "{:.4f}".format(fe_res_CVS.rsquared), ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount= Reg_Output_FAmount.append(pd.DataFrame(np.array([["Model type", fe_res_VS.name, "", fe_res_CVS.name, ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount = Reg_Output_FAmount.append(pd.DataFrame(np.array([["DV", fe_res_VS.model.dependent.vars[0], "", fe_res_CVS.model.dependent.vars[0], ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount.fillna("", inplace=True)Reg_Output_FAmount.to_html()open("d://ss5.html", "w").write(Reg_Output_FAmount.to_html())  # for html

11. 特殊結(jié)構(gòu)數(shù)據(jù)整理

https://www.cnpython.com/qa/560110

import numpy as npfrom scipy.stats import norm, gaussian_kdeimport matplotlib.pyplot as pltimport pandas as pdfrom linearmodels.panel.data import PanelDatafrom linearmodels.panel import PanelOLS, PooledOLS, RandomEffects, comparefrom collections import OrderedDictimport wooldridgefrom statsmodels.formula.api import olsimport warningswarnings.filterwarnings("ignore")wagepan = wooldridge.data("wagepan")wooldridge.data("wagepan", description=True)np.random.seed(123)df = pd.DataFrame(np.random.randint(1,10,(4,12)),                   index=["ID 1", "ID 2", "ID 3", "ID 4"])df.columns = ["Q1", "Q2", "Q3", "Q4"]*3df.columns = [    df.columns.to_series().groupby(level=0).cumcount().map({0: "X", 1: "Y",2:"Z"}),    df.columns]df2=df.stack().rename_axis(["ID", "T"]).reset_index()

https://stackoverflow.com/questions/32835498/pandas-python-describe-formatting-output

這是典型的 縱向結(jié)構(gòu)(類似于面板數(shù)據(jù))的數(shù)據(jù) 轉(zhuǎn)換為 橫向排列的數(shù)據(jù)。

#這個是將描述性統(tǒng)計 與 相關(guān)系數(shù)表輸出,很簡單。df=df_finaltemp = df.groupby("firm")["fin0"].describe().reset_index()aaa=temp.stack(level=-1) #unstack(),unstack(level=-1) level can be -1, 0temp = df.corr().reset_index()# 模擬數(shù)據(jù)d = {"id": [1,1,1,2,2], "Month":[1,2,3,1,3],"Value":[12,23,15,45,34], "Cost":[124,214,1234,1324,234]}df = pd.DataFrame(d)df2 =    pd.pivot_table(df,                         values=["Value","Cost"],                        index=["id"],                        columns=["Month"],                        aggfunc=np.sum,                        fill_value=0)df2.columns =[s1 + str(s2) for (s1,s2) in df2.columns.tolist()]df2.reset_index(inplace=True)df2.columns = [" ".join(col).strip() for col in df2.columns.values]df2.index.name = None

12. 估計系數(shù)聯(lián)合檢驗(Wald tests via statsmodels)

系數(shù)聯(lián)合相等,聯(lián)合等于0,任意兩個系數(shù)之差 檢驗,等等,諸如此類,都可以執(zhí)行 wald 檢驗。

https://andrewpwheeler.com/2021/06/18/wald-tests-via-statsmodels-python/ https://github.com/apwheele/ 作者主頁 Often times interventions are general and may be expected to reduce multiple crime types, e.g. hot spots policing may reduce both violent crimes and property crimes. But we do not know for sure – so it makes sense to fit models to check if that is the case.

For crimes that are more/less prevalent, this is a case in which fitting Poisson/Negative Binomial models makes alot of sense, since the treatment effect is in terms of rate modifiers. The crossvalidated post shows an example in R. In the past I have shown how to stack models and do these tests in Stata, or use seemingly unrelated regression in Stata for generalized linear models. Here I will show an example in python using data from my dissertation on stacking models and doing Wald tests.

The above link to github has the CSV file and metadata to follow along. Here I just do some upfront data prep. The data are crime counts at intersections/street segments in DC, across several different crime types and various aspects of the built environment.

# python code to stack models and estimate wald testsimport pandas as pdimport numpy as npimport statsmodels.formula.api as smfimport patsyimport itertools# Use dissertation data for multiple crimes#https://github.com/apwheele/ResearchDesign/tree/master/Week02_PresentingResearchdata = pd.read_csv(r"d://DC_Crime_MicroPlaces.csv", index_col="MarID")# only keep a few independent variables to make it simplercrime = ["OffN1","OffN3","OffN5","OffN7","OffN8","OffN9"] #dropping very low crime countsx = ["CFS1","CFS2"] #311 calls for servicedata = data[crime + x].copy()data.reset_index(inplace=True)# Stack the data into long format, so each crime is a new rowdata_long = pd.wide_to_long(data, "OffN",i="MarID",j="OffCat").reset_index()data_long.sort_values(by=["MarID","OffN"],inplace=True)# Fit a model with clustered standard errorscovp = {"groups": data_long["MarID"],"df_correction":True}nb_mod = smf.negativebinomial("OffN ~ C(OffCat) + CFS1:C(OffCat) ",data_long).fit(cov_type="cluster",cov_kwds=covp)print(nb_mod.summary())# Conduct a Wald test for equality of multiple coefficientsx_vars = nb_mod.summary2().tables[1].indexwald_str = " = ".join(list(x_vars[6:-1]))print(wald_str)wald_test = nb_mod.wald_test(wald_str) # joint testprint(wald_test)# Or can do test all join equal 0nb_mod.wald_test_terms()# To replicate what wald_test_terms is doing yourselfall_zero = [x + "= 0" for x in x_vars[6:-1]]nb_mod.wald_test(",".join(all_zero))# Pairwise contrasts of coefficients# To get the actual difference in coefficientswald_li = []for a,b in itertools.combinations(x_vars[6:-1],2):    wald_li.append(a + " - " + b + " = 0")wald_dif = " , ".join(wald_li)dif = nb_mod.t_test(wald_dif) print(dif)# c"s correspond to the wald_li listres_contrast = dif.summary_frame()res_contrast["Test"] = wald_lires_contrast.set_index("Test", inplace=True)print(res_contrast)# Nicer function to print out the actual tests it interprets as# ends up being 1 - 3, 3 - 5, etc.def nice_lab_tests(test_str,mod):    # Getting exogenous variables    x_vars = mod.summary2().tables[1].index    # Patsy getting design matrix and constraint from string    di = patsy.DesignInfo(x_vars)    const_mat = di.linear_constraint(test_str)    r_mat = const_mat.coefs    c_mat = list(const_mat.constants)    # Loop over the tests, get non-zero indices    # Build the interpreted tests    lab = []    for i,e in enumerate(c_mat):        lm = r_mat[i,:] #single row of R matrix        nz = np.nonzero(lm)[0].tolist() #only need non-zero        c_vals = lm[nz].tolist()        v_labs = x_vars[nz].tolist()        fin_str = ""        in_val = 0        for c,v in zip(c_vals,v_labs):            # 1 and -1 drop values and only use +/-            if c == 1:                if in_val == 0:                    fin_str += v                else:                    fin_str += " + " + v            elif c == -1:                if in_val == 0:                    fin_str += "-" + v                else:                    fin_str += " - " + v            else:                if in_val == 0:                    fin_str += str(c) + "*" + v                else:                    if c > 0:                        sg = " + "                    else:                        sg = " - "                    fin_str += sg + str(np.abs(c)) + "*" + v            in_val += 1        fin_str += " = " + str(e[0]) #set equality at end        lab.append(fin_str)    return lab    #任意組合的變量組,聯(lián)合系數(shù)檢驗。# Wald string for equality across coefficients# from earlierlab_tests = nice_lab_tests(wald_str,nb_mod)print(lab_tests)dif4 = nb_mod.t_test(lab_tests) print(dif4)# Additional test to show how nice_lab_tests function worksstr2 = "CFS2:C(OffCat)[1] = 3, CFS2:C(OffCat)[7] = CFS2:C(OffCat)[5]"nice_lab_tests(str2,nb_mod)dif3 = nb_mod.t_test(nice_lab_tests(str2,nb_mod)) print(dif3)

13. Python 與 STATA 時間日期轉(zhuǎn)換問題

參考文獻(xiàn):STATA時間格式

Python的時間常常是下面的樣子: ±--------------------------+ | datetime year | |---------------------------| 1. | 09sep1943 00:00:00 1943 | ±--------------------------+ 將其輸出到stata之中后,怎么樣獲取年月日等數(shù)據(jù)呢? 到了stata之后,格式顯示為 %tc double。 首先是顯示為年月日的形式: gen eventdate=dofc(datetime) #dofc是將函數(shù)格式轉(zhuǎn)化為日期 format eventdate=%td

gen year = yofd(dofc(datetime)) 顯示年月等等 gen day = dofy(dofc(datetime)) gen year = yofd(dofc(datetime)) //由天數(shù)數(shù)據(jù)計算年份

gen halfyear=hofd(dofc(datetime)) //由天數(shù)數(shù)據(jù)計算半年 format %thCCYY!hh halfyear

gen quarter=qofd(dofc(datetime)) //由天數(shù)數(shù)據(jù)計算季度數(shù) format %tqCCYY!qq quarter

gen month=mofd(dofc(datetime)) //由天數(shù)數(shù)據(jù)計算月度數(shù) format %tmCCYY-NN month or. format month %tmCY_N

gen week=wofd(dofc(datetime)) //由天數(shù)數(shù)據(jù)計算周數(shù) format %twCCYY!www week

14. Python執(zhí)行面板數(shù)據(jù)的Hausman檢驗

參考文獻(xiàn):https://py4etrics.github.io/17_Panel.html 日本學(xué)者教學(xué)網(wǎng)站

#首先使用 PanelOLS 模塊來進行 個體固定效應(yīng)、時間固定效應(yīng) 估計   formula_fe = "lwage ~ married + union + expersq \                      +d81+d82+d83+d84+d85+d86+d87 + EntityEffects"mod_fe = PanelOLS.from_formula(formula_fe, data=wagepan)est1 = mod_fe.fit()print(est1)#同學(xué)們,這個wald檢驗有很多種,我們現(xiàn)在需要的是所有系數(shù)均=0的聯(lián)合檢驗restriction = "married = union = expersq = d81=d82=d83=d84=d85=d86=d87=0"est1.wald_test(formula=restriction) #下面,我們將要進行hausman檢驗,進而確定是采用 隨機效應(yīng)還是 固定效應(yīng)模型,# 注意,這個檢驗的 原假設(shè)是 支持采用隨機效應(yīng) 模型。# 第一步  應(yīng)用固定效應(yīng)模型 進行擬合formula_fe = "lwage ~ married + union + expersq \                      +d81+d82+d83+d84+d85+d86+d87 + EntityEffects"mod_fe = PanelOLS.from_formula(formula_fe, data=wagepan)fe_res = mod_fe.fit()print(fe_res)# 第二步  應(yīng)用隨機效應(yīng)模型 進行擬合formula_re = "lwage ~ married + union + expersq \                      +d81+d82+d83+d84+d85+d86+d87  "model_re = RandomEffects.from_formula(formula_re, data=wagepan) re_res = model_re.fit() print(re_res)# 第三步,運用計量書本上的公式,自編函數(shù), 因為目前 Python里面沒有現(xiàn)成的模塊import numpy.linalg as lafrom scipy import statsimport numpy as npdef hausman(fe, re):    b = fe.params    B = re.params    v_b = fe.cov    v_B = re.cov    df = b[np.abs(b) < 1e8].size    chi2 = np.dot((b - B).T, la.inv(v_b - v_B).dot(b - B))      pval = stats.chi2.sf(chi2, df)    return chi2, df, pval# 第四步,將 1 2 步的結(jié)果帶入自編函數(shù)之中。hausman_results = hausman(re_res, fe_res) print("chi-Squared: " + str(hausman_results[0]))print("degrees of freedom: " + str(hausman_results[1]))print("p-Value: " + str(hausman_results[2]))# 第五步,看結(jié)果,如果這兒的 p-Value 小于 10%,則支持 固定效應(yīng) 模型。# 不過,我們的作業(yè),只是一個練習(xí), 不管這個 p-Value 等于多少,我們只提供固定效應(yīng)結(jié)果。# 計量的路很長,理論與軟件操作 缺一不可。

15. Fama Macbeth回歸、滾動回歸等等

https://fin-library.readthedocs.io/en/latest/BYU FIN 585 Library Documentation 這個上面提供了多個有用的包。https://pypi.org/project/finance-byu/#files 還有一些網(wǎng)頁也介紹了這個算法: https://www.cnblogs.com/jungsee/p/8448156.html我原來的主頁 https://www.kevinsheppard.com/teaching/python/notes/notebooks/example-fama-macbeth/數(shù)據(jù)也在這個教授的主頁上 https://randlow.github.io/posts/finance-economics/asset-pricing-regression/

相關(guān)討論的網(wǎng)頁: https://stackoverflow.com/questions/24074481/fama-macbeth-regression-in-python-pandas-or-statsmodels

https://www.e-learn.cn/topic/1620028

from finance_byu.fama_macbeth import fama_macbeth, fama_macbeth_parallel, fm_summary, fama_macbeth_numbaimport pandas as pdimport timeimport numpy as npn_jobs = 5n_firms = 1.0e2n_periods = 1.0e2def firm(fid):     f = np.random.random((int(n_periods),4))     f = pd.DataFrame(f)     f["period"] = f.index     f["firmid"] = fid     return fdf = [firm(i) for i in range(int(n_firms))]df = pd.concat(df).rename(columns={0:"ret",1:"exmkt",2:"smb",3:"hml"})df.head()result = fama_macbeth(df,"period","ret",["exmkt","smb","hml"],intercept=True)result.head()fm_summary(result)%timeit  fama_macbeth(df,"period","ret",["exmkt","smb","hml"],intercept=True)%timeit fama_macbeth_parallel(df,"period","ret",["exmkt","smb","hml"],intercept=True)  %timeit fama_macbeth_numba(df,"period","ret",["exmkt","smb","hml"],intercept=True)from finance_byu.statistics import GRSimport numpy as npimport pandas as pdn_periods = 1.0e2df = pd.DataFrame(np.random.random((int(n_periods),6)))df = df.rename(columns={0:"port1",1:"port2",2:"port3",3:"exmkt",4:"smb",5:"hml"})grsstat,pval,tbl = GRS(df,["port1","port2","port3"],["exmkt","smb","hml"])grsstatpvalfrom tabulate import tabulateprint(tabulate(tbl.render(),tablefmt="github",headers=tbl.render().columns))from finance_byu.summarize import summaryn_periods = 1.0e2df = pd.DataFrame(np.random.random((100,3))/10.,columns=["Port 1","Port 2","Port 3"])summary(df)

# -*- coding: utf-8 -*-from numpy import mat, cov, mean, hstack, multiply,sqrt,diag, \    squeeze, ones, array, vstack, kron, zeros, eye, savez_compressedfrom numpy.linalg import invfrom scipy.stats import chi2from pandas import read_csvimport statsmodels.api as smimport osos.chdir("d://")data =read_csv("FamaFrench.csv")# Split using both named colums and ix for larger blocksdates = data["date"].valuesfactors = data[["VWMe", "SMB", "HML"]].valuesriskfree = data["RF"].valuesportfolios = data.iloc[:, 5:].values# Use mat for easier linear algebrafactors = mat(factors)riskfree = mat(riskfree)portfolios = mat(portfolios)# Shape informationT,K = factors.shapeT,N = portfolios.shape# Reshape rf and compute excess returnsriskfree.shape = T,1excessReturns = portfolios - riskfree# Time series regressionsX = sm.add_constant(factors)ts_res = sm.OLS(excessReturns, X).fit()alpha = ts_res.params[0]beta = ts_res.params[1:]avgExcessReturns = mean(excessReturns, 0)# Cross-section regressioncs_res = sm.OLS(avgExcessReturns.T, beta.T).fit()riskPremia = cs_res.params# Moment conditionsX = sm.add_constant(factors)p = vstack((alpha, beta))epsilon = excessReturns - X @ pmoments1 = kron(epsilon, ones((1, K + 1)))moments1 = multiply(moments1, kron(ones((1, N)), X))u = excessReturns - riskPremia[None,:] @ betamoments2 = u * beta.T# Score covarianceS = mat(cov(hstack((moments1, moments2)).T))# JacobianG = mat(zeros((N * K + N + K, N * K + N + K)))SigmaX = (X.T @ X) / TG[:N * K + N, :N * K + N] = kron(eye(N), SigmaX)G[N * K + N:, N * K + N:] = -beta @ beta.Tfor i in range(N):    temp = zeros((K, K + 1))    values = mean(u[:, i]) - multiply(beta[:, i], riskPremia)    temp[:, 1:] = diag(values)    G[N * K + N:, i * (K + 1):(i + 1) * (K + 1)] = tempvcv = inv(G.T) * S * inv(G) / TvcvAlpha = vcv[0:N * K + N:4, 0:N * K + N:4]J = alpha @ inv(vcvAlpha) @ alpha.TJ = J[0, 0]Jpval = 1 - chi2(25).cdf(J)vcvRiskPremia = vcv[N * K + N:, N * K + N:]annualizedRP = 12 * riskPremiaarp = list(squeeze(annualizedRP))arpSE = list(sqrt(12 * diag(vcvRiskPremia)))print("        Annualized Risk Premia")print("           Market       SMB        HML")print("--------------------------------------")print("Premia     {0:0.4f}    {1:0.4f}     {2:0.4f}".format(arp[0], arp[1], arp[2]))print("Std. Err.  {0:0.4f}    {1:0.4f}     {2:0.4f}".format(arpSE[0], arpSE[1], arpSE[2]))print("\n\n")print("J-test:   {:0.4f}".format(J))print("P-value:   {:0.4f}".format(Jpval))i = 0betaSE = []for j in range(5):    for k in range(5):        a = alpha[i]        b = beta[:, i]        variances = diag(vcv[(K + 1) * i:(K + 1) * (i + 1), (K + 1) * i:(K + 1) * (i + 1)])        betaSE.append(sqrt(variances))        s = sqrt(variances)        c = hstack((a, b))        t = c / s        print("Size: {:}, Value:{:}   Alpha   Beta(VWM)   Beta(SMB)   Beta(HML)".format(j + 1, k + 1))        print("Coefficients: {:>10,.4f}  {:>10,.4f}  {:>10,.4f}  {:>10,.4f}".format(a, b[0], b[1], b[2]))        print("Std Err.      {:>10,.4f}  {:>10,.4f}  {:>10,.4f}  {:>10,.4f}".format(s[0], s[1], s[2], s[3]))        print("T-stat        {:>10,.4f}  {:>10,.4f}  {:>10,.4f}  {:>10,.4f}".format(t[0], t[1], t[2], t[3]))        print("")        i += 1                betaSE = array(betaSE)savez_compressed("fama-macbeth-results", alpha=alpha, beta=beta,                 betaSE=betaSE, arpSE=arpSE, arp=arp, J=J, Jpval=Jpval)from numpy import savezsavez("fama-macBeth-results.npz", arp=arp, beta=beta, arpSE=arpSE,      betaSE=betaSE, J=J, Jpval=Jpval)

16. 事件研究法 event study

https://www.lianxh.cn/news/90de95e42e8ff.html http://fmwww.bc.edu/repec/bocode/e/ https://zhuanlan.zhihu.com/p/348125854 前期準(zhǔn)備:使用eventstudy2命令需要先安裝用戶編寫的程序moremata、nearmrg、distinct、_gprod、rmse和parallel。

ssc install morematassc install nearmrgssc install distinctssc install _gprodssc install rmsessc install parallel

注:moremata可能存在安裝不成功的問題,這時可以選擇手動安裝。下載moremata.zip并解壓縮到指定文件夾路徑(例如E:\moremata),運行如下命令

net install moremata, from(E:\moremata)

eventstudy2 執(zhí)行事件研究程序,并允許用戶指定設(shè)定已在金融領(lǐng)域和相關(guān)文獻(xiàn)中建立的幾個模型,例如原始收益率、市場模型、多因子模型和買入-持有異常收益率(BHAR)。估計窗口和事件窗口可以被自由選擇,可以計算多達(dá)10個窗口的累積(平均)異常(買入-持有)收益率。當(dāng)前eventstudy2的輸出結(jié)果提供了下列檢驗統(tǒng)計量,它們均使用Mata編程語言:

eventstudy2 Security_id Date using Security_returns if Ea > 0.05, ///      ret(Return) car1LB(-1) car1UB(1) mod(FM) marketfile(Factor_returns) ///    mar(MKT) idmar(Market_reference) #市場模型,需要三個數(shù)據(jù)集支持這個算法模塊

import pandas as pdimport tushare as tstoken = "第一步準(zhǔn)備階段的接口復(fù)制到此處"pro = ts.pro_api("2f4f20d8b2fb57e033a820d04954b6f8a33f5a44c18b806189a802adsee")import matplotlib.pyplot as pltplt.style.use("fivethirtyeight")#畫圖風(fēng)格#獲取深交所指數(shù)列表,找創(chuàng)業(yè)板指數(shù)的代碼dfzs = pro.index_basic(market="szse")#獲取深證綜合指數(shù)基本信息df = pro.index_daily(ts_code="399107.SZ", start_date="20130101", end_date="20                                                                          211225")#h獲取深證綜合指數(shù)的交易日和日漲幅列df_index = df[["trade_date", "pct_chg"]]#df_index1 = df[["trade_date", "pct_chg"]]#調(diào)整日期格式并按照日期進行排序df_index["trade_date"] = pd.to_datetime(df_index["trade_date"])df_index = df_index.sort_values("trade_date")#reset_index()重置索引,0,1,2,3#刪除index列df_index = df_index.reset_index().drop("index", axis=1)#將漲跌幅轉(zhuǎn)化為百分?jǐn)?shù)對應(yīng)的小數(shù)df_index["pct_chg"] = df_index["pct_chg"] / 100d2= pro.daily(ts_code="000333.SZ", start_date="20110121", end_date="20211225")d0 = d2[["trade_date", "pct_chg"]]d0.columns = ["trade_date", "return"]#df_index1 = df[["trade_date", "return"]]#調(diào)整日期格式為pandas格式,并按照日期進行排序d0["trade_date"] = pd.to_datetime(d0["trade_date"])d0 = d0.sort_values("trade_date")#reset_index()重置索引,0,1,2,3#刪除index列d0 = d0.reset_index().drop("index", axis=1)#將漲跌幅轉(zhuǎn)化為百分?jǐn)?shù)對應(yīng)的小數(shù)d0["return"] = d0["return"] / 100#獲取美的集團2013-01-01至2020-02-20的日漲幅數(shù)據(jù)d1= d0[(d0["trade_date"] >= "2013-01-01") & (d0["trade_date"] <= 3="" 5="" 100="" 123="" 600="" nandf_final="d1.merge(df_index," on="trade_date" how="left" events="[" from="" sklearn.linear_model="" import="" python="" def="" :="" linear_m="LinearRegression().fit(X," r_2="linear_m.score(X," y="estimate_df["return"]" rm="" return="" u="((y_true" -="" v="((y_true" coef_="" intercept_="" h="df_final[df_final["trade_date"]" target="df_final.loc[q:h].copy()" estimate_df="df_final.loc[q-160:q]" x="estimate_df[["pct_chg"]]" predict_x="target[["pct_chg"]]" a="df0["code"]返回的是series類型"""a" for="" e="" in="" events:="" bug:="" os="" pandas="" as="" pddf="pd.read_excel("eventdates.xlsx").drop_duplicates(["code"],keep="last")df0" axis="1)" dfcode="df0[["code"]]lcode" i="" lcode:="" ldf="df0[df0["code"]==i]#" key="0#" event="list(df0["date"])#" del="" tushare="" tstoken="2f4f20d8b2fb57e033a820d04954b6f8a33f5a44c18b806189a802adsee" pro="ts.pro_api(token)"""全指醫(yī)藥指數(shù)的日期及對應(yīng)的日收益率"""dfzs" market="SSE" dfyy="pro.index_daily(ts_code="000991.SH"," start_date="20190101" end_date="20210630" df_index="df_index.sort_values(" sort_value="" d2="pro.daily(ts_code=cd," d0="d0.reset_index().drop(" d0.columns="["trade_date"," df_index1="df[["trade_date"," d1="d0[(d0["trade_date"]">= "2014-01-01") & (d0["trade_date"] <= "2021-06-30")].copy()    #股票漲跌幅和市場漲跌幅    #以左側(cè)或d1為基礎(chǔ),交易日為關(guān)鍵值進行合并,df_index沒有的值填充nan    df_final0 = d1.merge(df_index, on="trade_date", how="left")    return df_final0from sklearn.linear_model import LinearRegression#機器學(xué)習(xí)中的線性回歸"""函數(shù)作用:計算預(yù)期報酬率,CAPM資本市場定價模型,線性回歸x,y為預(yù)測期的指數(shù)日回報率和個股日回報率,線性回歸出相應(yīng)的函數(shù)pre_X是窗口期的指數(shù)日回報率最終帶入回歸函數(shù)算出預(yù)測指數(shù)回報率"""def get_OLS(X, y, pre_X):    linear_m = LinearRegression().fit(X, y)#基于數(shù)組x和y建立回歸模型    # r_2 = linear_m.score(X, y)#值越接近1越好,擬合優(yōu)度越好。    # print(f"構(gòu)建模型,擬合優(yōu)度為{round(r_2*100, 2)}%")    # print(f"y = {round(linear_m.intercept_,3)} + {round(linear_m.coef_[0],3)}Rm + e")    return linear_m.predict(pre_X)"""異常收益率的計算"""def get_data(event):    # print("事件日為: ", event)    q, h = df_final[df_final["trade_date"] == event].index[0] -3, df_final[df_final["trade_date"] == event].index[0] +3    target = df_final.loc[q:h].copy()    #窗口期    estimate_df = df_final.loc[q-110:q-11]    #估計期    X = estimate_df[["pct_chg"]]    #估計期深證綜合指數(shù)漲幅    y = estimate_df["return"]    #估計期個股的漲幅    predict_X = target[["pct_chg"]]    #窗口期的全職醫(yī)藥指數(shù)漲幅    target["E(Rt)"] = get_OLS(X, y, predict_X)     #預(yù)期收益率的計算    target["ARt"] = target["return"] - target["E(Rt)"]    #異常收益率 = 實際收益率 -預(yù)期收益率    target["CARt"] =  target["ARt"].cumsum()    #累計異常收益率 = 異常收益率在窗口期的求和    target = target.reset_index().drop("index",axis=1)    return target# df_final = getdf_final("002950.SZ")# car = get_data(event[0])# m = car.loc[11:20]["CARt"]car = pd.DataFrame()ar_car=pd.DataFrame()for i in range(len(lcode)):   df_final = getdf_final(lcode[i])   car0 = get_data(event[i])   car0["firm"]=lcode[i]   car0["event_date"]=event[i]   car0["interval_days"]=(car0["trade_date"]-car0["event_date"])   car0["interval"]=pd.Series(range(-3,4))   ar_car=ar_car.append(car0)   car[lcode[i]] = car0.loc[0:6]["CARt"]   plt.rcParams["savefig.dpi"] = 600 #圖片像素   plt.rcParams["figure.dpi"] = 600 #分辨率   car0.set_index("interval")[["ARt", "CARt"]].plot()

================== 持續(xù)更新中 ==================

項目 項目 項目 項目1項目2項目3計劃任務(wù)完成任務(wù)

創(chuàng)

創(chuàng)

K

您可以使用渲染LaTeX數(shù)學(xué)表達(dá)式 KaTeX:

Gamma公式展示 Γ ( n ) = ( n ? 1 ) ! ? n ∈ N \Gamma(n) = (n-1)!\quad\forall n\in\mathbb N Γ(n)=(n?1)!?n∈N 是通過歐拉積分

Γ ( z ) = ∫ 0 ∞ t z ? 1 e ? t d t ? . \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,. Γ(z)=∫0∞tz?1e?tdt.

U

F

導(dǎo)

導(dǎo)出

如果你想嘗試使用此編輯器, 你可以在此篇文章任意編輯。當(dāng)你完成了一篇文章的寫作, 在上方工具欄找到 文章導(dǎo)出,生成一個.md文件或者.html文件進行本地保存。

導(dǎo)入

如果你想加載一篇你寫過的.md文件,在上方工具欄可以選擇導(dǎo)入功能進行對應(yīng)擴展名的文件導(dǎo)入, 繼續(xù)你的創(chuàng)作。

責(zé)任編輯:

標(biāo)簽:

相關(guān)推薦:

精彩放送:

新聞聚焦
Top 主站蜘蛛池模板: 喀什市| 水富县| 东海县| 宁津县| 石首市| 寿光市| 南宫市| 乐清市| 日土县| 轮台县| 庆安县| 林西县| 喀喇沁旗| 来宾市| 淮北市| 丰城市| 类乌齐县| 保山市| 桐乡市| 喀喇沁旗| 惠州市| 昔阳县| 云梦县| 曲周县| 青岛市| 广水市| 饶阳县| 兴山县| 漳州市| 高安市| 景德镇市| 长泰县| 西昌市| 江陵县| 定边县| 察隅县| 淮滨县| 卓尼县| 扎赉特旗| 台北市| 肃宁县|