用Pandas DataFrameparsing彭博Excel / CSV(新用户)

首先,请原谅我的无知。 这是我第一个Python程序。

我使用Excel API检索彭博资料。 以典型的方式,第一行包含每第四列的代号,第二行的标签为date,PX_LAST,[空列],date,PX_LAST等。以下行具有date和最后价格。

EHFI38 Index BBGID, , , EHFI139 Index BBGID, , ... Date , PX_LAST , , Date , PX_LAST , ... 1999-12-31 , 100.0000 , , 1999-12-31 , 100.0000 , ... 2000-01-31 , 100.1518 , , 2000-01-31 , 98.6526 , ... ... 

看起来,正确的数据结构将是一个DataFrame,其中date为索引,string为列名。

  , Date, EHFI38 Index BBGID, EHFI139 Index BBGID, EHFI139 Index BBGID, EHFI84 Index BBGID, ... 0, 1999-12-31, 100.0000 , 100.0000 , 100.0000 , 100.0000, ... 1, 2000-01-31, 100.1518 , 98.6526 , 98.6526 , 104.7575, ... ... 

我编写了这个代码,这个代码似乎在我逐步完成的时候可以正常工作,但是我确信我做得不好。 我想学习如何做得更好。 TIA

 # IMPORT import pandas as pd import numpy as np import datetime # READ IN CSV FILES # EHFI38 Index BBGID, , , EHFI139 Index BBGID, , ... # Date , PX_LAST , , Date , PX_LAST , ... # 1999-12-31 , 100.0000 , , 1999-12-31 , 100.0000 , ... # 2000-01-31 , 100.1518 , , 2000-01-31 , 98.6526 , ... # ... px = pd.read_csv('Book1.csv', sep=',', parse_dates=True) # REMOVE EMPTY COLUMNS px = px.dropna(axis=1, how='all') # CONVERT TO ARRAYS M = np.array(px) C = np.array(px.columns) # FIX UNNAMED COLUMNS IN C for i in arange( len(C)/2 ) * 2: C[i+1] = C[i] # CONVERT EXCEL DATES FUNCTION (THANKS JOHN MACHIN) def xl2pydate(xldate, datemode): # datemode: 0 for 1900-based, 1 for 1904-based return ( datetime.datetime(1899, 12, 30) + datetime.timedelta(days=xldate + 1462 * datemode) ) # CONVERT DATES THE UGLY WAY # LOOP THROUGH 1,2, ... last row for i in arange( len(M)-1 ) + 1: # LOOP THROUGH 0,2, ... last column-1 for j in arange( len(MT)/2 ) * 2: # CONVERT DATE & STORE if isinstance(M[i,j],str) and M[i,j].isdigit(): M[i,j] = xl2pydate(int(M[i,j]), 0) else: M[i,j] = NaN # RECOMBINE IN A DATAFRAME df = pd.DataFrame(M[1:,:], columns=[C,M[0,:]]) # MERGE DATES # , Date, EHFI38 Index BBGID, EHFI139 Index BBGID, EHFI139 Index BBGID, EHFI84 Index BBGID, ... # 0, 1999-12-31, 100.0000 , 100.0000 , 100.0000 , 100.0000, ... # 1, 2000-01-31, 100.1518 , 98.6526 , 98.6526 , 104.7575, ... # ... # LOOP 0,2,...,len-1 for i in arange( (len(df.T))/2 ) * 2: # GET A DATE, LAST_PX FOR A SINGLE TICKER b = df[df.columns[i:(i+2)]] # CHANGE COLUMN NAMES TO DATE, [TICKER] b.columns = (df.columns[i][1], df.columns[i][0]) # COMBINE if i==0: a = b else: a = pd.merge(a.dropna(), b.dropna(), on='Date', how='outer')