Skip to content Skip to sidebar Skip to footer

Efficient Way To Do Pandas Operation And Skip Row

There must be a simple way to do this, but I'm missing it. First, imagine the situation in Excel: I have a column of percent changes. (assume column A) In the next column (B), I w

Solution 1:

IIUC you can skip first row of column A of df_source by selection all rows without first by ix:

df_target["A"].ix[1:] = df_source['A'].ix[1:] + 1
print df_target
             A
0  1000.000000
1     0.988898
2     0.986142
3     1.009979
4     1.005165
5     1.101116
6     0.992312
7     0.962890
8     1.051340
9     1.009750

Or maybe you think:

importpandasaspdimportnumpyasnpdf_source=pd.DataFrame(np.random.normal(0,.05,10),index=range(10),columns=['A'])printdf_sourceA00.03996510.0608212-0.0792383-0.12993240.0021965-0.0037216-0.00835870.0141048-0.02290590.014793df_target=pd.DataFrame(index=df_source.index)#all A set to 1000df_target["A"]=1000# initialize target array to start at 1000printdf_targetA01000110002100031000410005100061000710008100091000
df_target["A"] = (1 + df_source["A"].shift(-1))* df_target["A"]
print df_target
             A
0  1060.820882
1   920.761946
2   870.067878
3  1002.195555
4   996.279287
5   991.641909
6  1014.104402
7   977.094961
8  1014.793488
9          NaN

EDIT:

Maybe you need cumsum:

df_target["B"]  = 2
df_target["C"] = df_target["B"].cumsum()

df_target["D"] = df_target["B"] + df_target.index
print df_target
             A  B   C   D
0  1041.003000  2   2   2
1  1013.817000  2   4   3
2   948.853000  2   6   4
3  1031.692000  2   8   5
4   970.875000  2  10   6
5  1011.095000  2  12   7
6  1053.472000  2  14   8
7   903.765000  2  16   9
8  1010.546000  2  18  10
9     0.010546  2  20  11

Solution 2:

I think I understand your problem and in these cases, I usually find it easier to make a list and append it to the existing dataframe. You, of course, could make an Series instance first and then do calculations.

new_series = [0]*len(df["A"])                 
new_series[0] = 1000for i,k inenumerate(dataframe["A"].ix[1:]):   
    new_series[i] = (1 + k)*new_series[i-1]    

dataframe["B"] = pd.Series(new_series)         

IIRC, iloc is being deprecated in future builds of pandas in favor of ix

After rethinking the problem, you can use lambda expressions as elements in your dataframe

dataframe["B"] = [lambda row: (1+ dataframe["A"].ix[row])*dataframe["B"].ix[row-1]*len(dataframe["A"])
# Above: initiate "B" with a lambda expression that isas long as "A"

dataframe["B"].ix[0] =1000for i,k in enumerate(dataframe["B"].ix[1]):
    dataframe["B"].ix[i] = k(row=i)

I am trying to think of a way around using a for loop to this problem but can't manage to figure where to grab a row number from.

Hope this helps.

Post a Comment for "Efficient Way To Do Pandas Operation And Skip Row"