Efficient Way To Do Pandas Operation And Skip Row
There must be a simple way to do this, but I'm missing it. First, imagine the situation in Excel: I have a column of percent changes. (assume column A) In the next column (B), I w
Solution 1:
IIUC you can skip first row of column A
of df_source
by selection all rows without first by ix
:
df_target["A"].ix[1:] = df_source['A'].ix[1:] + 1
print df_target
A
0 1000.000000
1 0.988898
2 0.986142
3 1.009979
4 1.005165
5 1.101116
6 0.992312
7 0.962890
8 1.051340
9 1.009750
Or maybe you think:
importpandasaspdimportnumpyasnpdf_source=pd.DataFrame(np.random.normal(0,.05,10),index=range(10),columns=['A'])printdf_sourceA00.03996510.0608212-0.0792383-0.12993240.0021965-0.0037216-0.00835870.0141048-0.02290590.014793df_target=pd.DataFrame(index=df_source.index)#all A set to 1000df_target["A"]=1000# initialize target array to start at 1000printdf_targetA01000110002100031000410005100061000710008100091000
df_target["A"] = (1 + df_source["A"].shift(-1))* df_target["A"]
print df_target
A
0 1060.820882
1 920.761946
2 870.067878
3 1002.195555
4 996.279287
5 991.641909
6 1014.104402
7 977.094961
8 1014.793488
9 NaN
EDIT:
Maybe you need cumsum
:
df_target["B"] = 2
df_target["C"] = df_target["B"].cumsum()
df_target["D"] = df_target["B"] + df_target.index
print df_target
A B C D
0 1041.003000 2 2 2
1 1013.817000 2 4 3
2 948.853000 2 6 4
3 1031.692000 2 8 5
4 970.875000 2 10 6
5 1011.095000 2 12 7
6 1053.472000 2 14 8
7 903.765000 2 16 9
8 1010.546000 2 18 10
9 0.010546 2 20 11
Solution 2:
I think I understand your problem and in these cases, I usually find it easier to make a list and append it to the existing dataframe. You, of course, could make an Series instance first and then do calculations.
new_series = [0]*len(df["A"])
new_series[0] = 1000for i,k inenumerate(dataframe["A"].ix[1:]):
new_series[i] = (1 + k)*new_series[i-1]
dataframe["B"] = pd.Series(new_series)
IIRC, iloc is being deprecated in future builds of pandas in favor of ix
After rethinking the problem, you can use lambda expressions as elements in your dataframe
dataframe["B"] = [lambda row: (1+ dataframe["A"].ix[row])*dataframe["B"].ix[row-1]*len(dataframe["A"])
# Above: initiate "B" with a lambda expression that isas long as "A"
dataframe["B"].ix[0] =1000for i,k in enumerate(dataframe["B"].ix[1]):
dataframe["B"].ix[i] = k(row=i)
I am trying to think of a way around using a for loop to this problem but can't manage to figure where to grab a row number from.
Hope this helps.
Post a Comment for "Efficient Way To Do Pandas Operation And Skip Row"