Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications
Have a data source where each row is uniquely defined by two columns. However, some rows are missing, which need to be inserted with some information from the dataframe. So with th
Solution 1:
Not sure if this is what you are after - You can use the complete function from pyjanitor to expose the missing combinations; at the moment you have to install the latest development version from github:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
df.complete(["A", "B"])
A B C
0 1 10 A
1 1 20 D
2 2 10 B
3 2 20 NaN
4 3 10 C
5 3 20 E
Using Pandas' only, we can create unique values for columns 'A' and 'B", build a new MultiIndex, then reindex the dataframe:
new_index = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()],
names=["A", "B"])
new_index
MultiIndex([(1, 10),
(1, 20),
(2, 10),
(2, 20),
(3, 10),
(3, 20)],
names=['A', 'B'])
Now, set index, reindex and reset index:
df.set_index(["A", "B"]).reindex(new_index).reset_index()
A B C
0 1 10 A
1 1 20 D
2 2 10 B
3 2 20 NaN
4 3 10 C
5 3 20 E
You can also fill the null value:
df.set_index(["A", "B"]).reindex(new_index, fill_value=0).reset_index()
The complete function requires that you pass a dictionary (or you could just use fillna instead and not worry about a dictionary):
df.complete(["A", "B"], fill_value={"C": 0}) # or df.complete(["A", "B"]).fillna(0)
A B C
0 1 10 A
1 1 20 D
2 2 10 B
3 2 20 0
4 3 10 C
5 3 20 E
Post a Comment for "Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications"