Skip to content Skip to sidebar Skip to footer

Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications

Have a data source where each row is uniquely defined by two columns. However, some rows are missing, which need to be inserted with some information from the dataframe. So with th

Solution 1:

Not sure if this is what you are after - You can use the complete function from pyjanitor to expose the missing combinations; at the moment you have to install the latest development version from github:

# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git

import janitor
df.complete(["A", "B"])

    A   B   C
0   1   10  A
1   1   20  D
2   2   10  B
3   2   20  NaN
4   3   10  C
5   3   20  E

Using Pandas' only, we can create unique values for columns 'A' and 'B", build a new MultiIndex, then reindex the dataframe:

new_index = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()], 
                                        names=["A", "B"])
new_index

MultiIndex([(1, 10),
            (1, 20),
            (2, 10),
            (2, 20),
            (3, 10),
            (3, 20)],
           names=['A', 'B'])

Now, set index, reindex and reset index:

df.set_index(["A", "B"]).reindex(new_index).reset_index()

    A   B   C
0   1   10  A
1   1   20  D
2   2   10  B
3   2   20  NaN
4   3   10  C
5   3   20  E

You can also fill the null value:

 df.set_index(["A", "B"]).reindex(new_index, fill_value=0).reset_index()

The complete function requires that you pass a dictionary (or you could just use fillna instead and not worry about a dictionary):

df.complete(["A", "B"], fill_value={"C": 0}) # or df.complete(["A", "B"]).fillna(0)

    A   B   C
0   1   10  A
1   1   20  D
2   2   10  B
3   2   20  0
4   3   10  C
5   3   20  E

Post a Comment for "Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications"