I have a pandas.DataFrame
of the form
index df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 2 111
6 3 111
7 0 111
8 2 111
9 3 111
10 0 111
11 1 111
12 2 111
13 3 111
14 0 111
15 1 111
16 2 111
17 3 111
18 1 111
19 2 111
20 3 111
I want to create a dataframe in which column df repeats 0,1,2,3. But there is something missing in the data. I'm trying to fill in the blanks with 0 by appending row values. Here is my expected result:
index df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
How can I achieve this?
What should I do if my input is as below?
index df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 3 111
6 1 111
7 2 111
Here is my expected result:
index df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
You can also right-click a selection, choose Insert, click the Entire Row (or Entire Column) option, and click OK. To eliminate a few clicks, select the entire row (or column) by dragging over the header cells before right-clicking; Excel will insert rows without displaying the Insert dialog.
Since Ctrl+G is the macro to open the Go To window, it is easy for me to remember Ctrl+Shift+G to run the macro to jump to a row or column. You can use any keyboard shortcut you want though. It does not have to be Ctrl+Shift+G for this to work.
To insert a row, pick a cell or row that's not the header row, and right-click. To insert a column, pick any cell in the table and right-click. Point to Insert, and pick Table Rows Above to insert a new row, or Table Columns to the Left to insert a new column.
Firstly, you need to insert a Command Button. Please click Developer > Insert > Command Button (ActiveX Control). See screenshot: 2. Then draw a Command Button in to the worksheet you need to add new rows, right click the Command Button and click Properties from the right-clicking menu.
At the end of each row, there is a submit button about action to be performed on the order and the form ends there. A new form begins with the next row <tr>
Insert a blank new row automatically by Command Button 1. Firstly, you need to insert a Command Button. Please click Developer > Insert > Command Button (ActiveX Control). See... 2. Then draw a Command Button in to the worksheet you need to add new rows, right click the Command Button and click... ...
Using @Mozway's idea, and combining with some helper functions from pyjanitor
, the missing values can be made explicit, and later filled. Again, this is just another option :
# pip install pyjanitor
import pandas as pd
import janitor as jn
(df.assign(temp = df.df.diff().le(0).cumsum())
.complete('df', 'temp') # helper function
.fillna(0)
# relevant if you care about the order
.sort_values('temp', kind='mergesort')
# helper function
.select_columns('df*') # or .drop(columns='temp')
)
df df1
0 0 111.0
6 1 111.0
12 2 111.0
18 3 111.0
1 0 111.0
7 1 0.0
13 2 111.0
19 3 111.0
2 0 111.0
8 1 0.0
14 2 111.0
20 3 111.0
3 0 111.0
9 1 111.0
15 2 111.0
21 3 111.0
4 0 111.0
10 1 111.0
16 2 111.0
22 3 111.0
5 0 0.0
11 1 111.0
17 2 111.0
23 3 111.0
You can set a custom grouping to detect when the increasing numbers in "df" reset to a lower (or equal) value.
Then reindex using the product of the unique values in "df" and the unique groups.
Finally, rework the output with a combination of fillna
/reset_index
/rename_axis
:
# uncomment below if "index" is not the index
# df = df.set_index('index')
# find positions where "df" resets and make groups
groups = df['df'].diff().le(0).cumsum()
(df.set_index([groups, 'df'], drop=True) # set custom groups and "df" as index
.reindex(pd.MultiIndex.from_product([groups.unique(), # reindex with all
range(4), # combinations
], names=['group', 'df']))
.fillna(0, downcast='infer') # set missing values as zero
.reset_index('df') # all below to restore a range index
.reset_index(drop=True)
.rename_axis('index')
)
output:
df df1
index
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
output on second example:
df1 df2
index
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
You can set group on increasing sequence of column df
. Then use .unstack()
and .stack()
, as follows:
group = df['df'].le(df['df'].shift()).cumsum() # new group if column `df` <= `df` last entry
df_out = (df.set_index([group, 'df']) # set `group` and column `df` as index
.unstack(fill_value=0) # unstack `df` and fill missing entry of `df` in [0,1,2,3] as 0 for `df1`
.stack() # stack back to original shape
.droplevel(0) # drop `group` from index
.reset_index() # restore `df` from index back to data column
)
Result:
print(df_out)
df df1
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 111
7 3 111
8 0 111
9 1 0
10 2 111
11 3 111
12 0 111
13 1 111
14 2 111
15 3 111
16 0 111
17 1 111
18 2 111
19 3 111
20 0 0
21 1 111
22 2 111
23 3 111
For the edited input, use similar codes:
group = df['df1'].le(df['df1'].shift()).cumsum()
df_out2 = (df.set_index([group, 'df1'])
.unstack(fill_value=0)
.stack()
.droplevel(0)
.reset_index()
)
Result:
print(df_out2)
df1 df2
0 0 111
1 1 111
2 2 111
3 3 111
4 0 111
5 1 0
6 2 0
7 3 111
8 0 0
9 1 111
10 2 111
11 3 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With