Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe creating multiple rows at once via .loc

I can create a new row in a dataframe using .loc():

>>> df = pd.DataFrame({'a':[10, 20], 'b':[100,200]}, index='1 2'.split())
>>> df
    a    b
1  10  100
2  20  200
>>> df.loc[3, 'a'] = 30
>>> df
      a      b
1  10.0  100.0
2  20.0  200.0
3  30.0    NaN

But how can I create more than one row using the same method?

>>> df.loc[[4, 5], 'a'] = [40, 50]
...
KeyError: '[4 5] not in index'

I'm familiar with .append() but am looking for a way that does NOT require constructing a new row into a Series before having it appended to df.

Desired input:

>>> df.loc[[4, 5], 'a'] = [40, 50]

Desired output

      a      b
1  10.0  100.0
2  20.0  200.0
3  30.0    NaN
4  40.0    NaN
5  50.0    NaN

Where last 2 rows are newly added.

like image 962
Zhang18 Avatar asked May 18 '17 03:05

Zhang18


1 Answers

Admittedly, this is a very late answer, but I have had to deal with a similar problem and think my solution might be helpful to others as well.

After recreating your data, it is basically a two-step approach:

  1. Recreate data:

    import pandas as pd
    df = pd.DataFrame({'a':[10, 20], 'b':[100,200]}, index='1 2'.split())
    df.loc[3, 'a'] = 30
    
  2. Extend the df.index using .reindex:

    idx = list(df.index)
    new_rows = list(map(str, range(4, 6)))  # easier extensible than new_rows = ["4", "5"]
    idx.extend(new_rows)
    df = df.reindex(index=idx)
    
  3. Set the values using .loc:

    df.loc[new_rows, "a"] = [40, 50]
    

    giving you

    >>> df
          a      b
    1  10.0  100.0
    2  20.0  200.0
    3  30.0    NaN
    4  40.0    NaN
    5  50.0    NaN
    
like image 97
apitsch Avatar answered Oct 19 '22 07:10

apitsch