Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - make dataframe's columns consistent with list elements

From what I've read, it's easy to add and delete columns from a DataFrame, but I was wondering if there's already a method to do what I'm trying to achieve, in order to avoid reinventing the wheel.

Suppose I have the DataFrame x:

   a  b   c
0  1  5   8
1  2  6   9
2  3  7  10

I want to verify whether the column names correspond solely to the elements contained in a list l. Case there are less elements in l than columns in x, I want the missing columns to be deleted.

For instance, if l = ["a", "b"], x would become:

   a   c
0  1   8
1  2   9
2  3  10

On the other hand, if there are more elements in l than columns in x, I want to create new, correspondingly named columns, with all the values on that column being set as 0.

For instance, if l = ["a", "b", "c", "d"], x would become:

   a  b   c  d
0  1  5   8  0
1  2  6   9  0
2  3  7  10  0

I could do a loop to check consistency between column names in x and elements in l, but is there anything more efficient than that?

like image 601
Filipe Avatar asked Dec 08 '25 08:12

Filipe


2 Answers

just use (addition of .astype(np.int) thanks to @Bill if needed. Note that this converts the whole dataframe to ints):

df.loc[:, l].fillna(0).astype(np.int)

Case 1:

l = ["a", "b"]
df.loc[:, l].fillna(0).astype(np.int)

    a   b
0   1   5
1   2   6
2   3   7

Case 2:

l = ["a", "b", "c", "d"]
df.loc[:, l].fillna(0).astype(np.int)

    a   b   c   d
0   1   5   8   0
1   2   6   9   0
2   3   7   10  0
like image 190
Quickbeam2k1 Avatar answered Dec 09 '25 22:12

Quickbeam2k1


I think pd.concat might be a way to achieve.

In [47]: import pandas as pd

In [48]: data = {
    ...: 'a': [1, 2, 3],
    ...: 'b': [5, 6, 7],
    ...: 'c': [8, 9, 10]
    ...: }

In [49]: x = pd.DataFrame(data)

In [50]: x
Out[50]: 
   a  b   c
0  1  5   8
1  2  6   9
2  3  7  10

In [51]: l = ["a", "b"]

In [52]: x[l]
Out[52]: 
   a  b
0  1  5
1  2  6
2  3  7

In [53]: l = ["a", "b", "c", "d"]

In [55]: y = pd.DataFrame(columns=l)

In [56]: y
Out[56]: 
Empty DataFrame
Columns: [a, b, c, d]
Index: []

In [57]: pd.concat((x, y))
Out[57]: 
     a    b     c    d
0  1.0  5.0   8.0  NaN
1  2.0  6.0   9.0  NaN
2  3.0  7.0  10.0  NaN

In [58]: pd.concat((x, y)).fillna(0)
Out[58]: 
     a    b     c  d
0  1.0  5.0   8.0  0
1  2.0  6.0   9.0  0
2  3.0  7.0  10.0  0
like image 34
Bill Avatar answered Dec 09 '25 21:12

Bill