Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop columns contains certain strings while reading data : python

I'm reading .txt files in a directory and want to drop columns that contains some certain string.

for file in glob.iglob(files + '.txt', recursive=True):
    
    cols = list(pd.read_csv(file, nrows =1))
    
    df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i in cols if i.str.contains['TRIVIAL|EASY']==False])

when I do this I'm getting

df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i >in cols if i.str.contains['PASS']==True])

AttributeError: 'str' object has no attribute 'str'

Which part I need tp fix I could not figured it out ?

select columns based on columns names containing a specific string in pandas

drop column based on a string condition

AttributeError: 'str' object has no attribute 'str'

Drop multiple columns that end with certain string in Pandas

like image 895
Alexander Avatar asked Oct 19 '25 01:10

Alexander


2 Answers

Without reading the header separately you would pass a callable to usecols. Check whether 'EASY' or 'TRIVIAL' are not in the column name.

exclu = ['EASY', 'TRIVIAL']  # Any substring in this list excludes a column 
usecols = lambda x: not any(substr in x for substr in exclu)

df = pd.read_csv('test.csv', usecols=usecols)

print(df)
   HARD  MEDIUM
0     2       4
1     6       8
2     1       1

Sample Data: test.csv

TRIVIAL,HARD,EASYfoo,MEDIUM
1,2,3,4
5,6,7,8
1,1,1,1
like image 116
ALollz Avatar answered Oct 21 '25 16:10

ALollz


few issues in your code, first you are using str.contains on the whole dataframe not the columns, secondly the str contains cannot be used on a list.

using regex

import re

cols = pd.read_csv(file, nrows =1)

cols_to_use = [i for i in cols.columns if not re.search('TRIVIAL|EASY',i)] 


df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols  =cols_to_use)
like image 35
Umar.H Avatar answered Oct 21 '25 16:10

Umar.H