Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove specific characters in dataframe and csv file

l have a csv file that l treat using pandas dataframe. in the column which called left l'm supposed to have only numbers 1)

  df.icol(4) 
    0       2492
    1       2448
    2       2410
    3       2382
    4       2358
    5       2310
    6       2260
    7       2208
    8       2166
    9       2134
    10       198
    11       198
    12       239
    13       239
    14       243
    15       241
    16       239
    17       394
    18       396
    19       396
    20       396
    21       396
    22       396
    23       396
    24       396
Name: bottom, dtype: object

however going further in my csv file l noticed that l have something like 396] or [456. My question is how l remove all the [ and ] in this column. 2) in the another column

df1.icol(0)
0       'm'
1       'i'
2       'i'
3       'l'
4       'm'
5       'u'
6       'i'
7       'l'
8       'i'
9       'l'
10      '.'
11      '3'
12      'A'
13      'M'
14      'S'
15      'U'
16      'N'
17      'A'
18      'D'
19      'R'
20      'E'
21      'S'
22      'S'
23      'E'
Name: char, dtype: object

l noticed also that l have some rows with ['E' , ]'S' rather than 'E' and 'S'. how can l remove [ and ] ?

3) l have a dataframe

df =[['c', 88, 118, 2872, 2902], [] ,['g', 8, 98, 287, 202]]

l want to remove all the '[]' as a result l'm looking for something like the following :

df= [['c', 88, 118, 2872, 2902], ['g', 8, 98, 287, 202]]
like image 428
vincent Avatar asked Mar 05 '26 12:03

vincent


1 Answers

I think you can use replace to empty string if need replace values in all columns:

df = df.replace(['\[','\]'], ['',''], regex=True)

Sample:

df = pd.DataFrame({'char':['[E','S]','[E']})
print (df)
  char
0   [E
1   S]
2   [E

df = df.replace(['\[','\]'], ['',''], regex=True)
print (df)
  char
0    E
1    S
2    E

If need replace only in one column:

df.char = df.char.replace(['\[','\]'], ['',''], regex=True)
print (df)
  char
0    E
1    S
2    E

For remove empty lists use list comprehension:

L = [['c', 88, 118, 2872, 2902], [] ,['g', 8, 98, 287, 202]]

L1 = [x for x in L if len(x) !=0]
print (L1)
[['c', 88, 118, 2872, 2902], ['g', 8, 98, 287, 202]]

And for remove NaN rows dropna:

df = pd.DataFrame([['c', 88, 118, 2872, 2902], [] ,['g', 8, 98, 287, 202]])
print (df)
      0     1      2       3       4
0     c  88.0  118.0  2872.0  2902.0
1  None   NaN    NaN     NaN     NaN
2     g   8.0   98.0   287.0   202.0

print (df.dropna(how='all'))
   0     1      2       3       4
0  c  88.0  118.0  2872.0  2902.0
2  g   8.0   98.0   287.0   202.0
like image 175
jezrael Avatar answered Mar 08 '26 03:03

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!