Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I remove multiple values within one column in Python

Tags:

python

pandas

I have a file named df that looks like this:

Size       ID      File
500 TB     A       200 TB 
200 TB     B       100 TB
600 TB     C       300 TB

The numerical value along with the text, which is always 'TB', are within one column. How would I transform this and remove the 'TB' text from both columns to give me the desired output of:

Size       ID      File
500        A       200 
200        B       100 
600        C       300 

This is what I am doing:

import numpy as np
import pandas as pd

df = df[df[","] > ,] 

I am still researching this. Any insight will be helpful.

like image 651
Lynn Avatar asked Dec 23 '22 16:12

Lynn


1 Answers

  • Apply str.split to the columns with pandas.DataFrame.apply, and then select the first element from the list created by .split, with .str[0].
  • This will work, as long as the pattern shown in the sample is consistent, with the undesired text after the space.
  • Using .apply in this way, will apply the lambda function to all the columns.
    • If ID has values with spaces, then this solution will cause an issue there, which can be resolved by using apply only on the columns that need to be fixed.
      • df[['Size', 'File']] = df[['Size', 'File']].apply(lambda x: x.str.split(' ').str[0])
    • If there was only one column to fix, then .apply isn't required.
      • df['Size'] = df['Size'].str.split(' ').str[0]
import pandas as pd

# test dataframe
df =  pd.DataFrame({'Size': ['500 TB', '200 TB', '600 TB'], 'ID': ['A', 'B', 'C'], 'File': ['200 TB ', '100 TB', '300 TB']})

# display(df)
     Size ID     File
0  500 TB  A  200 TB 
1  200 TB  B   100 TB
2  600 TB  C   300 TB

# apply str.split and then select the value at index [0]
df = df.apply(lambda x: x.str.split(' ').str[0])

# display(df)
  Size ID File
0  500  A  200
1  200  B  100
2  600  C  300
like image 66
Trenton McKinney Avatar answered Dec 25 '22 06:12

Trenton McKinney