I have a file named df that looks like this:
Size ID File
500 TB A 200 TB
200 TB B 100 TB
600 TB C 300 TB
The numerical value along with the text, which is always 'TB'
, are within one column. How would I transform this and remove the 'TB' text from both columns to give me the desired output of:
Size ID File
500 A 200
200 B 100
600 C 300
This is what I am doing:
import numpy as np
import pandas as pd
df = df[df[","] > ,]
I am still researching this. Any insight will be helpful.
str.split
to the columns with pandas.DataFrame.apply
, and then select the first element from the list created by .split
, with .str[0]
..apply
in this way, will apply the lambda
function to all the columns.
ID
has values with spaces, then this solution will cause an issue there, which can be resolved by using apply only on the columns that need to be fixed.
df[['Size', 'File']] = df[['Size', 'File']].apply(lambda x: x.str.split(' ').str[0])
.apply
isn't required.
df['Size'] = df['Size'].str.split(' ').str[0]
import pandas as pd
# test dataframe
df = pd.DataFrame({'Size': ['500 TB', '200 TB', '600 TB'], 'ID': ['A', 'B', 'C'], 'File': ['200 TB ', '100 TB', '300 TB']})
# display(df)
Size ID File
0 500 TB A 200 TB
1 200 TB B 100 TB
2 600 TB C 300 TB
# apply str.split and then select the value at index [0]
df = df.apply(lambda x: x.str.split(' ').str[0])
# display(df)
Size ID File
0 500 A 200
1 200 B 100
2 600 C 300
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With