I have a two-column data in a text file, eg as follows.
Balkrishna Industries Ltd. Auto Ancillaries 3.54
Aurobindo Pharma Ltd. Pharmaceuticals 3.36
NIIT Technologies Ltd. Software 3.31
Sonata Software Ltd. Software 3.21
When I tried to read this in Pandas, I get an error as the space is a delimiter, and the company names are not restricted to a single column. How do I modify my code to separate this data into two columns, one for the name and one for the number?
import numpy as np
import pandas as pd
data = pd.read_csv('file.txt', sep=" ", header=None)
data.columns = ["Name", "Fraction"]
print(data)
Using Regex Lookbehind & Lookahead sep="(?<=\w) (?=\d)"
Ex:
import pandas as pd
df = pd.read_csv(filename, sep="(?<=\w) (?=\d)", names=["Name", "Fraction"])
print(df)
Output:
Name Fraction
0 Balkrishna Industries Ltd. Auto Ancillaries 3.54
1 Aurobindo Pharma Ltd. Pharmaceuticals 3.36
2 NIIT Technologies Ltd. Software 3.31
3 Sonata Software Ltd. Software 3.21
Another approach, read the file in as one column (use a sep character that doesn't exist in the file - such as |).
Then use Series.str.rsplit, with n=1 and expand=True arguments, to split the string, on whitespace, from the right, with only 1 partition, returned as a DataFrame with 2 columns:
df = pd.read_csv('file.txt', sep='|', header=None)
df = df[0].str.rsplit(' ', n=1, expand=True)
df.columns = ["Name", "Fraction"]
[out]
Name Fraction
0 Balkrishna Industries Ltd. Auto Ancillaries 3.54
1 Aurobindo Pharma Ltd. Pharmaceuticals 3.36
2 NIIT Technologies Ltd. Software 3.31
3 Sonata Software Ltd. Software 3.21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With