I am using pandas.read_csv to read a whitespace delimited file. The file has a variable number of whitespace characters in front of every line (the numbers are right-aligned). When I read this file, it creates a column of NaN. Why does this happen, and what is the best way to prevent it?
Example:
Text file:
9.0 3.3 4.0
32.3 44.3 5.1
7.2 1.1 0.9
Command:
import pandas as pd
pd.read_csv("test.txt",delim_whitespace=True,header=None)
Output:
0 1 2 3
0 NaN 9.0 3.3 4.0
1 NaN 32.3 44.3 5.1
2 NaN 7.2 1.1 0.9
FWIW I tend to use \s+
instead, and it doesn't suffer the same problem:
>>> pd.read_csv("wspace.csv", header=None, delim_whitespace=True)
0 1 2 3
0 NaN 9.0 3.3 4.0
1 NaN 32.3 44.3 5.1
2 NaN 7.2 1.1 0.9
>>> pd.read_csv("wspace.csv", header=None, sep=r"\s+")
0 1 2
0 9.0 3.3 4.0
1 32.3 44.3 5.1
2 7.2 1.1 0.9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With