How do I read the following (two columns) data (from a .dat file) with Pandas <pre class="prettyprint"><code>TIME XGSM 2004 006 01 00 01 37 600 1 2004 006 01 00 02 32 800 5 2004 006 01 00 03 28 000 8 2004 006 01 00 04 23 200 11 2004 006 01 00 05 18 400 17 </code></pre> Column separator is (at least) 2 spaces. I tried <pre class="prettyprint"><code>df = pd.read_table("test.dat", sep="\s+", usecols=['TIME', 'XGSM']) print df </code></pre> But it prints <pre class="prettyprint"><code> TIME XGSM 2004 6 2004 6 2004 6 2004 6 2004 6 </code></pre>

I too experienced the problem while importing when there are lots of white space. I could solve by using <blockquote> pd.read_fwf(file_name) </blockquote> If you want to import files with fixed width text file, then read_fwf might be the solution without needing to use StringIO.

Read data (.dat file) with Pandas

Tags:

python

pandas

dataframe

How do I read the following (two columns) data (from a .dat file) with Pandas

TIME                      XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17

Column separator is (at least) 2 spaces.

I tried

df = pd.read_table("test.dat", sep="\s+", usecols=['TIME', 'XGSM'])
print df

But it prints

944

asked Dec 07 '16 19:12

KcFnMi

3 Answers

You can use parameter usecols with order of columns:

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIME             XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), 
                 sep="\s+", 
                 skiprows=1, 
                 usecols=[0,7], 
                 names=['TIME','XGSM'])

print (df)
   TIME  XGSM
0  2004     1
1  2004     5
2  2004     8
3  2004    11
4  2004    17

Edit:

You can use separator regex - 2 and more spaces and then add engine='python' because warning:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIME              XGSM
2004 006 01 00 01 37 600   1
2004 006 01 00 02 32 800   5
2004 006 01 00 03 28 000   8
2004 006 01 00 04 23 200   11
2004 006 01 00 05 18 400   17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep=r'\s{2,}', engine='python')

print (df)
                       TIME  XGSM
0  2004 006 01 00 01 37 600     1
1  2004 006 01 00 02 32 800     5
2  2004 006 01 00 03 28 000     8
3  2004 006 01 00 04 23 200    11
4  2004 006 01 00 05 18 400    17

113

answered Oct 23 '22 19:10

jezrael

Could also try pd.read_fwf() (Read a table of fixed-width formatted lines into DataFrame):

import pandas as pd
from io import StringIO

pd.read_fwf(StringIO("""TIME                      XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17"""), usecols = ["TIME", "XGSM"])

#   TIME    XGSM
#0  2004    1
#1  2004    5
#2  2004    8
#3  2004    11
#4  2004    17

answered Oct 23 '22 19:10

Psidom

I too experienced the problem while importing when there are lots of white space. I could solve by using

pd.read_fwf(file_name)

If you want to import files with fixed width text file, then read_fwf might be the solution without needing to use StringIO.

answered Oct 23 '22 17:10

Suraj

Related questions
                            
                                JS: how to use generator and yield in a callback
                            
                                How to convert an Observable to a ReplaySubject?
                            
                                Eslint errorring importing jsx without extension
                            
                                How can I do a spatial join with the sf package using st_join()
                            
                                Understanding MetaData() from SQLAlchemy in Python
                            
                                Respond then continue working with AWS Lambda/API Gateway?
                            
                                What does is the "exotic" naming in the version of npm packages?
                            
                                UITableView with MVVM using Swift
                            
                                Best practice for storing auth tokens in VueJS?
                            
                                What does the CV stand for in sklearn.linear_model.LogisticRegressionCV?
                            
                                Class template argument deduction failed with derived class
                            
                                @Import vs @ContextConfiguration in Spring

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With