I need to read a .dat file in python which has 12 columns in total and millions of lines of rows. I need to divide column 2,3 and 4 with column 1 for my calculation. So before I load that .dat file, do I need to delete all the other unwanted columns? If not, how do I selectively declare the column and ask python to do the math? an example of the .dat file would be data.dat I am new to python , so a little instruction to open , read and calculation would be appreciated. I have added the code I am using as a starter from your suggestion: <pre class="prettyprint"><code>from sys import argv import pandas as pd script, filename = argv txt = open(filename) print "Here's your file %r:" % filename print txt.read() def your_func(row): return row['x-momentum'] / row['mass'] columns_to_keep = ['mass', 'x-momentum'] dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep) dataframe['new_column'] = dataframe.apply(your_func, axis=1) </code></pre> and also the error I get through it: <pre class="prettyprint"><code>Traceback (most recent call last): File "flash.py", line 18, in <module> dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 529, in parser_f return _read(filepath_or_buffer, kwds) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 295, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 612, in __init__ self._make_engine(self.engine) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1119, in __init__ self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030) ValueError: No columns to parse from file </code></pre>

<pre class="prettyprint"><code>train=pd.read_csv("Path",sep=" ::",header=None) </code></pre> Now you can access the dat file. <pre class="prettyprint"><code>train.columns=["A","B","C"]# Number of columns you can see in the dat file. </code></pre> then you can use this as csv files.

reading and doing calculation from .dat file in python

Tags:

python

csv

I need to read a .dat file in python which has 12 columns in total and millions of lines of rows. I need to divide column 2,3 and 4 with column 1 for my calculation. So before I load that .dat file, do I need to delete all the other unwanted columns? If not, how do I selectively declare the column and ask python to do the math?

an example of the .dat file would be data.dat

I am new to python , so a little instruction to open , read and calculation would be appreciated.

I have added the code I am using as a starter from your suggestion:

from sys import argv

import pandas as pd



script, filename = argv

txt = open(filename)

print "Here's your file %r:" % filename
print txt.read()

def your_func(row):
    return row['x-momentum'] / row['mass']

columns_to_keep = ['mass', 'x-momentum']
dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

and also the error I get through it:

Traceback (most recent call last):
  File "flash.py", line 18, in <module>
    dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 529, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 295, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 612, in __init__
    self._make_engine(self.engine)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1119, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)
ValueError: No columns to parse from file

927

asked Jun 21 '16 23:06

bhjghjh

2 Answers

After looking at your flash.dat file, it's clear you need to do a little clean up before you process it. The following code converts it to a CSV file:

import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]

# write it as a new CSV file
with open("./flash.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)

Now, use Pandas to compute new column.

import pandas as pd

def your_func(row):
    return row['x-momentum'] / row['mass']

columns_to_keep = ['#time', 'x-momentum', 'mass']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe

165

answered Nov 06 '22 05:11

Bill

train=pd.read_csv("Path",sep=" ::",header=None)

Now you can access the dat file.

train.columns=["A","B","C"]# Number of columns you can see in the dat file.

then you can use this as csv files.

answered Nov 06 '22 03:11

Nisarg Bhatt

Related questions
                            
                                Error while loading Word2Vec model in gensim
                            
                                Django: How to set EDT timezone in settings for Florida
                            
                                Location of stored offline data for cartopy
                            
                                Splitting pandas data frame based on column name
                            
                                TextBlob NaiveBayesAnalyzer extremely slow (compared to Pattern)
                            
                                django-'NoneType' object is not callable
                            
                                Pandas: boolean indexing with 'item in list' syntax
                            
                                Flask - POST - The method is not allowed for the requested URL
                            
                                Notepad++ convert to UTF-8 multiple files
                            
                                Installing pygrib Package
                            
                                How to install package in anaconda?
                            
                                Seaborn: barplot the counting not mean value of a column
                            
                                How to set browser viewport size
                            
                                Stocking large numbers into numpy array
                            
                                How to handle elements inside Shadow DOM from Selenium
                            
                                Open scrapy output in browser tab or ipython window
                            
                                Speed Static Methods vs Class Method
                            
                                How to return an unpacked list in Python?
                            
                                pandas dataframe resample per day without date time index
                            
                                Pandas Series: Log Normalize

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With