I have a problem with reading CSV(or txt file) on pandas module Because numpy's loadtxt function takes too much time, I decided to use pandas read_csv instead.
I want to make a numpy array from txt file with four columns separated by space, and has very large number of rows (like, 256^3. In this example, it is 64^3).
The problem is that I don't know why but it seems that pandas's read_csv always skips the first line (first row) of the csv (txt) file, resulting one less data.
here is the code.
from __future__ import division import numpy as np import pandas as pd ngridx = 4 ngridy = 4 ngridz = 4 size = ngridx*ngridy*ngridz f = np.zeros((size,4)) a = np.arange(size) f[:, 0] = np.floor_divide(a, ngridy*ngridz) f[:, 1] = np.fmod(np.floor_divide(a, ngridz), ngridy) f[:, 2] = np.fmod(a, ngridz) f[:, 3] = np.random.rand(size) print f[0] np.savetxt('Testarray.txt',f,fmt='%6.16f') g = pd.read_csv('Testarray.txt',delimiter=' ').values print g[0] print len(g[:,3])
f[0] and g[0] that are displayed in the output have to match but it doesn't, indicating that pandas is skipping the first line of the Testarray.txt
. Also, length of loaded file g
is less than the length of the array f
.
I need help.
Thanks in advance.
Step 1: In order to read rows in Python, First, we need to load the CSV file in one object. So to load the csv file into an object use open() method. Step 2: Create a reader object by passing the above-created file object to the reader function. Step 3: Use for loop on reader object to get each row.
You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.
By default, pd.read_csv
uses header=0
(when the names
parameter is also not specified) which means the first (i.e. 0th-indexed) line is interpreted as column names.
If your data has no header, then use
pd.read_csv(..., header=None)
For example,
import io import sys import pandas as pd if sys.version_info.major == 3: # Python3 StringIO = io.StringIO else: # Python2 StringIO = io.BytesIO text = '''\ 1 2 3 4 5 6 ''' print(pd.read_csv(StringIO(text), sep=' '))
Without header
, the first line, 1 2 3
, sets the column names:
1 2 3 0 4 5 6
With header=None
, the first line is treated as data:
print(pd.read_csv(StringIO(text), sep=' ', header=None))
prints
0 1 2 0 1 2 3 1 4 5 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With