I have a file of tab-separated values where the first half of the file has 3 columns and N rows and the second half has 2 columns and M rows. I need to convert such a file into two separate arrays: a 3xN and a 2xM.
Example:
6.7900209022264466 -3.8259897286289504 13.563976248832137
1.5334543760683907 12.723711617874176 1.5148291755004299
2.4282763900233522 9.1305022788201136 -3.1003673775485394
-6.5344717544805586E-002 -12.487743380186622 2.6928902187606480
8.9067951331740804 13.403331728374390 -0.58045132774289632
-11.842481592786449 -5.7083783211328551 1.9526760053685255
-10.240286781275808 13.204312088815593 4.4856524683466175
-4.6690658488407504 -6.2809313597959449 7.4378900284937082
-9.5874077836478282 -8.6799071183782903 -1.8203838010218165
0.62588896716878051 -5.4614995295716540 11.166650096421838
0 4173
0 1998
0 611
0 8606
1 6912
1 9671
1 7993
1 8513
2 5556
2 4422
2 3047
I cannot simply use loadtxt()
to read such a file because this would result in the error ValueError: Wrong number of columns at line ...
Is there a way to use loadtxt()
or some similar function to read such a file?
I would like to avoid using readlines()
and split()
and then convert to float, because this would make the code slower (I think...) and longer. I have also tried pandas.read_csv()
, but I need an array as output.
Update:
For now, following hpaulj's suggestion, I'm doing it like this using readlines()
and split()
:
with open(filename,"r") as f:
all_data=[x.split() for x in f.readlines()]
a=array([map(float,x) for x in all_data[:N]])
b=array([map(int,x) for x in all_data[N+1:]])
It is actually pretty fast, but I would still like to know if someone knows a faster -and maybe simpler- method.
I would recommend using pandas.read_csv()
and then obtaining the numpy array using the .values
attribute from the DataFrame
- see documentation
import pandas as pd
import numpy as np
df = pd.read_csv("filename.txt")
array_values = df.values
Right now if you just use .values
then you will get nan
for the missing values. You can determine M
and N
by checking for indices that contain nan
for the missing values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With