I am reading a dataset (separated by whitespace) from a file. I need to store all columns apart from last one in the array data, and the last column in the array target.
Can you guide me how to proceed further?
That's what I have so far:
with open(filename) as f:
data = f.readlines()
Or should I be reading line by line?
PS: The data type of columns is also different.
Edit: Sample Data
faban 1 0 0.288 withspy
faban 2 0 0.243 withoutspy
simulated 1 0 0.159 withoutspy
faban 1 1 0.189 withoutspy
This can be useful if we want to access specific columns of the file. #Create a variable for the file name filename = "Plates_output_simple. csv" #Open the file infile = open(filename, 'r') lines = infile. readlines() for line in lines: sline = line.
This would work:
data = []
target = []
with open('faban.txt') as fobj:
for line in fobj:
row = line.split()
data.append(row[:-1])
target.append(row[-1])
Now:
>>> data
[['faban', '1', '0', '0.288'],
['faban', '2', '0', '0.243'],
['simulated', '1', '0', '0.159'],
['faban', '1', '1', '0.189']]
>>> target
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
I think numpy has a clean, easy solution here.
>>> import numpy as np
>>> data, target = np.array_split(np.loadtxt('file', dtype=str), [-1], axis=1)
results in:
>>> data.tolist()
[['faban', '1', '0', '0.288'],
['faban', '2', '0', '0.243'],
['simulated', '1', '0', '0.159'],
['faban', '1', '1', '0.189']]
>>> target.flatten().tolist()
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
You could do that with pandas using read_table to read your data, iloc to subset your data, values to get values from DataFrame and tolist method to convert numpy array to list:
import pandas as pd
df = pd.read_table('path_to_your_file', delim_whitespace=True, header=None)
print(df)
0 1 2 3 4
0 faban 1 0 0.288 withspy
1 faban 2 0 0.243 withoutspy
2 simulated 1 0 0.159 withoutspy
3 faban 1 1 0.189 withoutspy
data = df.iloc[:,:-1].values.tolist()
target = df.iloc[:,-1].tolist()
print(data)
[['faban', 1, 0, 0.28800000000000003],
['faban', 2, 0, 0.243],
['simulated', 1, 0, 0.159],
['faban', 1, 1, 0.18899999999999997]]
print(target)
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With