I am reading a dataset (separated by whitespace) from a file. I need to store all columns apart from last one in the array data
, and the last column in the array target
.
Can you guide me how to proceed further?
That's what I have so far:
with open(filename) as f:
data = f.readlines()
Or should I be reading line by line?
PS: The data type of columns is also different.
Edit: Sample Data
faban 1 0 0.288 withspy
faban 2 0 0.243 withoutspy
simulated 1 0 0.159 withoutspy
faban 1 1 0.189 withoutspy
This can be useful if we want to access specific columns of the file. #Create a variable for the file name filename = "Plates_output_simple. csv" #Open the file infile = open(filename, 'r') lines = infile. readlines() for line in lines: sline = line.
This would work:
data = []
target = []
with open('faban.txt') as fobj:
for line in fobj:
row = line.split()
data.append(row[:-1])
target.append(row[-1])
Now:
>>> data
[['faban', '1', '0', '0.288'],
['faban', '2', '0', '0.243'],
['simulated', '1', '0', '0.159'],
['faban', '1', '1', '0.189']]
>>> target
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
I think numpy
has a clean, easy solution here.
>>> import numpy as np
>>> data, target = np.array_split(np.loadtxt('file', dtype=str), [-1], axis=1)
results in:
>>> data.tolist()
[['faban', '1', '0', '0.288'],
['faban', '2', '0', '0.243'],
['simulated', '1', '0', '0.159'],
['faban', '1', '1', '0.189']]
>>> target.flatten().tolist()
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
You could do that with pandas
using read_table
to read your data, iloc
to subset your data, values
to get values from DataFrame and tolist
method to convert numpy array to list:
import pandas as pd
df = pd.read_table('path_to_your_file', delim_whitespace=True, header=None)
print(df)
0 1 2 3 4
0 faban 1 0 0.288 withspy
1 faban 2 0 0.243 withoutspy
2 simulated 1 0 0.159 withoutspy
3 faban 1 1 0.189 withoutspy
data = df.iloc[:,:-1].values.tolist()
target = df.iloc[:,-1].tolist()
print(data)
[['faban', 1, 0, 0.28800000000000003],
['faban', 2, 0, 0.243],
['simulated', 1, 0, 0.159],
['faban', 1, 1, 0.18899999999999997]]
print(target)
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With