Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to read file and store certain columns in array

I am reading a dataset (separated by whitespace) from a file. I need to store all columns apart from last one in the array data, and the last column in the array target.

Can you guide me how to proceed further?

That's what I have so far:

with open(filename) as f:
    data = f.readlines()

Or should I be reading line by line?

PS: The data type of columns is also different.

Edit: Sample Data

faban       1   0   0.288   withspy
faban       2   0   0.243   withoutspy
simulated   1   0   0.159   withoutspy
faban       1   1   0.189   withoutspy
like image 410
SaadH Avatar asked Jan 04 '16 07:01

SaadH


People also ask

How do you read a specific column in a text file in Python?

This can be useful if we want to access specific columns of the file. #Create a variable for the file name filename = "Plates_output_simple. csv" #Open the file infile = open(filename, 'r') lines = infile. readlines() for line in lines: sline = line.


3 Answers

This would work:

data = []
target = []
with open('faban.txt') as fobj:
    for line in fobj:
        row = line.split()
        data.append(row[:-1])
        target.append(row[-1])

Now:

>>> data
[['faban', '1', '0', '0.288'],
 ['faban', '2', '0', '0.243'],
 ['simulated', '1', '0', '0.159'],
 ['faban', '1', '1', '0.189']]

>>> target
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
like image 118
Mike Müller Avatar answered Oct 19 '22 02:10

Mike Müller


I think numpy has a clean, easy solution here.

>>> import numpy as np
>>> data, target = np.array_split(np.loadtxt('file', dtype=str), [-1], axis=1)

results in:

>>> data.tolist()
[['faban', '1', '0', '0.288'], 
 ['faban', '2', '0', '0.243'], 
 ['simulated', '1', '0', '0.159'], 
 ['faban', '1', '1', '0.189']]
>>> target.flatten().tolist()
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
like image 40
timgeb Avatar answered Oct 19 '22 03:10

timgeb


You could do that with pandas using read_table to read your data, iloc to subset your data, values to get values from DataFrame and tolist method to convert numpy array to list:

import pandas as pd
df = pd.read_table('path_to_your_file', delim_whitespace=True, header=None)
print(df)
           0  1  2      3           4
0      faban  1  0  0.288     withspy
1      faban  2  0  0.243  withoutspy
2  simulated  1  0  0.159  withoutspy
3      faban  1  1  0.189  withoutspy


data = df.iloc[:,:-1].values.tolist()
target = df.iloc[:,-1].tolist()

print(data)
[['faban', 1, 0, 0.28800000000000003],
 ['faban', 2, 0, 0.243],
 ['simulated', 1, 0, 0.159],
 ['faban', 1, 1, 0.18899999999999997]]

print(target)
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']
like image 3
Anton Protopopov Avatar answered Oct 19 '22 03:10

Anton Protopopov