Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading tab delimited csv into numpy array with different data types

I have a tab delimited csv dataset as following:

1       2       3       4       5       6       [0, 1, 2, 3, 4, 5]
3       1       2       6       4       5       [2, 0, 1, 5, 3, 4]
7       8       9       10      11      6       [0, 1, 2, 3, 4, 5]
10      11      9       8       7       6       [3, 4, 2, 1, 0, 5]
12      13      4       14      15      6       [0, 1, 2, 3, 4, 5]
13      4       14      12      15      6       [1, 2, 3, 0, 4, 5]
16      17      18      19      20      6       [0, 1, 2, 3, 4, 5]
6       18      20      17      16      19      [5, 2, 4, 1, 0, 3]
7       21      22      23      24      6       [0, 1, 2, 3, 4, 5]
23      6       21      7       22      24      [3, 5, 1, 0, 2, 4]
25      7       21      22      23      6       [0, 1, 2, 3, 4, 5]
6       21      7       22      25      23      [5, 2, 1, 3, 0, 4]
16      26      3       27      28      6       [0, 1, 2, 3, 4, 5]
26      6       27      3       28      16      [1, 5, 3, 2, 4, 0]
7       29      24      30      31      6       [0, 1, 2, 3, 4, 5]
30      24      6       7       29      31      [3, 2, 5, 0, 1, 4]
32      33      13      34      35      36      [0, 1, 2, 3, 4, 5]
34      32      36      35      13      33      [3, 0, 5, 4, 2, 1]
7       37      38      39      40      6       [0, 1, 2, 3, 4, 5]
39      38      40      6       37      7       [3, 2, 4, 5, 1, 0]
7       41      42      43      44      6       [0, 1, 2, 3, 4, 5]
41      6       44      43      42      7       [1, 5, 4, 3, 2, 0]
7       45      46      47      48      6       [0, 1, 2, 3, 4, 5]
6       47      45      7       46      48      [5, 3, 1, 0, 2, 4]
49      2       50      51      52      6       [0, 1, 2, 3, 4, 5]

When I want to import such csv file into a numpy array as following;

dataset = numpy.loadtxt('dataset/demo_dataset.csv', delimiter='\t', dtype='str')

I obtain a numpy array with (25,) shape.

I want to import this csv file into two numpy arrays, called X and Y.

X will include the first 6 columns, and Y will include last column as list values, not str.

How can I manage it?

like image 332
yusuf Avatar asked Mar 17 '26 06:03

yusuf


1 Answers

I managed to achieve this only via a custom method:

import numpy

with open('dataset/demo_dataset.csv', 'r') as fin:
    lines = fin.readlines()
    # remove '\n' characters
    clean_lines = [l.strip('\n') for l in lines]
    # split on tab so that we get lists from strings
    A = [cl.split('\t') for cl in clean_lines]
    # get lists of ints instead of lists of strings
    X = [map(int, row[0:6]) for row in A]
    # last column in Y
    Y = [row[6] for row in A]

    # convert string to int values
    for i in xrange(len(Y)):
        Y[i] = map(int, Y[i].strip('[]').split(','))
like image 114
stelios Avatar answered Mar 18 '26 18:03

stelios