Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read data from text file into array with Python

Tags:

python

list

I have a bit trouble with some data stored in a text file on hand for regression analysis using Python.

The data are stored in the format that look like this:

2104,3,399900 1600,3,329900 2400,3,369000 ....

I need to do some analysis like finding mean by this: (2104+1600+...)/number of data

I think the appropriate steps is to store the data into array. But I have no idea how to store it. I think of two ways to do so. The first one is to set 3 array that stores like

a=[2104 1600 2400 ...] b=[3 3 3 ...] c=[399900 329900 36000 ...]

The second way is to store in

a=[2104 3 399900], b=[1600 3 329900] and so on. 

Which one is better?

Also, how to write code that allows the data can be stored into array? I think of like this:

with open("file.txt", "r") as ins:
array = []
elt.strip(',."\'?!*:') for line in ins:
array.append(line)

Is that correct?

like image 249
poonck1 Avatar asked Feb 22 '17 14:02

poonck1


People also ask

How do I read a text file into an array?

Use the fs. readFileSync() method to read a text file into an array in JavaScript, e.g. const contents = readFileSync(filename, 'utf-8'). split('\n') . The method will return the contents of the file, which we can split on each newline character to get an array of strings.

How do I read a text file into a list in Python?

You can read a text file using the open() and readlines() methods. To read a text file into a list, use the split() method. This method splits strings into a list at a certain character. In the example above, we split a string into a list based on the position of a comma and a space (“, ”).

How do you put text in an array in Python?

To convert String to array in Python, use String. split() method. The String . split() method splits the String from the delimiter and returns the splitter elements as individual list items.


Video Answer


3 Answers

You could use :

with open('data.txt') as data:
    substrings = data.read().split()
    values = [map(int, substring.split(',')) for substring in substrings]
    average = sum([a for a, b, c in values]) / float(len(values))
    print average

With this data.txt, :

2104,3,399900 1600,3,329900 2400,3,369000
2105,3,399900 1601,3,329900 2401,3,369000

It outputs :

2035.16666667
like image 74
Eric Duminil Avatar answered Oct 02 '22 16:10

Eric Duminil


Using pandas and numpy you can get the data into an array as follows:

In [37]: data = "2104,3,399900 1600,3,329900 2400,3,369000"

In [38]: d = pd.read_csv(StringIO.StringIO(data), sep=',| ', header=None, index_col=None, engine="python")

In [39]: d.values.reshape(3, d.shape[1]/3)
Out[39]: 
array([[  2104,      3, 399900],
       [  1600,      3, 329900],
       [  2400,      3, 369000]])
like image 38
languitar Avatar answered Oct 02 '22 16:10

languitar


Instead of having multiple arrays a, b, c... you could store your data as an array of arrays (a 2 dimensional array). For example:

[[2104,3,399900],
 [1600,3,329900],
 [2400,3,369000]...]

This way you don't have to deal with dynamically naming your arrays. How you store your data, i.e. 3 * array of length n or n * array of length 3 is up to you. I would prefer the second way. To read the data into your array you should then use the split() function, which will split your input into an array. So in your case:

with open("file.txt", "r") as ins:
    tmp = ins.read().split(" ")
    array = [i.split(",") for i in tmp]

>>> array
[['2104', '3', '399900'], ['1600', '3', '329900'], ['2400', '3', '369000']]

Edit: To find the mean e.g. for the first element in each list you could do the following:

arraymean = sum([int(i[0]) for i in array]) / len(array)

Where the 0 in i[0] specifies the first element in each list. Note that this code uses list comprehension, which you can learn more about in this post if you want to.

Also this code stores the values in the array as strings, hence the cast to int in the part to get the mean. If you want to store the data as int directly just edit the part in the file reading section:

array = [[int(j) for j in i.split(",")] for i in tmp]
like image 41
Leon Z. Avatar answered Oct 02 '22 15:10

Leon Z.