How to read data from text file into array with Python

Tags:

list

I have a bit trouble with some data stored in a text file on hand for regression analysis using Python.

The data are stored in the format that look like this:

2104,3,399900 1600,3,329900 2400,3,369000 ....

I need to do some analysis like finding mean by this: (2104+1600+...)/number of data

I think the appropriate steps is to store the data into array. But I have no idea how to store it. I think of two ways to do so. The first one is to set 3 array that stores like

a=[2104 1600 2400 ...] b=[3 3 3 ...] c=[399900 329900 36000 ...]

The second way is to store in

a=[2104 3 399900], b=[1600 3 329900] and so on.

Which one is better?

Also, how to write code that allows the data can be stored into array? I think of like this:

with open("file.txt", "r") as ins:
array = []
elt.strip(',."\'?!*:') for line in ins:
array.append(line)

Is that correct?

249

asked Feb 22 '17 14:02

Video Answer

3 Answers

You could use :

with open('data.txt') as data:
    substrings = data.read().split()
    values = [map(int, substring.split(',')) for substring in substrings]
    average = sum([a for a, b, c in values]) / float(len(values))
    print average

With this data.txt, :

2104,3,399900 1600,3,329900 2400,3,369000
2105,3,399900 1601,3,329900 2401,3,369000

It outputs :

2035.16666667

answered Oct 02 '22 16:10

languitar

Instead of having multiple arrays a, b, c... you could store your data as an array of arrays (a 2 dimensional array). For example:

[[2104,3,399900],
 [1600,3,329900],
 [2400,3,369000]...]

This way you don't have to deal with dynamically naming your arrays. How you store your data, i.e. 3 * array of length n or n * array of length 3 is up to you. I would prefer the second way. To read the data into your array you should then use the split() function, which will split your input into an array. So in your case:

with open("file.txt", "r") as ins:
    tmp = ins.read().split(" ")
    array = [i.split(",") for i in tmp]

>>> array
[['2104', '3', '399900'], ['1600', '3', '329900'], ['2400', '3', '369000']]

Edit: To find the mean e.g. for the first element in each list you could do the following:

arraymean = sum([int(i[0]) for i in array]) / len(array)

Where the 0 in i[0] specifies the first element in each list. Note that this code uses list comprehension, which you can learn more about in this post if you want to.

Also this code stores the values in the array as strings, hence the cast to int in the part to get the mean. If you want to store the data as int directly just edit the part in the file reading section:

array = [[int(j) for j in i.split(",")] for i in tmp]

answered Oct 02 '22 15:10

Leon Z.

Related questions
                            
                                Function input() in pyspark
                            
                                Is LIBGDX Slower in python than Java
                            
                                Change execution concurrency of Airflow DAG
                            
                                High GPU Memory-Usage but zero volatile gpu-util
                            
                                Pytest: running tests multiple times with different input data
                            
                                scikit-learn - Convert pipeline prediction to original value/scale
                            
                                How to code a sequence to sequence RNN in keras?
                            
                                Testing the connection of Postgres-DB
                            
                                Pandas Read CSV file with variable rows to skip with special character at the beginning of row
                            
                                How to take set union of all the values in a column of pandas Dataframe?
                            
                                Beautifulsoup results to pandas dataframe
                            
                                How solve python requests error: "Max retries exceeded with url"
                            
                                Python multiprocessing linux windows difference
                            
                                How to interleave numpy.ndarrays?
                            
                                Color Specific Bar Chart Differently in Python PPTX
                            
                                Creating a heap with heapify vs heappush. Which one is faster?
                            
                                Why is the accuracy of my CNN not reproducible?
                            
                                Django object.image.url not displaying even though path is correct
                            
                                finding all regex matches from a pandas dataframe column
                            
                                Python recursion in appending lists

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read data from text file into array with Python

Tags:

python

list

poonck1

People also ask

Video Answer

3 Answers

Eric Duminil

languitar

Leon Z.

Recent Activity

Donate For Us