I'm writing a general program to read and plot large amounts of data from .txt files. Each file has a different number of columns. I do know that each file has 8 columns that I'm not interested in, so I can figure out the number of relevant columns that way. How can I read the data and sort each relevant column's data into a separate variable?
This is what I have so far:
datafile = 'plotspecies.txt'
with open(datafile) as file:
reader = csv.reader(file, delimiter=' ', skipinitialspace=True)
first_row = next(reader)
num_cols = len(first_row)
rows = csv.reader(file, delimiter = ' ', quotechar = '"')
data = [data for data in rows]
num_species = num_cols - 8
I've seen people say that pandas is good for this sort of thing, but I can't seem to import it. I'd prefer a solution without it.
To store column A in a variable: "column_a = wb['sheet1']['A']". To store column B in a variable: "column_b = wb['sheet1']['B']".
The assignment operator, denoted by the “=” symbol, is the operator that is used to assign values to variables in Python. The line x=1 takes the known value, 1, and assigns that value to the variable with name “x”. After executing this line, this number will be stored into this variable.
Pandas is in fact the right solution here. The issue is that in order to robustly handle something where you aren't certain of the underlying structure there's a lot of edge cases you have to watch out for, and trying to shoe-horn it into the csv
module is a recipe for headaches (though it can be done)
As far as why you can't import pandas
the reason is that it doesn't come with python
by default. One of the most important things to consider when picking up a language is the ecosystem of packages it gives you access to. Python happens to be one of the best in the respect, so to ignore everything that's not a part of standard python is to ignore the best part of the language.
If you're on a windows environment you should start by getting conda
set up. This will allow you to seamlessly explore many of the packages available to python users with little overhead. This includes pandas
, which is in fact the right way to handle this problem. See this link for more info on installing conda: http://conda.pydata.org/docs/install/quick.html
Once you're got pandas
installed it's as easy as this:
import pandas
test = pandas.read_csv(<your_file>)
your_Variable = test[<column_header>]
Easy as that.
If you really, really don't want to use things that aren't in core python then you can do this with something like what follows, but you haven't given enough detail for an actual solution:
def col_var(input_file, delimiter):
# get each line into a variable
rows = open(input_file).read().splitlines()
# split each row into entries
split_rows = [row.split(delimiter) for row in rows]
# Re-orient your list
columns = zip(*split_rows)
The least intuitive piece of this is the last line, so here's a little example showing you how it works:
>>> test = [[1,2], [3,4]]
>>> zip(*test)
[(1, 3), (2, 4)]
Well, you can use the csv module provided there is some kind of delimiter within the rows that sets the columns appart.
import csv
file_to_read_from = 'myFile.txt'
#initializing as many lists as the columns you want (not all)
col1, col2, col3 = [], [], []
with open(file_to_read_from, 'r') as file_in:
reader = csv.reader(file_in, delimiter=';') #might as well be ',', '\t' etc
for row in reader:
col1.append(row[0]) # assuming col 1 in the file is one of the 3 you want
col2.append(row[3]) # assuming col 4 in the file is one of the 3 you want
col3.append(row[5]) # assuming col 6 in the file is one of the 3 you want
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With