I have a csv file like:
"B/G/213","B/C/208","WW_cis",,
"B/U/215","B/A/206","WW_cis",,
"B/C/214","B/G/207","WW_cis",,
"B/G/217","B/C/204","WW_cis",,
"B/A/216","B/U/205","WW_cis",,
"B/C/219","B/G/202","WW_cis",,
"B/U/218","B/A/203","WW_cis",,
"B/G/201","B/C/220","WW_cis",,
"B/A/203","B/U/218","WW_cis",,
and I want to read it into something like an array or dataframe, so that I would be able to compare elements from one column to selected elements from another columns. At first, I have read it straight into an array using numpy.genfromtxt, but I got stings like '"B/A/203"' with extra quotes " everywhere. I read somewhere, that pandas allows to strip strings of extra " so I tried:
class StructureReader(object):
def __init__(self, filename):
self.filename=filename
def read(self):
self.data=pd.read_csv(StringIO(str("RNA/"+self.filename)), header=None, sep = ",")
self.data
but I get something like so:
<class 'pandas.core.frame.DataFrame'> 0
0 RNA/4v6p.csv
How can I get my CSV file into some kind of a data type that would allow me to search through columns and rows?
You are putting the string of the filename into your DataFrame, i.e. RNA/4v6p.csv is your data in location row 0, col 0. You need to read in the file and store the data. This can be done by removing StringIO(str(...)) in your class
class StructureReader(object):
def __init__(self, filename):
self.filename = filename
def read(self):
self.data = pd.read_csv("RNA/"+self.filename), header=None, sep = ",")
self.data
I would also recommend removing the parent directory from being hardcoded by
Always passing in a full file path
class StructureReader(object):
def __init__(self, filepath):
self.filepath = filepath
def read(self):
self.data = pd.read_csv(self.filepath), header=None, sep = ",")
self.data
Making the directory an __init__() argument
class StructureReader(object):
def __init__(self, directory, filename):
self.directory = directory
self.filename = filename
def read(self):
self.data=pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",")
# or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",")
self.data
Making the directory a constant attribute
class StructureReader(object):
def __init__(self, filename):
self.directory = "RNA"
self.filename = filename
def read(self):
self.data = pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",")
# or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",")
self.data
This has nothing to do with reading your data, just a best practice commentary on structuring you code (Just my $0.02).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With