Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I insert data from a CSV file into a dataframe using pandas.read_csv?

I have a csv file like:

"B/G/213","B/C/208","WW_cis",,
"B/U/215","B/A/206","WW_cis",,
"B/C/214","B/G/207","WW_cis",,
"B/G/217","B/C/204","WW_cis",,
"B/A/216","B/U/205","WW_cis",,
"B/C/219","B/G/202","WW_cis",,
"B/U/218","B/A/203","WW_cis",,
"B/G/201","B/C/220","WW_cis",,
"B/A/203","B/U/218","WW_cis",,

and I want to read it into something like an array or dataframe, so that I would be able to compare elements from one column to selected elements from another columns. At first, I have read it straight into an array using numpy.genfromtxt, but I got stings like '"B/A/203"' with extra quotes " everywhere. I read somewhere, that pandas allows to strip strings of extra " so I tried:

class StructureReader(object):
    def __init__(self, filename):
        self.filename=filename
    def read(self):
        self.data=pd.read_csv(StringIO(str("RNA/"+self.filename)), header=None, sep = ",")
        self.data

but I get something like so:

<class 'pandas.core.frame.DataFrame'> 0 0 RNA/4v6p.csv

How can I get my CSV file into some kind of a data type that would allow me to search through columns and rows?

like image 207
Leukonoe Avatar asked Feb 28 '26 13:02

Leukonoe


1 Answers

Data Insert

You are putting the string of the filename into your DataFrame, i.e. RNA/4v6p.csv is your data in location row 0, col 0. You need to read in the file and store the data. This can be done by removing StringIO(str(...)) in your class

class StructureReader(object):
    def __init__(self, filename):
        self.filename = filename
    def read(self):
        self.data = pd.read_csv("RNA/"+self.filename), header=None, sep = ",")
        self.data

Code structure critique

I would also recommend removing the parent directory from being hardcoded by

  1. Always passing in a full file path

    class StructureReader(object):
        def __init__(self, filepath):
            self.filepath = filepath
        def read(self):
            self.data = pd.read_csv(self.filepath), header=None, sep = ",")
            self.data
    
  2. Making the directory an __init__() argument

    class StructureReader(object):
        def __init__(self, directory, filename):
            self.directory = directory
            self.filename = filename
        def read(self):
            self.data=pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",")
            # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",")
            self.data
    
  3. Making the directory a constant attribute

    class StructureReader(object):
        def __init__(self, filename):
            self.directory = "RNA"
            self.filename = filename
        def read(self):
            self.data = pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",")
            # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",")
            self.data
    

This has nothing to do with reading your data, just a best practice commentary on structuring you code (Just my $0.02).

like image 88
tmthydvnprt Avatar answered Mar 03 '26 03:03

tmthydvnprt



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!