I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example. Let's suppose that we have a CSV file. <blockquote> <pre class="prettyprint"><code>in value d f 0 975 f01 5 1 976 F 4 2 977 d4 1 3 978 B6 0 4 979 2C 0 </code></pre> </blockquote> I want to select a specific columns. <pre class="prettyprint"><code>import pandas data = pandas.read_csv("ThisFile.csv") </code></pre> In order to select the first 2 columns I used <pre class="prettyprint"><code>data.ix[:,:2] </code></pre> In order to select different columns like the 2nd and the 4th. What should I do? There is another way to solve this problem by re-writing the CSV file. But it's huge file; So I am avoiding this way.

This selects the second and fourth columns (since Python uses 0-based indexing): <pre class="prettyprint"><code>In [272]: df.iloc[:,(1,3)] Out[272]: value f 0 975 5 1 976 4 2 977 1 3 978 0 4 979 0 [5 rows x 2 columns] </code></pre> <code>df.ix</code> can select by location or label. <code>df.iloc</code> always selects by location. When indexing by location use <code>df.iloc</code> to signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels. <hr> Another possibility is to use the <code>usecols</code> parameter: <pre class="prettyprint"><code>data = pandas.read_csv("ThisFile.csv", usecols=[1,3]) </code></pre> This will load only the second and fourth columns into the <code>data</code> DataFrame.

If you rather select column by name, you can use <pre class="prettyprint"><code>data[['value','f']] value f 0 975 5 1 976 4 2 977 1 3 978 0 4 979 0 </code></pre>

Select specific CSV columns (Filtering) - Python/pandas

Tags:

python

pandas

csv

I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example.

Let's suppose that we have a CSV file.

in  value   d     f
0    975   f01    5
1    976   F      4
2    977   d4     1
3    978   B6     0
4    979   2C     0

I want to select a specific columns.

import pandas
data = pandas.read_csv("ThisFile.csv")

In order to select the first 2 columns I used

data.ix[:,:2]

In order to select different columns like the 2nd and the 4th. What should I do?

There is another way to solve this problem by re-writing the CSV file. But it's huge file; So I am avoiding this way.

653

asked Mar 14 '14 01:03

user3378649

2 Answers

This selects the second and fourth columns (since Python uses 0-based indexing):

In [272]: df.iloc[:,(1,3)]
Out[272]: 
   value  f
0    975  5
1    976  4
2    977  1
3    978  0
4    979  0

[5 rows x 2 columns]

df.ix can select by location or label. df.iloc always selects by location. When indexing by location use df.iloc to signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels.

Another possibility is to use the usecols parameter:

data = pandas.read_csv("ThisFile.csv", usecols=[1,3])

This will load only the second and fourth columns into the data DataFrame.

152

answered Sep 24 '22 05:09

unutbu

If you rather select column by name, you can use

data[['value','f']]

   value  f
0    975  5
1    976  4
2    977  1
3    978  0
4    979  0

answered Sep 22 '22 05:09

Wai Yip Tung

Related questions
                            
                                Handling extra newlines (carriage returns) in csv files parsed with Python?
                            
                                Replacement for getstatusoutput in Python 3
                            
                                sorting multiple lists based on a single list in python
                            
                                regex - match character which is not escaped
                            
                                error of install numpy on linux red hat
                            
                                Why doesn't 2.__add__(3) work in Python?
                            
                                How to read UWSGI parameters in python/flask passed from nginx
                            
                                Python Unicode Encode Error ordinal not in range<128> with Euro Sign
                            
                                Python MySQLdb iterate through table
                            
                                Error in GAE with ndb - BadQueryError: Cannot convert FalseNode to predicate
                            
                                Can't pretty print json from python
                            
                                In the Pyramid web framework, how do I source sensitive settings into development.ini / production.ini from an external file?
                            
                                Same value for id(float)
                            
                                Using window functions to LIMIT a query with SqlAlchemy on Postgres
                            
                                Creating DataFrame with Hierarchical Columns
                            
                                how to install cloud9 IDE on ubuntu server
                            
                                Python os.stat(file_name).st_size versus os.path.getsize(file_name)
                            
                                extrapolating data with numpy/python
                            
                                Python - is there any way to organize a group of yields in sub function to yield outside the main function?
                            
                                Matrix multiplication, solve Ax = b solve for x

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With