I have read loaded a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt: <pre class="prettyprint"><code>names = ['A','B','C','D'] dataset = pandas.read_csv('file.csv', names=names) new_dataset = dataset['A','D'] </code></pre> I would like to create a new dataframe with the columns A and D from the original dataframe.

It is called <code>subset</code> - passed list of columns in <code>[]</code>: <pre class="prettyprint"><code>dataset = pandas.read_csv('file.csv', names=names) new_dataset = dataset[['A','D']] </code></pre> what is same as: <pre class="prettyprint"><code>new_dataset = dataset.loc[:, ['A','D']] </code></pre> If need only filtered output add parameter <code>usecols</code> to <code>read_csv</code>: <pre class="prettyprint"><code>new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D']) </code></pre> EDIT: If use only: <pre class="prettyprint"><code>new_dataset = dataset[['A','D']] </code></pre> and use some data manipulation, obviously get: <blockquote> A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead </blockquote> If you modify values in <code>new_dataset</code> later you will find that the modifications do not propagate back to the original data (<code>dataset</code>), and that Pandas does warning. As pointed EdChum add <code>copy</code> for remove warning: <pre class="prettyprint"><code>new_dataset = dataset[['A','D']].copy() </code></pre>

Creating new pandas dataframe from certain columns of existing dataframe

Tags:

I have read loaded a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:

names = ['A','B','C','D'] dataset = pandas.read_csv('file.csv', names=names) new_dataset = dataset['A','D']

I would like to create a new dataframe with the columns A and D from the original dataframe.

563

asked Jul 11 '17 13:07

Sjoseph

1 Answers

It is called subset - passed list of columns in []:

dataset = pandas.read_csv('file.csv', names=names)  new_dataset = dataset[['A','D']]

what is same as:

new_dataset = dataset.loc[:, ['A','D']]

If need only filtered output add parameter usecols to read_csv:

new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])

EDIT:

If use only:

new_dataset = dataset[['A','D']]

and use some data manipulation, obviously get:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

If you modify values in new_dataset later you will find that the modifications do not propagate back to the original data (dataset), and that Pandas does warning.

As pointed EdChum add copy for remove warning:

new_dataset = dataset[['A','D']].copy()

136

answered Oct 21 '22 15:10

jezrael

Related questions
                            
                                Logging in AWS Lambda with slf4j
                            
                                Angular 2/4 - routerLinkActive not working properly
                            
                                Spring boot use resources templates folder with JSP templates instead of webapp folder?
                            
                                Scrollable table with fixed columns and header, with modern CSS
                            
                                Rails render head vs. status
                            
                                Why does isNil method in Lodash use null instead of undefined?
                            
                                Passing ngFor variable to an ngIf template
                            
                                Fetching all collections in Firestore
                            
                                Angular 4: reactive form control is stuck in pending state with a custom async validator
                            
                                VSCode Ctrl + Click
                            
                                How to see progress of Dask compute task?
                            
                                Python error : TypeError: Object of type 'Timestamp' is not JSON serializable'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With