I have a csv file with the name params.csv
. I opened up ipython qtconsole
and created a pandas dataframe
using:
import pandas
paramdata = pandas.read_csv('params.csv', names=paramnames)
where, paramnames
is a python list of string objects. Example of paramnames
(the length of actual list is 22):
paramnames = ["id",
"fc",
"mc",
"markup",
"asplevel",
"aspreview",
"reviewpd"]
At the ipython prompt if I type paramdata
and press enter then I do not get the dataframe with columns and values as shown in examples on Pandas website. Instead, I get information about the dataframe. I get:
In[35]: paramdata
Out[35]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 59 entries, 0 to 58
Data columns:
id 59 non-null values
fc 59 non-null values
mc 59 non-null values
markup 59 non-null values
asplevel 59 non-null values
aspreview 59 non-null values
reviewpd 59 non-null values
If I type paramdata['mc']
then I do get the values as expected for the mc
column. I have two questions:
(1) In the examples on the pandas website (see, for example, the output of df
here: http://pandas.sourceforge.net/indexing.html#additional-column-access) typing the name of the dataframe gives the actual data. Why am I getting information about the dataframe as shown above instead of the actual data? Do I need to set some output options somewhere?
(2) How do I output all columns in the dataframe to the screen without having to type their names, i.e., without having to type something like paramdata[['id','fc','mc']]
.
I am using pandas version 0.8.
Thank you.
Pandas DataFrame. duplicated() function is used to get/find/select a list of all duplicate rows(all or selected columns) from pandas. Duplicate rows means, having multiple rows on all columns. Using this method you can get duplicate rows on selected multiple columns or all columns.
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
Find Duplicate Rows based on all columns To find & select the duplicate all rows based on all columns call the Daraframe. duplicate() without any subset argument. It will return a Boolean series with True at the place of each duplicated rows except their first occurrence (default value of keep argument is 'first').
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
Use:
pandas.set_option('display.max_columns', 7)
This will force Pandas to display the 7 columns you have. Or more generally:
pandas.set_option('display.max_columns', None)
which will force it to display any number of columns.
Explanation: the default for max_columns
is 0
, which tells Pandas to display the table only if all the columns can be squeezed into the width of your console.
Alternatively, you can change the console width (in chars) from the default of 80 using e.g:
pandas.set_option('display.width', 200)
There is too much data to be displayed on the screen, therefore a summary is displayed instead.
If you want to output the data anyway (it won't probably fit on a screen and does not look very well):
print paramdata.values
converts the dataframe to its numpy-array matrix representation.
paramdata.columns
stores the respective column names and
paramdata.index
stores the respective index (row names).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With