Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are Python pandas equivalents for R functions like str(), summary(), and head()?

Tags:

python

pandas

r

I'm only aware of the describe() function. Are there any other functions similar to str(), summary(), and head()?

like image 741
megashigger Avatar asked Dec 24 '14 12:12

megashigger


People also ask

Is Python Pandas based on R?

There are clear points of similarity between both R and Python (pandas Dataframes were inspired by R dataframes, the rvest package was inspired by BeautifulSoup), and both ecosystems continue to grow stronger. In fact, it's remarkable how similar the syntax and approaches are for many common tasks in both languages.

What is str R in Python?

Python str() Function The str() function converts the specified value into a string.

What are the role of head () and tail () function in Pandas?

head() returns the first n rows(observe the index values). The default number of elements to display is five, but you may pass a custom number. tail() returns the last n rows(observe the index values).


2 Answers

In pandas the info() method creates a very similar output like R's str():

> str(train) 'data.frame':   891 obs. of  13 variables:  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...  $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...  $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...  $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...  $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...  $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...  $ Child      : num  0 0 0 0 0 NA 0 1 0 1 ...   train.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId    891 non-null int64 Survived       891 non-null int64 Pclass         891 non-null int64 Name           891 non-null object Sex            891 non-null object Age            714 non-null float64 SibSp          891 non-null int64 Parch          891 non-null int64 Ticket         891 non-null object Fare           891 non-null float64 Cabin          204 non-null object Embarked       889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6+ KB 
like image 71
reedcourty Avatar answered Sep 20 '22 15:09

reedcourty


This provides output similar to R's str(). It presents unique values instead of initial values.

def rstr(df): return df.shape, df.apply(lambda x: [x.unique()])  print(rstr(iris))  ((150, 5), sepal_length    [[5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.4, 4.8, 4.3,... sepal_width     [[3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 2.9, 3.7,... petal_length    [[1.4, 1.3, 1.5, 1.7, 1.6, 1.1, 1.2, 1.0, 1.9,... petal_width     [[0.2, 0.4, 0.3, 0.1, 0.5, 0.6, 1.4, 1.5, 1.3,... class            [[Iris-setosa, Iris-versicolor, Iris-virginica]] dtype: object) 
like image 29
jjurach Avatar answered Sep 17 '22 15:09

jjurach