Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe, keep only one column

I can't find the pandas function that returns a one column Dataframe from a multi column DF. I need the exact opposit function of the drop(['']) one.

Any ideas?

like image 529
Baptiste Arnaud Avatar asked Aug 17 '17 15:08

Baptiste Arnaud


People also ask

How do I keep one column in a DataFrame?

This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.

How do you keep only few columns in Pandas?

You can use the following syntax to exclude columns in a pandas DataFrame: #exclude column1 df. loc[:, df. columns!='

How do I show only one column in Pandas?

In Pandas, we can select a single column with just using the index operator [], but without list as argument. However, the resulting object is a Pandas series instead of Pandas Dataframe. For example, if we use df['A'], we would have selected the single column as Pandas Series object.


2 Answers

You can use the following notation to return a single column dataframe:

df = pd.DataFrame(data=np.random.randint(1, 100 ,(10, 5)), columns=list('ABCDE'))

df_out = df[['C']]

Output:

    C
0  65
1  48
2   1
3  41
4  85
5  55
6  45
7  10
8  44
9  11

Note: df['C'] returns a series. And, you can use the to_frame method to convert that series into a dataframe. Or use the double brackets, [[]].

like image 60
Scott Boston Avatar answered Oct 18 '22 23:10

Scott Boston


For completeness, I would like to show how we can use the parameter drop to obtain a one column dataframe from a multicolumn one. Also, I explain the result using the tidyverse universe (paper).

Working with a minimal example for a dataframe DF

library(tidyverse)

DF <- data.frame(a = 1:2, b = c("e", "f"))
str(DF)
#> 'data.frame':    2 obs. of  2 variables:
#>  $ a: int  1 2
#>  $ b: chr  "e" "f"

By the way, note that in R versions lower than 4.0, column b would be a factor by default (unless we use stringsAsFactors= FALSE)

Operator [ returns a list (dataframe) as it preserves the original structure (dataframe)

DF[1]
#>   a
#> 1 1
#> 2 2

DF['a']
#>   a
#> 1 1
#> 2 2

On the other hand, operator [[ simplifies the result to the simplest structure possible, a vector for a one-column dataframe. In the three expressions of it, you always get the simplified version (a vector)

DF[[1]]
#> [1] 1 2

DF[['a']]
#> [1] 1 2

DF$a
#> [1] 1 2

Finally, using [ with row and column dimension

DF[, 1]
#> [1] 1 2

also returns the simplified version because the parameter drop is set to TRUE by default. Setting it to FALSE, you preserve the structure and obtain a one-column dataframe

DF[, 1, drop = FALSE]
#>   a
#> 1 1
#> 2 2

A good explanation of this point can be found at: Advanced R by Hadley Wickham, CRC, 2015, section 3.2.1 or section 4.2.5 in the on-line version of the book (June 2021)

Finally, within the tidyverse universe CRAN, you always obtain a dataframe (tibble) when selecting one column

DF %>% 
  select(2)
#>   b
#> 1 e
#> 2 f

DF %>% 
  select("a")
#>   a
#> 1 1
#> 2 2

DF %>% 
  select(a)
#>   a
#> 1 1
#> 2 2

Created on 2021-06-04 by the reprex package (v0.3.0)

like image 35
josep maria porrà Avatar answered Oct 19 '22 00:10

josep maria porrà