Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce a data frame into single row with vectors

I have this DF

  email       date      user_ipaddress       other data    
1 [email protected] 2020-03-24  177.95.75.230         xxxx
2 [email protected] 2020-04-02  177.139.49.93         yyyy
3 [email protected] 2020-04-02  177.139.49.93         zzzz

and I want to transform this data to the shape it is going to be stored

the whole problem would be a big data frame with distinct emails and I want to reduce all data for each email in a single row like so

  email       date      user_ipaddress                       other data    
1 [email protected] 2020-04-02  c('177.95.75.230','177.139.49.93')   c('xxxx','yyyy','zzzz') 

actually, if someone could help me only with the case that there is only one email address it would save my life but feel free to help with the whole problem

using

ipadreessVec<-Reduce(append,x =df$network_userid) 

I can get my vector c('177.95.75.230','177.139.49.93') but if i try to make

newdf$network_userid<-a

I get

Error in `$<-.data.frame`(`*tmp*`, network_userid, value = c("20562206-f557-48a3-861b-5d1e18524bbb",  : 
  replacement has 3 rows, data has 1

any answer that makes me go a step further will get a approve even if it does not solve everything.

like image 905
fils capo Avatar asked Dec 18 '22 13:12

fils capo


1 Answers

We can create a list column grouped by 'email', 'date'

library(dplyr)
DF %>% 
    group_by(email, date) %>%
    summarise_all(list)
# A tibble: 2 x 4
# Groups:   email [1]
#  email     date       user_ipaddress otherdata
#  <chr>     <chr>      <list>         <list>   
#1 [email protected] 2020-03-24 <chr [1]>      <chr [1]>
#2 [email protected] 2020-04-02 <chr [2]>      <chr [2]>

Or in the devel version use across with summarise

DF %>%
   group_by(email, date) %>% 
   summarise(across(everything(), list))
# A tibble: 2 x 4
# Groups:   email [1]
#  email     date       user_ipaddress otherdata
#  <chr>     <chr>      <list>         <list>   
#1 [email protected] 2020-03-24 <chr [1]>      <chr [1]>
#2 [email protected] 2020-04-02 <chr [2]>      <chr [2]>

data

DF <- structure(list(email = c("[email protected]", "[email protected]", "[email protected]"
), date = c("2020-03-24", "2020-04-02", "2020-04-02"),
 user_ipaddress = c("177.95.75.230", 
"177.139.49.93", "177.139.49.93"),
otherdata = c("xxxx", "yyyy", 
"zzzz")), class = "data.frame", row.names = c("1", "2", "3"))
like image 140
akrun Avatar answered Dec 31 '22 23:12

akrun