Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a dictionary for a large data frame in R?

Tags:

r

I read the answers about creating dictionary in r.

equivalent of a python dict in R

Is there a dictionary functionality in R

And I have a question: how could I use this in a large dataset? Data structure is like this:

enter image description here

dput of a subsample is:

structure(list(...1 = c("category 1", NA, NA, NA, "total", "category 2", 
NA, NA, NA, "total"), Items = c("product 1", "product 2", "product 3", 
"product 4", NA, "product 1", "product 2", "product 3", "product 4", 
NA), price = c(1, 2, 3, 4, 10, 3, 4, 5, 6, 18)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

And I want the result be like:

categoryx: {prodcut1:1, product2:2, product3:3....}

What could I do if the there are 1000 categories and the number of products for each category is different? The answers in above two links, values of each key should be added manually, I don't how to use it for a large dataset.

Or is there other method (except create dictionaries) that could let me extract information of each category easily?

Could someone give ideas about this question? Thanks.

Is it possible to have a result like a dictionary(or list) of dictionaries in python?

such as dict={category1: {prodcut1:1, product2:2, product3:3....}, category2: {prodcut1:3, product2:4, product3:5....} }

So I could know categories's index and use the index to extract information from dict, and maybe it is like such a dataframe:

            item      price

categoryx    product1   2
             product2   3

so I could do operations for specific category?

like image 412
ling Avatar asked Nov 25 '25 09:11

ling


1 Answers

A list of hashmap dictionaries:

dat <-
  structure(
    list(
      ...1 = c("category 1", NA, NA, NA, "total", "category 2",
               NA, NA, NA, "total"),
      Items = c(
        "product 1",
        "product 2",
        "product 3",
        "product 4",
        NA,
        "product 1",
        "product 2",
        "product 3",
        "product 4",
        NA
      ),
      price = c(1, 2, 3, 4, 10, 3, 4, 5, 6, 18)
    ),
    row.names = c(NA,-10L),
    class = c("tbl_df", "tbl", "data.frame")
  )

library(hashmap)

dat_clean <- tidyr::fill(dat[!is.na(dat[["Items"]]), ], 1)

list_of_dicts <- lapply(split(dat_clean, dat_clean[[1]]), function(d){
  hashmap(d[["Items"]], d[["price"]])  
})

list_of_dicts
# $`category 1`
# ## (character) => (numeric)  
# ## [product 1] => [+1.000000]
# ## [product 3] => [+3.000000]
# ## [product 4] => [+4.000000]
# ## [product 2] => [+2.000000]
# 
# $`category 2`
# ## (character) => (numeric)  
# ## [product 1] => [+3.000000]
# ## [product 3] => [+5.000000]
# ## [product 4] => [+6.000000]
# ## [product 2] => [+4.000000]


# get totals:
lapply(list_of_dicts, function(dict){
  sum(dict$values())
})
# $`category 1`
# [1] 10
# 
# $`category 2`
# [1] 18
like image 83
Stéphane Laurent Avatar answered Nov 27 '25 22:11

Stéphane Laurent



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!