Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to expand Posixct field in R str()?

Tags:

r

posixct

I am trying to expand the amount of factors shown in one custom Posixct field where the normal way (str(DF, list.len=ncol(DF), vec.len=20)) does not work. I request here 20 but it shows all the time two ("2017-01-01 08:40:00" "2017-01-01 08:50:00" ...) regardless the length of the list (here 3). Data data.csv

"AAA", "BBB"
1, 01012017-0940+0100
2, 01012017-0950+0100
3, 01012017-0838+0100

Code

library('methods') # setClass

# https://unix.stackexchange.com/a/363290/16920
setClass('iso8601')

# https://stackoverflow.com/questions/5788117/only-read-limited-number-of-columns
setAs("character","iso8601",function(from) strptime(from,format="%d%m%Y-%H%M%z"))

DF <- read.csv(file='data.csv',
        sep=',',
        header=TRUE,
        colClasses=c('numeric','iso8601'),
        strip.white=TRUE)

DF

str(DF, list.len=ncol(DF), vec.len=20)

Output in R 3.3.3

 AAA                 BBB
1  1 2017-01-01 08:40:00
2  2 2017-01-01 08:50:00
3  3 2017-01-01 07:38:00
'data.frame':  3 obs. of  2 variables:
 $ AAA : num  1 2 3
 $ BBB : POSIXlt, format: "2017-01-01 08:40:00" "2017-01-01 08:50:00" ...

Output in R 3.4.0

Same as above, reproducing the same problem.

  AAA                 BBB
1   1 2017-01-01 08:40:00
2   2 2017-01-01 08:50:00
3   3 2017-01-01 07:38:00
'data.frame':   3 obs. of  2 variables:
 $ AAA: num  1 2 3
 $ BBB: POSIXlt, format: "2017-01-01 08:40:00" "2017-01-01 08:50:00" ...
  1. How can you expand str(DF, list.len=ncol(DF), vec.len=20) to many factors per variable?

  2. How can you show the amount of items per variable in str(DF)? Etc without the expansion of the parameters itself in the variable.

Eliminate terminal width and column factor in etiology

I did

  1. increased the defaults: width from 80 to 150, and columns from 24 to 38
  2. restarted the terminal prompt
  3. run Rscript myScript.r
  4. Output same again so the terminal width and column amount do not seem to play a factor here

Roland's proposal

The code does not work in all occasions, but in limited number of cases, so it should be possible apply it dynamically

# Roland's comment
str(DF, list.len=ncol(DF), vec.len=20, width = 100)

R: 3.3.3, 3.4.0 (2017-04-21, backports)
OS: Debian 8.7
Window manager: Gnome 3.14.1

like image 798
Léo Léopold Hertz 준영 Avatar asked May 17 '17 13:05

Léo Léopold Hertz 준영


1 Answers

Proposal width

In order to achieve "wider" output, you can change default width in R options.

According to options {base} help:

width:

controls the maximum number of columns on a line used in printing vectors, matrices and arrays, and when filling by cat.

Here is an example:
# initial try
str(DF, list.len=ncol(DF), vec.len=20)

it gives:

    'data.frame':   3 obs. of  2 variables:
 $ AAA: num  1 2 3
 $ BBB: POSIXlt, format: "2017-01-01 11:40:00" "2017-01-01 11:50:00" ...

Proposal options(width)

And now, with different width:

# retain default options
op <- options()

# set apropriate width
n_cols <- 22 * 20 # n columns for 20 POSIXlt strings
n_cols <- n_cols + 50 # 50 columns for column description
# actually you can use any sufficiently big number
# for example n_cols = 1000
options(width = n_cols)
str(DF, list.len=ncol(DF), vec.len=20)
options(op)

The result is:

'data.frame':   3 obs. of  2 variables:
 $ AAA: num  1 2 3
 $ BBB: POSIXlt, format: "2017-01-01 11:40:00" "2017-01-01 11:50:00" "2017-01-01 10:38:00"

Roland's width parameter

It seems like you can achieve this as well with width parameter in str. Just as Roland suggested. But again you have to provide big enough value for output. 1 POSIXlt string contains 21 characters + whitespace. So for 20 strings, you need more than 440 columns.

Three parameter approach

I have tried it with your example:

DF <- rbind(DF, DF, DF) # nrows = 24

# Calculate string width
string_size <- nchar(as.character(DF[1, 2])) + 3 # string width + "" and \w
N <- 20 # number of items
n_cols <- string_size * N

str(DF, list.len=ncol(DF), vec.len=20, width = n_cols)

Output:

'data.frame':   24 obs. of  2 variables:
 $ AAA: num  1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
 $ BBB: POSIXlt, format: "2017-01-01 11:40:00" "2017-01-01 11:50:00" "2017-01-01 10:38:00" "2017-01-01 11:40:00" "2017-01-01 11:50:00" "2017-01-01 10:38:00" "2017-01-01 11:40:00" "2017-01-01 11:50:00" "2017-01-01 10:38:00" "2017-01-01 11:40:00" "2017-01-01 11:50:00" "2017-01-01 10:38:00" "2017-01-01 11:40:00" "2017-01-01 11:50:00" "2017-01-01 10:38:00" "2017-01-01 11:40:00" "2017-01-01 11:50:00" "2017-01-01 10:38:00" "2017-01-01 11:40:00" "2017-01-01 11:50:00" ...

There are exactly 20 POSIXlt strings.

Explanation

The problem with output arises from utils:::str.POSIXt method which is called for POSIXlt object. The interesting part is in next line:

larg[["vec.len"]] <- min(larg[["vec.len"]], (larg[["width"]] - 
                nchar(larg[["indent.str"]]) - 31)%/%19)

This line computes the number of POSIXlt strings in output. Roughly saying output will consist of NOT more than vec.len POSIXlt strings AND the length of output in characters will be NOT more than width.

Here, larg is a list of arguments passed to str. By default they are: vec.len = 4; width = 80; indent.str = " ".

So, the recomputed vec.len by default will be 2.

As to last example, we set vec.len = 20, width = 440 and our data frame has 24 rows. Recomputed vec.length is 20. So the output str(DF) contains 20 POSIXlt strings and tailed with '...', which means that there are more than 20 elements in the POSIXlt vector.

like image 170
Istrel Avatar answered Sep 21 '22 12:09

Istrel