How to count number of spaces just after the date information?

Question

I have unstructured data that look like this:

data <- c("24-March-2017      product 1              color 1",
"March-2017-24              product 2                 color 2",
"2017-24-March  product 3              color 3")

I would like to count number of spaces between the date and the first character (product column) for each line. As shown in the sample data, the date format can vary. This information will be used to put the data into structured format.

What is the best way to perform this in R? I believe gsub can be used in this case, just not sure how to apply to count only number of spaces at the beginning of each line.

sinQueso · Accepted Answer

One approach would be to use regexpr that will return information about the first match of a given regular expression. In your case, you are looking for the first instance of a repeated white space. So, the following would tell you (1) where in your string you'll find the first white spaces, and (2) in the attributes how many white spaces you have:

regexpr("\s+", data)
# [1] 14 14 14
# attr(,"match.length")
# [1]  6 14  2
# attr(,"useBytes")
# [1] TRUE

You can then use attr to extract the match.length attribute:

attr(regexpr("\s+", data), "match.length")

EDIT

As pointed out by @xehpuk, using \s+ will match at least one space. If your date column contained spaces that could be problematic. Instead you'd need to use \s{2,}.

Rich Scriven · Answer

You can sub out that section, then take the number of characters.

nchar(sub("\S+(\s+).*", "\1", data))
# [1]  6 14  2

Or this one is kinda fun:

nchar(data) - nchar(sub("\s+", "", data))
# [1]  6 14  2

How to count number of spaces just after the date information?

Tags:

r

Curious

2 Answers

sinQueso

Rich Scriven

Recent Activity

Donate For Us

How to count number of spaces just after the date information?

Tags:

r

Curious

2 Answers

sinQueso

Rich Scriven

Related questions

Recent Activity

Donate For Us