How can I find out the internal code representation of a WINDOWS-1252 character?

Tags:

r

I am processing SPSS data from a questionnaire that must have originated in M$ Word. Word automatically changes hyphens into long hyphens, and gets converted into characters that don't display properly, i.e. "-" turns into "ú".

My question: What is the equivalent to utf8ToInt() in the WINDOWS-1252 character set?

utf8ToInt("A")
[1] 65

When I do this with my own data, I get an error:

x <- str_sub(levels(sd$j1)[1], 7, 7)
print(x)
[1] "ú"

utf8ToInt(x)
Error in utf8ToInt(x) : invalid UTF-8 string

However, the contents of x are perfectly usable in grep and gsub expressions.

> Sys.getlocale()
[1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"

870

asked Mar 05 '11 16:03

Andrie

2 Answers

If you load the SPSS sav file via read.spss form package foreign, you could easily import the data frame with correct encoding via specifying the encoding like:

read.spss("foo.sav", reencode="CP1252")

106

answered Oct 29 '22 11:10

daroczig

After some head-scratching, lots of reading help files and trial-and-error, I created two little functions that does what I need. These functions work by converting their input into UTF-8 and then returning the integer vector for the UTF-8 encoded character vector, and vice versa.

# Convert character to integer vector
# Optional encoding specifies encoding of x, defaults to current locale
encToInt <- function(x, encoding=localeToCharset()){
    utf8ToInt(iconv(x, encoding, "UTF-8"))
}

# Convert integer vector to character vector
# Optional encoding specifies encoding of x, defaults to current locale
intToEnc <- function(x, encoding=localeToCharset()){
    iconv(intToUtf8(x), "utf-8",  encoding)
}

Some examples:

x <- "\xfa"
encToInt(x)
[1] 250

intToEnc(250)
[1] "ú"

answered Oct 29 '22 10:10

Andrie

Related questions
                            
                                converting a dgCMatrix to data frame
                            
                                What is causing this error? Coefficients not defined because of singularities
                            
                                Adding boxplot below density plot
                            
                                Equivalent for Stata's egen group() function
                            
                                How to make plot title partly bold?
                            
                                What's a tidyverse approach to iterating over rows in a data frame when vectorisation is not feasible?
                            
                                Scoping and evaluating functions in R
                            
                                Issues compiling Rpackage: error in asNamespace(ns) using Rcpp
                            
                                I want to apply two functions one function on the block diagonal and the second function on the off-diagonal elements in the data frame
                            
                                Dividing selected columns by vector in dplyr
                            
                                R: Using dplyr to count number of occurence 1 hour ahead
                            
                                R: Group Similar Addresses Together
                            
                                How to prevent ggplot2 from combining key glyphs? [duplicate]
                            
                                Consistent way to overlay data on histogram (extracting the binned data from geom_histogram?)
                            
                                converting a matrix of lists to a regular matrix
                            
                                How do I email myself data from a R script?
                            
                                Best device for SVG graphics in R? [closed]
                            
                                How can I use xpath querying using R's XML library?
                            
                                warnings() does not work within a function? How can one work around this?
                            
                                What does a R multi-part formula mean in mathematical terms?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I find out the internal code representation of a WINDOWS-1252 character?

Tags:

r

Andrie

People also ask

2 Answers

daroczig

Andrie

Recent Activity

Donate For Us