Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading PISA data into R - read.table error

Tags:

r

read.table

pisa

I am trying to read data from the PISA 2012 study (http://pisa2012.acer.edu.au/downloads.php) into R using the read.table function. This is the code I tried:

pisa  <- read.table("pisa2012.txt", sep = "")    

unfortunately I keep getting the following error message:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
: line 2 did not have 184 elements    

I have tried to set

header = T

but then get the following error message

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
 :line 1 did not have 184 elements

Lastly, this is what the .txt file looks like ...

http://postimg.org/image/4u9lqtxqd/

Thanks for your help!

like image 459
sascha91 Avatar asked Jun 26 '26 18:06

sascha91


2 Answers

You can see from the first line that you'll need some sort of control file to delimit the individual variables. So, from working with PISA in other environments, I know the first three columns corrrespond to the ISO 3 letter country code (e.g., ALB). What follows are numbers and letters that need to be made sense of in a meaninful way by separating them. You could use the codebook for this (https://pisa2012.acer.edu.au/downloads/M_stu_codebook.pdf), but that is a real bear for every single variable. Why not download in SPSS or sAS and import? Not a 'slick' solution, but without a control file, you'd have a lot of manual work to do.

like image 109
Leslie Avatar answered Jun 29 '26 17:06

Leslie


I just read the files using readr package. So what will you need: readr package, the TXT file, SAScii package and the associated sas file.

So, let say you want to read the student files. Then you will need the following files: INT_STU12_DEC03.txt and INT_STU12_DEC03.sas.

##################### READING STUDENT DATA  ###################
## Loading the dictionary
dic_student = parse.SAScii(sas_ri = 'INT_STU12_SAS.sas')

## Creating the positions to read_fwf
student <- read_fwf(file = 'INT_STU12_DEC03.txt', col_positions = fwf_widths(dic_student$width), progress = T)
colnames(student) <- dic_student$varname

OBS 1: As i'm using Linux, I needed to delete the first lines from the sas file and change the encoding to UTF-8.

OBS 2: The lines deleted, were:

libname  M_DEC03 "C:\XXX"; 
filename STU "C:\XXX\INT_STU12_DEC03.txt"; 
options nofmterr;

OBS 3: The dataset takes about 1Gb, so you will need enougth RAM.

like image 35
Flavio Barros Avatar answered Jun 29 '26 17:06

Flavio Barros



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!