Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read files with extension .data into R

Tags:

r

dataset

I need to read a data file into R for my assignment. You can download it from the following site.

http://archive.ics.uci.edu/ml/datasets/Acute+Inflammations

The data file ends with an extension .data which I never see before. I tried read.table and alike but could not read it into R properly. Can anyone help me with this, please?

like image 378
LaTeXFan Avatar asked Jan 13 '14 21:01

LaTeXFan


People also ask

How do I read a .data file in R?

Reading R Data FilesThe command > ls() can be used to print out all of the objects currently loaded into R. The readRDS function will restore a single R object. In this example, this object was assigned a new name of dataRDS.

What is a .data file extension?

A DATA file is a data file used by Analysis Studio, a statistical analysis and data mining program. It contains mined data in a plain text, tab-delimited format, including an Analysis Studio file header.


2 Answers

It's a UTF-16 little endian file with a byte order mark at the beginning. read.table will fail unless you specify the correct encoding. This works for me on MacOS. Decimals are indicated by a comma.

read.table("diagnosis.data", fileEncoding="UTF-16", dec=",")

      V1  V2  V3  V4  V5  V6  V7  V8
1   35.5  no yes  no  no  no  no  no
2   35.9  no  no yes yes yes yes  no
3   35.9  no yes  no  no  no  no  no
like image 150
Mark Heckmann Avatar answered Sep 22 '22 03:09

Mark Heckmann


From your link:

The data is in an ASCII file. Attributes are separated by TAB.

Thus you need to use read.table() with sep = "\t"

-- Attribute lines: For example, '35,9 no no yes yes yes yes no' Where: '35,9' Temperature of patient 'no' Occurrence of nausea 'no' Lumbar pain 'yes' Urine pushing (continuous need for urination) 'yes' Micturition pains 'yes' Burning of urethra, itch, swelling of urethra outlet 'yes' decision: Inflammation of urinary bladder 'no' decision: Nephritis of renal pelvis origin

Also looks like it uses a comma for the decimal, so also specify dec = "," inside read.table().

It looks like you'll need to put in the column headings manually, though your link defines them.

Make sure you see @Gavin Simpson's comment below to clean up other undocumented "features" of this dataset.

like image 36
Gregor Thomas Avatar answered Sep 20 '22 03:09

Gregor Thomas