Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete the first row of a dataframe in R?

Tags:

r

dataset

I have a dataset with 11 columns with over a 1000 rows each. The columns were labeled V1, V2, V11, etc.. I replaced the names with something more useful to me using the "c" command. I didn't realize that row 1 also contained labels for each column and my actual data starts on row 2.

Is there a way to delete row 1 and decrement?

like image 384
akz Avatar asked Sep 24 '11 20:09

akz


People also ask

How do I remove the first column from a Dataframe in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.

How do I remove a line in R?

You can print a backspace, and it will delete a character on the same line. Printing a RET character "\r" (in some front ends) will take you to the start of line to overwrite it.


2 Answers

Keep the labels from your original file like this:

df = read.table('data.txt', header = T) 

If you have columns named x and y, you can address them like this:

df$x df$y 

If you'd like to actually delete the first row from a data.frame, you can use negative indices like this:

df = df[-1,] 

If you'd like to delete a column from a data.frame, you can assign NULL to it:

df$x = NULL 

Here are some simple examples of how to create and manipulate a data.frame in R:

# create a data.frame with 10 rows > x = rnorm(10) > y = runif(10) > df = data.frame( x, y )  # write it to a file > write.table( df, 'test.txt', row.names = F, quote = F )  # read a data.frame from a file:  > read.table( df, 'test.txt', header = T )  > df$x  [1] -0.95343778 -0.63098637 -1.30646529  1.38906143  0.51703237 -0.02246754  [7]  0.20583548  0.21530721  0.69087460  2.30610998 > df$y  [1] 0.66658148 0.15355851 0.60098886 0.14284576 0.20408723 0.58271061  [7] 0.05170994 0.83627336 0.76713317 0.95052671  > df$x = x > df             y           x 1  0.66658148 -0.95343778 2  0.15355851 -0.63098637 3  0.60098886 -1.30646529 4  0.14284576  1.38906143 5  0.20408723  0.51703237 6  0.58271061 -0.02246754 7  0.05170994  0.20583548 8  0.83627336  0.21530721 9  0.76713317  0.69087460 10 0.95052671  2.30610998  > df[-1,]             y           x 2  0.15355851 -0.63098637 3  0.60098886 -1.30646529 4  0.14284576  1.38906143 5  0.20408723  0.51703237 6  0.58271061 -0.02246754 7  0.05170994  0.20583548 8  0.83627336  0.21530721 9  0.76713317  0.69087460 10 0.95052671  2.30610998  > df$x = NULL > df              y 1  0.66658148 2  0.15355851 3  0.60098886 4  0.14284576 5  0.20408723 6  0.58271061 7  0.05170994 8  0.83627336 9  0.76713317 10 0.95052671 
like image 167
James Thompson Avatar answered Oct 14 '22 04:10

James Thompson


You can use negative indexing to remove rows, e.g.:

dat <- dat[-1, ] 

Here is an example:

> dat <- data.frame(A = 1:3, B = 1:3) > dat[-1, ]   A B 2 2 2 3 3 3 > dat2 <- dat[-1, ] > dat2   A B 2 2 2 3 3 3 

That said, you may have more problems than just removing the labels that ended up on row 1. It is more then likely that R has interpreted the data as text and thence converted to factors. Check what str(foo), where foo is your data object, says about the data types.

It sounds like you just need header = TRUE in your call to read in the data (assuming you read it in via read.table() or one of it's wrappers.)

like image 39
Gavin Simpson Avatar answered Oct 14 '22 05:10

Gavin Simpson