Load a small random sample from a large csv file into R data frame

2 Answers

You can also just do it in the terminal with perl.

perl -ne 'print if (rand() < .01)' biglist.txt > subset.txt

This won't necessarily get you exactly 20,000 lines. (Here it'll grab about .01 or 1% of the total lines.) It will, however, be really really fast, and you'll have a nice copy of both files in your directory. You can then load the smaller file into R however you want.

163

answered Sep 18 '22 13:09

Jed

Try this based on examples 6e and 6f on the sqldf github home page:

Click to copy

library(sqldf)
DF <- read.csv.sql("x.csv", sql = "select * from file order by random() limit 20000")

See ?read.csv.sql using other arguments as needed based on the particulars of your file.

answered Sep 21 '22 13:09

G. Grothendieck

Related questions
                            
                                R markdown compile error:
                            
                                Pass multiple functions to purrr:map
                            
                                Button extension to download all data or only visible data
                            
                                Boxplot schmoxplot: How to plot means and standard errors conditioned by a factor in R?
                            
                                How to put in labels in list for R?
                            
                                Make y-axis logarithmic in histogram using R [duplicate]
                            
                                Could not find function inside clusterApply
                            
                                Mahalonobis distance in R, error: system is computationally singular
                            
                                Add comma to numbers every three digits in datatable (R)
                            
                                Using custom OTF fonts in ggplot2
                            
                                Convenient way to access variables label after importing Stata data with haven
                            
                                In R plotly subplot graph, how to show only one legend?
                            
                                R: combine several gsub() function in a pipe
                            
                                if_else() `false` must be type double, not integer - in R
                            
                                How to sort data by column in descending order in R
                            
                                max and min functions that are similar to colMeans
                            
                                Barplot with 2 variables side by side
                            
                                ggplot set scale_color_gradientn manually
                            
                                Add a popup with error, warning to shiny
                            
                                R shiny conditionalPanel output value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Load a small random sample from a large csv file into R data frame

Tags:

random

dataframe

r

csv

bigdata

P.Escondido

People also ask

2 Answers

Jed

G. Grothendieck

Recent Activity

Donate For Us