How to pull out specific rows from two data frames with different dimensions and produce multiple .csv files?

Tags:

r

Data frame one.

  structure(list(trial_id = c(2022L, 2023L, 2123L, 2184L, 3883L, 
4434L), ctri_number = c("CTRI/2018/02/011794 ", "CTRI/2017/08/009517 ", 
"CTRI/2019/05/019036 ", "CTRI/2017/12/010935 ", "CTRI/2017/09/009746 ", 
"CTRI/2016/06/007055 "), name = c("National Institute of Allergy and Infectious Diseases NIAIDMaryland USA", 
"Jawaharlal Nehru Medical College", "KLEU Ayurveda Pharmacy", 
"Amgen Inc", "Dr Arunkumar", "ALVAS EDUCATION FOUNDATION"), type_of_sponsor = c("' Government funding agency '", 
"' Government medical college '", "' Research institution '", 
"' Pharmaceutical industry-Global '", " Other [Self sponsored] '", 
"' Private hospital/clinic '"), address = c("' USA '", "' Jawaharlal Nehru Medical College, Aligarh Muslim University, Aligarh-202001 '", 
"' KLEU Ayurveda Pharmacy, Khasbhag, Belgaum, Karnataka '", "' One Amgen Center Drive\n\n\nThousand Oaks, CA USA\n\n\n91320 '", 
"' Room no 32 ,Department of Periodontics , Government Dental college , Trivandrum '", 
"' ALVAS EDUCATION FOUNDATION ALVAS COLLEGE OF PHYSIOTHERAPY\n\n\nMoodabidri - 574227\n\n\nSouth Canara District\n\n\nKarnataka '"
)), row.names = c(NA, 6L), class = "data.frame")

Data frame two.

    structure(list(distinctOrganizations = c("A AMMU", "A and U tibbia college and hospital", 
"A Arumuga kani", "A KIREETI", "AAMIR ZUBAIR SHAIKH", "Aansu Susan Varghese"
)), row.names = c(NA, 6L), class = "data.frame")

Using all the data fields from data frame 2(distinctOrganizations) I have to pull out the rows from data frame one which match the values in the name column.

However, each data field should produce a specific .csv file.

How can I achieve this?

Possible Outcome- A CSV file similar to the image.

The image is of CSV file which contains all the rows related to AIIMS and its variants only. I need CSV file different for each such name.

733

asked Oct 26 '22 10:10

1 Answers

First of all: Your example data don't match any lines (df2 doesn't provide any names contained in your example df1).

If I got your question right, you could use

library(dplyr)
library(purrr)
library(readr)

df1 %>% 
  inner_join(df2, by = c("name" = "distinctOrganizations")) %>% 
  split(f = .$name) %>% 
  walk(~write_csv(.x, paste0(unique(.x$name), ".csv")))

We use an inner_join to remove all elements from df1 that don't have a match in df2
Then we split the resulting data.frame by name, creating a new data.frame for each (distinct) organization
Finally we use purrr's walk function to write a .csv-file for each of these organizations. This produces .csv-files like Amgen Inc.csv or ALVAS EDUCATION FOUNDATION.csv.

Note: The address column contains some line breaks (\n). You should consider removing them, those could cause trouble in your .csv and in your next steps working with those. There are also some white spaces in column type_of_sponsor (at the beginning and the end) you perhaps want to remove.

enter image description here

Data

I modified df2 to get two matches:

df2 <- structure(list(distinctOrganizations = c("Amgen Inc", "A and U tibbia college and hospital", 
"ALVAS EDUCATION FOUNDATION", "A KIREETI", "AAMIR ZUBAIR SHAIKH", 
"Aansu Susan Varghese")), row.names = c(NA, 6L), class = "data.frame")

188

answered Jan 02 '23 21:01

Martin Gal

Related questions
                            
                                Use Rs mongolite to correctly (insert? update?) add data to existing collection
                            
                                base package used when arules is required. Specifying package doesn't work
                            
                                Angle between vector and list of vectors in R
                            
                                How to rotate a custom annotation in ggplot?
                            
                                Counting days of the year with leap years
                            
                                Passing R functions to C routines using rcpp
                            
                                Combining color_bar(...) and percent(...) in R formattable
                            
                                How to connect AWS S3 credentials in a modular shiny app
                            
                                R Officer Package: How to specify a certain placeholder when there are multiple nearly identical ones
                            
                                R: where is the gradient step in the xgboost source code?
                            
                                Creating a new date variable that is on the same day of the week, within the same month, and year as original date variable in r
                            
                                Find coordinates x distance along linestring
                            
                                R data.table breaks in exported functions
                            
                                Updated to Mac OS Big Sur and getting "Warning: Expected min height of view" errors in R
                            
                                What is idiomatic Julia style for by column or row operations?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to pull out specific rows from two data frames with different dimensions and produce multiple .csv files?

Tags:

sqlite

r

classy_BLINK

People also ask

1 Answers

Data

Martin Gal

Recent Activity

Donate For Us