Data frame one.
structure(list(trial_id = c(2022L, 2023L, 2123L, 2184L, 3883L,
4434L), ctri_number = c("CTRI/2018/02/011794 ", "CTRI/2017/08/009517 ",
"CTRI/2019/05/019036 ", "CTRI/2017/12/010935 ", "CTRI/2017/09/009746 ",
"CTRI/2016/06/007055 "), name = c("National Institute of Allergy and Infectious Diseases NIAIDMaryland USA",
"Jawaharlal Nehru Medical College", "KLEU Ayurveda Pharmacy",
"Amgen Inc", "Dr Arunkumar", "ALVAS EDUCATION FOUNDATION"), type_of_sponsor = c("' Government funding agency '",
"' Government medical college '", "' Research institution '",
"' Pharmaceutical industry-Global '", " Other [Self sponsored] '",
"' Private hospital/clinic '"), address = c("' USA '", "' Jawaharlal Nehru Medical College, Aligarh Muslim University, Aligarh-202001 '",
"' KLEU Ayurveda Pharmacy, Khasbhag, Belgaum, Karnataka '", "' One Amgen Center Drive\n\n\nThousand Oaks, CA USA\n\n\n91320 '",
"' Room no 32 ,Department of Periodontics , Government Dental college , Trivandrum '",
"' ALVAS EDUCATION FOUNDATION ALVAS COLLEGE OF PHYSIOTHERAPY\n\n\nMoodabidri - 574227\n\n\nSouth Canara District\n\n\nKarnataka '"
)), row.names = c(NA, 6L), class = "data.frame")
Data frame two.
structure(list(distinctOrganizations = c("A AMMU", "A and U tibbia college and hospital",
"A Arumuga kani", "A KIREETI", "AAMIR ZUBAIR SHAIKH", "Aansu Susan Varghese"
)), row.names = c(NA, 6L), class = "data.frame")
Using all the data fields from data frame 2(distinctOrganizations) I have to pull out the rows from data frame one which match the values in the name column.
However, each data field should produce a specific .csv file.
How can I achieve this?
Possible Outcome- A CSV file similar to the image.
Method 3: Splitting based both on Rows and ColumnsUsing groupby() method of Pandas we can create multiple CSV files row-wise. To create a file we can use the to_csv() method of Pandas. Here created two files based on row values “male” and “female” values of specific Gender column for Spending Score.
Step 1 (Using Pandas): Find the number of rows from the files. Step 1 (Using Traditional Python): Find the number of rows from the files. Step 2: User to input the number of lines per file (Range) and generate a random number. In case you want an equal split, provide the same number for max and min.
First of all: Your example data don't match any lines (df2
doesn't provide any names contained in your example df1
).
If I got your question right, you could use
library(dplyr)
library(purrr)
library(readr)
df1 %>%
inner_join(df2, by = c("name" = "distinctOrganizations")) %>%
split(f = .$name) %>%
walk(~write_csv(.x, paste0(unique(.x$name), ".csv")))
inner_join
to remove all elements from df1
that don't have a match in df2
split
the resulting data.frame by name, creating a new data.frame for each (distinct) organizationpurrr
's walk
function to write a .csv
-file for each of these organizations. This produces .csv
-files like Amgen Inc.csv
or ALVAS EDUCATION FOUNDATION.csv
.Note: The address
column contains some line breaks (\n
). You should consider removing them, those could cause trouble in your .csv
and in your next steps working with those. There are also some white spaces in column type_of_sponsor
(at the beginning and the end) you perhaps want to remove.
I modified df2
to get two matches:
df2 <- structure(list(distinctOrganizations = c("Amgen Inc", "A and U tibbia college and hospital",
"ALVAS EDUCATION FOUNDATION", "A KIREETI", "AAMIR ZUBAIR SHAIKH",
"Aansu Susan Varghese")), row.names = c(NA, 6L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With