<TABLE cellspacing=1 cellpadding=7 rules=all frame=Box border=1>
<thead>
<TR>
<TD ROWSPAN=2 ALIGN=CENTER VALIGN=CENTER> </TD>
<TD COLSPAN=6 ALIGN=CENTER>1a. My peers make a positive impact my work environment.</TD>
<TD ALIGN=CENTER>Number</TD>
</TR>
<TR>
<TD ALIGN=CENTER>Strongly agree <br> </TD>
<TD ALIGN=CENTER>Generally agree <br> </TD>
<TD ALIGN=CENTER>Neither agree nor<br>disagree</TD>
<TD ALIGN=CENTER>Generally disagree<br> </TD>
<TD ALIGN=CENTER>Strongly disagree<br> </TD>
<TD ALIGN=CENTER>No basis to judge<br> </TD>
<TD ALIGN=CENTER>of Cases</TD>
</TR>
</thead>
<tbody>
<TR>
<TD ALIGN=LEFT VALIGN=TOP> Company-Wide </TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 44.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 44.9</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 6.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 1.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 0.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2,014</TD>
</TR>
<TR>
<TD ALIGN=LEFT VALIGN=TOP> Region 1 </TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 45.6</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 45.2</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 5.7</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 1.4</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 0.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 1,699</TD>
</TR>
<TR>
<TD ALIGN=LEFT VALIGN=TOP>Division 1 </TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 52.9</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 39.7</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 4.1</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 2.5</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 0.8</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM>0</TD>
<TD ALIGN=RIGHT VALIGN=BOTTOM> 121</TD>
</TR>
</tbody>
</TABLE>
<hr><A NAME="IDX1"> </A>
I have an HTML file that contains several tables of the sort above. I would like to convert them into a data frame where each survey question, currently in the table header, would appear in a column. The percent responding to each question would remain in a column, as would the response levels. Not all questions have the same number of responses (i.e. some are on a five point scale, others are on a nine point scale). I tried readHTMLTable and then do.call rbind on that result, but cannot obtain the data frame of interest because the number of columns is not identical. I welcome any advice on how to proceed. thanks!
edit:
library(xml)
library(dplyr)
questions<-readHTMLTable(files[8], trim=T, as.data.frame=T, header=T)
data<-bind_rows(questions)
Results in the data frame I want, but because some questions have more response levels than others, the "number of cases" data does not consistently appear in one column. Is there a way for me to name the last column of each table before merging?
frame() function in R Programming Language is used to convert an object to data frame.
HTML tables are a standard way to display tabular information online. Getting HTML table data into R is fairly straightforward with the readHTMLTable() function of the XML package.
Data Visualization using R ProgrammingA data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
You can use the rvest
package for this. However, it might be necessary to pay attention to column names with white spaces. I used the option fill=TRUE
as a quick fix, but maybe this can be done in a better way.
library(rvest)
my_df <- as.data.frame(read_html(text) %>% html_table(fill=TRUE))
> my_df
# X1 X2 X3 X4 X5 X6 X7 X8
#1 1a. My peers make a positive impact my work environment. <NA> <NA> <NA> <NA> <NA> Number
#2 Strongly agree Generally agree Neither agree nordisagree Generally disagree Strongly disagree No basis to judge of Cases <NA>
#3 Company-Wide 44.1 44.9 6.6 2.6 1.6 0.1 2,014
#4 Region 1 45.6 45.2 5.7 2.1 1.4 0.1 1,699
#5 Division 1 52.9 39.7 4.1 2.5 0.8 0 121
Concerning the data, I copy-pasted the html code from the OP and assigned it to the variable text
with text <- '<TABLE cellspacing=1 cellpadding=7 rules=all frame=...'
, using single quotation marks.
Some details of the format can be corrected afterwards in a rather simple way:
my_df[2,] <- c("",my_df[2,][-length(my_df)])
#> my_df
# X1 X2 X3 X4 X5 X6 X7 X8
#1 1a. My peers make a positive impact my work environment. <NA> <NA> <NA> <NA> <NA> Number
#2 Strongly agree Generally agree Neither agree nordisagree Generally disagree Strongly disagree No basis to judge of Cases
#3 Company-Wide 44.1 44.9 6.6 2.6 1.6 0.1 2,014
#4 Region 1 45.6 45.2 5.7 2.1 1.4 0.1 1,699
#5 Division 1 52.9 39.7 4.1 2.5 0.8 0 121
Essentially, in this case the entries of the second row should be shifted to the right by one cell.
data
text <- '<TABLE cellspacing=1 cellpadding=7 rules=all frame=Box border=1>\n <thead>\n <TR>\n <TD ROWSPAN=2 ALIGN=CENTER VALIGN=CENTER> </TD>\n <TD COLSPAN=6 ALIGN=CENTER>1a. My peers make a positive impact my work environment.</TD>\n <TD ALIGN=CENTER>Number</TD>\n </TR>\n <TR>\n <TD ALIGN=CENTER>Strongly agree <br> </TD>\n <TD ALIGN=CENTER>Generally agree <br> </TD>\n <TD ALIGN=CENTER>Neither agree nor<br>disagree</TD>\n <TD ALIGN=CENTER>Generally disagree<br> </TD>\n <TD ALIGN=CENTER>Strongly disagree<br> </TD>\n <TD ALIGN=CENTER>No basis to judge<br> </TD>\n <TD ALIGN=CENTER>of Cases</TD>\n </TR>\n </thead>\n <tbody>\n <TR>\n <TD ALIGN=LEFT VALIGN=TOP> Company-Wide </TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 44.1</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 44.9</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 6.6</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 2.6</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 1.6</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 0.1</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 2,014</TD>\n </TR>\n <TR>\n <TD ALIGN=LEFT VALIGN=TOP> Region 1 </TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 45.6</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 45.2</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 5.7</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 2.1</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 1.4</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 0.1</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 1,699</TD>\n </TR>\n <TR>\n <TD ALIGN=LEFT VALIGN=TOP>Division 1 </TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 52.9</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 39.7</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 4.1</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 2.5</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 0.8</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM>0</TD>\n <TD ALIGN=RIGHT VALIGN=BOTTOM> 121</TD>\n </TR>\n </tbody>\n </TABLE>\n <hr><A NAME=\"IDX1\"> </A>'
#> class(text)
#[1] "character"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With