Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find_xml_all return {xml_nodeset (0)}

Tags:

r

kml

xml2

I have recently downloaded the KML file from this map, and tried to use the package xml2 to extract the information of the campsites e.g. the geolocation, the facilities around the sites etc; but I got {xml_nodeset (0)} at the end.

Belows are the codes I have used,

library(xml2)
campsites <- read_xml("file_path")
xml_find_all(campsites, ".//Placemark")

Here is the structure of the KML file (you may also try xml_structure(campsites)),

> library(magrittr)
> campsites
{xml_document}
<kml>
[1] <Document>\n<description><![CDATA[powered by <a href="http://www.wordpress.org">WordPress</a> &amp; <a href="https://www.mapsmarker.com">MapsMarker.com</a>]] ...
>
> campsites %>% xml_children %>% xml_children %>% xml_children
{xml_nodeset (55)}
 [1] <IconStyle>\n  <Icon>\n    <href>http://www.mountaineering-lohas.org/wp-content/uploads/leaflet-maps-marker-icons/tents.png</href>\n  </Icon>\n</IconStyle>
 [2] <IconStyle>\n  <Icon>\n    <href>http://www.mountaineering-lohas.org/wp-content/uploads/leaflet-maps-marker-icons/tents-1.png</href>\n  </Icon>\n</IconStyle>
 [3] <IconStyle>\n  <Icon>\n    <href>http://www.mountaineering-lohas.org/wp-content/uploads/leaflet-maps-marker-icons/tents1.png</href>\n  </Icon>\n</IconStyle>
 [4] <name>香港營地 Hong Kong Camp Site</name>
 [5] <Placemark id="marker-1">\n<styleUrl>#tents</styleUrl>\n<name>æµæ°´éŸ¿ç‡Ÿåœ° ( Lau Shui Heung Camp Site )</name>\n<TimeStamp><when>2013-02-21T04:02:29+08: ...
 [6] <Placemark id="marker-2">\n<styleUrl>#tents</styleUrl>\n<name>鶴藪營地(Hok Tau Camp Site)</name>\n<TimeStamp><when>2013-02-21T04:02:18+08:00</when></Tim ...
 [7] <Placemark id="marker-3">\n<styleUrl>#tents</styleUrl>\n<name>涌背營地(Chung Pui Camp Site)</name>\n<TimeStamp><when>2013-02-22T11:02:02+08:00</when></T ...
 [8] <Placemark id="marker-4">\n<styleUrl>#tents</styleUrl>\n<name>æ±å¹³æ´²ç‡Ÿåœ° (Tung Ping Chau Campsite)</name>\n<TimeStamp><when>2013-02-22T11:02:39+08:00</ ...
 [9] <Placemark id="marker-5">\n<styleUrl>#tents</styleUrl>\n<name>ç£ä»”å—營地(Wan Tsai Peninsula South Campsite)</name>\n<TimeStamp><when>2013-02-22T11:02:2 ...
[10] <Placemark id="marker-6">\n<styleUrl>#tents</styleUrl>\n<name>ç£ä»”西營地 (Wan Tsai Peninsula West Campsite)</name>\n<TimeStamp><when>2013-02-22T11:02:3 ...
...

As you can see there are nodes named as "Placemark", why I can't find the nodes using xml_find_all? Did I make any mistakes in my codes?

Thanks!

like image 986
pe-perry Avatar asked Dec 25 '22 10:12

pe-perry


1 Answers

It looks like you have a few namespaces. If you add the prefix to your xpath you can get the nodeset.

xml_ns(campsites)
# d1   <-> http://www.opengis.net/kml/2.2
# atom <-> http://www.w3.org/2005/Atom
# gx   <-> http://www.google.com/kml/ext/2.2

xml_find_all(campsites, ".//d1:Placemark", xml_ns(campsites))
# {xml_nodeset (45)}
#  [1] <Placemark id="marker-1">\n<styleUrl>#tents</styleUrl>\n<name>流水響營地 ( La ...
#  [2] <Placemark id="marker-2">\n<styleUrl>#tents</styleUrl>\n<name>鶴藪營地(Hok T ...
#  ...

To get the text in the cdata, you could use something like

xml_text(xml_find_all(campsites, "//d1:description", xml_ns(campsites))) 
# or "//d1:description/text()"
like image 68
Rorschach Avatar answered Jan 31 '23 00:01

Rorschach