R in the Outdoors | R Ladies Seattle
Almost 600 orca encounters ➡ almost 600 separate webpages to scrape!
read_html()
html_element()
vs html_elements()
html_text()
vs html_text2()
html_attr(), html_table()
, and other html_*()
<html>
<body>
<h1 id='first'>A heading</h1>
<p>Some text & <b>some bold text.</b></p>
<img src='myimg.png' width='100' height='100'>
</body>
Element: start tag, content, end tag (list of elements)
Attribute: optional info about elements (list of attributes)
rvest::html_*()
requires css
or xpath
arguments to define the element selectors
Beware of dynamic CSS classes and respect robots.txt files
{orcas}
: all together nowScrape landing pages to get links to encounter webpages
[1] "https://www.whaleresearch.com/2023-66"
[2] "https://www.whaleresearch.com/2023-65"
Extract data from webpage
Don’t overload the site! Use {polite} or sys.sleep()
.
[1] "link:https://www.whaleresearch.com/2023-14"
[2] "EncDate:08/04/23 "
[3] "EncSeq:1"
[4] "Enc#:14"
[5] "ObservBegin:04:22 PM"
[6] "ObservEnd:05:45 PM"
[7] "Vessel:Mike 1"
[8] "Staff:Mark Malleson"
[9] "Other Observers:Brendon Bissonnette"
[10] "Pods:J, L"
[11] "LocationDescr:East Sooke"
[12] "Start Latitude:48 18.65"
[13] "Start Longitude:123 39.84"
[14] "End Latitude:48 18.04"
[15] "End Longitude:123 44.84"
[16] "EncSummary:"
[17] "While out scanning the western Strait of Juan de Fuca from shore at Sheringham Point, at ~ 1400, Mark received a report that the Southern Resident killer whales who were seen traveling east into rough seas earlier in the day had flipped and were now heading west into calmer water. Mark quickly rushed back to Victoria and met Brendon at the dock, and they departed on Mike 1 at 1529. "
[18] "The encounter began at 1622 when Mark spotted the first pair of whales east of Secretary Island. Brendon identified these whales as L72 and her son, L105. These two were part of a subgroup of L pod encountered off Monterey Bay, California, along with members of K pod on March 18th. Mark and Brendon soon spotted more fins up ahead and to the south, leading them to believe that more than one pod was present. They decided to work their way ahead to the nearest animals, which proved to be another mother-son pair, L91, and L122. From there, the closest and largest group could be seen several hundred meters to the south, so they ventured over, and Brendon identified two mixed groups: one containing J40, J45, L83, L110, and L115, while the other group consisted of the L4s (minus L55, L109 & L118), J35s and J44. Soon, the two groups merged into one large social band with spy hops and tail-slaps. L115 was noted to be particularly animated and breached three times in quick succession."
[19] "After photographing and identifying these animals, Mark and Brendon moved to a small trailing group, which turned out to be J16, J26, and J42. Missing matriline member, J36, was later encountered a couple of hundred meters away alongside J53. "
[20] "Mark and Brendon could see more blows to the northeast and a few animals to the southeast, and the decision was made to move to the lead animals, which were identified as the J19s. They could not see any blows ahead of them and concluded that the rest of the unidentified whales in their survey were likely sprinkled throughout the already-observed whales, simply in smaller units. As Mike 1 ventured eastwards, Brendon spotted two more groupings to the southeast. The first group turned out to be a boisterous pairing of J39 and L109. The next set of whales were the J22s and J37s, trailing slightly behind the two sprouting bulls. "
[21] "Mark decided to take one last look at a lone whale nearby in the hopes that the animal was not previously identified, and luck was on their side as Brendon identified the female as J46. After a quick proof-of-presence photo, they ended the encounter with the westbound whales south of Sooke at 1745 and began making way for Victoria in a waning southeasterly wind."
[22] "NMFS PERMIT: 21238/ DFO SARA 388"
Parse individual encounter data into dataframe
nmfs_permit | link | enc_date | enc_seq | enc_number | observ_begin | observ_end | vessel | staff | other_observers | pods | location_descr | start_latitude | start_longitude | end_latitude | end_longitude | enc_summary |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21238/ DFO SARA 388 | https://www.whaleresearch.com/2023-14 | 08/04/23 | 1 | 14 | 04:22 PM | 05:45 PM | Mike 1 | Mark Malleson | Brendon Bissonnette | J, L | East Sooke | 48 18.65 | 123 39.84 | 48 18.04 | 123 44.84 | While out scanning the western Strait of Juan de Fuca from shore at Sheringham Point, at ~ 1400, Mark received a report that the Southern Resident killer whales who were seen traveling east into rough seas earlier in the day had flipped and were now heading west into calmer water. Mark quickly rushed back to Victoria and met Brendon at the dock, and they departed on Mike 1 at 1529. |
The encounter began at 1622 when Mark spotted the first pair of whales east of Secretary Island. Brendon identified these whales as L72 and her son, L105. These two were part of a subgroup of L pod encountered off Monterey Bay, California, along with members of K pod on March 18th. Mark and Brendon soon spotted more fins up ahead and to the south, leading them to believe that more than one pod was present. They decided to work their way ahead to the nearest animals, which proved to be another mother-son pair, L91, and L122. From there, the closest and largest group could be seen several hundred meters to the south, so they ventured over, and Brendon identified two mixed groups: one containing J40, J45, L83, L110, and L115, while the other group consisted of the L4s (minus L55, L109 & L118), J35s and J44. Soon, the two groups merged into one large social band with spy hops and tail-slaps. L115 was noted to be particularly animated and breached three times in quick succession. After photographing and identifying these animals, Mark and Brendon moved to a small trailing group, which turned out to be J16, J26, and J42. Missing matriline member, J36, was later encountered a couple of hundred meters away alongside J53. Mark and Brendon could see more blows to the northeast and a few animals to the southeast, and the decision was made to move to the lead animals, which were identified as the J19s. They could not see any blows ahead of them and concluded that the rest of the unidentified whales in their survey were likely sprinkled throughout the already-observed whales, simply in smaller units. As Mike 1 ventured eastwards, Brendon spotted two more groupings to the southeast. The first group turned out to be a boisterous pairing of J39 and L109. The next set of whales were the J22s and J37s, trailing slightly behind the two sprouting bulls. Mark decided to take one last look at a lone whale nearby in the hopes that the animal was not previously identified, and luck was on their side as Brendon identified the female as J46. After a quick proof-of-presence photo, they ended the encounter with the westbound whales south of Sooke at 1745 and began making way for Victoria in a waning southeasterly wind. |
Combine previous 3 functions to scrape and combine all desired encounters
nmfs_permit | link | enc_date | enc_seq | enc_number | observ_begin | observ_end | vessel | staff | other_observers | pods | location_descr | start_latitude | start_longitude | end_latitude | end_longitude | enc_summary |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
27038/ DFO SARA 388 | https://www.whaleresearch.com/2023-66 | 07/10/23 | 1 | 66 | 04:20 PM | 07:04 PM | Mike 1 | Mark Malleson | Brendon Bissonnette | Transients | South of Victoria | 48 15.01 | 123 20.72 | 48 12.79 | 123 23.57 | Upon receiving a report of four killer whales quickly heading east in the traffic lanes south of Race Rocks, Mark began to prepare the boat. Brendon joined him at the dock, and they set off on Mike 1 at 1545. By this time, the quartet of whales had surged eastward at 8 knots and merged with a larger group already engrossed in a sea lion predation. |
The encounter began at 1620 as Mark and Brendon entered the scene, greeted by what seemed to be a dozen or more whales. Brendon identified the matrilines as the T035As, T036As, T038As, along with the T123s, who were the ones previously spotted heading east. This brought the grand tally to seventeen whales. The sea lion, a mid-sized Steller, remarkably held its own despite the relentless onslaught from the whales. By 1500, it was apparent that the whales were deliberately extending the pursuit, taking moments to socialize, rest and regroup before the next onslaught. Brendon observed the lead huntress to be 23-year-old T038A. With assistance from both T036A1 and T123, she orchestrated the majority of the sea lion chase. While this trio pursued the hunt, the juveniles engaged in socialization. T036A hovered approximately a hundred meters away with some of the other youngsters, seemingly at rest, while T123A maintained prolonged dives on the periphery. Perhaps cognizant of the sea lion’s vigor, the true onslaught commenced around 1845. Repeated passes were made at the sea lion, each more intense than the last. As the hunt approached its conclusion, so too did the day itself; Mark noted the dwindling sunlight and was aware that the opportunity to return home with any residual daylight was slipping away. At 1904, the decision was made to conclude the encounter and set course for Victoria after a final pass from T036A & T036A5. Behind, the hunt persisted into its fourth hour… |
Final tidying and packaging data set in data-raw/data_cwr.R
Webscraping
{rvest}
documentationMapping
{leaflet}
R documentation
All photos and data belong to the Center for Whale Research, a 501c3 nonprofit organization registered in Washington State.