Web Scraping & Mapping
{orcas} Encounters

R in the Outdoors | R Ladies Seattle

Jadey Ryan | April 20, 2023

Center for Whale Research

L86 and L47 displaying their right saddle patch  (Dave Ellifrit, Center for Whale Research © 2020)

L86 and L47 (Dave Ellifrit, Center for Whale Research © 2020)

Almost 600 orca encounters ➡ almost 600 separate webpages to scrape!

Webscrape with {rvest}

  1. Read webpage with read_html()
  2. Extract specific parts with:
    • html_element() vs html_elements()

    • html_text() vs html_text2()

    • html_attr(), html_table(), and other html_*()

rvest::read_html(
  "https://www.whaleresearch.com/2022encounters") |>
  rvest::html_elements(
    xpath = "//*[starts-with(@id, 'comp-l3369ith')]//a") |>
  rvest::html_attr("href") |> 
  head(2)
[1] "https://www.whaleresearch.com/2023-66"
[2] "https://www.whaleresearch.com/2023-65"

Understand what to extract

<html>
<body>
  <h1 id='first'>A heading</h1>
  <p>Some text &amp; <b>some bold text.</b></p>
  <img src='myimg.png' width='100' height='100'>
</body>

Tools to identify selectors

rvest::read_html(
  "https://www.whaleresearch.com/2022encounters") |>
  rvest::html_elements(
    xpath = "//*[starts-with(@id, 'comp-l3369ith')]//a") |>
  rvest::html_attr("href") |> 
  head()

{orcas}: all together now

Scrape landing pages to get links to encounter webpages

orcas::get_encounter_links(year = 2023, max_urls = 2)
[1] "https://www.whaleresearch.com/2023-66"
[2] "https://www.whaleresearch.com/2023-65"

Extract data from webpage

Don’t overload the site! Use {polite} or sys.sleep().

orcas::get_encounter_data("https://www.whaleresearch.com/2023-14")
 [1] "link:https://www.whaleresearch.com/2023-14"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
 [2] "EncDate:08/04/23 "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
 [3] "EncSeq:1"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
 [4] "Enc#:14"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
 [5] "ObservBegin:04:22 PM"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
 [6] "ObservEnd:05:45 PM"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
 [7] "Vessel:Mike 1"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
 [8] "Staff:Mark Malleson"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
 [9] "Other Observers:Brendon Bissonnette"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[10] "Pods:J, L"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
[11] "LocationDescr:East Sooke"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
[12] "Start Latitude:48 18.65"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
[13] "Start Longitude:123 39.84"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
[14] "End Latitude:48 18.04"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
[15] "End Longitude:123 44.84"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
[16] "EncSummary:"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
[17] "While out scanning the western Strait of Juan de Fuca from shore at Sheringham Point, at ~ 1400, Mark received a report that the Southern Resident killer whales who were seen traveling east into rough seas earlier in the day had flipped and were now heading west into calmer water. Mark quickly rushed back to Victoria and met Brendon at the dock, and they departed on Mike 1 at 1529. "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
[18] "The encounter began at 1622 when Mark spotted the first pair of whales east of Secretary Island. Brendon identified these whales as L72 and her son, L105. These two were part of a subgroup of L pod encountered off Monterey Bay, California, along with members of K pod on March 18th. Mark and Brendon soon spotted more fins up ahead and to the south, leading them to believe that more than one pod was present. They decided to work their way ahead to the nearest animals, which proved to be another mother-son pair, L91, and L122. From there, the closest and largest group could be seen several hundred meters to the south, so they ventured over, and Brendon identified two mixed groups: one containing J40, J45, L83, L110, and L115, while the other group consisted of the L4s (minus L55, L109 & L118), J35s and J44. Soon, the two groups merged into one large social band with spy hops and tail-slaps. L115 was noted to be particularly animated and breached three times in quick succession."
[19] "After photographing and identifying these animals, Mark and Brendon moved to a small trailing group, which turned out to be J16, J26, and J42. Missing matriline member, J36, was later encountered a couple of hundred meters away alongside J53. "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[20] "Mark and Brendon could see more blows to the northeast and a few animals to the southeast, and the decision was made to move to the lead animals, which were identified as the J19s. They could not see any blows ahead of them and concluded that the rest of the unidentified whales in their survey were likely sprinkled throughout the already-observed whales, simply in smaller units. As Mike 1 ventured eastwards, Brendon spotted two more groupings to the southeast. The first group turned out to be a boisterous pairing of J39 and L109. The next set of whales were the J22s and J37s, trailing slightly behind the two sprouting bulls. "                                                                                                                                                                                                                                                                                                                                                                    
[21] "Mark decided to take one last look at a lone whale nearby in the hopes that the animal was not previously identified, and luck was on their side as Brendon identified the female as J46. After a quick proof-of-presence photo, they ended the encounter with the westbound whales south of Sooke at 1745 and began making way for Victoria in a waning southeasterly wind."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
[22] "NMFS PERMIT: 21238/ DFO SARA 388"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

Parse individual encounter data into dataframe

orcas::get_encounter_data("https://www.whaleresearch.com/2023-14") |>
  orcas::parse_encounter()
nmfs_permit link enc_date enc_seq enc_number observ_begin observ_end vessel staff other_observers pods location_descr start_latitude start_longitude end_latitude end_longitude enc_summary
21238/ DFO SARA 388 https://www.whaleresearch.com/2023-14 08/04/23 1 14 04:22 PM 05:45 PM Mike 1 Mark Malleson Brendon Bissonnette J, L East Sooke 48 18.65 123 39.84 48 18.04 123 44.84 While out scanning the western Strait of Juan de Fuca from shore at Sheringham Point, at ~ 1400, Mark received a report that the Southern Resident killer whales who were seen traveling east into rough seas earlier in the day had flipped and were now heading west into calmer water. Mark quickly rushed back to Victoria and met Brendon at the dock, and they departed on Mike 1 at 1529.

The encounter began at 1622 when Mark spotted the first pair of whales east of Secretary Island. Brendon identified these whales as L72 and her son, L105. These two were part of a subgroup of L pod encountered off Monterey Bay, California, along with members of K pod on March 18th. Mark and Brendon soon spotted more fins up ahead and to the south, leading them to believe that more than one pod was present. They decided to work their way ahead to the nearest animals, which proved to be another mother-son pair, L91, and L122. From there, the closest and largest group could be seen several hundred meters to the south, so they ventured over, and Brendon identified two mixed groups: one containing J40, J45, L83, L110, and L115, while the other group consisted of the L4s (minus L55, L109 & L118), J35s and J44. Soon, the two groups merged into one large social band with spy hops and tail-slaps. L115 was noted to be particularly animated and breached three times in quick succession. After photographing and identifying these animals, Mark and Brendon moved to a small trailing group, which turned out to be J16, J26, and J42. Missing matriline member, J36, was later encountered a couple of hundred meters away alongside J53. Mark and Brendon could see more blows to the northeast and a few animals to the southeast, and the decision was made to move to the lead animals, which were identified as the J19s. They could not see any blows ahead of them and concluded that the rest of the unidentified whales in their survey were likely sprinkled throughout the already-observed whales, simply in smaller units. As Mike 1 ventured eastwards, Brendon spotted two more groupings to the southeast. The first group turned out to be a boisterous pairing of J39 and L109. The next set of whales were the J22s and J37s, trailing slightly behind the two sprouting bulls. Mark decided to take one last look at a lone whale nearby in the hopes that the animal was not previously identified, and luck was on their side as Brendon identified the female as J46. After a quick proof-of-presence photo, they ended the encounter with the westbound whales south of Sooke at 1745 and began making way for Victoria in a waning southeasterly wind. |

Combine previous 3 functions to scrape and combine all desired encounters

make_encounter_df <- function(years, max_urls = Inf) {
  links <- purrr::map(years, get_encounter_links, max_urls) |>
    purrr::list_c()

  links |>
    purrr::map(\(x) get_encounter_data(x), .progress = TRUE) |>
    purrr::map_df(\(x) parse_encounter(x))
}

orcas::make_encounter_df(years = 2023, max_urls = 1)
nmfs_permit link enc_date enc_seq enc_number observ_begin observ_end vessel staff other_observers pods location_descr start_latitude start_longitude end_latitude end_longitude enc_summary
27038/ DFO SARA 388 https://www.whaleresearch.com/2023-66 07/10/23 1 66 04:20 PM 07:04 PM Mike 1 Mark Malleson Brendon Bissonnette Transients South of Victoria 48 15.01 123 20.72 48 12.79 123 23.57 Upon receiving a report of four killer whales quickly heading east in the traffic lanes south of Race Rocks, Mark began to prepare the boat. Brendon joined him at the dock, and they set off on Mike 1 at 1545. By this time, the quartet of whales had surged eastward at 8 knots and merged with a larger group already engrossed in a sea lion predation.

The encounter began at 1620 as Mark and Brendon entered the scene, greeted by what seemed to be a dozen or more whales. Brendon identified the matrilines as the T035As, T036As, T038As, along with the T123s, who were the ones previously spotted heading east. This brought the grand tally to seventeen whales. The sea lion, a mid-sized Steller, remarkably held its own despite the relentless onslaught from the whales. By 1500, it was apparent that the whales were deliberately extending the pursuit, taking moments to socialize, rest and regroup before the next onslaught. Brendon observed the lead huntress to be 23-year-old T038A. With assistance from both T036A1 and T123, she orchestrated the majority of the sea lion chase. While this trio pursued the hunt, the juveniles engaged in socialization. T036A hovered approximately a hundred meters away with some of the other youngsters, seemingly at rest, while T123A maintained prolonged dives on the periphery. Perhaps cognizant of the sea lion’s vigor, the true onslaught commenced around 1845. Repeated passes were made at the sea lion, each more intense than the last. As the hunt approached its conclusion, so too did the day itself; Mark noted the dwindling sunlight and was aware that the opportunity to return home with any residual daylight was slipping away. At 1904, the decision was made to conclude the encounter and set course for Victoria after a final pass from T036A & T036A5. Behind, the hunt persisted into its fourth hour… |

Final tidying and packaging data set in data-raw/data_cwr.R

(Mostly) tidy data set

orcas::make_dt(orcas::cwr_tidy[1:50, ])

Mapping with whale icons!

orcas::make_leaflet(orcas::cwr_tidy[1:2, ])

Resources

Webscraping

Mapping

Half breach (Center for Whale Research 2023) Spyhop (Center for Whale Research 2023)

⭐⭐⭐⭐⭐ personal project

Slides

{orcas} GitHub Repo

linkedin.com/in/jadey-ryan

@jadeynryan

WSDA NRAS Webpage


All photos and data belong to the Center for Whale Research, a 501c3 nonprofit organization registered in Washington State.

L115 Breaching (Center for Whale Research 2023)

L115 Breaching (Center for Whale Research 2023)