用Python及Selenium抓取奇摩字典英文單字的音標

用Python及Selenium抓取奇摩字典英文單字的音標

抓取奇摩字典英文單字的音標

from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException

import time # For pause

# For MacOS, place getCoordinate.py and chromedriver under users\bfhaha
driver = webdriver.Chrome(r'C:\Users\bfhaha\chromedriver')

vocabulary = [
"follicle",
"polio",
"groove",

]

n = len(vocabulary)

f = open("getKK.txt", "a", encoding='UTF-8')
f.truncate(0) # empty getKK.txt

for i in range(n):
  driver.get("https://tw.dictionary.search.yahoo.com/search?p=" + vocabulary[i])
  time.sleep(2)
  try:
    kk = driver.find_element_by_xpath("//span[@class = ' fz-14']").text
    f.write(kk)
  except NoSuchElementException as exception:
    f.write("NULL")
  f.write("\n")
f.close()

有些網頁會有預先載入的頁面,例如Youtube的廣告,會導致雖然看原始碼有某個元素,但實際上卻抓不到的情況(預先載入的頁面跟你實際上看到的頁面不同)。這時候可以先用kk = driver.execute_script("return document.getElementsByTagName('body')[0].innerHTML;"),看一下這個預先載入的網頁的原始碼,來決定需要的資訊在哪一個元素中。

No comments:

Post a Comment