Selenium Web Scraping Tutorial: A 2026 Practical Guide
Selenium is the original headless-browser library. In 2026, Playwright has eaten most of its mindshare for new projects, but Selenium remains the default in many enterprise scraping codebases and has unmatched grid / cluster tooling. This guide covers Selenium 4, undetected-chromedriver, login flows, and the realistic limits of Selenium against modern anti-bot.
Selenium vs Playwright in 2026
Use Selenium if your team has years of existing tooling around it, you operate a Selenium Grid cluster, or you need cross-browser parity (real IE / Safari support, where it still matters). For greenfield scraping projects, Playwright has a faster developer loop, better default stealth, and a cleaner async API. Both can scrape; the choice usually rides on existing infrastructure.
Install (Python)
pip install selenium webdriver-manager
# For stealth-ish scraping:
pip install undetected-chromedriverSelenium 4 ships its own driver manager — you no longer need to download chromedriver manually.
Basic extraction
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
opts = webdriver.ChromeOptions()
opts.add_argument("--headless=new")
driver = webdriver.Chrome(options=opts)
driver.get("https://example.com/products")
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".product-card"))
)
cards = driver.find_elements(By.CSS_SELECTOR, ".product-card")
for c in cards:
print(c.find_element(By.CSS_SELECTOR, ".title").text,
c.find_element(By.CSS_SELECTOR, ".price").text)
driver.quit()Always wait on a selector before extracting — Selenium's biggest source of flaky tests-and-scrapers is racing the JavaScript render. Use WebDriverWait, not time.sleep.
Login & cookies
driver.get("https://example.com/login")
driver.find_element(By.ID, "email").send_keys(USER)
driver.find_element(By.ID, "password").send_keys(PASS)
driver.find_element(By.CSS_SELECTOR, "button[type=submit]").click()
WebDriverWait(driver, 15).until(EC.url_contains("/dashboard"))
# Persist cookies for reuse
import json
with open("cookies.json", "w") as f:
json.dump(driver.get_cookies(), f)
# On the next run:
for cookie in json.load(open("cookies.json")):
driver.add_cookie(cookie)Proxies
opts.add_argument(f"--proxy-server=http://gateway.example.com:7777")
# For authenticated proxies you need a small extension shim or use seleniumwire:
# pip install selenium-wire
from seleniumwire import webdriver as wirewd
driver = wirewd.Chrome(seleniumwire_options={
"proxy": {
"http": "http://USER:[email protected]:7777",
"https": "http://USER:[email protected]:7777",
}
})Vanilla Selenium doesn't support proxy auth out of the box; selenium-wire handles it cleanly and also lets you inspect the underlying network requests.
Stealth: undetected-chromedriver
import undetected_chromedriver as uc
driver = uc.Chrome(headless=False, version_main=None)undetected-chromedriver patches the most common automation tells: navigator.webdriver, missing plugins, broken Permissions API. It is not magic — mature anti-bot vendors update detection rules every few weeks, so projects that depend on it need to track upstream patches.
Honest limits
Selenium against modern Cloudflare / DataDome / PerimeterX is a moving target. You can win against a stable target for months, then a vendor update breaks the run overnight. Two things help: keep the browser image and stealth library current; recycle browser instances aggressively (every 50–200 pages) so accumulated cookies and behavior signals don't accumulate against you.
Past low five-figure pages per day on defended targets, the maintenance cost of stealth Selenium usually exceeds the cost of a managed scraping API or a managed-data service. If you'd rather skip the cat-and-mouse, see our tools category guide.
Don't want to operate browser fleets?
We deliver structured CSV / JSON on a schedule — proxies, anti-bot, and monitoring included.