EngineeringApril 24, 2026 · 9 min read

Selenium Web Scraping Tutorial: A 2026 Practical Guide

Selenium is the original headless-browser library. In 2026, Playwright has eaten most of its mindshare for new projects, but Selenium remains the default in many enterprise scraping codebases and has unmatched grid / cluster tooling. This guide covers Selenium 4, undetected-chromedriver, login flows, and the realistic limits of Selenium against modern anti-bot.

Selenium vs Playwright in 2026

Use Selenium if your team has years of existing tooling around it, you operate a Selenium Grid cluster, or you need cross-browser parity (real IE / Safari support, where it still matters). For greenfield scraping projects, Playwright has a faster developer loop, better default stealth, and a cleaner async API. Both can scrape; the choice usually rides on existing infrastructure.

Install (Python)

pip install selenium webdriver-manager
# For stealth-ish scraping:
pip install undetected-chromedriver

Selenium 4 ships its own driver manager — you no longer need to download chromedriver manually.

Basic extraction

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

opts = webdriver.ChromeOptions()
opts.add_argument("--headless=new")
driver = webdriver.Chrome(options=opts)

driver.get("https://example.com/products")
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, ".product-card"))
)
cards = driver.find_elements(By.CSS_SELECTOR, ".product-card")
for c in cards:
    print(c.find_element(By.CSS_SELECTOR, ".title").text,
          c.find_element(By.CSS_SELECTOR, ".price").text)

driver.quit()

Always wait on a selector before extracting — Selenium's biggest source of flaky tests-and-scrapers is racing the JavaScript render. Use WebDriverWait, not time.sleep.

Login & cookies

driver.get("https://example.com/login")
driver.find_element(By.ID, "email").send_keys(USER)
driver.find_element(By.ID, "password").send_keys(PASS)
driver.find_element(By.CSS_SELECTOR, "button[type=submit]").click()
WebDriverWait(driver, 15).until(EC.url_contains("/dashboard"))

# Persist cookies for reuse
import json
with open("cookies.json", "w") as f:
    json.dump(driver.get_cookies(), f)

# On the next run:
for cookie in json.load(open("cookies.json")):
    driver.add_cookie(cookie)

Proxies

opts.add_argument(f"--proxy-server=http://gateway.example.com:7777")
# For authenticated proxies you need a small extension shim or use seleniumwire:
# pip install selenium-wire
from seleniumwire import webdriver as wirewd
driver = wirewd.Chrome(seleniumwire_options={
    "proxy": {
        "http":  "http://USER:[email protected]:7777",
        "https": "http://USER:[email protected]:7777",
    }
})

Vanilla Selenium doesn't support proxy auth out of the box; selenium-wire handles it cleanly and also lets you inspect the underlying network requests.

Stealth: undetected-chromedriver

import undetected_chromedriver as uc
driver = uc.Chrome(headless=False, version_main=None)

undetected-chromedriver patches the most common automation tells: navigator.webdriver, missing plugins, broken Permissions API. It is not magic — mature anti-bot vendors update detection rules every few weeks, so projects that depend on it need to track upstream patches.

Honest limits

Selenium against modern Cloudflare / DataDome / PerimeterX is a moving target. You can win against a stable target for months, then a vendor update breaks the run overnight. Two things help: keep the browser image and stealth library current; recycle browser instances aggressively (every 50–200 pages) so accumulated cookies and behavior signals don't accumulate against you.

Past low five-figure pages per day on defended targets, the maintenance cost of stealth Selenium usually exceeds the cost of a managed scraping API or a managed-data service. If you'd rather skip the cat-and-mouse, see our tools category guide.

Don't want to operate browser fleets?

We deliver structured CSV / JSON on a schedule — proxies, anti-bot, and monitoring included.