How to Scrape Indeed and LinkedIn Job Listings
By Marcus Reiner · 2026-03-31 · 10 min read · Engineering
Job listings are the most-scraped data on the web. Here's the stack that scales to millions of postings per month.
Indeed vs LinkedIn — totally different difficulty
Indeed is moderate — Cloudflare-protected, ISP/residential works at decent volume. LinkedIn is among the hardest sites on the web — residential + headless browser + slow throttling is the minimum, and pre-collected datasets are usually cheaper than scraping.
Indeed stack
Decodo residential + curl_cffi + lxml. ~95% success rate at 2 req/s. Public listings page is server-rendered; parse the JSON in the script#__NEXT_DATA__ tag for clean structured data.
LinkedIn stack
Two viable routes:
1. Bright Data LinkedIn Jobs Dataset — pre-collected, refreshed daily, GDPR-aware sourcing. Pay per record (~$0.003 each)
2. Oxylabs LinkedIn Scraper API — pay per result with retry built-in. ~$1.50 per 1k
Deduplication
The same job is posted on 5+ boards. Hash on (normalized_title, company, city, first_50_chars_of_description). Cuts your downstream pipeline by ~70%.
Schema.org JobPosting
Most major boards embed JobPosting JSON-LD in the head. Parse that first; fall back to HTML only when missing. Stable across UI changes and 10× faster than DOM scraping.
FAQ
Can I republish job listings?
Job posting text is usually copyrighted by the employer. Facts (title, company, location, salary) are not. Most aggregators redirect to source rather than republish full text.