How to Scrape Indeed and LinkedIn Job Listings

By Marcus Reiner · 2026-03-31 · 10 min read · Engineering

indeedlinkedinjobsscraping

Job listings are the most-scraped data on the web. Here's the stack that scales to millions of postings per month.

Indeed vs LinkedIn — totally different difficulty

Indeed is moderate — Cloudflare-protected, ISP/residential works at decent volume. LinkedIn is among the hardest sites on the web — residential + headless browser + slow throttling is the minimum, and pre-collected datasets are usually cheaper than scraping.

Indeed stack

Decodo residential + curl_cffi + lxml. ~95% success rate at 2 req/s. Public listings page is server-rendered; parse the JSON in the script#__NEXT_DATA__ tag for clean structured data.

LinkedIn stack

Two viable routes:

1. Bright Data LinkedIn Jobs Dataset — pre-collected, refreshed daily, GDPR-aware sourcing. Pay per record (~$0.003 each)

2. Oxylabs LinkedIn Scraper API — pay per result with retry built-in. ~$1.50 per 1k

Deduplication

The same job is posted on 5+ boards. Hash on (normalized_title, company, city, first_50_chars_of_description). Cuts your downstream pipeline by ~70%.

Schema.org JobPosting

Most major boards embed JobPosting JSON-LD in the head. Parse that first; fall back to HTML only when missing. Stable across UI changes and 10× faster than DOM scraping.

FAQ

Can I republish job listings?

Job posting text is usually copyrighted by the employer. Facts (title, company, location, salary) are not. Most aggregators redirect to source rather than republish full text.

Back to Blog