Files
hsntsn-scraper/README.md
T

1.7 KiB

hsntsn-scraper

.NET console scraper.

Source: http://www.hsn-tsn.de/

CSV output fields:

  • HsnTsn, Hsn, Tsn
  • Brand, VehicleType, Model, OfficialType
  • YearFrom, YearTo
  • PowerPs, PowerKw, DisplacementCcm, FuelType
  • MatchKey
  • SourceQuery, SourceListUrl, SourceDetailUrl

Usage

Scrape all brand pages:

dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv

Scrape directly from Autoampel typklassen pages (no hsn-tsn redirect chain):

dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --source autoampel > hsntsn.csv

Scrape only specific queries from stdin:

printf "0588\nGolf\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv

Enable detail-page enrichment:

printf "0588\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --include-details

Repair only missing year fields from an existing CSV:

dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --repair-years --input-csv hsntsn.csv --output-csv hsntsn.repaired.csv

Merge core fields by HsnTsn and write to PostgreSQL (priority: hsn-tsn.de then autoampel.de):

dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db --pg-connection "Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn" --pg-table public.hsntsn_vehicle

You can also pass the connection via environment variable:

export HSNTSN_PG="Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn"
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db

Optional: if you already have a CSV, you can still seed from it with --input-csv hsntsn.csv.