1.7 KiB
1.7 KiB
hsntsn-scraper
.NET console scraper.
Source: http://www.hsn-tsn.de/
CSV output fields:
HsnTsn,Hsn,TsnBrand,VehicleType,Model,OfficialTypeYearFrom,YearToPowerPs,PowerKw,DisplacementCcm,FuelTypeMatchKeySourceQuery,SourceListUrl,SourceDetailUrl
Usage
Scrape all brand pages:
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv
Scrape directly from Autoampel typklassen pages (no hsn-tsn redirect chain):
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --source autoampel > hsntsn.csv
Scrape only specific queries from stdin:
printf "0588\nGolf\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv
Enable detail-page enrichment:
printf "0588\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --include-details
Repair only missing year fields from an existing CSV:
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --repair-years --input-csv hsntsn.csv --output-csv hsntsn.repaired.csv
Merge core fields by HsnTsn and write to PostgreSQL (priority: hsn-tsn.de then autoampel.de):
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db --pg-connection "Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn" --pg-table public.hsntsn_vehicle
You can also pass the connection via environment variable:
export HSNTSN_PG="Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn"
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db
Optional: if you already have a CSV, you can still seed from it with --input-csv hsntsn.csv.