Files

62 lines
1.7 KiB
Markdown

# hsntsn-scraper
.NET console scraper.
Source: `http://www.hsn-tsn.de/`
CSV output fields:
- `HsnTsn`, `Hsn`, `Tsn`
- `Brand`, `VehicleType`, `Model`, `OfficialType`
- `YearFrom`, `YearTo`
- `PowerPs`, `PowerKw`, `DisplacementCcm`, `FuelType`
- `MatchKey`
- `SourceQuery`, `SourceListUrl`, `SourceDetailUrl`
## Usage
Scrape all brand pages:
```bash
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv
```
Scrape directly from Autoampel typklassen pages (no hsn-tsn redirect chain):
```bash
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --source autoampel > hsntsn.csv
```
Scrape only specific queries from `stdin`:
```bash
printf "0588\nGolf\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv
```
Enable detail-page enrichment:
```bash
printf "0588\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --include-details
```
Repair only missing year fields from an existing CSV:
```bash
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --repair-years --input-csv hsntsn.csv --output-csv hsntsn.repaired.csv
```
Merge core fields by `HsnTsn` and write to PostgreSQL (priority: `hsn-tsn.de` then `autoampel.de`):
```bash
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db --pg-connection "Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn" --pg-table public.hsntsn_vehicle
```
You can also pass the connection via environment variable:
```bash
export HSNTSN_PG="Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn"
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db
```
Optional: if you already have a CSV, you can still seed from it with `--input-csv hsntsn.csv`.