62 lines
1.7 KiB
Markdown
62 lines
1.7 KiB
Markdown
# hsntsn-scraper
|
|
|
|
.NET console scraper.
|
|
|
|
Source: `http://www.hsn-tsn.de/`
|
|
|
|
CSV output fields:
|
|
|
|
- `HsnTsn`, `Hsn`, `Tsn`
|
|
- `Brand`, `VehicleType`, `Model`, `OfficialType`
|
|
- `YearFrom`, `YearTo`
|
|
- `PowerPs`, `PowerKw`, `DisplacementCcm`, `FuelType`
|
|
- `MatchKey`
|
|
- `SourceQuery`, `SourceListUrl`, `SourceDetailUrl`
|
|
|
|
## Usage
|
|
|
|
Scrape all brand pages:
|
|
|
|
```bash
|
|
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv
|
|
```
|
|
|
|
Scrape directly from Autoampel typklassen pages (no hsn-tsn redirect chain):
|
|
|
|
```bash
|
|
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --source autoampel > hsntsn.csv
|
|
```
|
|
|
|
Scrape only specific queries from `stdin`:
|
|
|
|
```bash
|
|
printf "0588\nGolf\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj > hsntsn.csv
|
|
```
|
|
|
|
Enable detail-page enrichment:
|
|
|
|
```bash
|
|
printf "0588\n" | dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --include-details
|
|
```
|
|
|
|
Repair only missing year fields from an existing CSV:
|
|
|
|
```bash
|
|
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --repair-years --input-csv hsntsn.csv --output-csv hsntsn.repaired.csv
|
|
```
|
|
|
|
Merge core fields by `HsnTsn` and write to PostgreSQL (priority: `hsn-tsn.de` then `autoampel.de`):
|
|
|
|
```bash
|
|
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db --pg-connection "Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn" --pg-table public.hsntsn_vehicle
|
|
```
|
|
|
|
You can also pass the connection via environment variable:
|
|
|
|
```bash
|
|
export HSNTSN_PG="Host=localhost;Port=5432;Database=hsntsn;Username=hsntsn;Password=hsntsn"
|
|
dotnet run --project src/HsnTsnScraper/HsnTsnScraper.csproj -- --merge-core-db
|
|
```
|
|
|
|
Optional: if you already have a CSV, you can still seed from it with `--input-csv hsntsn.csv`.
|