ConversionService - convert from GIS file to other format include itself using Aspose.com API

Dani_S 4,771 Reputation points
2025-11-21T06:39:29.8466667+00:00

Hi Bruce,

1.I used this conversion orchestrator to convert from one GIS format to another format include itself.

2.I have two cases with gisInputFilePath: an archive or single file.

public static ConversionResult Run(string gisInputFilePath, string outFolderPath, string tempFolderPath, IConverterFactory factory = null)

If it a archive we look in its contents without opening it and determine to which input converter its belongs to, otherwise we detect the single file to which format it belong to.

Do you see any problem with Run method and other methods,

is critical for me, because it influence on all the logic ?

3.The ConversionService class responsible on:

  • Validate paths (input exists, output/temp can be created).
  • Inspect input: single file or archive.
  • Detect the best matching converter based on input extension or archive contents (required file extensions).
  • Resolve converter using ConverterFactory (TryCreate) and invoke its Convert method.
  • Log each step and return a friendly ConversionResult on expected failures (validation, unknown option, missing required files) rather than throwing.

4.The code + tests files.

code.txt

Tests.txt

Thanks in advance,

Developer technologies | C#
Developer technologies | C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
{count} votes

5 answers

Sort by: Most helpful
  1. Gade Harika (INFOSYS LIMITED) 1,870 Reputation points Microsoft External Staff
    2025-11-26T04:45:54.32+00:00

    Thanks for reaching out.

    ConversionService.Run sometimes chooses the wrong converter for archives with overlapping contents (e.g., .kmz files with .kml inside), fails to detect File Geodatabase in ZIPs, and rejects .geojsonseq inputs. This breaks orchestration and affects all further logic.

    Symptoms

    • .kmz inputs are dispatched to Kml instead of Kmz.
    • ZIPs containing *.gdb folder (typical File Geodatabase) are not detected.
    • Single file .geojsonseq returns “Unknown input file type”.
    • Archives containing .json are ambiguous (GeoJSON vs EsriJSON vs TopoJSON vs GeoJSONSeq).

    Root Causes

    1. Non-deterministic archive detection with overlapping requirements (e.g., both Kml and Kmz match .kml).
    2. FGDB detection relies on Path.GetExtension(entry) and misses folder-based .gdb layout inside ZIPs.
    3. JSON detection is gated by ext.EndsWith("json"), which misses .geojsonseq.
    4. Archive JSON detection relies on extensions, not content sniffing.

    Exact Fix (Drop-in Code)

    What this adds:

    Deterministic archive resolution with outer extension bias (.kmz → Kmz).

    • FGDB detection via folder segments (*.gdb/…) and marker extensions (e.g., .gdbtable).
    • A JSON family extension set and content sniffing (file + archive stream).
    • .geojsonseq support.
    • Better messages for missing extensions, and optional directory handling for .gdb folders.
      **
      1) Update ConversionService (archive detection & single-file JSON)
      **notes 01.txt

    2) Add helper to open archive entry as a stream

    This lets you sniff JSON content inside ZIPs without extracting.

    notes 02.txt

    If you prefer full extraction to tempFolderPath instead of on-demand streams, you can skip this helper and extract the JSON entry then sniff the file. Streaming is faster and cleaner for large archives.

    Tests (prove the fix)

    Add these to your tests project (using the same FakeFactory/FakeConverter pattern you have).

    [Fact(DisplayName = RunDetectsGeoJs.txt

    Expected Outcomes (after fix)

    • .kmz archives are always dispatched to Kmz, even if they contain only .kml.
    • ZIPs with *.gdb/… folder or FGDB markers are correctly dispatched to Gdb.
    • .geojsonseq single files are recognized and dispatched to GeoJsonSeq.
    • Archives containing JSON are sniffed to the precise JSON type and dispatched accordingly.
    • Unknown/no-extension inputs produce clear, friendly error messages. If a failure persists inside a converter (e.g., Aspose parsing errors), the orchestration will correctly report a failure—those are converter/library issues, not detection/dispatch. Notes / Best Practices
    • Keep ConverterFactory.TryCreate(key, out conv) implemented for all keys you dispatch: GeoJson, EsriJson, TopoJson, GeoJsonSeq, Kmz, Gdb, Shapefile, etc.
    • Consider adding a CancellationToken to converter calls for large inputs.
    • JsonFormatDetector should support file and stream sniffing for GeoJSON/EsriJSON/TopoJSON/GeoJSONSeq.

    If the issue has been resolved, Kindly mark the provided solution as "Accept Answer", so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.


  2. Gade Harika (INFOSYS LIMITED) 1,870 Reputation points Microsoft External Staff
    2025-11-27T13:01:32.69+00:00

    Thanks for reaching out.

    We’ve updated the ConversionService to address several detection issues:

    1. KMZ inputs dispatched to KML instead of KMZ – Fixed KMZ files are now correctly detected as KMZ when the outer file extension is .kmz or when the archive contains a single top-level doc.kml file. This avoids false positives for generic ZIPs that happen to include .kml files.

    *2. ZIPs containing .gdb folder (Esri File Geodatabase) not detected – Fixed ZIP archives are now scanned for .gdb directory markers. If any folder in the archive ends with .gdb, it is correctly classified as an Esri File Geodatabase.

    3. Single file .geojsonseq returns “Unknown input file type” – Fixed Single-file GeoJSON Sequence (NDJSON) is now detected by checking for at least two JSON-looking lines (lines starting with { or [). This prevents misclassifying single-object GeoJSON files as GeoJSONSeq.

    4. Archives containing .json are ambiguous (GeoJSON vs EsriJSON vs TopoJSON vs GeoJSONSeq) – Solution For archives with one or more .json files, each JSON entry is now classified by reading a small header (up to 64 KB) and checking for content fingerprints:

    • TopoJSON: Root "type": "Topology" and "objects" present.
    • EsriJSON: "geometryType", "spatialReference", or features with "attributes".
    • GeoJSONSeq: Multiple JSON objects as NDJSON (≥2 JSON-looking lines).
    • GeoJSON: "FeatureCollection", "Feature", or geometry objects with "coordinates".

    The converter is selected based on the most frequent match among the JSON entries. If the classification is inconclusive, the service returns a friendly message indicating ambiguity.

    Summary: These improvements ensure robust detection and conversion for KMZ, File Geodatabase, GeoJSON Sequence, and ambiguous JSON files in archives. The service now uses both file extensions and content fingerprints for reliable format detection.

    If the issue has been resolved, Kindly mark the provided solution as "Accept Answer", so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.


  3. Gade Harika (INFOSYS LIMITED) 1,870 Reputation points Microsoft External Staff
    2025-11-28T11:15:20.0933333+00:00

    Thanks for sharing the update. I reviewed the new ConversionService implementation you posted in 1_new (1).txt and validated it against the four symptoms you listed. Here’s what I found:

    KMZ inputs were dispatched to KMLPartially fixed Your archive detection now special‑cases .kml inside an archive and selects the Kmz converter (see the DetectConverterFromArchiveEntries final branch that returns "Kmz" when .kml is present). This correctly handles true KMZ files and single‑file .kmz inputs via extension mapping. However, any generic ZIP that merely contains a .kml (not a real KMZ) will also be classified as KMZ, which can still produce false positives. A small guard that checks the outer extension .kmz or single top‑level doc.kml would remove this ambiguity. [1_new (1) | Txt]

    ZIPs containing *.gdb folder (Esri FileGDB) not detectedFixed You now scan path segments within archive entries and add a .gdb marker when a folder ends with .gdb. That causes the archive to satisfy the "Gdb" requirement and correctly pick the FileGDB converter. This addresses the previous miss. [1_new (1) | Txt]

    Single‑file .geojsonseq returns “Unknown input file type”Not fully fixed The fallback JSON sniff treats any first non‑empty line starting with { or [ as GeoJSONSeq (NDJSON) unless "FeatureCollection", "Topology", or "spatialReference" is found. This can misclassify single‑object GeoJSON (e.g., a single Feature or Geometry) as GeoJSONSeq. The intended rule (“≥2 JSON‑looking lines”) isn’t implemented—only the first line is checked. You’ll want to count at least two JSON‑looking lines to confirm NDJSON and treat single‑line inputs as regular GeoJSON when "Feature" or coordinates are present. [1_new (1) | Txt]

    Archives containing .json are ambiguous (GeoJSON vs EsriJSON vs TopoJSON vs GeoJSONSeq)Not fixed The archive classifier relies on extension presence and the _s_archiveRequirements dictionary. Because several formats list only ".json" as the requirement, the first dictionary match (currently EsriJson) will win for any archive containing .json, which is ambiguous. The code does not read headers of JSON entries in the archive to fingerprint content (Topology vs Esri schema vs FeatureCollection vs NDJSON), nor does it tally the most frequent match. Adding a lightweight header read (≤64 KB) and voting per entry will resolve this as designed. [1_new (1) | Txt]

    Bottom line:

    • (2) is resolved.
    • (1) is mostly addressed but should add a guard to avoid ZIP false positives.
    • (3) and (4) still need changes to implement the multi‑line NDJSON test and per‑entry JSON fingerprinting.

    If this aligns with your expectations, please mark the answer as “Accept Answer” so others can find it easily.

    Targeted code fixes you can apply now

    All snippets below are drop‑in style and keep your current design (factory resolution, friendly ConversionResult, and logging).

    1) KMZ disambiguation: prefer outer .kmz or single top‑level doc.kml

    1KMZ.txt

    Call this overload from Run(...) and pass gisInputFilePath as outerPath. This prevents a generic ZIP with a stray .kml from being mistaken for KMZ. [1_new (1) | Txt]

    2) Proper NDJSON (GeoJSONSeq) detection: require ≥2 JSON-looking lines
    NDJSON.txt

    **3) JSON‑in‑archive ambiguity: classify each .json entry and vote
    JSON.txt
    **Integrate this into the archive path before consulting _s_archiveRequirements for formats that only specify .json. If null is returned on a tie, emit your existing friendly failure (“ambiguous JSON in archive—please specify format”). [1_new (1) | Txt]

    4) Minor robustness in Run(...)

    • Limit header reads: don’t File.ReadAllText(...) the entire file—cap to 64 KB as shown above.
    • Extension routing: your ext.EndsWith("json") branch nicely funnels .topojson, .esrijson, .geojson into content detection; keep it, but now with the multi‑line NDJSON rule.
    • Logging: you already provide friendly messages; consider logging the selected converter key for traceability, and the reason (e.g., “KMZ guard: outer .kmz”). [1_new (1) | Txt]

    Suggested test cases to add

    1. sample.kmz (outer .kmz) with doc.kmlKmz
    2. generic.zip with roads.kml and images → not Kmz (should be unknown or KML depending on policy)
    3. gdb.zip containing mydata.gdb/...Gdb
    4. single.geojson with one Feature object (no FeatureCollection) → GeoJson, not GeoJsonSeq
    5. seq.json with 3 NDJSON lines → GeoJsonSeq
    6. Archive containing topo.json and layer.json with Esri schema → vote majority (Topo vs EsriJSON) or ambiguity if tie

    If the issue has been resolved, Kindly mark the provided solution as "Accept Answer", so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.


  4. Gade Harika (INFOSYS LIMITED) 1,870 Reputation points Microsoft External Staff
    2025-12-01T13:59:38.0133333+00:00

    Thanks for reaching out.
    Your ConversionService design is on the right track, but make sure these critical fixes are implemented:
    Main Fixes

    • KMZ disambiguation: Prefer outer .kmz or top-level doc.kml before treating as generic ZIP.
    • NDJSON detection: Require ≥2 JSON-like lines for GeoJSONSeq.
    • JSON-in-archive ambiguity: Classify each .json entry and vote; if tie → return friendly failure (“ambiguous JSON in archive—please specify format”).
    • Robustness:
      • Limit header reads to 64 KB (avoid full File.ReadAllText).
        • Keep extension routing for .geojson, .topojson, .esrijson with NDJSON rule.
    • Logging: Include converter key and reason (e.g., “KMZ guard: outer .kmz”) for traceability.

    Advice

    • Validate paths and handle unknown formats gracefully using ConversionResult (not exceptions).
    • Ensure tests cover:
      - Single file vs archive detection.
      
            - KMZ vs ZIP with stray KML.
      
                  - NDJSON multi-line rule.
      
                        - Ambiguous JSON → friendly failure.
      
      • If issues persist, share failing test names and returned ConversionResult.
      If the issue has been resolved, Kindly mark the provided solution as "Accept Answer", so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

  5. Dani_S 4,771 Reputation points
    2025-12-02T11:56:30.1866667+00:00

    mistake..

    put my answer now in comment


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.