Skip to content

Commit a5f4e47

Browse files
committed
Fix Socrata previews returning null
It looks like `row_to_json(t.*)` makes Multicorn think we're requesting no columns from the FDW (this could be related to our Multicorn changes, but it doesn't happen with the LQFDW): ```sql explain SELECT row_to_json(p.*) FROM sg_tmp_67cecbc993a0738c2f815e80169dafd3.some_table p LIMIT 10 Limit (cost=20.00..20.02 rows=10 width=32) -> Foreign Scan on some_table p (cost=20.00..237.38 rows=94954 width=32) Multicorn: Socrata query to data.cityofnewyork.us Multicorn: Socrata dataset ID: 8wbx-tsch Multicorn: Query: Multicorn: Columns: Multicorn: Order: :id ``` This query returns JSON objects full of NULLs. Rewrite the preview query to first get the first 10 rows as a materialized CTE, then use `row_to_json` on the result, which seems to fix the issue: ```sql explain SELECT t.* FROM sg_tmp_67cecbc993a0738c2f815e80169dafd3.some_table t LIMIT 10 Limit (cost=20.00..33620.00 rows=10 width=3360) -> Foreign Scan on some_table t (cost=20.00..319045440.00 rows=94954 width=3360) Multicorn: Socrata query to data.cityofnewyork.us Multicorn: Socrata dataset ID: 8wbx-tsch Multicorn: Query: Multicorn: Columns: `last_date_updated`,`license_type`,`last_time_updated`,`vehicle_license_number`,`wheelchair_accessible`,`vehicle_year`,`permit_license_number`,`order_date`,`certification_date`,`base_address`,`base_name`,`reason`,`hack_up_date`,`expiration_date`,`vehicle_vin_number`,`base_number`,`base_telephone_number`,`dmv_license_plate_number`,`website`,`active`,`base_type`,`veh`,`name` Multicorn: Order: :id explain with p as materialized (SELECT * FROM sg_tmp_67cecbc993a0738c2f815e80169dafd3.some_table LIMIT 10) select row_to_json(p.*) from p CTE Scan on p (cost=33620.00..33620.22 rows=10 width=32) CTE p -> Limit (cost=20.00..33620.00 rows=10 width=3360) -> Foreign Scan on some_table (cost=20.00..319045440.00 rows=94954 width=3360) Multicorn: Socrata query to data.cityofnewyork.us Multicorn: Socrata dataset ID: 8wbx-tsch Multicorn: Query: Multicorn: Columns: `last_date_updated`,`license_type`,`last_time_updated`,`vehicle_license_number`,`wheelchair_accessible`,`vehicle_year`,`permit_license_number`,`order_date`,`certification_date`,`base_address`,`base_name`,`reason`,`hack_up_date`,`expiration_date`,`vehicle_vin_number`,`base_number`,`base_telephone_number`,`dmv_license_plate_number`,`website`,`active`,`base_type`,`veh`,`name` Multicorn: Order: :id ```
1 parent d1e74d6 commit a5f4e47

2 files changed

Lines changed: 8 additions & 3 deletions

File tree

splitgraph/hooks/data_source/fdw.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -235,9 +235,10 @@ def _preview_table(self, schema: str, table: str, limit: int = 10) -> List[Dict[
235235
result_json = cast(
236236
List[Dict[str, Any]],
237237
self.engine.run_sql(
238-
SQL("SELECT row_to_json(t.*) FROM {}.{} t LIMIT %s").format(
239-
Identifier(schema), Identifier(table)
240-
),
238+
SQL(
239+
"WITH p AS MATERIALIZED(SELECT * FROM {}.{} LIMIT %s) "
240+
"SELECT row_to_json(p.*) FROM p"
241+
).format(Identifier(schema), Identifier(table)),
241242
(limit,),
242243
return_shape=ResultShape.MANY_ONE,
243244
),

test/splitgraph/ingestion/test_socrata.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -423,3 +423,7 @@ def test_socrata_data_source_introspection_smoke(local_engine_empty):
423423
schema, params = result["some_table"]
424424
assert len(schema) > 1
425425
assert params == {"socrata_id": "8wbx-tsch"}
426+
427+
preview_result = data_source.preview(tables=result)
428+
assert len(preview_result["some_table"]) == 10
429+
assert preview_result["some_table"][0][":id"] is not None

0 commit comments

Comments
 (0)