Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ogr2ogr: translate of DateTime column to shapefile gives an error #11671

Closed
theroggy opened this issue Jan 16, 2025 · 3 comments · Fixed by #11675
Closed

ogr2ogr: translate of DateTime column to shapefile gives an error #11671

theroggy opened this issue Jan 16, 2025 · 3 comments · Fixed by #11675
Assignees

Comments

@theroggy
Copy link
Contributor

theroggy commented Jan 16, 2025

What is the bug?

On the master branch, when a GPKG with a datetime column is converted to a shapefile, an error is thrown:

RuntimeError: Terminating translation prematurely after failed
translation of layer src_lyr (use -skipfailures to skip errors)
May be caused by: WriteArrowBatch() failed
May be caused by: For field field_date, OGR field type is Date whereas Arrow type implies String

When OGR2OGR_USE_ARROW_API is set to NO, the error doesn't occur.

Remark: I don't have a strong opinion about this being good or bad... (probably I think the arrow behaviour is even an improvement), but there is also a difference in the result being written for GDAL versions < 3.11 between when arrow is used or not:

  • If arrow is used, the datetime column is saved to the shapefile as a String type column
  • If arrow is not used, the column is written as Date type column, with the disadvantage that all time information is lost

Steps to reproduce the issue

Script can be ran to show two things:

  • if script is ran with GDAL master, the WITH arrow part will give give an error
  • if script is ran with GDAL < master,
    • the WITH arrow part will result in a shapefile with
      • a String column for the input DateTime column
      • a Date column for the input Date column
    • the WITHOUT arrow part will result in a shapefile with a Date column for both the input DateTime and Date column
import os
import tempfile
from pathlib import Path
from osgeo import gdal, ogr

ogr.UseExceptions()

# Create input test file with a datetime field with a date in it
tmp_dir = Path(tempfile.gettempdir())
input_path = tmp_dir / "test.gpkg"
input_path.unlink(missing_ok=True)
src_ds = ogr.GetDriverByName("GPKG").CreateDataSource(input_path)
src_lyr = src_ds.CreateLayer("src_lyr")

field_def = ogr.FieldDefn("field_int", ogr.OFTInteger)
src_lyr.CreateField(field_def)

field_def = ogr.FieldDefn("field_dt", ogr.OFTDateTime)
src_lyr.CreateField(field_def)

field_def = ogr.FieldDefn("field_date", ogr.OFTDate)
src_lyr.CreateField(field_def)

feat_def = src_lyr.GetLayerDefn()
src_feature = ogr.Feature(feat_def)
src_feature.SetField("field_int", 1)

#src_feature.SetField("field_dt", "2020-05-01")
#src_feature.SetField("field_dt", "2020-05-01T00:00:00.000Z")
src_feature.SetField("field_dt", "2020-05-01T01:02:03.456Z")
src_feature.SetField("field_date", "2020-05-01")

src_feature.SetGeometry(ogr.CreateGeometryFromWkt("POINT (1 2)"))
src_feature.SetFID(1)

src_lyr.CreateFeature(src_feature)

src_ds = None

# Translate the file to a new shapefile, without arrow API: datetime becomes date
print("=== output file info, WITHOUT arrow ===")
output_path = tmp_dir / "test_noarrow.shp"
os.environ["OGR2OGR_USE_ARROW_API"] = "NO"
output_ds = gdal.VectorTranslate(destNameOrDestDS=output_path, srcDS=input_path)
output_ds = None

output_ds = gdal.OpenEx(output_path, nOpenFlags=gdal.OF_VECTOR)
output_layer = output_ds.GetLayer()
layer_defn = output_layer.GetLayerDefn()
print(f'{layer_defn.GetFieldDefn(1).GetName()} type: {layer_defn.GetFieldDefn(1).GetTypeName()}')
print(f'{layer_defn.GetFieldDefn(2).GetName()} type: {layer_defn.GetFieldDefn(2).GetTypeName()}')

# Translate the file to a new shapefile, with arrow API: datetime becomes string
print("=== output file info, WITH arrow ===")
output_path = tmp_dir / "test_arrow.shp"
os.environ["OGR2OGR_USE_ARROW_API"] = "YES"
output_ds = gdal.VectorTranslate(destNameOrDestDS=output_path, srcDS=input_path)
output_ds = None

output_ds = gdal.OpenEx(output_path, nOpenFlags=gdal.OF_VECTOR)
output_layer = output_ds.GetLayer()
layer_defn = output_layer.GetLayerDefn()
print(f'{layer_defn.GetFieldDefn(1).GetName()} type: {layer_defn.GetFieldDefn(1).GetTypeName()}')
print(f'{layer_defn.GetFieldDefn(2).GetName()} type: {layer_defn.GetFieldDefn(2).GetTypeName()}')

Versions and provenance

Tested on windows 11, with gdal installed using conda.

  • To test GDAL < master, GDAL 3.9.3 installed from conda-forge was used
  • To test GDAL master, GDAL ~3.10.0 installed from gdal-master was used

Additional context

No response

@jratike80
Copy link
Collaborator

Is it correct by the GeoPackage standard to write date without time "2014-12-04" into a DATETIME field? Shouldn't it be in a DATE field?

@theroggy
Copy link
Contributor Author

theroggy commented Jan 16, 2025

Is it correct by the GeoPackage standard to write date without time "2014-12-04" into a DATETIME field? Shouldn't it be in a DATE field?

True... probably not... even though it might be useful that it would stay working anyway. I replaced the value with "2020-05-01T00:00:00.000Z" in the script above, which gives the same error.

EDIT: for further testing purposes, I replaced the value as well by "2020-05-01T01:02:03.456Z", and when arrow is disabled, that DateTime is also saved as Date in shapefile, so also if there is useful time information (rather than all 0's), it also seems to be dropped in the "old" code path...

@jratike80
Copy link
Collaborator

I think that DATETIME should be written as text into shapefile always because native data type is not supported. And DATE should go through as DATE because it is supported.

@rouault rouault self-assigned this Jan 16, 2025
rouault added a commit to rouault/gdal that referenced this issue Jan 16, 2025
theroggy added a commit to geofileops/geofileops that referenced this issue Jan 18, 2025
Add CI config to test on the gdal "nightly" master version.

GDAL will be installed from the gdal-master conda channel:
https://anaconda.org/gdal-master/libgdal-core/files

Activating gdal nightly CI triggered this:
- [x] OSGeo/gdal#11671
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants