Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV trim-option does not trim spaces at the end of strings #57959

Open
2 tasks done
wvdbee opened this issue Jul 2, 2024 · 4 comments
Open
2 tasks done

CSV trim-option does not trim spaces at the end of strings #57959

wvdbee opened this issue Jul 2, 2024 · 4 comments
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Delimited text data provider

Comments

@wvdbee
Copy link

wvdbee commented Jul 2, 2024

What is the bug or the crash?

The trim fields-option does not trim a string field.

Steps to reproduce the issue

Settings:

  • Attribute Only-CSV
  • All fields are quoted with a double apostrophe.
  • Al fields are separated with a semi-colon

Some fields contain loads of training spaces
afbeelding

Settings in Add CSV-Layer-window
afbeelding

Alas, spaces have not been trimmed.
afbeelding

Demo-set
50120NED_TypedDataSet_25062024_172055.zip

Versions

<style type="text/css"> p, li { white-space: pre-wrap; } </style>
QGIS version 3.36.3-Maidenhead QGIS code revision 2df9655
Qt version 5.15.13
Python version 3.12.3
GDAL/OGR version 3.9.0
PROJ version 9.4.0
EPSG Registry database version v11.004 (2024-02-24)
GEOS version 3.12.1-CAPI-1.18.1
SQLite version 3.45.1
PDAL version 2.6.3
PostgreSQL client version 16.2
SpatiaLite version 5.1.0
QWT version 6.2.0
QScintilla2 version 2.14.1
OS version Windows 11 Version 2009
       
Active Python plugins
AutomaticBackup-master 1.0
BGTImport 3.18
changeDataSource 3.1
create_layer_from_selected_features 1.2
geo_sim_processing 1.2.0
GroupStats 2.2.7
mmqgis 2021.9.10
pcraster_tools 0.3.0
pdokservicesplugin 5.0.1
precisioncursor4qgis-main 1.1.D
processing_saga_nextgen 1.0.0
qgis_resource_sharing 1.0.0
quick_map_services 0.19.34
SelectWithin 0.4
slyr_community 5.0.0
StreetView 3.2
topo_tijdreis 1.0
db_manager 0.1.20
MetaSearch 0.3.6
processing 2.12.99
QGIS version 3.36.3-Maidenhead QGIS code revision [2df9655](https://github.com/qgis/QGIS/commit/2df96554) Qt version 5.15.13 Python version 3.12.3 GDAL/OGR version 3.9.0 PROJ version 9.4.0 EPSG Registry database version v11.004 (2024-02-24) GEOS version 3.12.1-CAPI-1.18.1 SQLite version 3.45.1 PDAL version 2.6.3 PostgreSQL client version 16.2 SpatiaLite version 5.1.0 QWT version 6.2.0 QScintilla2 version 2.14.1 OS version Windows 11 Version 2009

Active Python plugins
AutomaticBackup-master
1.0
BGTImport
3.18
changeDataSource
3.1
create_layer_from_selected_features
1.2
geo_sim_processing
1.2.0
GroupStats
2.2.7
mmqgis
2021.9.10
pcraster_tools
0.3.0
pdokservicesplugin
5.0.1
precisioncursor4qgis-main
1.1.D
processing_saga_nextgen
1.0.0
qgis_resource_sharing
1.0.0
quick_map_services
0.19.34
SelectWithin
0.4
slyr_community
5.0.0
StreetView
3.2
topo_tijdreis
1.0
db_manager
0.1.20
MetaSearch
0.3.6
processing
2.12.99

Supported QGIS version

  • I'm running a supported QGIS version according to the roadmap.

New profile

Additional context

No response

@wvdbee wvdbee added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Jul 2, 2024
@wvdbee wvdbee changed the title CSV-trim-option does not trim spaces to the end of strings CSV trim-option does not trim spaces at the end of strings Jul 2, 2024
@pigreco
Copy link
Contributor

pigreco commented Jul 3, 2024

With the attached dataset I confirm the problem also in QGIS 3.34.8, 3.38.0

OSGeo4W win 11

@aborruso
Copy link

aborruso commented Jul 3, 2024

Hi @wvdbee in some way your CSV is not optimal, because you have double quotes even when you don't need them.

If you have something like this

image

ID;Perioden;Gemeentenaam_1
4683;2022JJ00;Amsterdam                               
4687;2022JJ00;Amsterdam                               
4691;2022JJ00;Amsterdam                               
4695;2022JJ00;Amsterdam                               
4699;2022JJ00;Amsterdam                               
4703;2022JJ00;Amsterdam                               
4707;2022JJ00;Amsterdam                               

the trimming seems to work

csv

It should probably work in your case, too, but maybe if you have the double quotes it's like telling QGIS that those are not "normal" spaces, but that those spaces are part of the values.
I don't know how QGIS is set up, though. I wanted to show you this quotation mark thing.

Best regards

@wvdbee
Copy link
Author

wvdbee commented Jul 3, 2024

Hello @aborruso

Thank you for your insight. But I think you're not completely right. Couple of remarks:

RFC4180 says

  • spaces are considered part of a field;
  • fields may or may not be enclosed in double quotes.

Which means that aaa,bbb,ccc is not equal to aaa,bbb ,ccc. Second column in the second example contains 4 characters and in the first example it contains 3 characters. So double quotes are not required to add spaces to a field. Spaces and quotes are not explicitly linked to each other.

Second: in fact the file is a DSV and not a CSV (delimiter separated and not comma separated). DSV is free regarding the way it is formatted. Thankfully QGIS supports all kinds of DSV and not specifically RFC4180-complient CSV. DSV-formatted files can contain al sorts of intended and unintended data. And there are no explicit formatting rules, I guess. So you can not do assumptions regarding the meaning of quotes in DSV-files.
But even if you apply RFC4180 formatting rules to a DSV-file, then the following examples contain exactly the same content: "aaa";"bbb ";"ccc" and aaa;bbb ;ccc

Third, my main point: I interpret the Trim Fields-check box as a post-processing option: whenever a field contains spaces around, then trim these spaces. Which means, it is up to the user to decide whether or not spaces are intended or not. Back to my example: in RFC4180-complient csv aaa,bbb ,ccc the space is also part of the data. Even then it is up to the user to decide whether or not the space is intentionally or unwanted. The trim-check box then comes in quit usefull to trim unwanted spaces. I think it should work for both the quoted and unquoted fields.

By the way, the example data is is not mine but is a download from our national statistics agency. Spaces are unwanted and not intended. I quess it is a flaw in one of their download formats. And, yes, I know I can preprocess the CSV/DSV myself. But what then is the purpose of the check box? :-)

https://statline.rivm.nl/portal.html?_la=nl&_catalog=RIVM&tableId=50120NED&_theme=93

@aborruso
Copy link

aborruso commented Jul 3, 2024

Hi @wvdbee

Thank you for your insight. But I think you're not completely right. Couple of remarks:

RFC4180 says

I don't think I'm completely in the right, nor did I refer to rfc4180.
And I have nothing against the data from the national statistics agency.

I do not like CSV with unnecessary double quotes, I therefore cleaned it up and saw that it worked better.
And I reported it to you.

From here on, a QGIS developer, who is reading this thread, has some useful elements to make a good choice.

@agiudiceandrea agiudiceandrea added GUI/UX Related to QGIS application GUI or User Experience Delimited text data provider and removed GUI/UX Related to QGIS application GUI or User Experience labels Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Delimited text data provider
Projects
None yet
Development

No branches or pull requests

4 participants