From e620ad3568e1603c2bccb752977798a523997d1a Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Sun, 12 Jan 2025 15:37:40 +0100 Subject: [PATCH 1/4] Add RFC 105 text: Add and use safe path manipulation functions --- doc/source/development/rfc/index.rst | 1 + ...fc105_safe_path_manipulation_functions.rst | 244 ++++++++++++++++++ 2 files changed, 245 insertions(+) create mode 100644 doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst diff --git a/doc/source/development/rfc/index.rst b/doc/source/development/rfc/index.rst index 7f68c19d9fda..341bad45fad2 100644 --- a/doc/source/development/rfc/index.rst +++ b/doc/source/development/rfc/index.rst @@ -110,3 +110,4 @@ RFC list rfc102_embedded_resources rfc103_schema_open_option rfc104_gdal_cli + rfc105_safe_path_manipulation_functions diff --git a/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst new file mode 100644 index 000000000000..91de6b3ca504 --- /dev/null +++ b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst @@ -0,0 +1,244 @@ +.. _rfc-105: + +=================================================================== +RFC 105: Add and use safe path manipulation functions +=================================================================== + +============== ============================================= +Author: Even Rouault +Contact: even.rouault @ spatialys.com +Started: 2025-01-11 +Status: Draft +Target: GDAL 3.11 +============== ============================================= + +Summary +------- + +This RFC adds safe versions, for use by C++ code, of all functions of +cpl_path.cpp (such as CPLGetPath(), CPLGetDirname(), CPLGetBasename(), CPLGetExtension(), +CPLGetFormFilename(), CPLGetFormCIFilename(), etc.), that returns a result stored +in more or less ephemeral storage, to avoid potential security issues related +to their mis-use. It also covers converting most of the code base to the safer +alternatives. + +Motivation +---------- + +Above mentioned functions return a ``const char *`` string, which is owned by +a thread-locale rotating buffer of 10 strings of up to 2048 bytes each, managed +by the `CPLGetStaticResult() `__ +function of port/cpl_path.cpp + +The current functions are very easy to use, allowing to do things like the +following, without caring about freeing temporary buffers: + +.. code-block:: c + + const char *pszDirname = CPLGetDirname(pszFilename); + const char *pszBasename = CPLGetBasename(pszFilename); + const char *pszSidecarFile = CPLFormFilename(pszDirname, pszBasename, "bin"); + + +But this ease of use doesn't come risk-free, given that there is only room for +10 strings for a given thread. So if doing the following: + +.. code-block:: c + + const char* pszExt = CPLGetExtension(mainFile); + for (int i = 0; i < 10; ++i ) + { + do_something(CPLGetExtension(auxiliaryFiles[i])); + } + do_something_else(pszExt); + +when reaching the last line, ``pszExt`` is no longer an alias of +``mainFile``, but of `CPLGetExtension(auxiliaryFiles[0])`. This will in the best +cases "just" cause malfunction, in bad cases, reading outside the bounds of +buffers, and perhaps even worse when unlucky. + +This risk of such mis-use has been identified for long, and the documentation of +those functions states that the result they return is short-lived, and that +it must be stored in a longer-living storage when not immediately used. + +Thus, patterns like the following is the recommended practice: + +.. code-block:: c++ + + const std::string osExt = CPLGetExtension(mainFile); + for (int i = 0; i < 10; ++i ) + { + do_something(std::string(CPLGetExtension(auxiliaryFiles[i])).c_str()); + } + do_something_else(osExt.c_str()); + + +However, experience has shown that it is extremely easy to forget that, and over +the years, this has caused several heap buffer overflows bugs detected by +oss-fuzz, and probably latent ones still uncovered. +The later in date is https://issues.oss-fuzz.com/issues/388868487 , fixed +by https://github.com/OSGeo/gdal/pull/11638 , and shows how subtle those bugs +can be. In that instance, the issue is due to complex chains of calls where we +end up using more than 10 instances of the thread-locale storage. + +Implementation +-------------- + +Rather to react on a case by case when fuzzers found issues, we want a +systematic approach to suppress their root case: + +- Start from the implementation of the CPLXXXX() functions, copy them to + C++ CPLXXXXSafe() functions that return a ``std::string`` instead of a ``const char *``, + and modify them to no longer use thread-locale rotating buffer of strings, + but rely on std::string natural concatenation routines instead + of the convoluted C concatenation logic, resulting in clearer code. + +- make the now superseded CPLXXXX() functions call the Safe versions, and store + the result in the thread-locale rotating buffer of strings. So those functions + will mostly behave as their current implementation + + That is a simple as: + + .. code-block:: c++ + + const char *CPLGetPath(const char *pszFilename) + { + return CPLPathReturnTLSString(CPLGetPathSafe(pszFilename), __FUNCTION__); + } + + +- deprecate the use of the legacy CPLXXXX() functions in C++ code and replace + wherever possible their use by the use of the new CPLXXXXSafe() functions. + +After some preliminary work in https://github.com/OSGeo/gdal/pull/11639, the +status of remaining unsafe calls is: + +:: + + $ grep -r CPLGetBasename frmts ogr | grep -v Safe | wc -l + 66 + $ grep -r CPLGetDirname frmts ogr | grep -v Safe | wc -l + 29 + $ grep -r CPLGetPath frmts ogr | grep -v Safe | wc -l + 43 + $ grep -r CPLGetExtension frmts ogr | grep -v Safe | wc -l + 33 + $ grep -r CPLFormFilename frmts ogr | grep -v Safe | wc -l + 415 + $ grep -r CPLFormCIFilename frmts ogr | grep -v Safe | wc -l + 53 + $ grep -r CPLResetExtension frmts ogr | grep -v Safe | wc -l + 112 + $ grep -r CPLProjectRelativeFilename frmts ogr | grep -v Safe | wc -l + 19 + $ grep -r CPLGenerateTempFilename frmts ogr | grep -v Safe | wc -l + 23 + + +That is after automating a number of replacements for patterns like the following +ones ``some_func(CPLGetBasename(x))``, to be replaced by +``some_func(CPLGetBasenameSafe(x).c_str())``. + +Patterns like ``variable_name = CPLGetBasename(x)`` will however require manual +intervention. If variable_name is a ``std::string`` or ``CPLString``, then +replacing by ``variable_name = CPLGetBasenameSafe(x)`` is appropriate. If +variable_name is a ``const char*``, case-by-case analysis will have to be done, +to either change its type to std::string / CPLString, or create a temporary +std::string, and have variable_name be assigned to std_string_temp.c_str(). + +A Continuous Integration script check will be added to verify that use of the +unsafe functions is not re-added in parts of the code base where they have +been eliminated. + +Although most call sites can benefit from the safe alternatives, we cannot +remove completely the legacy functions, + +- because they are used by C code. For such code, immediately + storing the result with CPLStrdup() is the best alternative (and remembering + to CPLFree()) + +- there is at least one known external code base (MapServer) that use some of + the legacy functions, although it uses them from a C++ source code file, and + thus could eventually migrate to the CPLXXXXSafe() functions when they are + released. + +- even within GDAL, there are situations where we cannot easily use the + safe alternatives. For example other functions or methods returning themselves + a ``const char *``. However, in some situations, particularly for C++ methods, + we can in some cases add a std::string member variable to store the result + of the safe methods and return its C-string. + +Impact on performance +--------------------- + +There might be a theoretical impact on performance due to dynamic memory +allocation done by temporary ``std::string`` allocations, although normally +those uses occur in the identification and open part of drivers, and not in +performance critical loops. As the identification part is still critical, and +its main use if to get the filename extension, we extend the GDALOpenInfo class, +so it stores the filename extension as a member variable, which will save tens +of drivers from the need of calling CPLGetExtensionSafe(), as well as a +convenience `GDALOpenInfo::IsExtensionEqualToCI(const char* pszExtension)` method. +So, all in all, the performance impact of those changes is thought/hoped to be +in the hardly measurable category. + +Out-of-scope (for the candidate implementation related to this RFC) +------------------------------------------------------------------- + +We have exactly the same issue with ``const char* CPLSPrintf(const char* pszFmt, ....)`` +which uses its own thread-locale rotating buffer of 10 strings of size up to +8000 bytes each. + +The safer alternative exists as ``CPLString::Printf()`` and it would be worth +generalizing its use. I don't remember (at least recent) security related issues +related to mis-use, but there have been bugs related to the truncation to 8000 +bytes. + +To keep things manageable in terms of implementation, such replacement will not +be covered by the candidate implementation linked to this RFC, but this could be +a worthwhile goal to pursue besides it. + +Risks +----- + +Due to the amount of changes, there is a non-zero risk of causing regressions, +in particular in drivers with poor coverage of our regression test suite. + +Backward compatibility +---------------------- + +No impact + +Testing +------- + +Existing autotest suite will cover changes. + +Documentation +------------- + +No impact + +Related issues and PRs +---------------------- + +* Bug that triggered this PR: https://github.com/OSGeo/gdal/pull/11638 + +* Candidate implementation (WIP): https://github.com/OSGeo/gdal/pull/11639 + +Funding +------- + +Funded by GDAL Sponsorship Program (GSP). + +Voting history +-------------- + +TODO + + + +.. below is an allow-list for spelling checker. + +.. spelling:word-list:: + fuzzers From 78293c62b6013c0708383f98130720e26a32884c Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Mon, 13 Jan 2025 16:56:13 +0100 Subject: [PATCH 2/4] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Laurențiu Nicola --- .../rfc/rfc105_safe_path_manipulation_functions.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst index 91de6b3ca504..b3bd9df07a14 100644 --- a/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst +++ b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst @@ -26,7 +26,7 @@ Motivation ---------- Above mentioned functions return a ``const char *`` string, which is owned by -a thread-locale rotating buffer of 10 strings of up to 2048 bytes each, managed +a thread-local rotating buffer of 10 strings of up to 2048 bytes each, managed by the `CPLGetStaticResult() `__ function of port/cpl_path.cpp @@ -79,7 +79,7 @@ oss-fuzz, and probably latent ones still uncovered. The later in date is https://issues.oss-fuzz.com/issues/388868487 , fixed by https://github.com/OSGeo/gdal/pull/11638 , and shows how subtle those bugs can be. In that instance, the issue is due to complex chains of calls where we -end up using more than 10 instances of the thread-locale storage. +end up using more than 10 instances of the thread-local storage. Implementation -------------- @@ -89,12 +89,12 @@ systematic approach to suppress their root case: - Start from the implementation of the CPLXXXX() functions, copy them to C++ CPLXXXXSafe() functions that return a ``std::string`` instead of a ``const char *``, - and modify them to no longer use thread-locale rotating buffer of strings, + and modify them to no longer use thread-local rotating buffer of strings, but rely on std::string natural concatenation routines instead of the convoluted C concatenation logic, resulting in clearer code. - make the now superseded CPLXXXX() functions call the Safe versions, and store - the result in the thread-locale rotating buffer of strings. So those functions + the result in the thread-local rotating buffer of strings. So those functions will mostly behave as their current implementation That is a simple as: @@ -186,7 +186,7 @@ Out-of-scope (for the candidate implementation related to this RFC) ------------------------------------------------------------------- We have exactly the same issue with ``const char* CPLSPrintf(const char* pszFmt, ....)`` -which uses its own thread-locale rotating buffer of 10 strings of size up to +which uses its own thread-local rotating buffer of 10 strings of size up to 8000 bytes each. The safer alternative exists as ``CPLString::Printf()`` and it would be worth From a5b03bab1b37dc1c9f0c5594c1a7cec9ce341d32 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Tue, 14 Jan 2025 01:29:51 +0100 Subject: [PATCH 3/4] Update doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst --- .../development/rfc/rfc105_safe_path_manipulation_functions.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst index b3bd9df07a14..9f3dedb06d54 100644 --- a/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst +++ b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst @@ -17,7 +17,7 @@ Summary This RFC adds safe versions, for use by C++ code, of all functions of cpl_path.cpp (such as CPLGetPath(), CPLGetDirname(), CPLGetBasename(), CPLGetExtension(), -CPLGetFormFilename(), CPLGetFormCIFilename(), etc.), that returns a result stored +CPLFormFilename(), CPLFormCIFilename(), etc.), that returns a result stored in more or less ephemeral storage, to avoid potential security issues related to their mis-use. It also covers converting most of the code base to the safer alternatives. From 154a90dbd0eccfffada1fe68443e519539ba15eb Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Tue, 14 Jan 2025 23:53:53 +0100 Subject: [PATCH 4/4] RFC105 text: update with candidate implementation --- ...fc105_safe_path_manipulation_functions.rst | 55 +++++-------------- 1 file changed, 14 insertions(+), 41 deletions(-) diff --git a/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst index 9f3dedb06d54..2b197cde3d6d 100644 --- a/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst +++ b/doc/source/development/rfc/rfc105_safe_path_manipulation_functions.rst @@ -108,47 +108,24 @@ systematic approach to suppress their root case: - deprecate the use of the legacy CPLXXXX() functions in C++ code and replace - wherever possible their use by the use of the new CPLXXXXSafe() functions. - -After some preliminary work in https://github.com/OSGeo/gdal/pull/11639, the -status of remaining unsafe calls is: - -:: - - $ grep -r CPLGetBasename frmts ogr | grep -v Safe | wc -l - 66 - $ grep -r CPLGetDirname frmts ogr | grep -v Safe | wc -l - 29 - $ grep -r CPLGetPath frmts ogr | grep -v Safe | wc -l - 43 - $ grep -r CPLGetExtension frmts ogr | grep -v Safe | wc -l - 33 - $ grep -r CPLFormFilename frmts ogr | grep -v Safe | wc -l - 415 - $ grep -r CPLFormCIFilename frmts ogr | grep -v Safe | wc -l - 53 - $ grep -r CPLResetExtension frmts ogr | grep -v Safe | wc -l - 112 - $ grep -r CPLProjectRelativeFilename frmts ogr | grep -v Safe | wc -l - 19 - $ grep -r CPLGenerateTempFilename frmts ogr | grep -v Safe | wc -l - 23 - - -That is after automating a number of replacements for patterns like the following -ones ``some_func(CPLGetBasename(x))``, to be replaced by -``some_func(CPLGetBasenameSafe(x).c_str())``. + their use by the use of the new CPLXXXXSafe() functions. + +Most of the unsafe calls can be automatically replaced by the safer alternatives +for patterns like the following ones ``some_func(CPLGetBasename(x))``, to be replaced by +``some_func(CPLGetBasenameSafe(x).c_str())``, using ``sed``. Patterns like ``variable_name = CPLGetBasename(x)`` will however require manual intervention. If variable_name is a ``std::string`` or ``CPLString``, then replacing by ``variable_name = CPLGetBasenameSafe(x)`` is appropriate. If -variable_name is a ``const char*``, case-by-case analysis will have to be done, +variable_name is a ``const char*``, case-by-case analysis has to be done, to either change its type to std::string / CPLString, or create a temporary std::string, and have variable_name be assigned to std_string_temp.c_str(). -A Continuous Integration script check will be added to verify that use of the -unsafe functions is not re-added in parts of the code base where they have -been eliminated. +All in all, during development of the candidate implementation, several hundreds +manual replacements have been made. + +A ``#define`` based protection will prevent any GDAL C++ code from accidentally +re-using the unsafe functions. Although most call sites can benefit from the safe alternatives, we cannot remove completely the legacy functions, @@ -162,11 +139,6 @@ remove completely the legacy functions, thus could eventually migrate to the CPLXXXXSafe() functions when they are released. -- even within GDAL, there are situations where we cannot easily use the - safe alternatives. For example other functions or methods returning themselves - a ``const char *``. However, in some situations, particularly for C++ methods, - we can in some cases add a std::string member variable to store the result - of the safe methods and return its C-string. Impact on performance --------------------- @@ -201,7 +173,8 @@ a worthwhile goal to pursue besides it. Risks ----- -Due to the amount of changes, there is a non-zero risk of causing regressions, +Due to the amount of changes, in particular for replacements that could not be +automated, there is a non-zero risk of causing regressions, in particular in drivers with poor coverage of our regression test suite. Backward compatibility @@ -224,7 +197,7 @@ Related issues and PRs * Bug that triggered this PR: https://github.com/OSGeo/gdal/pull/11638 -* Candidate implementation (WIP): https://github.com/OSGeo/gdal/pull/11639 +* Candidate implementation: https://github.com/OSGeo/gdal/pull/11639 Funding -------