Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The reset_git_repo function fails when HEAD is detached #530

Closed
behnazh-w opened this issue Oct 26, 2023 · 7 comments
Closed

The reset_git_repo function fails when HEAD is detached #530

behnazh-w opened this issue Oct 26, 2023 · 7 comments
Assignees
Labels
bug Something isn't working git_operations The issues realated to the git operations that Macaron makes on the target repository

Comments

@behnazh-w
Copy link
Member

If a repository is cloned by Macaron and then we checkout a commit which makes the HEAD detached, if any of the files are modified for some reason, the reset_git_repo function fails to stash the changes and the second run of Macaron on that target fails.

Note that supporting commit checkout is crucial for mapping artifacts to commits.

To reproduce the error on the staging branch run:

macaron -o output analyze -purl pkg:maven/org.opentest4j/[email protected]?type=jar --skip-deps
pushd output/git_repos/github_com/ota4j-team/opentest4j/
git checkout 75136304fab712895090c9c4678dc72ccbcb5e21
popd
macaron -o output analyze -purl pkg:maven/org.opentest4j/[email protected]?type=jar --skip-deps
@behnazh-w behnazh-w added the bug Something isn't working label Oct 26, 2023
@tromai
Copy link
Member

tromai commented Oct 27, 2023

The issue can be reproduced in git version 2.25.1

@tromai
Copy link
Member

tromai commented Oct 30, 2023

I have dig a bit into this issues with the repository opentest4j, which correspond to the PURL pkg:maven/org.opentest4j/[email protected]?type=jar mentioned above.

Note: I highly recommend reading through this article first to understand about the background of line endinds in git.

The history of opentest4j

At 96502f5

There is no .gitattribute for setting up line endings conversion.
The eol information for gradlew.bat and gradle-wrapper.jar. For more information on the output of git ls-files, please see here.

> git ls-files gradlew.bat --eol
i/lf    w/lf    attr/                   gradlew.bat
 
> git ls-files gradle/wrapper/gradle-wrapper.jar  --eol
i/-text w/-text attr/                   gradle/wrapper/gradle-wrapper.jar
  • gradle.bat:
    • i/lf : The file gradlew.bat was initially committed into the repository with the line endings lf (which is Unix's line endings) → this is probably because the person who committed in this file use git's line ending conversion feature.
    • w/lf : When I checked out this commit into the working tree, because there is no .gitattribute or any settings for git's line ending conversion feature, there is no line ending conversion happens → gradlew.bat in my working tree still keeps the lf line ending just like when it's committed.
    • attr/ : no setting for line ending conversion is set.
  • gradle/wrapper/gradle-wrapper.jar:
    • i/-text and w/-text: For binary files like images, note that you’ll see -text for both the index and working tree line endings. This means that Git correctly isolated those binary files, leaving them untouched.
    • attr/ : no setting for line ending conversion is set in .gitattribute.

At 6ae5b16 - right after commit 96502f5.

gradlew.bat and gradle/wrapper/gradle-wrapper.jar is again updated with new content. gradlew.bat new's content has CRLF endings.
There is still no .gitattribute in the repository.

> git ls-files gradlew.bat --eol
i/crlf  w/crlf  attr/                   gradlew.bat
 
> git ls-files gradle/wrapper/gradle-wrapper.jar  --eol
i/-text w/-text attr/                   gradle/wrapper/gradle-wrapper.jar
  • gradle.bat:
    • i/crlf : The file gradlew.bat was committed into the repository with the line endings crlf (which is Window's line endings)
    • w/crlf : When I checked out this commit into the working tree, because there is no .gitattribute or any settings for git's line ending conversion feature, there is no line ending conversion happens → gradlew.bat in my working tree will keeps the crlf line ending just like when it's committed.
    • attr/ : no setting for line ending conversion is set.
  • gradle/wrapper/gradle-wrapper.jar :
    • i/-text and w/-text : For binary files like images, note that you’ll see -text for both the index and working tree line endings. This means that Git correctly isolated those binary files, leaving them untouched.
    • attr/ : no setting for line ending conversion is set in .gitattribute

At f2c384f3 - right after commit 6ae5b16.

The .gitattribute file is committed with the following content.

* text eol=lf
*.bat eol=crlf
  • * text eol=lf: for all files, set the text option:
    • Setting the text attribute on a path enables end-of-line conversion on checkin and checkout as described above. Line endings are normalized to LF in the index every time the file is checked in, even if the file was previously added to Git with CRLF line endings.

    • eol=lf when checked out in a local repository, make git convert all line endings into lf .
  • *.bat eol=crlf: for all files that ends with .bat , when checked out into the working tree, git will convert their line endings into lf. This eol=crlf overrides the default eol=lf for bat files only.

Therefore when you check out this commit:

> git ls-files gradlew.bat --eol
i/crlf  w/crlf  attr/text eol=crlf      gradlew.bat

> git ls-files gradle/wrapper/gradle-wrapper.jar  --eol
i/-text w/-text attr/text eol=lf        gradle/wrapper/gradle-wrapper.jar
  • gradle.bat:
    • i/crlf : The file gradlew.bat was committed with line endings crlf in commit 6ae5b16
    • w/crlf : When I checked out this commit into the working tree, because there is no .gitattribute or any settings for git's line ending conversion feature, there is no line ending conversion happens → gradlew.bat in my working tree still keeps the crlf line ending just like when it's committed.
    • attr/text eol=crlf : the text setting is applied for this file, when this file is checked out into the working tree, git will convert all line endings of gradle.bat into crlf.
  • gradle/wrapper/gradle-wrapper.jar :
    • i/-text and w/-text : For binary files like images, note that you’ll see -text for both the index and working tree line endings. This means that Git correctly isolated those binary files, leaving them untouched.
    • attr/text eol=lf : the text setting is applied for this file (however, text should only be used for text files, not binaries, so this is an error on the author side).

Now, if you freshly clone the repository and checkout commit f2c384f3 into the working tree:

> git status
HEAD detached at f2c384f
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   gradle/wrapper/gradle-wrapper.jar
        modified:   gradlew.bat
 
no changes added to commit (use "git add" and/or "git commit -a")

This is the same behavior specified in this Github issue.
I have tested and the same behavior appears in git:

  • 2.25.1
  • 2.39.3

Why does this happen in the first place?

In commit f2c384f3 as I mentioned above, even though the .gitattribute files is added, theses changes won’t take effect immediately for files committed into the repository prior to the commit of .gitattributes. This will mean that gradlew.bat is still stored with line ending crlf in the repository.

From my understanding, the eol for gradlew.bat within the repository is crlf, but because attr/text eol=crlf makes sure that the content of this file when checked out into the working tree, its eol will be converted into lf by git. The converted version with lf eol will be different from the existing crlf versions of gradlew.bat inside the repository → I think this make Git treats them as 2 "versions" because of different eol settings.

This is consistent with the discussion in an existing Stackoverflow thread.

One thing I know exactly sure is that the author that made committed the content of .gitattribute in commit f2c384f3 didn't run git add --renormalize . as recommended here to make sure that all previous versions of gradlew.bat and gradle/wrapper/gradpe-wrapper.jar got normalized according to .gitattribute.

In later commits:

  • 37106d85: gradlew.bat eol in the repository is updated to be lf.
  • 6aa4af84: the content of .gitattribute is modified to correctly assign gradle/wrapper/gradpe-wrapper.jar as a binary and therefore ignored it from git's end of lines conversion.
  • Therefore, if your check out commits after these two commits, the issues will not happen.

How can we address this?

We can run git checkout -f when checking out banches or commits so that it will checkout regardless of the issue mentioned above.
We can safely do this as we have run git stash and git reset beforehand.

@behnazh-w
Copy link
Member Author

behnazh-w commented Oct 30, 2023

@tromai thanks for looking into the issue. It's still unclear to me why these two files remain modified after calling the reset_git_repo function. Do you have any idea?

If we run checkout with the --force option, we essentially discard any local changes:

       -f, --force
           When switching branches, proceed even if the index or the working tree differs from HEAD, and even if there are untracked files in the way. This is
           used to throw away local changes and any untracked files or directories that are in the way.

           When checking out paths from the index, do not fail upon unmerged entries; instead, unmerged entries are ignored.

So, if we do that, we won't need to stash and reset. But it would be good to understand the root cause first.

@tromai
Copy link
Member

tromai commented Oct 31, 2023

It's still unclear to me why these two files remain modified after calling the reset_git_repo function. Do you have any idea?

The root cause of the issue:

  • In the .gitattribute file, gradlew.bat is configured with line ending normalization by git.

    * text eol=lf
    *.bat eol=crlf
    
    • This configuration will ensure that: 1. When gradlew.bat is committed into the repository, it must always be converted to use lf, when this file is checked out into the working tree, it must be converted to crlf again.
  • However, due to mis-configuration:

    • In the repository: gradlew.bat is still stored with crlf.
    • When checked out into the working tree: gradlew.bat will have crlf as the line endings (as expected by the configuration in .gitattribute).
  • Therefore, whenever git try to see if there are any changes to gradlew.bat in the working tree compared to gradlew.bat in the repository (compare to the current HEAD commit), it will try to convert the gradlew.bat in the working tree (with crlf) to lf. This will obviously make it different compared to the gradlew.bat currently in the repository with crlf -> This explains that when running git commands that need to check the state of files in the working tree (e.g git status), gradlew.bat will appear as though it has been changed.

Now let's get back to the reset_git_repo function. This function called within Macaron at the moment will try to reset both the index and the working tree of the repository it's analyzing.
This reset operation can't help change the condition which trigger the issue I describe above:

  • "Resetting the index" doesn't help because gradlew.bat is still stored with crlf still in the repository.
  • "Resetting the working tree" doesn't help because git would then again try to convert gradlew.bat to lf.

So, if we do that, we won't need to stash and reset. But it would be good to understand the root cause first.

Yes this is true. In theory, we wouldn't need to reset at all (given that we have created a stash) before trying to checkout with the --force flag. I could try to remove the resetting operation and test if everything works normally.

Furthermore, for optimization, when Macaron deals with repositories that it caches inside output/git_repos that are not local repositories provided by the users, we could just ignore stashing. This is because those repositories are only touched by Macaron. If the user really wants to do that, they must know what they are doing and we don't need to handle those cases.

@behnazh-w
Copy link
Member Author

Thanks for the clarification.

we wouldn't need to reset at all (given that we have created a stash) before trying to checkout with the --force flag.

I don't think we even need to stash if we use checkout with the --force flag.

@tromai
Copy link
Member

tromai commented Oct 31, 2023

I don't think we even need to stash if we use checkout with the --force flag.

That's correct. The reason the stash command was used is because we were trying to save any local changes (propably made by the users) to the repository we are analyzing. This is mostly specific for the use case where the user wants to provide a local repository path to Macaron. If we decide that we wouldn't be responsible for losing any local changes in the repository the user want to analyze, we could avoid stashing and put some documentation saying that Macaron could potentially remove any local non-committed changes.

@behnazh-w
Copy link
Member Author

behnazh-w commented Oct 31, 2023

Sounds good. We should add a note or warning to the docs for local repo use case.

@tromai tromai added the git_operations The issues realated to the git operations that Macaron makes on the target repository label Oct 31, 2023
tromai added a commit that referenced this issue Nov 3, 2023
@tromai tromai closed this as completed Nov 3, 2023
art1f1c3R pushed a commit that referenced this issue Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working git_operations The issues realated to the git operations that Macaron makes on the target repository
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants