Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added logic to fetch README files, documentation, commit messages, an… #2919

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

SahilDhillon21
Copy link
Contributor

@SahilDhillon21 SahilDhillon21 commented Nov 13, 2024

…d issue trackers from repository APIs.

Fixes #2681

  • Updated Project model to include fields for README content, documentation links, commit summaries, and issue tracker counts.
  • Implemented API calls in update_projects.py to gather and save this data to the database.

readme ss

Fixed the migration file issue that I was facing in the previous PR.

I had to remove the line "Authorization": f"token {settings.GITHUB_TOKEN}" from the header as it was giving an error saying 'Unable to fetch repository - 401'. I was unsure how to deal with it so I have removed it for now.

Copy link

sentry-io bot commented Nov 13, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: website/management/commands/update_projects.py

Function Unhandled Issue
handle SystemExit: 1 /project/{slug}/
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@SahilDhillon21
Copy link
Contributor Author

@DonnieBLT I believe for the CodeQL test the languages should be python, javascript; instead of being 'python javascript'. Though I am unsure about how to fix this.

@DonnieBLT
Copy link
Collaborator

The reason that the token is there is because it will help with the rate limiting, you can create a token from your github profile settings, page and can you please avoid changes to white space I’m not sure what linter settings you’re using, but maybe if we can standardize them we won’t see the white space changes

@SahilDhillon21
Copy link
Contributor Author

I've added the GitHub token and fixed the formatting. Please review and let me know.

Copy link
Collaborator

@DonnieBLT DonnieBLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check the comments?

)

# Set Issue Tracker URL
project.issue_tracker_url = f"https://github.com/{repo_name}/issues"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this since it's universal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

@SahilDhillon21
Copy link
Contributor Author

I have integrated a basic summary model "facebook/bart-large-cnn". However, due to the large variations in the readme files of the repositories, the summaries aren't too effective. I had thought of pre-processing the content to only pass relevant sections, but even that seems to be difficult since there is no particular structure followed. @DonnieBLT which direction should I look into to improve this? Though openai API is paid, it could do the job really well compared to the generic python models

@DonnieBLT
Copy link
Collaborator

We can use the OpenAI we already have a API key and it is set up in the code

@SahilDhillon21
Copy link
Contributor Author

I have modified the summary and label generation to use Openai's API. Meanwhile, I am working on the UI search functionality and displaying these summaries and labels.

@SahilDhillon21
Copy link
Contributor Author

Hello @DonnieBLT, here's an overview of the search functionality:
projects-search-function.webm
It filters based on the project name, a summary, and ai labels.

Looking forward to your feedback.

readme_content = models.TextField(null=True, blank=True)
documentation_url = models.URLField(null=True, blank=True)
recent_commit_messages = models.TextField(null=True, blank=True)
issue_tracker_url = models.URLField(null=True, blank=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this because it's always /issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these migrations be merged?

openai.api_key = os.getenv("OPENAI_API_KEY")


def ai_summary(text, topics=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to utils.py please

@@ -22,6 +22,15 @@ <h3>Projects: {{ projects.count }}</h3>
<i class="fas fa-plus-circle"></i> Add Project
</button>
</form>
<form id="search-form" class="search-form">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure the search in the header works and remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search bar in the header functions slightly differently. Upon entering a query and selecting a label, it transitions to another search bar specific to the selected label. This second search bar is more focused and offers only four distinct labels to refine the search (issue, domain, user, label). It doesn't allow you to search for say specific projects or orgs.

This search form filters exclusively on project's description, labels and name. Let me know if you want to remove this and and fix the header search form to include projects and organizations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you combine them into the one top search?

@@ -53,130 +62,7 @@ <h3>Projects: {{ projects.count }}</h3>
{% endfor %}
</ul>
{% endif %}
<ul class="project-list">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep this in this file please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I assume we don't want to create a separate template and have all the code here itself? Let us finalize what to do with the search function and I'll make the changes accordingly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can combine this into one template and adjust the global search to work as you have it.

openai.api_key = os.getenv("OPENAI_API_KEY")


def generate_labels(readme_content, github_topics):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to utils.py too please

project.readme_content = readme_content
readme_text = markdown_to_text(readme_content)
project.ai_summary = ai_summary(readme_text, project.topics)
project.ai_labels = json.loads(generate_labels(readme_text, project.topics))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just add the labels verbatim from the topics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the AI-generated labels to be more accurate and effective, but we can surely use these topics directly. I'll modify it

Copy link
Collaborator

@DonnieBLT DonnieBLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requesting a few changes, we're almost there

@SahilDhillon21
Copy link
Contributor Author

Have made most of the changes, just need a final heads-up on what to do with the search functionality as it's quite buggy and limited. Should I create a new PR to improve its working if we go that route?

Copy link
Collaborator

@DonnieBLT DonnieBLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few adjustments request

)

# Check for Documentation URL (homepage)
project.documentation_url = repo_data.get("homepage")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have a homepage field

@@ -756,6 +756,10 @@ class Project(models.Model):
closed_issues = models.IntegerField(default=0)
size = models.IntegerField(default=0)
commit_count = models.IntegerField(default=0)
readme_content = models.TextField(null=True, blank=True)
documentation_url = models.URLField(null=True, blank=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use homepage_url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -22,6 +22,15 @@ <h3>Projects: {{ projects.count }}</h3>
<i class="fas fa-plus-circle"></i> Add Project
</button>
</form>
<form id="search-form" class="search-form">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you combine them into the one top search?

@@ -53,130 +62,7 @@ <h3>Projects: {{ projects.count }}</h3>
{% endfor %}
</ul>
{% endif %}
<ul class="project-list">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can combine this into one template and adjust the global search to work as you have it.

@SahilDhillon21
Copy link
Contributor Author

I have combined the search bar into one and added all the remaining categories. I'll raise a new pr later to improve the UI of the search results, have kept it basic for now.

@SahilDhillon21
Copy link
Contributor Author

I have merged the migrations, please check

@DonnieBLT
Copy link
Collaborator

Can you please clean this up?

@SahilDhillon21
Copy link
Contributor Author

SahilDhillon21 commented Dec 14, 2024

I have started from the main branch again. Here a normal summary is used
image

The search bar changes are not included here, should I raise a separate pr for that or have those changes here itself?

P.S. I'm not sure how to solve this error for ai_summary:
Error generating summary:

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run openai migrate to automatically upgrade your codebase to use the 1.0.0 interface.

Alternatively, you can pin your installation to the old version, e.g. pip install openai==0.28

A detailed migration guide is available here: openai/openai-python#742

Got the error even after updating openai. It could be because I don't have access to a valid api key, please verify

DonnieBLT
DonnieBLT previously approved these changes Dec 18, 2024
@DonnieBLT
Copy link
Collaborator

The OpenAI error is happening with a valid key for the pr analysis too in production. I think maybe updating and using a different function will fix it

@SahilDhillon21
Copy link
Contributor Author

Understood, will resolve it

@SahilDhillon21
Copy link
Contributor Author

The functions are migrated
Please run pip install --upgrade openai
image

  1. As for the test that's failing, it seems to be working when I set a random API key on my local machine, so hopefully it works in production as well
  2. It updates the functions used in pr_analysis as well

@SahilDhillon21
Copy link
Contributor Author

ModuleNotFoundError: No module named 'openai'
Is it because a new openai key is needed to be created? Because the package is available in the docker image

@DonnieBLT
Copy link
Collaborator

I removed it as a test, we can add it back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Enhancement: AI-Powered Project Summary and Labeling
2 participants