Skip to content

This is a package that can extract important keywords from a page to create tags for any blogs, news etc. The link to the source repository of package is mentioned in the readme.

Notifications You must be signed in to change notification settings

zyberg2091/BlogTagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blog Tagger

This Package can be used to extract keywords from a page to create tags for any blogs,news or any textual information available on the web page. These tags highlights the topic content by providing a glance of large volume of texts embedded in a page.Tag generation is an important feature in many sectors of IT such as Amazon uses tags for customer segmentation.

Prerequisites:

  • Install packages in the requirements.txt using pip install -r requirements.txt

  • Follow the instruction given below to use albert-base model from hugging face model hub, you can change the model but it might need some customization in source code. so albert model is adviced here to download.

    model=TFAutoModel.from_pretrained('albert-base-v2')
    tokenizer=AutoTokenizer.from_pretrained('albert-base-v2')

Usage Instructions

  1. Clone the repository on local system

  2. Collect web data

  • For example

    from web_data import Blog_Data
    data=Blog_Data("https://influencermarketinghub.com/12-best-food-blogs/") pass website
    Text_data=data.text_prep(req=['h1', 'h2', 'h3', 'h4', 'p']) pass tags

  1. Use main class Blog tagger to generate top k tags.
  • For example

    tagger=Blog_Tagger(Text_data,maxlen=<int num>)
    tagger.token_embedding_gen(model,tokenizer)
    top_tokens=tagger.tag_gen(k)

Source Repository that contains package

About

This is a package that can extract important keywords from a page to create tags for any blogs, news etc. The link to the source repository of package is mentioned in the readme.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages