This Package can be used to extract keywords from a page to create tags for any blogs,news or any textual information available on the web page. These tags highlights the topic content by providing a glance of large volume of texts embedded in a page.Tag generation is an important feature in many sectors of IT such as Amazon uses tags for customer segmentation.
-
Install packages in the requirements.txt using
pip install -r requirements.txt
-
Follow the instruction given below to use albert-base model from hugging face model hub, you can change the model but it might need some customization in source code. so albert model is adviced here to download.
model=TFAutoModel.from_pretrained('albert-base-v2')
tokenizer=AutoTokenizer.from_pretrained('albert-base-v2')
-
Clone the repository on local system
-
Collect web data
-
For example
from web_data import Blog_Data
data=Blog_Data("https://influencermarketinghub.com/12-best-food-blogs/")
pass website
Text_data=data.text_prep(req=['h1', 'h2', 'h3', 'h4', 'p'])
pass tags
- Use main class Blog tagger to generate top k tags.
-
For example
tagger=Blog_Tagger(Text_data,maxlen=<int num>)
tagger.token_embedding_gen(model,tokenizer)
top_tokens=tagger.tag_gen(k)
- Link : original repository