Finding a Candidate List for a Website #26

aswilkinson12 · 2024-04-09T19:01:05Z

aswilkinson12
Apr 9, 2024

I'm working on the Applications track for COMP 545 regarding implementation of Weblinx, and so far I understand the algorithm of how it works and what to feed in, the problem I'm encountering is finding a list of candidates for a website. I can find the dom tree while simulating a website using Playwright, but not how to get the list of candidates in order to feed the information into the model. Is there a function or method that I maybe missed that gets that list of candidates? Thanks!

Answered by xhluca

Apr 10, 2024

The function build_records_for_single_turn used in modeling.dmr.processing shows how to take a replay and turn object to format the query and list of candidate elements:

weblinx/modeling/dmr/processing.py

Lines 302 to 364 in c8986ae

     def build_records_for_single_turn(  
   turn, replay, format_intent_input, uid_key, max_neg=None, only_allow_valid_uid=True  
   ) -> List[dict]:  
   """  
    This function will build a list of dictionaries, each of which is a record  
    for a single turn. Each record has the following keys:  
    - query: the dialogue history, used as a query for training the model  
    - doc: concise representation of HTML element used as doc for training  
   

View full answer

xhluca · 2024-04-10T00:42:49Z

xhluca
Apr 10, 2024
Maintainer

The function build_records_for_single_turn used in modeling.dmr.processing shows how to take a replay and turn object to format the query and list of candidate elements:

weblinx/modeling/dmr/processing.py

Lines 302 to 364 in c8986ae

    
           def build_records_for_single_turn( 
        
               turn, replay, format_intent_input, uid_key, max_neg=None, only_allow_valid_uid=True 
        
           ) -> List[dict]: 
        
               """ 
        
               This function will build a list of dictionaries, each of which is a record 
        
               for a single turn. Each record has the following keys: 
        
                   - query: the dialogue history, used as a query for training the model 
        
                   - doc: concise representation of HTML element used as doc for training 
        
                   - label: either 0 or 1, indicating whether the document is the target element 
        
                   - uid: the unique identifier for an element, must be in the element attributes 
        
                   - turn_index: the index of the turn in the replay 
        
                   - demo_name: the name of the demonstration 
        
               If `only_allow_valid_uid` is True, then only turns that have a valid uid 
        
               will be included in the output. Otherwise, all turns will be included. 
        
               """ 
        
               bboxes_filt = wh.filter_bboxes( 
        
                   turn.bboxes, 
        
                   viewport_height=turn.viewport_height, 
        
                   viewport_width=turn.viewport_width, 
        
               ) 
        
               root = lxml.html.fromstring(turn.html) 
        
               root_tree = root.getroottree() 
        
               elements = root.xpath(f"//*[@{uid_key}]") 
        
               elements_filt = [p for p in elements if p.attrib[uid_key] in bboxes_filt] 
        
               has_valid_uid = turn_has_valid_uid(turn, paths=elements, uid_key=uid_key) 
        
               if only_allow_valid_uid and not has_valid_uid: 
        
                   return [] 
        
               # Now, we can format each of the elements in paths_filt into string 
        
               # and use them as negative samples 
        
               query = format_turn_for_input(replay, turn, format_intent=format_intent_input) 
        
               target_uid = turn.element["attributes"][uid_key] if has_valid_uid else -1 
        
               records_positive = [] 
        
               records_negative = [] 
        
               for elem in elements_filt: 
        
                   bbox = turn.bboxes[elem.attrib[uid_key]] 
        
                   elem_dict = represent_element_as_dict(elem, bbox, root_tree) 
        
                   elem_str = convert_elem_dict_to_str_legacy(elem_dict) 
        
                   record = { 
        
                       "query": query, 
        
                       "doc": elem_str, 
        
                       "uid": elem.attrib[uid_key], 
        
                       "demo_name": turn.demo_name, 
        
                       "turn_index": turn.index, 
        
                       "elem_dict": elem_dict, 
        
                   } 
        
                   if elem.attrib[uid_key] == target_uid: 
        
                       record["label"] = 1 
        
                       records_positive.append(record) 
        
                   else: 
        
                       record["label"] = 0 
        
                       records_negative.append(record) 
        
               if max_neg is not None and 0 < max_neg < len(records_negative): 
        
                   records_negative = random.sample(records_negative, max_neg) 
        
               return records_positive + records_negative

You might want to simplify it, since you don't need the negatives in real-time use, and you also don't have a target_uid during inference (only for training/eval), so you might want something like this:

def format_relevant_elements_for_single_turn(
    turn, format_intent_input, uid_key="data-webtasks-id"
) -> List[dict]:
    bboxes_filt = wh.filter_bboxes(
        turn.bboxes,
        viewport_height=turn.viewport_height,
        viewport_width=turn.viewport_width,
    )
    root = lxml.html.fromstring(turn.html)
    root_tree = root.getroottree()
    elements = root.xpath(f"//*[@{uid_key}]")
    elements_filt = [p for p in elements if p.attrib[uid_key] in bboxes_filt]

    records = []

    for elem in elements_filt:
        bbox = turn.bboxes[elem.attrib[uid_key]]
        elem_dict = represent_element_as_dict(elem, bbox, root_tree)
        elem_str = convert_elem_dict_to_str_legacy(elem_dict)

        record = {
            "doc": elem_str,
            "uid": elem.attrib[uid_key],
            "turn_index": turn.index,
            "elem_dict": elem_dict,
        }
        records.append(record)

    return records

# Usage:
from modeling.dmr.processing import build_formatters  # weblinx/modeling/dmr/processing.py
from sentence_transformers.util import cos_sim

uid_key = "what-you-injected-into-elements-on-javascript-side"
format_intent_input, _ = build_formatters()
query = format_turn_for_input(replay, turn, format_intent=format_intent_input)
elements_records = format_relevant_elements_for_single_turn(turn=turn, format_intent_input=format_intent_input, uid_key=uid_key)

# now, use dmr to select candidate elements from the list of relevant formatted elements
# for example (untested, just to illustrate):
docs = [r['doc'] for r in elements_records]
encoded = model.encode(
    [query] + docs, batch_size=batch_size, show_progress_bar=False
)
query_vector, doc_vectors = encoded[0], encoded[1:]
scores = cos_sim(query_vector, doc_vectors).cpu().squeeze().tolist()

for i in range(len(records)):
    records[i]['score'] = scores[i]

This code snippet shows you how to do it:

weblinx/modeling/dmr/eval.py

Lines 74 to 88 in 53b6a23

    
           for k, group in tqdm(input_grouped.items(), desc="Computing scores"): 
        
               group = input_grouped[k] 
        
               query = group[0]["query"] 
        
               docs = [r["doc"] for r in group] 
        
               encoded = model.encode( 
        
                   [query] + docs, batch_size=batch_size, show_progress_bar=False 
        
               ) 
        
               query_vector, doc_vectors = encoded[0], encoded[1:] 
        
               scores = sim_func(query_vector, doc_vectors).cpu().squeeze().tolist() 
        
               if isinstance(scores, float): 
        
                   scores = [scores] 
        
               for i, r in enumerate(group): 
        
                   r["score"] = scores[i]

Note that you need to construct the Replay and Turn objects dynamically, which might require some coding effort, since they were originally designed for processing on-disk data that was pre-processed, whereas your goal is to use them for in-memory raw data. The best way to get started is to tinker with the Demonstration/Replay/Turn classes from weblinx and understand how they work, then inherit them in your custom classes to use for inference, along the lines of:

import weblinx as wl

class ReplayDynamic(wl.Replay):
    def __init__(self, ..., html=None):
        super().__init__()
        if html is not None:
            self._html_in_memory = html
    
    @cached_property
    def html(self):
        return self._html_in_memory
    # modify any other method/property, or add new methods/properties

class DemonstrationDynamic(wl.Demonstration):
    # also make necessary modifications here

class Turn(wl.Turn):
    # also make necessary modifications here

19 replies

xhluca May 7, 2024
Maintainer

I am working on adding support for running DMR on the fly alongside an action model; however it's a WIP and a bit messy atm, releasing it as is would probably be more confusing than helpful, but I will announce when it will work.

For now, I think that dmr/processing.py and dmr/evaluate.py for DMR are good start for running it dynamically, then llama/processing.py would be used to process the candidate elements output by DMR in dmr/evaluate.py (instead of saving as json you just use the dictionary output as is).

georgel May 9, 2024

Looking forward to the DMR on the fly support! Thank you

matbee-eth May 9, 2024

I have the beginning support for Weblinx and Playwright. Its not perfect and I'm still hacking away at it. Hacking, as in, with a figurative hatchet.
https://github.com/matbee-eth/WeblinxRePlaywrightBrowser

tupini07 Jun 13, 2024

Hi @xhluca , I'm curious if there are any news with on-the-fly DMR? I'm working on creating something like https://github.com/ddupont808/GPT-4V-Act and was really interested in leveraging the code you've already written to get candidates so to minimize the number of options the model has to deal with!

xhluca Jun 13, 2024
Maintainer

@tupini07 You asked just in time! I just (a few mins ago) included the pre-release for our new experimental API designed to simplify the pipeline for webllama model inference: https://github.com/McGill-NLP/webllama/releases/tag/0.1.0pre1

Since it's experimental, it is not guaranteed to work in a way you expect. In fact, it might fail in subtle ways sometimes... And there's no guarantee it will work well on scenarios that differ from training or test sets. So if you find things to improve, please open an issue & PRs are welcome!

Here's what the code should look like now (much simpler than before), as per the docs/README.md:

import webllama.experimental as wa

# We will initialize our processor, which helps us prepare the input for action model
proc = wa.processing.WebTurnProcessor(tokenizer=act_model.tokenizer)

# Step 1: prepare query, run DMR and prepare retrieved candidates
query_dmr = proc.prepare_dmr_query(action_history, state)
elems = proc.prepare_dmr_elements(state=state)
scores = wa.functions.compute_dmr_scores(dmr, query_dmr, elems)
top_cands, cands_uids = wa.functions.get_top_dmr_candidates(elems, scores, proc.top_k)
cands_str = proc.prepare_candidates(top_cands)

# Step 2: format candidates, utterances, state, and previous actions
html = proc.prepare_state_html(state.html, cands_uids=cands_uids)
utterances = proc.prepare_instructor_chat(action_history, state)
prev_actions = proc.prepare_prev_actions(action_history, state)

# Let's use the default system prompt template, but you can also use your own
sys_prompt_template: str = proc.default_system_prompt_template
sys_prompt = sys_prompt_template.format(
    html=html,
    utterances=utterances,
    candidates=cands_str,
    # ...
)
input_chat = proc.convert_to_chat_list(sys_prompt, prev_actions)

# Use your tokenizer to convert the input to string and pass it to the action model
input_str = act_model.tokenizer.apply_chat_template(input_chat, tokenize=False)
output = act_model(input_str, ...)
pred_action = proc.process_action_model_output(output, state.index, elems)
a = wa.classes.Action.from_dict(pred_action)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finding a Candidate List for a Website #26

{{title}}

Replies: 1 comment 19 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

	def build_records_for_single_turn(
	turn, replay, format_intent_input, uid_key, max_neg=None, only_allow_valid_uid=True
	) -> List[dict]:
	"""
	This function will build a list of dictionaries, each of which is a record
	for a single turn. Each record has the following keys:
	- query: the dialogue history, used as a query for training the model
	- doc: concise representation of HTML element used as doc for training

Finding a Candidate List for a Website #26

aswilkinson12 Apr 9, 2024

Replies: 1 comment · 19 replies

xhluca Apr 10, 2024 Maintainer

xhluca May 7, 2024 Maintainer

georgel May 9, 2024

matbee-eth May 9, 2024

tupini07 Jun 13, 2024

xhluca Jun 13, 2024 Maintainer

aswilkinson12
Apr 9, 2024

Replies: 1 comment 19 replies

xhluca
Apr 10, 2024
Maintainer

xhluca May 7, 2024
Maintainer

xhluca Jun 13, 2024
Maintainer