-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
175 lines (160 loc) · 8.75 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>PASTA: Multiple Object Tracking</title>
<link href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.3.2/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css" rel="stylesheet">
<link rel="apple-touch-icon" sizes="180x180" href="icon/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="icon/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="icon/favicon-16x16.png">
<link rel="manifest" href="icon/site.webmanifest">
<style>
.btn-paper {
margin: 0 10px;
}
.disabled {
pointer-events: none;
opacity: 0.6;
}
.hero-section {
padding: 4rem 0;
background-color: #f8f9fa;
}
.authors {
color: #666;
font-size: 1.1rem;
}
.table-results {
font-size: 0.9rem;
}
.table-results th {
background-color: #f8f9fa;
}
.best-result {
font-weight: bold;
color: #198754;
}
</style>
</head>
<body>
<!-- Hero Section -->
<div class="hero-section">
<div class="container">
<div class="row justify-content-center text-center">
<div class="col-md-10">
<h1 class="display-4 mb-3">PASTA: Is Multiple Object Tracking a Matter of Specialization</h1>
<h2 class="h4 mb-4 text-muted">Deep Modules Compositionality Meets Multiple Object Tracking</h2>
<p class="authors mb-4">
Gianluca Mancusi, Mattia Bernardi, Aniello Panariello, Angelo Porrello, Simone Calderara, Rita
Cucchiara
</p>
<div class="buttons mb-4">
<button class="btn btn-primary btn-paper disabled">
<i class="fas fa-file-alt me-2"></i>NeurIPS 2024 Paper
</button>
<a href="https://arxiv.org/abs/2411.00553"><button class="btn btn-danger btn-paper"><i class="fas fa-archive me-2"></i>arXiv Paper</button></a>
<button class="btn btn-dark btn-paper disabled">
<i class="fab fa-github me-2"></i>Code (coming soon)
</button>
</div>
</div>
</div>
</div>
</div>
<!-- Main Content -->
<div class="container my-5">
<!-- Abstract -->
<div class="row justify-content-center mb-5">
<div class="col-md-10">
<h3 class="mb-3">Abstract</h3>
<p class="lead">
End-to-end transformer-based trackers have achieved remarkable performance on most human-related
datasets. However, training these trackers in heterogeneous scenarios poses significant challenges,
including negative interference -- where the model learns conflicting scene-specific parameters --
and limited domain generalization, which often necessitates expensive fine-tuning to adapt the
models to new domains. In response to these challenges, we introduce PArameter efficient Scenario
specific Tracking Architecture (PASTA), a novel framework that combines Parameter-Efficient
Fine-Tuning (PEFT) and Modular Deep Learning (MDL). Specifically, we define key scenario attributes
(e.g., camera-viewpoint, lighting condition) and train specialized PEFT modules for each attribute.
These expert modules are hence combined in parameter space, enabling systematic generalization to
new domains without increasing inference time. Extensive experiments on MOTSynth, along with
zero-shot evaluations on MOT17 and PersonPath22, demonstrate that a neural tracker built from
carefully selected modules surpasses its monolithic counterpart.
</p>
</div>
</div>
<!-- Model Overview -->
<div class="row justify-content-center mb-5">
<div class="col-md-10">
<h3 class="mb-3">Model Overview</h3>
<div class="text-center mb-4">
<img src="model_full.png" alt="PASTA Model Architecture" class="img-fluid rounded">
<p class="text-muted mt-2">Figure 1: Overview of the PASTA architecture</p>
</div>
</div>
</div>
<!-- Key Features -->
<div class="row justify-content-center mb-5">
<div class="col-md-10">
<h3 class="mb-3">Key Features</h3>
<div class="row">
<div class="col-md-6 mb-4">
<h4 class="h5">Problem Statement: Domain-shifts</h4>
<p>The limited availability of annotated data often leads end-to-end trackers to overfit on
training sets, making them vulnerable to domain shifts. With limited data, the model
struggles to generalize, especially when negative interference arises between scenarios with
differing attributes.</p>
<div class="text-center mb-4">
<img src="attributes.png" class="img-fluid rounded shadow">
<p class="text-muted mt-2">Figure 2: Example of attributes among MOTSynth, MOT17, PersonPath22</p>
</div>
</div>
<div class="col-md-6 mb-4">
<h4 class="h5">Solution: Attribute-specific PEFT modules</h4>
<p>We train parameter-efficient modules for each attribute, creating a specialized expert
system. During inference, an operator selects the expert modules for each scenario, enabling
better adaptation to specific tracking conditions.</p>
<div class="text-center mb-4">
<img src="model.png" class="img-fluid rounded shadow">
<p class="text-muted mt-2">Figure 3: Overview of our modular framework</p>
</div>
</div>
</div>
</div>
</div>
<!-- Results -->
<div class="row justify-content-center mb-5">
<div class="col-md-10">
<h3 class="mb-3">Experimental Results</h3>
<p>Our experiments on MOTSynth show that reducing negative interference enhances association
metrics. Zero-shot evaluations (Tab. 1) on real-world datasets (MOT17, PersonPath22) illustrate the improved
generalization achieved by composing expert modules.</p>
<h4 class="mt-4">Zero-shot Results</h4>
<div class="text-center mb-4">
<img src="zeroshot.png" alt="Zero-shot results on MOT17/PP22" class="img-fluid rounded shadow">
<p class="text-muted mt-2">Table 1: Our zero-shot results on MOT17 and PersonPath22 datasets</p>
</div>
<p>We show that, within an in-domain scenario, composing only the modules selected through expert knowledge yields superior results. </br>
Conversely, during domain shifts, leveraging all modules while assigning lower weights to unselected ones helps the model retain valuable knowledge without discarding any (Tab. 2).</p>
<h4 class="mt-4">Ablation Results</h4>
<div class="text-center mb-4">
<img src="ablation.png" alt="Ablation results" class="img-fluid rounded shadow">
<p class="text-muted mt-2">Table 2: Our ablation results on MOTSynth (in-domain) and MOT17 datasets</p>
</div>
</div>
</div>
<!-- Acknowledgements -->
<div class="row justify-content-center">
<div class="col-md-10">
<h3 class="mb-3">Acknowledgements</h3>
<p>The research was supported by the Italian Ministry for University and Research through the PNRR
project ECOSISTER ECS 00000033 CUP E93C22001100001 and by the EU Horizon project "ELIAS - European
Lighthouse of AI for Sustainability" (No. 101120237).</p>
</div>
</div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.3.2/js/bootstrap.bundle.min.js"></script>
</body>
</html>