INDEX
Comparison with Humans
Next table shows the comparison between the AWS-D model and the HumanPCT50 model, using the s-AUC and s-NSS values for all the vídeos in the CITIUS-VDB. The HumanPCT50 model represents the mean behaviour of half the subjects included in the database. It has been obtained by randomly selecting half of the fixations of the database subjects at each instant.
Comparison with other Models
We have studied the capability of the AWS-D model of predicting human fixations. Here we show the results obtained showing the agreement between our model and other -state of the art- models under different databases. This first table shows the behavior of AWS-D model, evaluated under two different metrics (s-AUC and s-NSS), using our CITIUS database, that includes 72 videos. By using the animated gif, we can clearly detect the good visual agreement between the two metrics (s-AUC and s-NSS) with this database. This can be understood as the next state. Easy videos are easy for almost all the tested algorithms, and also both metrics are in agreement. Hard videos are hard for almost all the tested algorithms, and both metrics agree in which parts of the videos are more difficult. For the next comparison we have selected the best classified four models in the comparison with our CITIUS-VBD, detailed id the right side of this figure.
In the next figure, on each chart, the rows represents each one of the 72 videos, being the X axis the time (frame number). Red regions on the charts are related with a good correspondence between the model prediction and the human fixation positions (sAUC~1 or big s-NSS), while blue regions means a bad correspondence (s-AUC~0.5 or s-NSS~0). The algorithms are presented in decreasing value of s-AUC/s-NSS. Each animation frame corresponds to one model in an infinite loop. It can be globally seen as a decrease of the proportion of global red color. Note that blue regions at the beginning of almost all the videos are mainly related with the central bias of the subjects. This is because they were fixating at the center of the screen immediately before the video start. Blue regions at the right side of the videos are zero values before the ending of the videos. The videos included in this database have different lengths.
Next table shows the comparison of the AWS-D model with other dynamic reference models. All of them was tested by using two different metrics; the sAUC and the s-NSS. All these data has been published in IEEE Transactions on Pattern Analysis and Machine Intelligence (info). Each row of the charts represents one of the videos included in the CITIUS database while each column is related with the model evaluated. Each cell shows the mean value of the metric, that is the mean s-AUC value in the two upper charts, and the mean s-NSS value for the two lower charts.
Previous charts could be grouped by models by calculating the mean value of the metric with all the videos. By doing that we get a global mean s-AUC or s-NSS. In the next table, left figures represents the results with the models ordered in increasing ROC value (Higher is better). Right figures represent the same models applied over a two different subsamples of the videos, labeled as the synthetic (SC) and natural (RC) categories. Human model, represented with the red bar, gets the best results, as expected. The AWS-D occupies the second position both in the global results and in both categories. The effect of the central bias is eliminated with the metrics as can be appreciated in the s-AUC and s-NSS values for both the Gaussian and Random models.
Saliency map samples
On the next table we also include the most representative animated gifs for the CITIUS-VDB videos (these animated gifs on the web only include one out of ten frames to save disk space and minimize unnecessary data processing). Left side of the table shows the original videos, second column shows human responses, while the other columns presents the saliency maps obtained with the dynamic saliency tested models. Those models are ordered from left to right in descending global ROC value, as can be revised in the tables in the IEEE published article (info). From top to bottom the vídeos are also ordered based on the global ROC value for each vídeo. First table shows the best three videos vs. the worst three with the Natural videos of the CITIUS VDB with Static camera.
Natural with Static Camera | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Original | Human | AWS-D | GBVSm | CIORFM | SUNDAY | Surprise | SEOD | CIOFM | PQFT | DCOF | ESA-D |
{…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} |
This second table shows the best three videos vs. the worst three with the Natural videos of the CITIUS VDB with Dynamic Camera.
Natural with Dynamic Camera | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Original | Human | AWS-D | GBVSm | CIORFM | SUNDAY | Surprise | SEOD | CIOFM | PQFT | DCOF | ESA-D |
{…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} | {…} |
This third table shows the best three videos vs. the worst three with the Synthetic videos of the CITIUS VDB.
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This information is under the license Creative Commons Reconocimiento-Compartir Igual 4.0 Internacional>. You can use this dataset on your publication as long as you include a citation to the reference on this page. When including a link to this dataset, please use this page instead of linking the file directly.
Click here to return the top of this page …