WS-DAN:Weakly Supervised Data Augmentation Netowrk for Fine-Grained Visual Classification_ws-dan: weakly supervised data augmentation networ-程序员宅基地

技术标签: 深度学习  数据扩充  细粒度分类  Paper Notes  

See Better Before Looking Closer: Weakly Supervised Data Augmentation Netowrk for Fine-Grained Visual Classification
Paper PDF

Abstract

In practice, random data augmentation, such as random image cropping, is low-efficiency and might introduce many uncontrolled background noises. In this paper, they propose Weakly Supervised Data Augmentation Network (WS-DAN) to explore the potential of data augmentation. Specifically, for each training image, we first generate attention maps to represent the object’s discriminative parts by weakly supervised learning. Next, we augment the image guided by these attention maps, including attention cropping and attention dropping. The proposed WS-DAN improves the classification accuracy in two folds. In the first stage, images can be seen better since more discriminative parts’ features will be extracted. In the second stage, attention regions provide accurate location of object, which ensures our model to look at the object closer and further improve the performance.

In summary, the main contributions of this work are:

  1. They propose Weakly Supervised Attention Learning to generate attention maps to represent the spatial distribution of discriminative object’s parts, And use BAP module to get the whole object feature by accumulating the object’s part feature.
  2. Based on attention maps, they propose attention-guided data augmentation to improve the efficiency of data augmentation, including attention cropping and attention dropping. Attention cropping randomly crops and resizes one of the attention part to enhance the local feature representation. Attention dropping randomly erases one of the attention region out of the image in order to encourage the model to extract the feature from multiple discriminative parts.
  3. They utilize attention maps to accurately locate the whole object and enlarge it to further improve the classification accuracy.

Innovation

  1. Bilinear Attention Pooling(BAP)
  2. Attention Regularization
  3. Attention-guided Data Augmentation

Pipeline

在这里插入图片描述
The training process can be divided into two parts: Weakly Supervised Attention Learning and Attention-guided Data Augmentation:

Weakly Supervised Attention Learning

Spatial Representation

Attention maps A which is obtained from Feature maps F by convolutional function f ( ⋅ ) f(\cdot) f() in Equ 1 . Each Attention map A k A_{k} Ak represents one of the object’s part or visual pattern, such as the head of a bird, the wheel of a car or the wing of an aircraft. Attention maps will be utilized to augment training data.
A = f ( F ) = ⋃ k = 1 M A k (1) A=f(F)=\bigcup_{k=1}^{M}A_{k} \tag{1} A=f(F)=k=1MAk(1)

Bilinear Attention Pooling(BAP)

在这里插入图片描述
They propose Bilinear Attention Pooling (BAP) to extract features from these parts are represented by Attention maps. We element-wise multiply feature maps F by each attention map A k A_{k} Ak in order to generate M part feature maps F k F_{k} Fk, as shown in Equ 2
F k = A k ⊙ F ( k = 1 , 2 , . . . , M ) (2) F_{k} = A_{k} \odot F (k = 1, 2, ...,M) \tag{2} Fk=AkF(k=1,2,...,M)(2)

Then, They further extract discriminative local feature by additional feature extraction function g ( ⋅ ) g(\cdot) g() such as Global Average Pooling (GAP), Global Maximum Pooling (GMP) or convolutions, in order to obtain k t h k_{th} kth attention feature f k f_{k} fk.
f k = g ( F k ) (3) f_{k}=g(F_{k}) \tag{3} fk=g(Fk)(3)

Object’s feature is represented by part feature matrix P which is stacked by these part features f k f_{k} fk.

P = ( g ( a 1 ⊙ F ) g ( a 2 ⊙ F ) . . . g ( a M ⊙ F ) ) = ( f 1 f 2 . . . f M ) (4) P=\begin{pmatrix} g(a_{1} \odot F) \\ g(a_{2} \odot F) \\ ... \\ g(a_{M} \odot F) \end{pmatrix} =\begin{pmatrix} f_{1} \\ f_{2} \\ ... \\ f_{M} \end{pmatrix} \tag{4} P=g(a1F)g(a2F)...g(aMF)=f1f2...fM(4)

Attention Regularization

For each fine-grained category, They expect that attention map A k A_{k} Ak can represent the same k t h k_{th} kthobject’s part. They penalize the variances of features that belong to the same object’s part, which means that part feature f k f_{k} fk will get close to the a global feature center c k c_{k} ck and attention map A k A_{k} Ak will be activated in the same k t h k_{th} kth object’s part. The loss function can be represented by L A L_{A} LA in Equ 5.
L A = ∑ k = 1 M ∥ f k − c k ∥ 2 2 (5) L_{A}=\sum_{k=1}^{M}\left \| f_{k} - c_{k} \right \|_{2}^{2} \tag{5} LA=k=1Mfkck22(5)

c k c_{k} ck wil updates by the Equ 6 from initialization zero.
c k ← c k + β ( f k − c k ) (6) c_{k} \leftarrow c_{k} + \beta(f_{k} -c_{k}) \tag{6} ckck+β(fkck)(6)

Attention-guided Data Augmentation

Random image cropping, is low-efficiency and a high percentage of them contain many background noises, which might lower the training efficiency, affect the quality of the extracted features and cancel out its benefits. Using Attention as guideline, the crop images may focus more on the target.

Augmentation Map

For each training image, they randomly choose one of its attention map A k A_{k} Ak to guide the data augmentation process, and normalize it as k t h k_{th} kth Augmentation Map A k ∗ A_{k}^{*} Ak
A k ∗ = A k − m i n ( A k ) m a x ( A k ) − m i n ( A k ) (7) A_{k}^{*} = \frac{A_{k}-min(A_{k})}{max(A_{k})-min(A_{k})} \tag{7} Ak=max(Ak)min(Ak)Akmin(Ak)(7)

Attention Cropping

Crop Mask C k C_{k} Ck from A k ∗ A_{k}^{*} Ak by setting element A k ∗ ( i , j ) A_{k}^{*}(i,j) Ak(i,j) which is greater than threshold θ c \theta_{c} θc to 1, and others to 0, as represented in Equ 8.

C k ( i , j ) = { 1 ,  if  A k ∗ ( i , j ) > θ c 0 , otherwise. (8) C_{k}(i,j)={\begin{cases} 1, & \text{ if } A_{k}^{*}(i,j) > \theta_{c} \\ 0, & \text {otherwise.} \end{cases}} \tag{8} Ck(i,j)={ 1,0, if Ak(i,j)>θcotherwise.(8)

We then find a bounding box Bk that can cover the whole
selected positive region of C k C_{k} Ck and enlarge this region from raw image as the augmented input data.
在这里插入图片描述

Attention Dropping

To encourage attention maps represent multiple discriminative object’s parts, they propose attention dropping. Specifically, they obtain attention Drop Mask D k D_{k} Dk by setting element A k ∗ ( i , j ) A_{k}^{*}(i,j) Ak(i,j) which is greater than threshold θ d \theta_{d} θd to 0, and others to 1, as shown in Equ 9

D k ( i , j ) = { 1 ,  if  A k ∗ ( i , j ) > θ d 0 , otherwise. (9) D_{k}(i,j)={\begin{cases} 1, & \text{ if } A_{k}^{*}(i,j) > \theta_{d} \\ 0, & \text {otherwise.} \end{cases}} \tag{9} Dk(i,j)={ 1,0, if Ak(i,j)>θdotherwise.(9)
在这里插入图片描述

Object Localization and Refinement

In the testing process, after the model outputs the coarse-stage classification result and corresponding attention maps for the raw image, we can predict the whole region of the object and enlarge it to predict fine-grained result by the same network model. Object Map A m A_{m} Am that indicates the location of object is calculated by Equ 10.
A m = 1 M ∑ k = 1 M A k (10) A_{m}=\frac{1}{M}\sum_{k=1}^{M} A_{k} \tag{10} Am=M1k=1MAk(10)
The final classification result is averaged by the coarse- grained prediction and fine-grained prediction. The detailed process of Coarse-to-Fine prediction is described as Algorithm below:
在这里插入图片描述

Experiments

Ablation:

在这里插入图片描述

Comparison with random data augmentation

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

Comparison with Stage-of-the-Art Methods

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/language_zcx/article/details/105905575

智能推荐

pyqt5讲解10:布局管理讲解大全_pyqt5布局讲解-程序员宅基地

文章浏览阅读1.1k次,点赞6次,收藏9次。一.绝对布局使用move(x,y)的方法。绝对布局:如果改变屏膜的大小,控件的位置不会发生变化,不同 屏膜显示有区别# -*- coding: utf-8 -*-''' 【简介】 PyQT5中Absolute positioning(绝对定位)例子 '''import sys from PyQt5.QtWidgets import QWidget, QLabel, QApplication class Example(QWidget): _pyqt5布局讲解

C语言学习笔记——类型转换与运算符的优先级_c语言类型转换优先级-程序员宅基地

文章浏览阅读1w次,点赞13次,收藏92次。一、类型转换自动类型转换:在程序运行的时候,低精度的数据会自动向高精度的数据进行这类型转换,数据的进度不会丢失强制类型转换:有时候在我们运算的过程中,需要对数据进行短暂的类型转换,去进行运算,让数据更加精确自动类型转换: 一般为 低精度 向 高精度 转换 (可按字节大小顺序记,有低精度到高精度的数据类型) char - short - int - long - float -..._c语言类型转换优先级

安卓如何调出软键盘_Android 软键盘的显示和隐藏,这样操作就对了-程序员宅基地

文章浏览阅读2.3k次。一、前言如果有需要用到输入的地方,通常会有需要自动弹出或者收起软键盘的需求。开篇明义,本文会讲讲弹出和收起软键盘的一些细节,最终还会从源码进行分析。想要操作软键盘,需要使用到 InputMethodManager ,它是一个系统服务,可以使用 Context.getSystemService() 获取到它。而很多关键的逻辑代码,都是在 InputMethodManagerService 中实现的。..._unity 隐藏安卓小键盘

ModuleNotFoundError: No module named ‘flask._compat‘_from flask._compat import text_type modulenotfound-程序员宅基地

文章浏览阅读2.2w次,点赞11次,收藏31次。Error with Flask-Script :from flask._compat import text_typeModuleNotFoundError: No module named 'flask._compat’ 问题描述: 报错:ModuleNotFoundError:No module named ‘flask._compat’#main.pyfrom apps import create_appfrom flask_script import Managerapp = _from flask._compat import text_type modulenotfounderror: no module named 'fl

【项目开发 | C语言项目 | C语言学生管理系统】_vscode 利用c开发一个学生管理系统-程序员宅基地

文章浏览阅读310次。本项目是一个简易的学生信息管理系统,用户可以通过命令行界面完成学生信息的增加、删除、修改、查询、排序和列表展示等功能。数据以txt文件形式存储,实现了数据持久化。项目采用模块化设计,具有较好的可读性和扩展性。_vscode 利用c开发一个学生管理系统

【Contiki学习】01.Contiki-stm32系统下实现serial-shell功能_contiki stm32-程序员宅基地

文章浏览阅读2.8k次。一,背景介绍之前通过网上收集资料(资料后期整理之后会在博客分享),在stm32上面跑通了,就是实现了简单的点灯程序,以及串口输出。但是后期开发使用肯定需要用到shell,经过一番折腾后小有收获。在这说一下开发环境:IAR+stm32vet6(ST官方库3.5+contiki3.0)_contiki stm32

随便推点

Python模块学习:logging 日志记录_qpython输出日志-程序员宅基地

文章浏览阅读2.1k次。许多应用程序中都会有日志模块,用于记录系统在运行过程中的一些关键信息,以便于对系统的运行状况进行跟踪。在.NET平台中,有非常著名的第三方开源日志组件log4net,c++中,有人们熟悉的log4cpp,而在python中,我们不需要第三方的日志组件,因为它已经为我们提供了简单易用、且功能强大的日志模块:logging。logging模块支持将日志信息保存到不同的目标域中,如:保存到日志文件中;以_qpython输出日志

java.io.IOException: Another Balancer is running..  Exiting ._another balancer is running.. exiting ...-程序员宅基地

文章浏览阅读1.2k次。平衡数据报错java.io.IOException: Another Balancer is running.. Exiting ...由于在页面点击取消,但是实际进程还在后台运行,解决方法如下:ps -ef |grep hdfs.sh 之后kill掉即可..._another balancer is running.. exiting ...

shell 学习_regular file, no read permission-程序员宅基地

文章浏览阅读502次。shell脚本的try catch{ # try command1 #save your output} || { # catch # save log for exception }docs http://bbs.chinaunix.net/thread-813011-1-1.html1. 计算“指定路径”下的文件数量_regular file, no read permission

只能选择分卷文件的第一部分_【教程】分卷解压教程-程序员宅基地

文章浏览阅读2.3k次。操作区分为第一次以及之后。第一次进行操作时,会麻烦一点,但是之后就只需要一次即可。第一次需要把网址对应的全部文件下载下来,之后解压“先解压此文件”的压缩包,然后以管理员身份运行”以管理员身份运行此文件“,最后双击”分卷解压“文件即可。第一次操作之后完成第一次操作之后,不再需要下载”先解压此文件“的压缩包了,之后的每一次操作,都只需要双击“分卷解压”即可。文件名说明:2_1代表总共有两个分..._只能选择分卷文件的第一部但第一部分已经删除了怎么办

【程序员宅基地设置问题】修改为二级分类专栏时,提示「信息输入有误」_csdn 二级专栏-程序员宅基地

文章浏览阅读436次。文章目录二级分类专栏问题:「信息输入有误」原因:历史遗留的分类解决:添加「简介」二级分类专栏登录个人账号,「内容管理」-> 「专栏管理」有这样的提示信息:个人分类与专栏管理合并啦!双击分类名称,即可编辑,输入“#”+“空格”+“分类名称”可将一级分类改成二级分类。问题:「信息输入有误」双击某个分类专栏时,输入 # 专栏名 将此专栏「显示」为二级专栏。点击「保存」时,提示「信息输入有误」。原因:历史遗留的分类早期的「分类」没有填写「专栏简介」。早期的「分类」是在发布博客时,填写_csdn 二级专栏

2021年Spring Cloud Gateway中文文档详细解析(7-·19章)(3.0.2版)_exchange.getattribute(gateway_route_attr)是什么意思-程序员宅基地

文章浏览阅读1.6k次。gateway目前最新版3.0.2版本,第7章到第19章理解,全局过滤器,TLS,SSL,超时配置_exchange.getattribute(gateway_route_attr)是什么意思

推荐文章

热门文章

相关标签