tgif其实就是gif数据集,feat,vocabulary还有datasets获取参见https://github.com/fanchenyou/HME-VideoQA/tree/master/gif-qa
No module named ‘colorlog’
pip install colorlog
No module named ‘block’
pip install block.bootstrap.pytorch
ordinal not in range(128)
调了半天utf编码啥的都不行,结果全都改回去了反而就可以跑了
AttributeError: Can’t get attribute ‘_init_fn’ on <module ‘main’ (built-in)>
好像是多进程的什么问题
麻了,弃了
——————————————————————————
虽然跑不了,但姑且还是学一学吧
——————————————————————————
能跑了,就是有点慢,研究一下原理
+--------------+------------------+
| Parameter | Value |
+==============+==================+
| Ablation | none |有['none', 'gcn', 'global', 'local', 'only_local']这几种选项
+--------------+------------------+
| Batch size | 128 |
+--------------+------------------+
| Birnn | 0 |是否使用双向RNN
+--------------+------------------+
| Change lr | none |是否改变学习率,默认none即是不改变
+--------------+------------------+
| Checkpoint | Count_4.092.pth |saved_models\MMModel中存的预数据
+--------------+------------------+
| Cycle beta | 0.010 |
+--------------+------------------+
| Dropout | 0.300 |
+--------------+------------------+
| Fusion type | coattn |有['none', 'coattn', 'single_visual', 'single_semantic', 'coconcat','cosiamese']这几种选项
+--------------+------------------+
| Gcn layers | 2 |+1,即默认GCN层数为3
+--------------+------------------+
| Hidden size | 512 |
+--------------+------------------+
| Lr | 0.000 |
+--------------+------------------+
| Lr list | [10, 20, 30, 40] |
+--------------+------------------+
| Max epoch | 100 |
+--------------+------------------+
| Max n videos | 100000 |
+--------------+------------------+
| Model | 7 |
+--------------+------------------+
| Momentum | 0.900 |
+--------------+------------------+
| Num workers | 1 |
+--------------+------------------+
| Prefetch | none |有[none,nvidia, background]这几种选择,还有对应的nvidia_prefetcher,BackgroundGenerator类的调用
+--------------+------------------+
| Q max length | 35 |
+--------------+------------------+
| Rnn layers | 1 |
+--------------+------------------+
| Save | False |是否保存模型,输入--save则为True
+--------------+------------------+
| Save adj | False |是否保存邻接矩阵,输入--save_adj则为True
+--------------+------------------+
| Save path | ./saved_models/ |
+--------------+------------------+
| Server | 1080ti |有['780, 1080ti, 1080']这几种选择
+--------------+------------------+
| Task | Count |有[Count, Action, FrameQA, Trans]这几种选择
+--------------+------------------+
| Test | False |False即为训练,输入--test则为测试
+--------------+------------------+
| Tf layers | 1 |
+--------------+------------------+
| Two loss | 0 |
+--------------+------------------+
| V max length | 80 |
+--------------+------------------+
| Val ratio | 0.100 |
+--------------+------------------+
| Weight decay | 0 |
+--------------+------------------+
补充
data_path |'/home/jp/data/tgif-qa/data' |如果server是'780'则设置这一条,那么默认1080的话怎么办?下面还要用呢?
feat_dir | data_path+'feats'
vc_dir | data_path+'Vocabulary'
df_dir | data_path+'dataset'
model_name | 'Count' |即task
pin_memory | False
dataset | 'tgif_qa'
log | './logs'
val_epoch_step | 1
two_loss | False |上面的two_loss大于0则设为True,否则为False
birnn | False |同上
save_model_path | save_path + 'MMModel/'
通过data_utils的dataset创建两个TGIFQA类实例full_dataset(长为26839),test_dataset(长为3554),区别在于前者dataset_name=‘train’,后者为’test’
再通过torch.utils.data.random_split划分训练验证集,训练24156,验证2683,再通过torch.utils.data创建三个DataLoader类实例train_dataloader,val_dataloader,test_dataloader
补充
resnet_input_size | 2048
c3d_input_size | 4096
text_embed_size | 300 |train_dataset.dataset.GLOVE_EMBEDDING_SIZE
answer_vocab_size | None
word_matrix | (2423, 300)的ndarray |train_dataset.dataset.word_matrix
voc_len | 2423
VOCABULARY_SIZE = train_dataset.dataset.n_words=2423
当前task为’Count’所以创建nn.MSELoss()均方误差的损失对象,并设定best_val_acc=-100
训练模型通过LSTMCrossCycleGCNDropout创建
for ii, data in enumerate(train_dataloader):
(看不太懂这个train_dataloader里面到底那个东西是数据信息,咋就摘出来ii和data了)
data是[128, 80, 2048]float32,[128, 80, 4096]float32,[128]int64,[128, 35]int64,[128]int64,[128]float326个tensor组成的list
当前change_lr为none所以创建如下优化器
Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.0001
weight_decay: 0
)
LSTMCrossCycleGCNDropout(
读取train_dataloader
#sentence_inputs(batch_size, sentence_len, 1)
#video_inputs(batch_size, frame_num, video_feature)
当前task为'Count'所以首先执行forward_count
输入:
resnet_inputs [128, 80, 2048]
c3d_inputs [128, 80, 4096]
video_length 128
sentence_inputs [128, 35]
question_length 128
answers 128
#创建all_adj[128,115,115]
#通过model_block得到out, adj
###问题编码 输入
# sentence_inputs [128, 35]
# question_length 128
(sentence_encoder): SentenceEncoderRNN(
(embedding): Embedding(2423, 300, padding_idx=0)
#得到embedded[128,35,300]
(dropout): Dropout(p=0.3, inplace=False)
(upcompress_embedding): Linear(in_features=300, out_features=512, bias=False)
(relu)
#[128,35,300]x[300,512]->[128,35,512]
if variable_lengths:
nn.utils.rnn.pack_padded_sequence
#输入:
# embedded [128,35,300]
# input_lengths 128
#输出:
# embedded:[1269,512],15,128,128 4个tensor组成的PackedSequence,
# 有15个batch,batchsize有128,128,128,128,128,128,128, 128,107,60,35,20,16,6,1
(rnn): GRU(512, 512, batch_first=True, dropout=0.3)
#输入:embedded
#输出:
# output:尺寸和batchsize同embedded的PackedSequence
# hidden:[1, 128, 512]tensor
if variable_lengths:
nn.utils.rnn.pad_packed_sequence
#输入:output
#输出:output[128, 15, 512]
#——————————————————————————————————————
if self.n_layers > 1 and self.bidirectional:
(compress_output): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)→q_output
(compress_hn_layers_bi): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)→s_hidden
elif self.n_layers > 1:
(compress_hn_layers): Linear(in_features=512, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)→s_hidden
elif self.bidirectional:
(compress_output): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)→q_output
(compress_hn_bi): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)→s_hidden
#————————————————————————————————
)
#输出:
# q_output[128, 15, 512]即上面的output
# s_hidden[1, 128, 512]即上面的hidden,再squeeze到s_last_hidden[128, 512]
###视频编码
(compress_c3d): WeightDropLinear(in_features=4096, out_features=2048, bias=False)
#c3d_inputs[128, 80, 4096]x[4096,2048]->[128,80,2048]
(relu)
(video_fusion): WeightDropLinear(in_features=4096, out_features=2048, bias=False)
#c3d_inputs与resnet_inputs拼接
#[128, 80, 4096]x[4096,2048]->video_inputs[128,80,2048]
(relu)
(video_encoder): VideoEncoderRNN(
#输入:
# video_inputs [128,80,2048]
# video_length 128
(project): Linear(in_features=2048, out_features=512, bias=False)
#[128,80,2048]x[2048,512]->embedded[128,80,512]
(relu)
(dropout): Dropout(p=0.3, inplace=False)
if variable_lengths:
nn.utils.rnn.pack_padded_sequence
#输入:
# embedded [128,80,512]
# input_lengths 128
#输出:
# embedded:[5311,512],80,128,128 4个tensor组成的PackedSequence,有80个batch
(rnn): GRU(512, 512, batch_first=True, dropout=0.3)
#输入:embedded
#输出:
# output:尺寸和batchsize同embedded的PackedSequence
# hidden:[1, 128, 512]tensor
if variable_lengths:
nn.utils.rnn.pad_packed_sequence
#输入:output
#输出:output[128, 80, 512]
#——————————————————————————————————————
if self.n_layers > 1 and self.bidirectional:
(compress_output): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
(compress_hn_layers_bi): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
elif self.n_layers > 1:
(compress_hn_layers): Linear(in_features=512, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
elif self.bidirectional:
(compress_output): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
(compress_hn_bi): Linear(in_features=1024, out_features=512, bias=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
#—————————————————————————————————————
)
#输出:
# v_output[128, 80, 512]即上面的output
# v_hidden[1, 128, 512]即上面的hidden,再squeeze到v_last_hidden[128, 512]
if self.ablation != 'local':
###视频问题融合
if self.tf_layers != 0:
(q_input_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
(v_input_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
#————————————————————————————
###自注意力
if 'self' in self.fusion_type:
(q_selfattn): SelfAttention(
(padding_mask_k)
#(bs, q_len, v_len)
(padding_mask_q)
#(bs, v_len, q_len)
(encoder_layers): ModuleList(
SelfAttentionLayer(
if attn_mask is None or softmax_mask is None:
(padding_mask_k)
(padding_mask_q)
#三头
(linear_k): WeightDropLinear(in_features=512, out_features=512, bias=False)
(linear_q): WeightDropLinear(in_features=512, out_features=512, bias=False)
(linear_v): WeightDropLinear(in_features=512, out_features=512, bias=False)
(softmax): Softmax(dim=-1)
(linear_final): WeightDropLinear(in_features=512, out_features=512, bias=False)
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
)
)
)
(v_selfattn): SelfAttention(
(padding_mask_k)
(padding_mask_q)
(encoder_layers): ModuleList(
SelfAttentionLayer(
if attn_mask is None or softmax_mask is None:
(padding_mask_k)
(padding_mask_q)
#三头
(linear_k): WeightDropLinear(in_features=512, out_features=512, bias=False)
(linear_q): WeightDropLinear(in_features=512, out_features=512, bias=False)
(linear_v): WeightDropLinear(in_features=512, out_features=512, bias=False)
(softmax): Softmax(dim=-1)
(linear_final): WeightDropLinear(in_features=512, out_features=512, bias=False)
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
)
)
)
#——————————————————————————————————
if 'coattn' in self.fusion_type:
(co_attn): CoAttention(
#输入:layernorm后的q_output,v_output
(padding_mask_k)
#fake_q[128,15,512]×v_output.T[128,512,80]->attn_mask[128,15,80]bool
(padding_mask_q)
#q_output[128,15,512]×fake_k.T[128,512,80]->softmax_mask[128,15,80]bool
(padding_mask_k)
#fake_q[128,80,512]×q_output.T[128,512,15]->attn_mask_[128,80,15]bool
(padding_mask_q)
#v_output[128,80,512]×fake_k.T[128,512,15]->softmax_mask_[128,80,15]bool
(encoder_layers): ModuleList(
CoAttentionLayer(
#输入:
#q_output,v_output,attn_mask,softmax_mask,attn_mask_,softmax_mask_
#多头
(linear_question): WeightDropLinear(in_features=512, out_features=512, bias=False)
(linear_video): WeightDropLinear(in_features=512, out_features=512, bias=False)
(linear_v_question): WeightDropLinear(in_features=512, out_features=512, bias=False)
(linear_v_video): WeightDropLinear(in_features=512, out_features=512, bias=False)
#得到
#question_q [128, 15, 512]
#video_k [128, 80, 512]
#question [128, 15, 512]
#video [128, 80, 512]
#scale=512^(-1/2)
#question_q×video_k.T->attention_qv[128,15,80]
#attention_qv×scale再通过masked_fill(attn_mask,-np.inf)
(softmax): Softmax(dim=-1)
#attention_qv再通过masked_fill(softmax_mask,0)
#video_k×question_q.T->attention_vq[128,80,15]
#attention_vq×scale再通过masked_fill(attn_mask_,-np.inf)
(softmax): Softmax(dim=-1)
#attention_vq再通过masked_fill(softmax_mask_,0)
#attention_qv×v_output->output_qv[128,15,512]
(linear_final_qv): WeightDropLinear(in_features=512, out_features=512, bias=False)
(layer_norm_qv): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
#output_qv+q_output做LayerNorm
#attention_vq×q_output->output_vq[128,15,512]
(linear_final_vq): WeightDropLinear(in_features=512, out_features=512, bias=False)
(layer_norm_vq): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
#output_vq+v_output做LayerNorm
)
)
)
#输出:
# q_output[128,15,512]
# v_output[128,80,512]
###GCN
(adj_learner): AdjLearner(
#拼接q_output,v_output得到graph_nodes[128,95,512]
(edge_layer_1): Linear(in_features=512, out_features=512, bias=False)
(relu)
(edge_layer_2): Linear(in_features=512, out_features=512, bias=False)
(relu)
#[128,95,512]×[128,512,95]->adj[128,95,95]
)
#另外拼接q_output,v_output得到q_v_inputs[128,95,512]
(gcn): GCN(
#输入:q_v_inputs,adj
#输出:q_v_output[128,95,512]
(layers): ModuleList(
(0): GraphConvolution(
(weight): Linear(in_features=512, out_features=512, bias=False)
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
)
(1): GraphConvolution(
(weight): Linear(in_features=512, out_features=512, bias=False)
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
)
(2): GraphConvolution(
(weight): Linear(in_features=512, out_features=512, bias=False)
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=False)
(relu)
(dropout): Dropout(p=0.3, inplace=False)
)
)
)
###注意力池
(gcn_atten_pool): Sequential(
(0): Linear(in_features=512, out_features=256, bias=True)
#[128,95,512]x[512,256]->[128,95,256]
(1): Tanh()
(2): Linear(in_features=256, out_features=1, bias=True)
#[128,95,256]x[256,1]->[128,95,1]
(3): Softmax(dim=-1)
)
#q_v_output[128,95,512]×local_attn[128,95,1]->[128,95,512]
#再sum到local_out[128,512]
if self.ablation != 'global':
###全局融合
(global_fusion): Block( #block包里的功能
#输入:
# s_last_hidden[128,512], v_last_hidden[128,512]
(linear0): Linear(in_features=512, out_features=1600, bias=True)
#[128,512]x[512,1600]->[128,1600]
(linear1): Linear(in_features=512, out_features=1600, bias=True)
#[128,512]x[512,1600]->[128,1600]
if self.dropout_input > 0:
(dropout)
(dropout)
(get_chunks)
#输入: x0
# self.sizes_list:[80]*20
#输出:
# x0_chunks:通过对x0做narrow得到20个[128,80]的tensor组成的list
(get_chunks)
# x1_chunks:通过对x1做narrow得到20个[128,80]的tensor组成的list
#同步遍历x0_chunks和x1_chunks的20个tensor
(merge_linears0): ModuleList( #20层
(0): Linear(in_features=80, out_features=1200, bias=True)
#[128,80]x[80,1200]->[128,1200]
)
(merge_linears1): ModuleList( #20层
(0): Linear(in_features=80, out_features=1200, bias=True)
#[128,1200]x[128,1200]->[128,1200]
#再view成[128,15,80]
#再sum成z[128,80]
#z=relu(z)^(-1/2)-relu(-z)^(-1/2)
(normalize)
#拼接成[128,1600]
)
(linear_out): Linear(in_features=1600, out_features=512, bias=True)
#[128,1600]×[1600,512]->global_out[128,512]
)
if self.ablation != 'local':
(fusion): Block(
#输入:
# global_out[128,512], local_out[128,512]
(linear0): Linear(in_features=512, out_features=1600, bias=True)
(linear1): Linear(in_features=512, out_features=1600, bias=True)
(merge_linears0): ModuleList(#20层
(0): Linear(in_features=80, out_features=1200, bias=True)
)
(merge_linears1): ModuleList(#20层
(0): Linear(in_features=80, out_features=1200, bias=True)
)
(linear_out): Linear(in_features=1600, out_features=1, bias=True)
#[128,1600]×[1600,1]->out[128,1]
#输出:
# out[128,1]
# adj[128, 95, 95]
#把adj放进all_adj(顶格放进前95位)
#对out通过clamp到1到10(好像得到全是1啊?)
)
输出
out, predictions, answers, all_adj
用预测值out与标签answers做MSE损失
对每个batch做一次BP,总计是188个batch
每轮计算一个acc,比如18.935%,并计算本轮训练损失均值,比如5.758
然后又神秘的计算了一个所谓的真实损失,又拿所有预测结果和所有标签计算一次MSE,比如5.802
后面的验证测试集也是这样,既计算每batch的损失的均值和最终的acc,也要算一下真实损失(又不BP计算损失干啥,还能当性能指标的吗?)
俗话说,隔行如隔山,异步君的朋友曾天真地问我桌面上花花绿绿的“字母”是什么?当我告诉她,没有这些“字母”,就没有她每天看的微博热搜和小哥哥美照。她:“???”把代码理解为“字母”是外行闹笑话,那同行是不是就能理解彼此?其实不然,对程序员来说,最痛苦的事情不是修BUG,看其他人写的代码更痛苦。特别是看没注释的代码,恨不得给同事一招乾坤大挪移,把他写代码时的脑子,移给自己。“没注释的代码就像小..._程序员你命名数字习惯
首先,两个函数的功能是有区别的:reserve是容器预留空间,但并不真正创建元素对象,在创建对象之前,不能引用容器内的元素,因此当加入新的元素时,需要用push_back()/insert()函数。resize是改变容器的大小,并且创建对象,因此,调用这个函数之后,就可以引用容器内的对象了,因此当加入新的元素时,用operator[]操作符,或者用迭代器来引用元素对象。其次,两个函数的形式是有区别的:reserve函数之后一个参数,即需要预留的容器的空间;resize函数可以有两个参数,第一个_回答 下 stl resize 和 reserve 的区别
蓝牙分为 低功耗蓝牙Ble 和 经典蓝牙ble 不错的网址:https://blog.csdn.net/chaoyue0071/article/details/43450091/https://blog.csdn.net/xj10160/article/details/73655112蓝牙框架https://github.com/xiaoyaoyou1212/BLEhttps://github.c..._android ble 保持长链接
问题详情:/usr/local/bin/webpack-cli -> /usr/local/lib/node_modules/webpack-cli/bin/cli.jsnpm WARN [email protected] requires a peer of [email protected] but none is installed. You must install peer dependencies yourself.问题翻译:npm WARN [email protected]要求_peer webpack@"4.x
简介cube是tme开源的一站式云原生机器学习平台,目前主要包含特征平台,支持在/离线特征;数据源管理,支持结构数据和媒体标注数据管理; 在线的vscode/jupyter代码开发;在线镜像调试,支持免dockerfile,增量构建; 任务流编排,在线拖拉拽;开放的模板市场,支持tf/pytorch/mxnet/spark/ray/horovod/kaldi/volcano等分布式计算/训练任务;task的单节点debug,分布式任务的批量优先级调度,聚合日志;任务运行资源监控,报警;定时调度,_cube studio
【缓冲流、转换流、序列/反序列化流、打印流】一.缓冲流1.字节缓冲输出流java.io.BufferedOutputStream extends OutputStream 高效字节输出流 写入文件 目标文件底层定义个了一个长度为8192的数组,提高效率常用方法:write(int b): 写单个字节...
python3下无法安装PIL。提示报错。正确的做法是安装PIL的模块Pillow。输入命令pip installPillow提示安装成功,再运行程序没有问题
一个简易实用物流信息跟踪页面模板,套上即可使用,可支持个人实际情况需求修改。话不多说,直接上代码示例哦= =基于vue2.0版的uniapp使用代码示例(ps:毕竟vue3.0了)template结构模块:引入插件模块引入组件模块函数方法:data函数模块:(截图一半)数据json模板使用快递100快递100相关技术文档地址: https://api.kuaidi100.com/document/5f0ffa8f2977d50a94e1023c.html#title_1._后端物流跟踪模块
stringpostString="arg1=a&arg2=b";//这里即为传递的参数,可以用工具抓包分析,也可以自己分析,主要是form里面每一个name都要加进来byte[]postData=Encoding.UTF8.GetBytes(postString);//编码,尤其是汉字,事先要看下抓取网页的编码方式stringurl="http://...
非常非常详细_impacket-mssqlclient
为什么80%的码农都做不了架构师?>>> ...
理科,无论是数学还是代码,都要讲究漂亮二字,比如说我认为最好看的代码就是“hello word!”。当然,好看的定义在每个人眼里都是不一样的,不过“高颜值”越来越被人看重,毕竟爱美之心,人皆有之。在大数据领域,可视化被用的越来越多,比如监测流动数据,再比如最近的618、双11成交情况分析,无不体现出“好看”的重要性。仅仅从可视化方面来说,相比于Excel那毫无新意的图表,这大屏简直完胜。可视化有很多种回复,之前做过基于D3.js和Echarts的数据可视化PC端的展示,但都是每次手动刷新网页_有没有不用编程拖拽式写大屏