好文作者面授招-20141217

  • 主题:Local Difference Binary for Ultra-fast and Distinctive Feature Description
  • 主讲人:杨欣,华中科技大学
  • 主持人白翔,华中科技大学
  • 主题: Diverse Sequential Subset Selection for Supervised Video Summarization
  • 主讲人宫博庆,University of Southern California;
  • 主持人程明明,南开大学
  • 活动时间:2014年12月17日(周三),北京时间晚上9:30-11:00 (其中杨欣在9:30-10:10,宫博庆10:10-10:50)

1. 相关文献阅读

  • Xin Yang and Tim Cheng, Local Difference Binary for Ultra-fast and Distinctive Feature Description,IEEE TPAMI, 2014 [pdf]
  • Xin Yang, Chong Huang and Tim Cheng, libLDB: A Library for Extracting Ultrafast and Distinctive Binary Feature Description, ACM International Conference on Multimedia (MM), Open Source Software Competition, 2014. [Project page]
  • B. Gong, W. Chao, K. Grauman, and F. Sha. Diverse Sequential Subset Selection for Supervised Video Summarization. NIPS 2014. [pdf]
  • Boqing Gong, Wei-lun Chao, Kristen Grauman, and Fei Sha. Large-Margin Determinantal Point Processes, Arvix1411.1537. [pdf]

2. 讲座资料

  • Diverse Sequential Subset Selection for Supervised Video Summarization. [Slides]
  • 摘要: Supervised video summarization large-margin training method for DPP Video summarization is a challenging problem with great application potential. Whereas prior approaches, largely unsupervised in nature, focus on sampling useful frames and assembling them as summaries, we consider video summarization as a supervised subset selection problem. Our idea is to teach the system to learn from human-created summaries how to select informative and diverse subsets, so as to best meet evaluation metrics derived from human-perceived quality. To this end, we propose the sequential determinantal point process (seqDPP), a probabilistic model for diverse sequential subset selection. Our novel seqDPP heeds the inherent sequential structures in video data, thus overcoming the deficiency of the standard DPP, which treats video frames as randomly permutable items. Meanwhile, seqDPP retains the power of modeling diverse subsets, essential for summarization. Our extensive results of summarizing videos from 3 datasets demonstrate the superior performance of our method, compared to not only existing unsupervised methods but also naive applications of the standard DPP model.
  • Local Difference Binary for Ultra-fast and Distinctive Feature Description. [Slides]
  • 摘要:The efficiency, robustness and distinctiveness of a feature descriptor are critical to user experience and scalability of mobile computer vision apps, e.g. mobile augmented reality (AR). However, existing descriptors are either too computationally expensive to achieve real-time performance on a mobile devices such as smartphone or tablet, or not sufficiently robust and distinctive to identify correct matches from a large database. As a result, current mobile AR systems still only have limited capabilities, which greatly restrict their deployment in practice. In this talk, we present a highly efficient, robust and distinctiveness binary descriptor, called local difference binary (LDB). LDB directly computes a binary string from an image patch using simply intensity and gradient difference tests on pairwise grid cells within the patch. To select an optimized set of grid cell pairs, we densely sample grid cells from an image patch and then leverage a modified AdaBoost algorithm to automatically extract a small set of critical ones with the goal of maximizing the Hamming distance between mismatches while minimizing it between matches. Experimental results demonstrate that LDB is extremely fast to compute and to match a large database due to its high robustness and distinctiveness. Compared to the state-of-the-art binary descriptors, primarily designed for speed, LDB has similar efficiency for descriptor construction, while achieving a greater accuracy and faster matching speed when matching over a large database with 2.3M descriptors on mobile devices.

3. 代码数据下载

(Visited 406 times, 1 visits today)
  1. 赵黎明D浙大(649518776) 22:26:30请问杨老师,有没有实验只考虑Intensity的情况下效果如何?因为其他feature大部分都没有考虑梯度信息

    • 我们比较过只用intensity和加入gradient信息的情况。细节可以参见我的TVCG‘14年的paperXin Yang, Tim Cheng, Learning Optimized Local Difference Binaries for Scalable Augmented Reality on Mobile Devices. IEEE Trans. on Pattern Analysis and Machine Intelligence. 2014, June.

  2. 1. keypoint detector使用的什么detector2. 多种sampling pattern在具体使用时采用哪种3. 设计时是否有移动端的考虑,或者说假如实验全部是在PC上evaluation是否会有类似的结果

    • 1. ORB feature detector2. 根据实验经验, Multiple gridding and Uniform gridding 的performance最好3. 这个工作中没有特别考虑移动设备的硬件局限性。但我另外一篇工作ACM MM2012是针对移动硬件和算法之间匹配性来设计特征的。那篇工作在台式机和mobile上性能会差很多!Xin Yang, Tim Cheng, Accelerating SURF Detector on Mobile Devices. ACM Multimedia, 2012.

  3. 翟强 I 亮风台(287187465) 22:19:02杨老师您好,2.3M的hashtable,会不会限制识别的数据集大小?提取256bit或512bit特征后,能否用传统方法进行match

    • 1.目前的实验中我们的database只包含228张图,每张图2000个特征点,用了5个hashtable,所以特征点的数量大概是2.3M。你需要做更多的图也可以,只是开销会更大而已。hashtable在移动终端也许不一定是最有的indexing structure,因为包含太多的random memory access。我们也在尝试使用其他更适应与移动终端的indexing structure。2.这样的特征可以用传统的方法进行match,算hamming distance

  4. 由于feature是很基本的操作,对generalize的能力要求会很高。有没有试过在A数据集做训练,在B数据集做测试以检验Generalize ability?

  5. 李玺(350092629) 22:18:21 我有如下几个问题:1)这样学出来的feature,如何处理旋转和尺度不变? 2)特征的泛化能力如何,这样会不会太data-dependent,而原来的brief特征默认是uniform 分布,这样就比较能够应对各种情况。 3)还有那个参数beta,在adaboost的传统的指数loss,那个样本应该重新被归一化一下,也就变为normalized版本;这样beta=1,也应该就可以了,文章中的算法1好像没有归一化。4)作者是否和现在主流supervised hashing的方法进行比较了吗,感觉文章本质是就是一个boosting hashing,特征就像是若分类器,作者也许可以尝试一下

    • 1. 特征的学习过程都是针对upright的patch。我们采用传统的dominate orientation,并在抽特征之前将patch旋转到dominate orientation上来计算特征解决旋转不变性的问题。在旋转的情况下,我们采用rotated integral image来对特征提取进行加速。2. 我们设计的特征主要针对移动设备上应用可能出现的图像变性。移动设备上特定目标识别,追踪中最常见的图像变性包括:viewpoint变化,image blur, lighting changes, rotation和scaling。我们训练的时候选取的训练样本集主要包含了这5类变化,挑选出针对这5类变化比较robust的特征。4. 我们目前还没有和supervised hashing进行比较。谢谢你的建议,我们会考虑尝试。

  6. 李玺T浙大 我有如下问题:1)实际上,论文的本质上也是一种bayesian版的graph clustering,无非是定义一些within-subgraph和between-subgraph的约束,再加上temporal markov平滑约束,所以作者可否做一些和temporally constrained graph clustering进行比较; 2)是不是video summarization这个问题本身是带有主观性,ground truth的定义是否具有唯一性; 3) 还有一个问题,如果视频的frames,按时间轴高度重复,如何保证summarization的diversity,挑出的frames会是什么样的,作者尝试过了实验吗