好文作者面授招–20150527

【15-16期VALSE Webinar活动】

报告嘉宾1:吴建鑫(南京大学)
主持人: 聂飞平(University of Texas, Arlington)
报告英文题目:Feature Selection in Image and Video Recognition [Slides]
报告中文题目:图像和视频识别中的特征选择
报告时间:2015年5月27日晚20:00(北京时间)
文章信息:
报告简介:Recently, high dimensional representation such as FV or VLAD has shown excellent accuracy in image and action recognition. The computational and storage costs of these representations, however, have become a serious issue in large scale applications. In this talk, I will review existing methods to handle this issue, and introduce MI-based feature selection, a simple yet most effective method proposed by us. This method has shown been successfully applied to general image and video recognition, as well as fine-grained categorization.
近来,高维表示(如FV和VLAD)在图像和视频识别中已获得优秀的效果。然而,其存储和计算开销很大,使这些表示方法在大规模问题中的应用颇为棘手。在这个讲座中,我将回顾解决此问题的已有方法,并介绍我们提出的基于互信息的特征选择方法,该方法简单但最有效。该方法已被成功应用于通用的图像和视频识别,并被应用与细粒度图像分类中。
报告人简介:吴建鑫,南京大学教授,博士生导师,入选中组部青年海外高层次人才引进计划(青年千人计划),2014年获得国家自然科学基金委优秀青年科学基金项目支持。主要从事计算机视觉和机器学习等领域的研究工作。在重要国际期刊如TPAMI,IJCV,AIJ,JMLR,TIP等以及重要国际会议如 ICCV、CVPR、ICML、NIPS、IJCAI、INFOCOM、ICRA等发表论文六十余篇。曾担任国际会议ICCV、ACCV、ACML、PSIVT、 PCM等的领域主席和组织委员会成员,在ICCV workshop应邀做报告,并多次担任IJCAI、ICCV、CVPR、TPAMI、IJCV、TIP等的资深程序会、程序委员会成员、或期刊审稿人。 曾获得教育部自然科学一等奖(2005年度,第五完成人)。据Google Scholar统计,发表的论文被60余个国家和地区的学者他引3700余次。

报告嘉宾2:郑帅(牛津大学)
主持人:郑海永(中国海洋大学)
报告题目:ImageSpirit: Verbal Guided Image Parsing [Slides & Video]
报告时间:2015年5月27日晚21:00(北京时间)
文章信息:
[1] ImageSpirit: Verbal Guided Image Parsing, Ming-Ming Cheng, Shuai Zheng, Wen-Yan Lin, Vibhav Vineet, Paul Sturgess, Nigel Crook, Niloy Mitra, Philip Torr, ACM Transactions on Graphics (ACM TOG), 2014.
[2] Dense Semantic Image Segmentation with Objects and Attributes, S. Zheng, M. Cheng, J. Warrell, P. Sturgess, V. Vineet, C Rother, P. Torr, IEEE International Conference on Computer Vision and Pattern Recognition (IEEE CVPR), 2014.
报告简介:Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the trade-off compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study. The related publication has been published in ACM Transactions on Graphics (TOG) and will be presented at ACM SIGGRAPH 2015.
报告人简介:Shuai Zheng (Kyle) is currently a DPhil student at Oxford Torr Vision Group, working on Computer Vision and Machine Learning with Professor Philip Torr. Before that, He worked with Professor Kaiqi Huang in Professor Tieniu Tan’s group at National Laboratory of Pattern Recognition (NLPR). He obtained MEng in Pattern Recognition from Chinese Academy of Sciences, and BEng in Information Engineering from Beijing Institute of Technology. His research interests include semantic image segmentation, object recognition, probabilistic graphical models and large-scale deep learning. http://www.robots.ox.ac.uk/~szheng/

(Visited 440 times, 1 visits today)