* Below is the final schedule. *

Main Workshop on July 26

Room: 321 AB

1:30 - 1:40 pm   Introduction
1:40 - 2:00 pm   Keynote
           Learning to Segment Moving Objects, by Cordelia Schmid (INRIA)
2:00 - 2:30 pm   Oral Session 1
           Gaze Embeddings for Zero-Shot Image Classification
             by Nour Karessli (Max Planck Institute for Informatics)
           Towards Better Instance-level Recognition
             by Georgia Gkioxari (Facebook AI Research)
2:30 - 2:50 pm   Keynote
           Interferences in Match Kernels, by Naila Murray (Naver Labs/NLE)
2:50 - 4:15 pm   Poster Session and Coffee Break
4:15 - 4:35 pm   Keynote
           Computer Vision for the Blind, by Chieko Asakawa (IBM Research/CMU)
4:35 - 5:05 pm   Oral Session 2
           Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution
             by Lanlan Liu (University of Michigan)
           Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network
             by Nasim Souly (University of Central Florida)
5:05 - 5:35 pm   Panel session
           Chieko Asakawa (IBM Research/CMU)
           Andrea Frome (Clarifai)
           Raia Hadsell (DeepMind)
           Naila Murray (Naver Labs/NLE)
           Cordelia Schmid (INRIA)
           Helge Seetzen (TandemLaunch)
5:35 - 5:45 pm   Closing Remarks

Keynote Talks

Keynote speakers will give technical talks about their research in computer vision.

Chieko Asakawa (IBM Research/CMU)

Title: Computer Vision for the Blind

Abstract: Blind people have been dreaming of a machine which can recognize objects, people and environment, such as goods in a shop, people around them, or obstacles in a corridor. For many years, such machines were only available in science fiction, but now thanks to the advancement of deep learning and computer vision technologies, it is becoming closer to a reality, by supplementing and augmenting missing or weakened abilities of people with visual impairments. In this talk, I will outline a set of necessary technologies, demonstrate our efforts and cast a vision for near future deployments to change people's lives.

Bio: Chieko Asakawa is a blind Japanese computer scientist, known for her work at IBM Research – Tokyo in accessibility. A Netscape browser plug-in which she developed, the IBM Home Page Reader, became the most widely used web-to-speech system available. She is the recipient of numerous industry and government awards. Asakawa was born with normal sight, but after she injured her optic nerve in a swimming accident at age 11, she began losing her sight, and by age 14 she was fully blind. She earned a bachelor's degree in English literature at Otemon Gakuin University in Osaka in 1982 and then began a two-year computer programming course for blind people using an Optacon to translate print to tactile sensation. Chieko joined IBM in 1985 after completing the computer science courses for the blind at Nippon Lighthouse. She received a B.A. degree in English literature from Otemon Gakuin University in 1982, and a Ph.D in Engineering from the University of Tokyo in 2004. She is a member of the Association for Computing Machinery (ACM), the Information Processing Society of Japan, and IBM Academy of Technology. She was inducted into the Women in Technology International (WITI) Hall of Fame in 2003, and both within and outside of IBM, she has been actively working to help women engineers pursue technical careers. Chieko was appointed to IBM Fellow in 2009, IBM's most prestigious technical honor. In 2013, the government of Japan awarded the Medal of Honor with Purple Ribbon to Chieko for her outstanding contributions to accessibility research, including the development of the voice browser for the visually impaired.

Naila Murray (Naver Labs/NLE)

Title: Interferences in Match Kernels

Abstract: We consider the design of an image representation that embeds and aggregates a set of local descriptors into a single vector. Popular representations of this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When two such image representations are compared with the dot-product, the image-to-image similarity can be interpreted as a match kernel. In match kernels, one has to deal with interference, i.e. with the fact that even if two descriptors are unrelated, their matching score may contribute to the overall similarity. We formalise this problem and propose two related solutions, both aimed at equalising the individual contributions of the local descriptors in the final representation. These methods modify the aggregation stage by including a set of per-descriptor weights. They differ by the objective function that is optimised to compute those weights. The first is a “democratisation” strategy that aims at equalising the relative importance of each descriptor in the set comparison metric. The second one involves equalising the match of a single descriptor to the aggregated vector. These concurrent methods give a substantial performance boost over standard aggregation methods, as demonstrated by our experiments on standard public image retrieval benchmarks.

Bio: Naila Murray graduated with a PhD in computer science from the Unversitat Autònoma de Barcelona. She also holds a master’s degree in computer vision and artificial intelligence from the Unversitat Autònoma de Barcelona, and a bachelor’s degree in electrical engineering from Princeton University. Her work in computer vision has involved research into biologically-inspired deep models of visual attention; fine-grained visual recognition, including participation in the winning team of the FGComp 2013 competition; and computational models for visual aesthetic analysis. Naila is a senior scientist and manager of the computer vision group at Xerox Research Centre Europe. Currently, her research focuses on visual search in large databases, and human behaviour understanding, particularly for video action recognition.

Cordelia Schmid (INRIA)

Title: Learning to Segment Moving Objects

Abstract: This talk addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a “visual memory” in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given video frames as input, our approach first assigns each pixel an object or background label obtained with an encoder-decoder network that takes as input optical flow and is trained on synthetic data. Next, a “visual memory” specific to the video is acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows to propagate spatial information over time. We evaluate our method extensively on two benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show state-of-the-art results.

Bio: Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis received the best thesis award from INPG in 1996. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996--1997. Since 1997 she has held a permanent research position at INRIA Grenoble Rhone-Alpes, where she is a research director and directs an INRIA team. Dr. Schmid is the author of over a hundred technical publications. She has been an Associate Editor for IEEE PAMI (2001--2005) and for IJCV (2004--2012), editor-in-chief for IJCV (2013---), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015 and ECCV 2020. In 2006, 2014 and 2016, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. She was awarded an ERC advanced grant in 2013, the Humbolt research award in 2015 and the Inria & French Academy of Science Grand Prix in 2016. She was elected to the German National Academy of Sciences, Leopoldina, in 2017.

Panel

Panelists will answer questions and discuss about increasing diversity in computer vision.

Feel free to ask your anonymous questions here.

Chieko Asakawa (IBM Research/CMU)

Chieko Asakawa is a blind Japanese computer scientist, known for her work at IBM Research – Tokyo in accessibility. A Netscape browser plug-in which she developed, the IBM Home Page Reader, became the most widely used web-to-speech system available. She is the recipient of numerous industry and government awards. Asakawa was born with normal sight, but after she injured her optic nerve in a swimming accident at age 11, she began losing her sight, and by age 14 she was fully blind. She earned a bachelor's degree in English literature at Otemon Gakuin University in Osaka in 1982 and then began a two-year computer programming course for blind people using an Optacon to translate print to tactile sensation. Chieko joined IBM in 1985 after completing the computer science courses for the blind at Nippon Lighthouse. She received a B.A. degree in English literature from Otemon Gakuin University in 1982, and a Ph.D in Engineering from the University of Tokyo in 2004. She is a member of the Association for Computing Machinery (ACM), the Information Processing Society of Japan, and IBM Academy of Technology. She was inducted into the Women in Technology International (WITI) Hall of Fame in 2003, and both within and outside of IBM, she has been actively working to help women engineers pursue technical careers. Chieko was appointed to IBM Fellow in 2009, IBM's most prestigious technical honor. In 2013, the government of Japan awarded the Medal of Honor with Purple Ribbon to Chieko for her outstanding contributions to accessibility research, including the development of the voice browser for the visually impaired.

Andrea Frome (Clarifai)

Dr. Andrea Frome earned her Ph.D. in Computer Science and Machine Learning in Jitendra Malik’s lab at UC Berkeley in 2007. Since then, her work in computer vision and machine learning has included: leading the visual recognition team within Street View which is especially known for its work blurring faces and license plates; as a member of the Google Brain team, developing DeViSE for combining visual recognition with word embeddings and applying an attention RNN to fine-grained classification; and work on systems for Hillary for America at campaign headquarters for identity resolution and automatically reading canvassing surveys to reduce data entry. In January 2017, she joined Clarifai as Director of Research. Her non-work pursuits include doing volunteer work for flippable.org, studying flying trapeze, and learning Argentine Tango.

Raia Hadsell (DeepMind)

Raia Hadsell, a senior research scientist at DeepMind, has worked on deep learning and robotics problems for over 10 years. Her early research developed the notion of manifold learning using Siamese networks, which has been used extensively for invariant feature learning. After completing a PhD with Yann LeCun, which featured a self-supervised deep learning vision system for a mobile robot, her research continued at Carnegie Mellon’s Robotics Institute and SRI International, and in early 2014 she joined DeepMind in London to study artificial general intelligence. Her current research focuses on the challenge of continual learning for AI agents and robots. While deep RL algorithms are capable of attaining superhuman performance on single tasks, they often cannot transfer that performance to additional tasks, especially if experienced sequentially. She has proposed neural approaches such as policy distillation, progressive nets, and elastic weight consolidation to solve the problem of catastrophic forgetting for agents and robots.

Naila Murray (Naver Labs/NLE)

Naila Murray graduated with a PhD in computer science from the Unversitat Autònoma de Barcelona. She also holds a master’s degree in computer vision and artificial intelligence from the Unversitat Autònoma de Barcelona, and a bachelor’s degree in electrical engineering from Princeton University. Her work in computer vision has involved research into biologically-inspired deep models of visual attention; fine-grained visual recognition, including participation in the winning team of the FGComp 2013 competition; and computational models for visual aesthetic analysis. Naila is a senior scientist and manager of the computer vision group at Xerox Research Centre Europe. Currently, her research focuses on visual search in large databases, and human behaviour understanding, particularly for video action recognition.

Cordelia Schmid (INRIA)

Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis received the best thesis award from INPG in 1996. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996--1997. Since 1997 she has held a permanent research position at INRIA Grenoble Rhone-Alpes, where she is a research director and directs an INRIA team. Dr. Schmid is the author of over a hundred technical publications. She has been an Associate Editor for IEEE PAMI (2001--2005) and for IJCV (2004--2012), editor-in-chief for IJCV (2013---), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015 and ECCV 2020. In 2006, 2014 and 2016, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. She was awarded an ERC advanced grant in 2013, the Humbolt research award in 2015 and the Inria & French Academy of Science Grand Prix in 2016. She was elected to the German National Academy of Sciences, Leopoldina, in 2017.

Helge Seetzen (TandemLaunch)

Helge is an award-winning technologist, entrepreneur, and a recognized global authority on technology transfer and display technologies. As General Partner of TandemLaunch, he works with inventors and entrepreneurs to build high growth technology companies. His past successes include the transformation of raw university IP into fully commercialized LED TV technology, including selling his last company - Brightside Technologies - to Dolby Laboratories after sealing partnerships with several of the largest consumer electronics companies in the world. Helge holds over 80 patents in the fields of display, camera and video technology.

Oral Presentations

A few accepted abstracts are invited to give oral presentations.

Presenter instructions: The presentations should be 12 minute talk and 3 minutes Q/A.


Accepted orals
Presenter Name
Institution
Paper Title
Facebook AI Research
Towards Better Instance-level Recognition

Poster Presentations

Authors of all accepted abstracts (with or without travel grant) will present their work in a poster session.

Presenter instructions: All posters should be installed in at most 10 minutes at the start of the poster session in the afternoon. The poster boards are located at the room Kamahameha II (where the main conference posters were). The physical dimensions of the poster stands are 8 feet wide by 4 feet high. Poster presenters can optionally use the CVPR17 poster template for more details on how to prepare their posters. Please note your poster number below to find your board.


Accepted abstracts

No
Presenter Name
Institution
Paper Title
5
Atreyee Sinha
Edgewood College
Pre-trained CNNs for Artistic Style Recognition
7
Huazhong University of Science and Technology
8
Orand SA
Automatic clothing labeling of outdoor images
9
Chia-Yin Tsai
Carnegie Mellon Unversity
The Geometry of First-Returning Photons for Non-Line-of-Sight Imaging
13
Faezeh Tafazzoli
University of Louisville
Vehicle Make and Model Recognition for Automated Vehicular Surveillance
14
Fereshteh Sadeghi
University of Washington
Collision Avoidance via Deep RL: Real Single-Image Flight without a Single Real Image
15
Facebook AI Research
Towards Better Instance-level Recognition
17
Hiranmayi Ranganathan
Arizona State University
Deep Active Learning Framework for Image Classification
18
Hiteshi Jain
IIT Jodhpur
19
Hsiao-Yu Tung
Carnegie Mellon University
Adversarial Inversion: Self-supervision with Adversarial Priors
20
Huda Alamri
Georgia Institute of Technology
Hierarchical Tree-based Prior for Place Recognition
22
Hyo Jin Kim
University of North Carolina at Chapel Hill
27
Jane Hung
Broad Institute
Applying Faster R-CNN for Object Detection on Malaria Images
28
Julia Peyre
INRIA
Weakly-supervised learning of visual relations
30
Kaori Abe
National Institute of Advanced Industrial Science and Technology
Weighted Feature Integration for Person Re-identification
31
Karla Brkic
University of Zagreb
I Know That Person: Generative Full Body and Face De-Identification of People in Images
32
University of Washington
SeGAN: Segmenting and Generating the Invisible
35
Ruhr-Universität Bochum
Spoofing Detection via Simultaneous Verification of Audio-Visual Synchronicity and Transcription
37
Lili Meng
University of British Columbia
Backtracking Regression Forests for Accurate Camera Relocalization
38
Stanford University
Deep Grade: A visual approach to grading student programming assignments
43
Ghent University
Towards Using Few-shot Learning for Early Glaucoma Diagnosis in Small-Sized Datasets of High-Resolution Images
45
Nai Chen Chang
Carnegie Mellon University
Low-shot Fine-grained Augmentation with Generative Adversarial Networks
49
Prabhjot Kaur
Indian Institute of Technology
Significance of Magnetic Resonance Image Details in Sparse Representation based Super Resolution
50
Priya Goyal
Facebook Inc.
Gated Cross Entropy Loss for Dense Object Detection
51
Qing He
Samsung SDS
Intent Identification via Action Recognition and Long-term Sequence Association Learning
52
Qiu Yue
University of Tsukuba
Sensing and recognition of typical indoor family scenes using an RGB-D camera
54
IIT Mandi
Object Triggered Egocentric Video Summarization
55
Shan Su
University of Pennsylvania
Predicting Behaviors of Basketball Players from First Person Videos
56
Sherin Mathews
University of Delaware
Maximum Correntropy based Dictionary Learning framework for physical activity recognition using wearable sensors
58
Peking University
An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data
59
Sima Behpour
University of Illinois
Deep Adversarial Object Localization
61
IIT Mandi
Unsupervised Segmentation of Cervical Cancer Nuclei via Adaptive Clustering
62
IIT Mandi
CNN based Segmentation of Nuclei in PAP-Smear Images with Selective Pre-processing
65
Sudipta Banerjee
Michigan State University
Generating an Image Phylogeny Tree for Near-Duplicate Iris Images
67
Stanford University
Medical image analysis to identify subgroups of patients for personalized treatment
68
Georgia Institute of Technology
DiscrimNet: Semi-Supervised Action Recognition from Videos using Generative Adversarial Networks
70
Vibha Gupta
IIT Mandi
An Integrated Multi-scale Model for Breast Cancer Histopathological Image Classification with Joint Colour-texture Features
71
Northeastern University
Multi-camera Multi-object Tracking
72
Yingxuan Zhu
Huawei R&D USA
Design of an Integrated Emotion Recognition System
73
Yingxuan Zhu
Huawei R&D USA
Composable Object Recognition and Reconstruction with Human in the Loop
74
University of Cambridge
Sparse Bayesian Multi-Task Learning for Subspace Segmentation

Mentoring Dinner on July 25

by invitation only

6:00 - 8:00 pm   Dinner sponsored by NVIDIA

The dinner event is an opportunity to meet other female computer vision researchers. Poster presenters will be matched with senior computer vision researchers to share experience and career advice. Invitees will receive an e-mail and be asked to confirm attendance.

*Note that the dinner takes place the evening before the main workshop day.*


Dinner speakers

Andrea Frome (Clarifai)

Dr. Andrea Frome earned her Ph.D. in Computer Science and Machine Learning in Jitendra Malik’s lab at UC Berkeley in 2007. Since then, her work in computer vision and machine learning has included: leading the visual recognition team within Street View which is especially known for its work blurring faces and license plates; as a member of the Google Brain team, developing DeViSE for combining visual recognition with word embeddings and applying an attention RNN to fine-grained classification; and work on systems for Hillary for America at campaign headquarters for identity resolution and automatically reading canvassing surveys to reduce data entry. In January 2017, she joined Clarifai as Director of Research. Her non-work pursuits include doing volunteer work for flippable.org, studying flying trapeze, and learning Argentine Tango.

Shalini De Mello (NVIDIA)

Shalini De Mello is a Senior Research Scientist at NVIDIA Research since March 2013. Her research interests are in computer vision and machine learning for human-computer interaction and smart interfaces. Her work includes NVIDIA’s shipping products for hand gesture recognition, face detection, video stabilization and GPU-optimized libraries for the development for computer vision applications on mobile platforms. She received doctoral and master’s degrees in Electrical and Computer Engineering from the University of Texas at Austin in 2008 and 2004, respectively.

Olga Russakovsky (Princeton University)

Olga Russakovsky is an Assistant Professor of Computer Science at Princeton University. She completed her PhD in Computer Science at Stanford University in August 2015 and her postdoc at the Robotics Institute of Carnegie Mellon University in June 2017. Her research is in computer vision, closely integrated with machine learning and human-computer interaction. Her work was featured in the New York Times and MIT Technology Review. She served as a Senior Program Committee member for WACV’16 (and CVPR’18 soon), led the ImageNet Large Scale Visual Recognition Challenge effort for two years, was the Publicity and Press chair at CVPR’16, and organized multiple workshops and tutorials on large-scale recognition at premier computer vision conferences ICCV’13, ECCV’14, CVPR’15, ICCV’15, CVPR’16, ECCV’16 and CVPR’17. In addition, she was the co-founder and director of the Stanford AI Laboratory’s outreach camp SAILORS (featured in Wired and published in SIGCSE’16) which educates high school girls about AI, and is the co-founder and board member of the AI4ALL foundation dedicated to cultivating a diverse group of future AI leaders.