WiCV

Main Workshop on July 26

Room: 321 AB

1:30 - 1:40 pm Introduction
1:40 - 2:00 pm Keynote
Learning to Segment Moving Objects, by Cordelia Schmid (INRIA)
2:00 - 2:30 pm Oral Session 1
Gaze Embeddings for Zero-Shot Image Classification
by Nour Karessli (Max Planck Institute for Informatics)
Towards Better Instance-level Recognition
by Georgia Gkioxari (Facebook AI Research)
2:30 - 2:50 pm Keynote
Interferences in Match Kernels, by Naila Murray (Naver Labs/NLE)
2:50 - 4:15 pm Poster Session and Coffee Break
4:15 - 4:35 pm Keynote
Computer Vision for the Blind, by Chieko Asakawa (IBM Research/CMU)
4:35 - 5:05 pm Oral Session 2
Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution
by Lanlan Liu (University of Michigan)
Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network
by Nasim Souly (University of Central Florida)
5:05 - 5:35 pm Panel session
Chieko Asakawa (IBM Research/CMU)
Andrea Frome (Clarifai)
Raia Hadsell (DeepMind)
Naila Murray (Naver Labs/NLE)
Cordelia Schmid (INRIA)
Helge Seetzen (TandemLaunch)
5:35 - 5:45 pm Closing Remarks

Keynote Talks

Keynote speakers will give technical talks about their research in computer vision.

Chieko Asakawa (IBM Research/CMU)

Title: Computer Vision for the Blind

Abstract: Blind people have been dreaming of a machine which can recognize objects, people and environment, such as goods in a shop, people around them, or obstacles in a corridor. For many years, such machines were only available in science fiction, but now thanks to the advancement of deep learning and computer vision technologies, it is becoming closer to a reality, by supplementing and augmenting missing or weakened abilities of people with visual impairments. In this talk, I will outline a set of necessary technologies, demonstrate our efforts and cast a vision for near future deployments to change people's lives.

Bio: Chieko Asakawa is a blind Japanese computer scientist, known for her work at IBM Research – Tokyo in accessibility. A Netscape browser plug-in which she developed, the IBM Home Page Reader, became the most widely used web-to-speech system available. She is the recipient of numerous industry and government awards. Asakawa was born with normal sight, but after she injured her optic nerve in a swimming accident at age 11, she began losing her sight, and by age 14 she was fully blind. She earned a bachelor's degree in English literature at Otemon Gakuin University in Osaka in 1982 and then began a two-year computer programming course for blind people using an Optacon to translate print to tactile sensation. Chieko joined IBM in 1985 after completing the computer science courses for the blind at Nippon Lighthouse. She received a B.A. degree in English literature from Otemon Gakuin University in 1982, and a Ph.D in Engineering from the University of Tokyo in 2004. She is a member of the Association for Computing Machinery (ACM), the Information Processing Society of Japan, and IBM Academy of Technology. She was inducted into the Women in Technology International (WITI) Hall of Fame in 2003, and both within and outside of IBM, she has been actively working to help women engineers pursue technical careers. Chieko was appointed to IBM Fellow in 2009, IBM's most prestigious technical honor. In 2013, the government of Japan awarded the Medal of Honor with Purple Ribbon to Chieko for her outstanding contributions to accessibility research, including the development of the voice browser for the visually impaired.

Naila Murray (Naver Labs/NLE)

Title: Interferences in Match Kernels

Abstract: We consider the design of an image representation that embeds and aggregates a set of local descriptors into a single vector. Popular representations of this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When two such image representations are compared with the dot-product, the image-to-image similarity can be interpreted as a match kernel. In match kernels, one has to deal with interference, i.e. with the fact that even if two descriptors are unrelated, their matching score may contribute to the overall similarity. We formalise this problem and propose two related solutions, both aimed at equalising the individual contributions of the local descriptors in the final representation. These methods modify the aggregation stage by including a set of per-descriptor weights. They differ by the objective function that is optimised to compute those weights. The first is a “democratisation” strategy that aims at equalising the relative importance of each descriptor in the set comparison metric. The second one involves equalising the match of a single descriptor to the aggregated vector. These concurrent methods give a substantial performance boost over standard aggregation methods, as demonstrated by our experiments on standard public image retrieval benchmarks.

Bio: Naila Murray graduated with a PhD in computer science from the Unversitat Autònoma de Barcelona. She also holds a master’s degree in computer vision and artificial intelligence from the Unversitat Autònoma de Barcelona, and a bachelor’s degree in electrical engineering from Princeton University. Her work in computer vision has involved research into biologically-inspired deep models of visual attention; fine-grained visual recognition, including participation in the winning team of the FGComp 2013 competition; and computational models for visual aesthetic analysis. Naila is a senior scientist and manager of the computer vision group at Xerox Research Centre Europe. Currently, her research focuses on visual search in large databases, and human behaviour understanding, particularly for video action recognition.

Cordelia Schmid (INRIA)

Title: Learning to Segment Moving Objects

Abstract: This talk addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a “visual memory” in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given video frames as input, our approach first assigns each pixel an object or background label obtained with an encoder-decoder network that takes as input optical flow and is trained on synthetic data. Next, a “visual memory” specific to the video is acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows to propagate spatial information over time. We evaluate our method extensively on two benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show state-of-the-art results.

Bio: Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis received the best thesis award from INPG in 1996. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996--1997. Since 1997 she has held a permanent research position at INRIA Grenoble Rhone-Alpes, where she is a research director and directs an INRIA team. Dr. Schmid is the author of over a hundred technical publications. She has been an Associate Editor for IEEE PAMI (2001--2005) and for IJCV (2004--2012), editor-in-chief for IJCV (2013---), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015 and ECCV 2020. In 2006, 2014 and 2016, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. She was awarded an ERC advanced grant in 2013, the Humbolt research award in 2015 and the Inria & French Academy of Science Grand Prix in 2016. She was elected to the German National Academy of Sciences, Leopoldina, in 2017.

Panel

Panelists will answer questions and discuss about increasing diversity in computer vision.

Feel free to ask your anonymous questions here.

Chieko Asakawa (IBM Research/CMU)

Chieko Asakawa is a blind Japanese computer scientist, known for her work at IBM Research – Tokyo in accessibility. A Netscape browser plug-in which she developed, the IBM Home Page Reader, became the most widely used web-to-speech system available. She is the recipient of numerous industry and government awards. Asakawa was born with normal sight, but after she injured her optic nerve in a swimming accident at age 11, she began losing her sight, and by age 14 she was fully blind. She earned a bachelor's degree in English literature at Otemon Gakuin University in Osaka in 1982 and then began a two-year computer programming course for blind people using an Optacon to translate print to tactile sensation. Chieko joined IBM in 1985 after completing the computer science courses for the blind at Nippon Lighthouse. She received a B.A. degree in English literature from Otemon Gakuin University in 1982, and a Ph.D in Engineering from the University of Tokyo in 2004. She is a member of the Association for Computing Machinery (ACM), the Information Processing Society of Japan, and IBM Academy of Technology. She was inducted into the Women in Technology International (WITI) Hall of Fame in 2003, and both within and outside of IBM, she has been actively working to help women engineers pursue technical careers. Chieko was appointed to IBM Fellow in 2009, IBM's most prestigious technical honor. In 2013, the government of Japan awarded the Medal of Honor with Purple Ribbon to Chieko for her outstanding contributions to accessibility research, including the development of the voice browser for the visually impaired.

Andrea Frome (Clarifai)

Dr. Andrea Frome earned her Ph.D. in Computer Science and Machine Learning in Jitendra Malik’s lab at UC Berkeley in 2007. Since then, her work in computer vision and machine learning has included: leading the visual recognition team within Street View which is especially known for its work blurring faces and license plates; as a member of the Google Brain team, developing DeViSE for combining visual recognition with word embeddings and applying an attention RNN to fine-grained classification; and work on systems for Hillary for America at campaign headquarters for identity resolution and automatically reading canvassing surveys to reduce data entry. In January 2017, she joined Clarifai as Director of Research. Her non-work pursuits include doing volunteer work for flippable.org, studying flying trapeze, and learning Argentine Tango.

Raia Hadsell (DeepMind)

Raia Hadsell, a senior research scientist at DeepMind, has worked on deep learning and robotics problems for over 10 years. Her early research developed the notion of manifold learning using Siamese networks, which has been used extensively for invariant feature learning. After completing a PhD with Yann LeCun, which featured a self-supervised deep learning vision system for a mobile robot, her research continued at Carnegie Mellon’s Robotics Institute and SRI International, and in early 2014 she joined DeepMind in London to study artificial general intelligence. Her current research focuses on the challenge of continual learning for AI agents and robots. While deep RL algorithms are capable of attaining superhuman performance on single tasks, they often cannot transfer that performance to additional tasks, especially if experienced sequentially. She has proposed neural approaches such as policy distillation, progressive nets, and elastic weight consolidation to solve the problem of catastrophic forgetting for agents and robots.

Naila Murray (Naver Labs/NLE)

Naila Murray graduated with a PhD in computer science from the Unversitat Autònoma de Barcelona. She also holds a master’s degree in computer vision and artificial intelligence from the Unversitat Autònoma de Barcelona, and a bachelor’s degree in electrical engineering from Princeton University. Her work in computer vision has involved research into biologically-inspired deep models of visual attention; fine-grained visual recognition, including participation in the winning team of the FGComp 2013 competition; and computational models for visual aesthetic analysis. Naila is a senior scientist and manager of the computer vision group at Xerox Research Centre Europe. Currently, her research focuses on visual search in large databases, and human behaviour understanding, particularly for video action recognition.

Cordelia Schmid (INRIA)

Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis received the best thesis award from INPG in 1996. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996--1997. Since 1997 she has held a permanent research position at INRIA Grenoble Rhone-Alpes, where she is a research director and directs an INRIA team. Dr. Schmid is the author of over a hundred technical publications. She has been an Associate Editor for IEEE PAMI (2001--2005) and for IJCV (2004--2012), editor-in-chief for IJCV (2013---), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015 and ECCV 2020. In 2006, 2014 and 2016, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. She was awarded an ERC advanced grant in 2013, the Humbolt research award in 2015 and the Inria & French Academy of Science Grand Prix in 2016. She was elected to the German National Academy of Sciences, Leopoldina, in 2017.

Helge Seetzen (TandemLaunch)

Helge is an award-winning technologist, entrepreneur, and a recognized global authority on technology transfer and display technologies. As General Partner of TandemLaunch, he works with inventors and entrepreneurs to build high growth technology companies. His past successes include the transformation of raw university IP into fully commercialized LED TV technology, including selling his last company - Brightside Technologies - to Dolby Laboratories after sealing partnerships with several of the largest consumer electronics companies in the world. Helge holds over 80 patents in the fields of display, camera and video technology.

Oral Presentations

A few accepted abstracts are invited to give oral presentations.

Presenter instructions: The presentations should be 12 minute talk and 3 minutes Q/A.

Accepted orals

Presenter Name

Institution

Paper Title

Georgia Gkioxari

Facebook AI Research

Towards Better Instance-level Recognition

Lanlan Liu

University of Michigan

Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution

Nasim Souly

University of Central Florida

Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network

Nour Karessli

EyeEm

Gaze Embeddings for Zero-Shot Image Classification

Poster Presentations

Authors of all accepted abstracts (with or without travel grant) will present their work in a poster session.

Presenter instructions: All posters should be installed in at most 10 minutes at the start of the poster session in the afternoon. The poster boards are located at the room Kamahameha II (where the main conference posters were). The physical dimensions of the poster stands are 8 feet wide by 4 feet high. Poster presenters can optionally use the CVPR17 poster template for more details on how to prepare their posters. Please note your poster number below to find your board.

Accepted abstracts

No

Presenter Name

Institution

Paper Title

1

Aishwarya Agrawal

Virginia Tech

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

2

Amaia Salvador

Universitat Politècnica de Catalunya

Learning Cross-modal Embeddings for Cooking Recipes and Food Images

3

Anna Khoreva

Max Planck Institute for Informatics

Learning Video Object Segmentation from Static Images

4

Anna Khoreva

Max Planck Institute for Informatics

Simple Does It: Weakly Supervised Instance and Semantic Segmentation

5

Atreyee Sinha

Edgewood College

Pre-trained CNNs for Artistic Style Recognition

6

Audrey Richard

ETH Zurich

Efficient Semantic 3D Urban Modelling

7

Boyi Li

Huazhong University of Science and Technology

End-to-end Dehazing Neural Network

8

Camila Alvarez

Orand SA

Automatic clothing labeling of outdoor images

9

Chia-Yin Tsai

Carnegie Mellon Unversity

The Geometry of First-Returning Photons for Non-Line-of-Sight Imaging

10

Deepali Aneja

University of Washington

Learning Stylized Character Expressions from Humans

11

Divyaa Ravichandran

GumGum Inc

Movies through the decades

12

Faezeh Amjadi

Montreal University

Comparison of Radial and Tangential Geometries for Cylindrical Panorama

13

Faezeh Tafazzoli

University of Louisville

Vehicle Make and Model Recognition for Automated Vehicular Surveillance

14

Fereshteh Sadeghi

University of Washington

Collision Avoidance via Deep RL: Real Single-Image Flight without a Single Real Image

15

Georgia Gkioxari

Facebook AI Research

Towards Better Instance-level Recognition

16

Helen Bear

University of East London

Understanding the visual speech signal

17

Hiranmayi Ranganathan

Arizona State University

Deep Active Learning Framework for Image Classification

18

Hiteshi Jain

IIT Jodhpur

A Framework to Assess Sun Salutation

19

Hsiao-Yu Tung

Carnegie Mellon University

Adversarial Inversion: Self-supervision with Adversarial Priors

20

Huda Alamri

Georgia Institute of Technology

Hierarchical Tree-based Prior for Place Recognition

21

Huijuan Xu

Boston University

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

22

Hyo Jin Kim

University of North Carolina at Chapel Hill

Learned Contextual Feature Reweighting for Image Geo-Localization

23

Ilke Demir

Purdue University / Facebook

On Generalized Proceduralization Approaches for Urban Data

24

Indrani Bhattacharya

Rensselaer Polytechnic Institute

A Cognitive Boardroom Without Cameras

25

Jadisha Ramirez Cornejo

University of Campinas

Automatic Down Syndrome Detection Based on Facial Features Using a Geometric Descriptor

26

Jadisha Ramirez Cornejo

University of Campinas

Robust Emotion Recognition of Occluded Facial Expressions

27

Jane Hung

Broad Institute

Applying Faster R-CNN for Object Detection on Malaria Images

28

Julia Peyre

INRIA

Weakly-supervised learning of visual relations

29

Jyoti Nigam

IIT Mandi

EgoTracker: Human Tracking with Re-identification in Egocentric Videos

30

Kaori Abe

National Institute of Advanced Industrial Science and Technology

Weighted Feature Integration for Person Re-identification

31

Karla Brkic

University of Zagreb

I Know That Person: Generative Full Body and Face De-Identification of People in Images

32

Kiana Ehsani

University of Washington

SeGAN: Segmenting and Generating the Invisible

33

Kuan-Ting Chen

National Taiwan University

When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing Features

34

Lanlan Liu

University of Michigan

Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution

35

Lea Schönherr

Ruhr-Universität Bochum

Spoofing Detection via Simultaneous Verification of Audio-Visual Synchronicity and Transcription

36

Li Linjie

Purdue University

Modeling social perception of faces with deep learning

37

Lili Meng

University of British Columbia

Backtracking Regression Forests for Accurate Camera Relocalization

38

Lisa Yan

Stanford University

Deep Grade: A visual approach to grading student programming assignments

39

Luisa Zintgraf

University of Amsterdam

Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

40

Maria Camila Alvarez Triviño

Universidad Autónoma de Occidente

Machine Learning on Retina Images for Diagnostic Decision Support

41

Melody Guan

Google Brain

Who Said What: Modeling individual labelers improves classification

42

Michela Paganini

Yale University

Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

43

Mijung Kim

Ghent University

Towards Using Few-shot Learning for Early Glaucoma Diagnosis in Small-Sized Datasets of High-Resolution Images

44

Miroslava Slavcheva

Technical University of Munich

KillingFusion: Non-rigid 3D Reconstruction without Correspondences

45

Nai Chen Chang

Carnegie Mellon University

Low-shot Fine-grained Augmentation with Generative Adversarial Networks

46

Nasim Souly

University of Central Florida

Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network

47

Nour Karessli

EyeEm

Gaze Embeddings for Zero-Shot Image Classification

48

Obioma Pelka

University of Applied Sciences and Arts Dortmund

Automated Classification of Enhanced Radiographs with Deep Convolutional Neural Network

49

Prabhjot Kaur

Indian Institute of Technology

Significance of Magnetic Resonance Image Details in Sparse Representation based Super Resolution

50

Priya Goyal

Facebook Inc.

Gated Cross Entropy Loss for Dense Object Detection

51

Qing He

Samsung SDS

Intent Identification via Action Recognition and Long-term Sequence Association Learning

52

Qiu Yue

University of Tsukuba

Sensing and recognition of typical indoor family scenes using an RGB-D camera

53

Ruth Fong

University of Oxford

Interpretable Explanations of Black Boxes by Meaningful Perturbation

54

Samriddhi Jain

IIT Mandi

Object Triggered Egocentric Video Summarization

55

Shan Su

University of Pennsylvania

Predicting Behaviors of Basketball Players from First Person Videos

56

Sherin Mathews

University of Delaware

Maximum Correntropy based Dictionary Learning framework for physical activity recognition using wearable sensors

57

Si Chen

Georgia Institute of Technology

Deep Auditory Hallucinations: Multi-Modal Music Generation from Actions

58

Sijie Song

Peking University

An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data

59

Sima Behpour

University of Illinois

Deep Adversarial Object Localization

60

Spandana Gella

University of Edinburgh

A Linguistic viewpoint of Action Recognition in still Images

61

Srishti Gautam

IIT Mandi

Unsupervised Segmentation of Cervical Cancer Nuclei via Adaptive Clustering

62

Srishti Gautam

IIT Mandi

CNN based Segmentation of Nuclei in PAP-Smear Images with Selective Pre-processing

63

Subarna Tripathi

UC San Diego

Continuous Self-Calibrating Eye Gaze Tracking for Virtual Reality Systems

64

Subhashini Venugopalan

University of Texas at Austin

Captioning Images with Diverse Objects

65

Sudipta Banerjee

Michigan State University

Generating an Image Phylogeny Tree for Near-Duplicate Iris Images

66

Thao Phung

University of Wyoming

Learning to solve symbolic math from visual inputs

67

Tiffany Liu

Stanford University

Medical image analysis to identify subgroups of patients for personalized treatment

68

Unaiza Ahsan

Georgia Institute of Technology

DiscrimNet: Semi-Supervised Action Recognition from Videos using Generative Adversarial Networks

69

Uta Büchler

Heidelberg University

LSTM Self-Supervision for Detailed Behavior Analysis

70

Vibha Gupta

IIT Mandi

An Integrated Multi-scale Model for Breast Cancer Histopathological Image Classification with Joint Colour-texture Features

71

Wenqian Liu

Northeastern University

Multi-camera Multi-object Tracking

72

Yingxuan Zhu

Huawei R&D USA

Design of an Integrated Emotion Recognition System

73

Yingxuan Zhu

Huawei R&D USA

Composable Object Recognition and Reconstruction with Human in the Loop

74

Yu Wang

University of Cambridge

Sparse Bayesian Multi-Task Learning for Subspace Segmentation

Mentoring Dinner on July 25

by invitation only

6:00 - 8:00 pm Dinner sponsored by NVIDIA

The dinner event is an opportunity to meet other female computer vision researchers. Poster presenters will be matched with senior computer vision researchers to share experience and career advice. Invitees will receive an e-mail and be asked to confirm attendance.

*Note that the dinner takes place the evening before the main workshop day.*

Dinner speakers

Andrea Frome (Clarifai)

Dr. Andrea Frome earned her Ph.D. in Computer Science and Machine Learning in Jitendra Malik’s lab at UC Berkeley in 2007. Since then, her work in computer vision and machine learning has included: leading the visual recognition team within Street View which is especially known for its work blurring faces and license plates; as a member of the Google Brain team, developing DeViSE for combining visual recognition with word embeddings and applying an attention RNN to fine-grained classification; and work on systems for Hillary for America at campaign headquarters for identity resolution and automatically reading canvassing surveys to reduce data entry. In January 2017, she joined Clarifai as Director of Research. Her non-work pursuits include doing volunteer work for flippable.org, studying flying trapeze, and learning Argentine Tango.

Shalini De Mello (NVIDIA)

Shalini De Mello is a Senior Research Scientist at NVIDIA Research since March 2013. Her research interests are in computer vision and machine learning for human-computer interaction and smart interfaces. Her work includes NVIDIA’s shipping products for hand gesture recognition, face detection, video stabilization and GPU-optimized libraries for the development for computer vision applications on mobile platforms. She received doctoral and master’s degrees in Electrical and Computer Engineering from the University of Texas at Austin in 2008 and 2004, respectively.

Olga Russakovsky (Princeton University)

Olga Russakovsky is an Assistant Professor of Computer Science at Princeton University. She completed her PhD in Computer Science at Stanford University in August 2015 and her postdoc at the Robotics Institute of Carnegie Mellon University in June 2017. Her research is in computer vision, closely integrated with machine learning and human-computer interaction. Her work was featured in the New York Times and MIT Technology Review. She served as a Senior Program Committee member for WACV’16 (and CVPR’18 soon), led the ImageNet Large Scale Visual Recognition Challenge effort for two years, was the Publicity and Press chair at CVPR’16, and organized multiple workshops and tutorials on large-scale recognition at premier computer vision conferences ICCV’13, ECCV’14, CVPR’15, ICCV’15, CVPR’16, ECCV’16 and CVPR’17. In addition, she was the co-founder and director of the Stanford AI Laboratory’s outreach camp SAILORS (featured in Wired and published in SIGCSE’16) which educates high school girls about AI, and is the co-founder and board member of the AI4ALL foundation dedicated to cultivating a diverse group of future AI leaders.