* The final schedule will be announced here. *

Main Workshop on June 22

Room: 251 - D

8:50 - 9:00    Introduction
9:00 - 9:30    Invited Talk1 (Jessica Hodgins, Carnegie Mellon University)
9:30 - 10:00    Invited Talk2 (Laura Leal Taixe, Technical University Munich)
10:00 - 11:30   Poster Session and Morning Break
11:30 - 11:50    Oral Session1
         Learnable PINs: Cross-Modal Embeddings for Person Identity, by Arsha Nagrani (Oxford University)
11:50 - 12:10    Oral Session2
        ARC: Adversarial Robust Cuts for Semi-Supervised and Multi-Label Classification, by Sima Behpour (University of Illinois)
12:10 - 13:30    Lunch
13:30 - 14:00    Invited Talk3 (Octavia Camps, Northeastern University)
14:00 - 14:20    Oral Session3
        Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering, by Aishwarya Agrawal (Georgia Tech)
14:20 - 14:50    Oral Session4
        On the iterative refinement of densely connected representation levels for semantic segmentation, by Arantxa Casanova (MILA)
14:50 - 15:20    Invited Talk4 (Carol E. Reiley, drive.ai)
15:20 - 15:40    Oral Session5
        Gradient-free policy architecture search and adaptation, by Sayna Ebrahimi (UC Berkeley)
15:40 - 16:00    Oral Session6
        Joint Event Detection and Description in Continuous Video Streams, by Huijuan Xu (Boston University)
16:00 - 16:30    Afternoon Break
16:30 - 17:10    Panel
17:10 - 17:20   Closing Remarks

Keynote Talks

Keynote speakers will give technical talks about their research in computer vision.

Octavia I. Camps (Northeastern University)

Title: Dynamics-based Invariants for Video Understanding

Abstract: The power of geometric invariants to provide solutions to computer vision problems has been recognized for a long time. On the other hand, dynamics-based invariants are often overlooked. Yet, visual data come in streams: videos are temporal sequences of frames, images are ordered sequences of rows of pixels and contours are chained sequences of edges. In this talk, I will discuss the key role that systems theory can play in timely extracting and exploiting dynamics-based invariants to capture actionable information that is very sparsely encoded in high dimensional data streams. The central theme of this approach is the use of dynamical models, and their associated invariants, as an information-encoding paradigm. We will show that embedding problems in the conceptual world of dynamical systems makes available a rich, extremely powerful resource base, leading to robust solutions, or, in cases where the underlying problem is intrinsically hard, to computationally tractable approximations with sub optimality certificates. We will illustrate these ideas in the context of several practical applications: crowd-sourcing video, activity recognition, human re-identification and video prediction.

Bio: Octavia Camps received a B.S. degree in computer science and a B.S. degree in electrical engineering from the Universidad de la Republica (Uruguay), and a M.S. and a Ph.D. degree in electrical engineering from the University of Washington. Since 2006, she is a Professor in the Electrical and Computer Engineering Department at Northeastern University. From 1991 to 2006 she was a faculty of Electrical Engineering and of Computer Science and Engineering at The Pennsylvania State University. Prof. Camps was a visiting researcher at the Computer Science Department at Boston University during Spring 2013 and in 2000, she was a visiting faculty at the California Institute of Technology and at the University of Southern California. She is an associate editor of Computer Vision and Image Understanding (CVIU). Her main research interests include robust computer vision, image processing, and machine learning.

Jessica K. Hodgins (Carnegie Mellon University / Facebook AI Research)

Title: Capture and Animation of Human Motion

Abstract: In this talk, Jessica Hodgins will present research on constructing controllers for human motion from first principles, on capturing data of human motion and conclude with recent results on using captured data to learn controllers. Throughout the talk, she will reflect on the lessons learned, both from the research itself and how the projects came to be.

Bio: Jessica Hodgins is a Professor in the Robotics Institute and Computer Science Department at Carnegie Mellon University. From 2008-2016, she founded and ran research labs for Disney, rising to VP of Research and leading the labs in Pittsburgh and Los Angeles. From 2005-2015, she was Associate Director for Faculty in the Robotics Institute, running the promotion and tenure process and creating a mentoring program for pre-tenure faculty. Prior to moving to Carnegie Mellon in 2000, she was an Associate Professor and Assistant Dean in the College of Computing at Georgia Institute of Technology. She received her Ph.D. in Computer Science from Carnegie Mellon University in 1989. Her research focuses on computer graphics, animation, and robotics with an emphasis on generating and analyzing human motion. She has received a NSF Young Investigator Award, a Packard Fellowship, and a Sloan Fellowship. She was editor-in-chief of ACM Transactions on Graphics from 2000-2002 and ACM SIGGRAPH Papers Chair in 2003. She was an elected director at large on the ACM SIGGRAPH Executive Committee from 2012-2017 and in 2017 she was elected ACM SIGGRAPH President. In 2010, she was awarded the ACM SIGGRAPH Computer Graphics Achievement Award and in 2017 she was awarded the Steven Anson Coons Award for Outstanding Creative Contributions to Computer Graphics.

Laura Leal-Taixé (Technical University of Munich)

Title: CNN vs SIFT-based localization: out with the old?

Abstract: Recently Deep Learning has achieved such a great performance in so many vision tasks that researcher are starting to use it to solve any imaginable task, disregarding previous methods optimised over decades of research work. Image-based localization is one of those problems: a classic task in Computer Vision where given an image, we are asked to find its camera pose and orientation with respect to a given model which can have a scale ranging from a city to a small room. Recently researchers have turned into CNNs for pose estimation disregarding previous literature and even disregarding basic epipolar geometry. In our first work, we put both methods to test: on the one hand, we clearly show CNN-based localization has still an incredibly long way to go with respect to SIFT-based methods; on the other hand, we present an indoor dataset in which classic methods suffer due to textureless surfaces and repetitive structures. Afterwards, I will also present our first attempt towards a fully scalable method based on relative pose estimation that allows us to localize a camera in any given scene with a single network (and even use epipolar geometry!). I will show that a marriage between CNN and Computer Vision classic knowledge is still possible and very much desirable in today's research landscape.

Bio: Prof. Laura Leal-Taixé is leading the Dynamic Vision and Learning group at the Technical University of Munich, Germany. She received her Bachelor and Master degrees in Telecommunications Engineering from the Technical University of Catalonia (UPC), Barcelona. She did her Master Thesis at Northeastern University, Boston, USA and received her PhD degree (Dr.-Ing.) from the Leibniz University Hannover, Germany. During her PhD she did a one-year visit at the Vision Lab at the University of Michigan, USA. She also spent two years as a postdoc at the Institute of Geodesy and Photogrammetry of ETH Zurich, Switzerland and one year at the Technical University of Munich. Her research interests are dynamic scene understanding, in particular multiple object tracking and segmentation, as well as machine learning for video analysis.

Carol E. Reiley ( drive.ai)

Title: When bias in AI and product design means life or death.

Abstract: Description: A talk about bias in AI products could impact humanity - the state of where we are and the largest pitfalls all throughout the process. Will go through a history of AI products and discuss where bias creeps in and how to create a safer, more delightful products for the world. Since we're at the start of the AI revolution, this lays groundwork for how to be thoughtful as a company and as a consumer.

Bio: Carol E. Reiley has been at the forefront of robotics and AI for over 15 years and was the youngest member on the IEEE Robotics and Automation board. She's worked on highly regulated products in a variety of different applications such as space, underwater and medical. She did her graduate work at Johns Hopkins University as an NSF Fellow researching how humans and robots interact, worked on the da Vinci system at Intuitive Surgical, and most recently cofounded and was the President of a self-driving car startup drive.ai. She's raised over $77M and built a partnership with Lyft, Grab, and automotive companies. Reiley was selected by Forbes as one of the twenty incredible women advancing artificial intelligence research. She currently holds over six patents, over a dozen publications, and was the first female engineer on the cover of MAKE magazine. She’s been featured in major publications such as MIT Tech Review, NYTimes, Harper’s Bazaar and given several Tedx Talks. She is also a published author and is now starting a new startup.

Panel

Panelists will answer questions and discuss about increasing diversity in computer vision.

Feel free to ask your anonymous questions here.

Michael Black (Max Planck / Amazon)

Bio: Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). After post-doctoral research at the University of Toronto, he worked at Xerox PARC as a member of research staff and area manager. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is one of the founding directors at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department and serves as Managing Director. He is also a Distinguished Amazon Scholar, an Honorarprofessor at the University of Tuebingen, and Adjunct Professor at Brown University. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision, and the 2013 Helmholtz Prize for work that has stood the test of time. He is a foreign member of the Royal Swedish Academy of Sciences. In 2013 he co-founded Body Labs Inc., which was acquired by Amazon in 2017.

Octavia I. Camps (Northeastern University)

Bio: Octavia Camps received a B.S. degree in computer science and a B.S. degree in electrical engineering from the Universidad de la Republica (Uruguay), and a M.S. and a Ph.D. degree in electrical engineering from the University of Washington. Since 2006, she is a Professor in the Electrical and Computer Engineering Department at Northeastern University. From 1991 to 2006 she was a faculty of Electrical Engineering and of Computer Science and Engineering at The Pennsylvania State University. Prof. Camps was a visiting researcher at the Computer Science Department at Boston University during Spring 2013 and in 2000, she was a visiting faculty at the California Institute of Technology and at the University of Southern California. She is an associate editor of Computer Vision and Image Understanding (CVIU). Her main research interests include robust computer vision, image processing, and machine learning.

Dima Damen (University of Bristol)

Bio: Dima Damen is a Senior Lecturer (Associate Professor) in Computer Vision at the University of Bristol, United Kingdom. Received her PhD from the University of Leeds (2009). Dima's research interests are in the automatic understanding of object interactions, actions and activities using static and wearable visual (and depth) sensors. Dima co-chaired BMVC 2013, is area chair for BMVC (2014-2018), associate editor of Pattern Recognition (2017-). She was selected as a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in ICCV17, CVPR13 and CVPR12. She currently supervises 9 PhD students, and 3 postdoctoral researchers.

Jessica K. Hodgins (Carnegie Mellon University / Facebook AI Research)

Bio: Jessica Hodgins is a Professor in the Robotics Institute and Computer Science Department at Carnegie Mellon University. From 2008-2016, she founded and ran research labs for Disney, rising to VP of Research and leading the labs in Pittsburgh and Los Angeles. From 2005-2015, she was Associate Director for Faculty in the Robotics Institute, running the promotion and tenure process and creating a mentoring program for pre-tenure faculty. Prior to moving to Carnegie Mellon in 2000, she was an Associate Professor and Assistant Dean in the College of Computing at Georgia Institute of Technology. She received her Ph.D. in Computer Science from Carnegie Mellon University in 1989. Her research focuses on computer graphics, animation, and robotics with an emphasis on generating and analyzing human motion. She has received a NSF Young Investigator Award, a Packard Fellowship, and a Sloan Fellowship. She was editor-in-chief of ACM Transactions on Graphics from 2000-2002 and ACM SIGGRAPH Papers Chair in 2003. She was an elected director at large on the ACM SIGGRAPH Executive Committee from 2012-2017 and in 2017 she was elected ACM SIGGRAPH President. In 2010, she was awarded the ACM SIGGRAPH Computer Graphics Achievement Award and in 2017 she was awarded the Steven Anson Coons Award for Outstanding Creative Contributions to Computer Graphics.

Laura Leal-Taixé (Technical University of Munich)

Bio: Prof. Laura Leal-Taixé is leading the Dynamic Vision and Learning group at the Technical University of Munich, Germany. She received her Bachelor and Master degrees in Telecommunications Engineering from the Technical University of Catalonia (UPC), Barcelona. She did her Master Thesis at Northeastern University, Boston, USA and received her PhD degree (Dr.-Ing.) from the Leibniz University Hannover, Germany. During her PhD she did a one-year visit at the Vision Lab at the University of Michigan, USA. She also spent two years as a postdoc at the Institute of Geodesy and Photogrammetry of ETH Zurich, Switzerland and one year at the Technical University of Munich. Her research interests are dynamic scene understanding, in particular multiple object tracking and segmentation, as well as machine learning for video analysis.

Xin Lu (Adobe)

Bio: Xin Lu is researcher, developer, and manager at Adobe. She has been working on deep learning and its application on computer vision and image processing. Her recent research interests include deep neural network architecture optimization, neural network pruning, and image generation. Her research work has been deployed across Adobe desktop and mobile products. Prior to joining Adobe, Xin received her Ph.D. from The Pennsylvania State University. Her thesis focused on image aesthetics assessment, emotion recognition and image denoising.

Jitendra Malik (UC Berkeley / Facebook)

Bio: Jitendra Malik is Arthur J. Chick Professor of EECS at UC Berkeley, and a Director of Research and Site Lead at Facebook AI Research in Menlo Park. He has published widely in computer vision, computational modeling of human vision, and machine learning. Several well-known concepts and algorithms arose in this research, such as anisotropic diffusion, normalized cuts, high dynamic range imaging, shape contexts and R-CNN. Jitendra received the Distinguished Researcher in Computer Vision Award from IEEE, the K.S. Fu Prize from IAPR, and the Allen Newell award from ACM and AAAI. He has been elected to the National Academy of Sciences, the National Academy of Engineering and the American Academy of Arts and Sciences.

Carol E. Reiley ( drive.ai)

Bio: Carol E. Reiley has been at the forefront of robotics and AI for over 15 years and was the youngest member on the IEEE Robotics and Automation board. She's worked on highly regulated products in a variety of different applications such as space, underwater and medical. She did her graduate work at Johns Hopkins University as an NSF Fellow researching how humans and robots interact, worked on the da Vinci system at Intuitive Surgical, and most recently cofounded and was the President of a self-driving car startup drive.ai. She's raised over $77M and built a partnership with Lyft, Grab, and automotive companies. Reiley was selected by Forbes as one of the twenty incredible women advancing artificial intelligence research. She currently holds over six patents, over a dozen publications, and was the first female engineer on the cover of MAKE magazine. She’s been featured in major publications such as MIT Tech Review, NYTimes, Harper’s Bazaar and given several Tedx Talks. She is also a published author and is now starting a new startup.

Oral Presentations

A few accepted papers are invited to give oral presentations.

Presenter instructions: The presentations should be 15 minute talk and 5 minutes Q/A.


Accepted orals
Presenter Name
Institution
Paper Title
Arantxa Casanova
MILA
On the iterative refinement of densely connected representation levels for semantic segmentation

Abstract: State-of-the-art semantic segmentation approaches increase the receptive field of their models by using either a downsampling path composed of poolings/strided convolutions or successive dilated convolutions. However, it is not clear which operation leads to best results. In this paper, we systematically study the differences introduced by distinct receptive field enlargement methods and their impact on the performance of a novel architecture, called Fully Convolutional DenseResNet (FC-DRN). FC-DRN has a densely connected backbone composed of residual networks. Following standard image segmentation architectures, receptive field enlargement operations that change the representation level are interleaved among residual networks. This allows the model to exploit the benefits of both residual and dense connectivity patterns, namely: gradient flow, iterative refinement of representations, multi-scale feature combination and deep supervision. In order to highlight the potential of our model, we test it on the challenging CamVid urban scene understanding benchmark and make the following observations: 1) downsampling operations outperform dilations when the model is trained from scratch, 2) dilations are useful during the finetuning step of the model, 3) coarser representations require less refinement steps, and 4) ResNets (by model construction) are good regularizers, since they can reduce the model capacity when needed. Finally, we compare our architecture to alternative methods and report state-of-the-art result on the Camvid dataset, with at least twice fewer parameters.

Huijuan Xu
University of Boston
Joint Event Detection and Description in Continuous Video Streams

Abstract: Dense video captioning is a fine-grained video understanding task that involves two sub-problems: localizing distinct events in a long video stream, and generating captions for the localized events. We propose the Joint Event Detection and Description Network (JEDDi-Net), which solves the dense video captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and generates their captions. Unlike existing approaches, our event proposal generation and language captioning networks are trained jointly and end-to-end, allowing for improved temporal segmentation. In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context. On the large-scale ActivityNet Captions dataset, JEDDi-Net demonstrates improved results as measured by standard metrics. We also present the first dense captioning results on the TACoS-MultiLevel dataset.

Sima Behpour
University of Illinois
ARC: Adversarial Robust Cuts for Semi-Supervised and Multi-Label Classification

Abstract: Many structured prediction tasks arising in computer vision and natural language processing tractably reduce to making minimum cost cuts in graphs with edge weights learned us- ing maximum margin methods. Unfortunately, the hinge loss used to construct these methods often provides a particularly loose bound on the loss function of interest (e.g., the Hamming loss). We develop Adversarial Robust Cuts (ARC), an approach that poses the learning task as a minimax game between predictor and “label approximator” based on minimum cost graph cuts. Unlike maximum margin methods, this game-theoretic perspective always provides meaningful bounds on the Hamming loss. We conduct multi-label and semi-supervised binary prediction experiments that demonstrate the benefits of our approach.

Arsha Nagrani
University of Oxford
Learnable PINs: Cross-Modal Embeddings for Person Identity

Abstract: We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.

Sayna Ebrahimi
UC Berkeley
Gradient-free policy architecture search and adaptation

Abstract: We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.

Aishwarya Agrawal
Georgia Tech
Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Abstract: A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the model from 'cheating' by primarily relying on priors in the training data. Specifically, GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. GVQA is built off an existing VQA model -- Stacked Attention Networks (SAN). Our experiments demonstrate that GVQA significantly outperforms SAN on both VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in several cases. GVQA offers strengths complementary to SAN when trained and evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more transparent and interpretable than existing VQA models.

Poster Presentations

Authors of all accepted papers (with or without travel grant) will present their work in a poster session.

Presenter instructions: All posters should be installed in at most 10 minutes at the start of the poster session at 10:00. The physical dimensions of the poster stands are 8 feet wide by 4 feet high. Poster presenters can optionally use the CVPR18 poster template for more details on how to prepare their posters. You do not need to use this template, but please read the instructions carefully and prepare your posters accordingly. Please note your poster number below to find your board.


Accepted Posters

No
Presenter Name
Institution
Paper Title


1
Doris Antensteiner
Svorad Stolc Austrian Institute of Technology
Variational Depth and Normal Fusion Algorithms for 3D Reconstruction
2
Mengjiao Wang
Imperial College London
A Neuro-Tensorial Approach For Learning Disentangled Representations
3
Ksenia Bittner
German Aerospace Center
Automatic Large-Scale 3D Building Shape Refinement Using Conditional Generative Adversarial Networks
4
Franziska Mueller
MPI Informatics
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
5
Yanran Joyce Wang
Northwestern University
Quick Adaption of Segmentation FCN via Network Modulation
6
Derya Akkaynak
University of Haifa
A Revised Underwater Image Formation Model
7
Dena Bazazian
CVC
Word Spotting in Scene Images based on Character Recognition
8
Tamar Rott Shaham
Technion
Deformation Aware Image Compression
9
Meng Zheng
Rensselaer Polytechnic Institute
RPIfield: A New Dataset for Temporally Evaluating Person Re-Identification
10
Ilke Demir
Facebook
A Holistic Framework for Addressing the World using Machine Learning
11
Bojana Gajic
Computer Vision Center
Cross-domain fashion image retrieval
12
Simone Meyer
ETH Zurich
PhaseNet for Video Frame Interpolation
13
Jingya Liu
City College of New York
Recognizing Elevator Buttons and Labels for Blind Navigation
14
Kuan-Ting Chen
National Taiwan University
Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures
15
Ruth Fong
University of Oxford
Net2Vec: Explaining how Concepts are Encoded in Deep Neural Networks
16
Ozge Yalcinkaya
Hacettepe University
I-ME: Iterative Model Evolution for Learning Activities From Weakly Labeled Videos
17
Kanami Yamagishi
Waseda University
Cosmetic Features Extraction by a Single Image Makeup Decomposition
18
Rosaura Vidal Mata
University of Notre Dame
UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition
19
Avantika Singh
IIT Mandi
Encapsulating the impact of transfer learning, domain knowledge and training strategies in deep-learning based architecture: A biometric based case study
20
Jadisha Ramirez Cornejo
University of Campinas
Dynamic Facial Expression Recognition through Visual Rhythm and Motion History Image
21
Karla Brkic
University of Zagreb
Keep it Short: Understanding Traffic Scenes From Very Short Representations
22
Hiya Roy
University of Tokyo
Do hashtags help? - Image aesthetics prediction using only hashtags
23
Marcella Cornia
University of Modena and Reggio Emilia
SAM: Pushing the Limits of Saliency Prediction Models
24
Yi Zhu
University of Chinese Academy of Sciences
To Be Focused: Efficient Weakly Supervised Learning via Soft Proposal Network
25
Nezihe Merve Gürel
ETH Zürich
Towards More Accurate Radio Telescope Images
26
Yi Zhu
University of Chinese Academy of Sciences
Learning Peak Response for Weakly Supervised Instance-level Segmentation
27
Mahdieh Poostchi
NIH
Multi-scale Spatially weighted Local Histogram in O(1)
28
Mahdieh Poostchi
NIH
Malaria Parasite Detection and Quantification Using Deep Neural Network
29
Mikayla Timm
University of Massachusetts Amherst
Large-Scale Ecological Analyses of Animals in the Wild using Computer Vision
30
Ekta Gujral
University of California, Riverside
FMVR:Shall I Get My Video Back? Feature Matching based Video Reconstruction
31
Murium Iqbal
Overstock
Discovering Style Trends through Deep Visually Aware Latent Product Embeddings
32
Anis Davoudi
university of florida
Autonomous detection of disruptions in the intensive care unit using deep mask R-CNN
33
Jing Zhang
Australian National University
Deep Saliency Detection: From Supervised Learning to Unsupervised Learning
34
Akram Bayat
UMass Boston
Multi-Resolution Deep Object Recognition Network
35
Fariba Zohrizadeh
University of Texas at Arlington
Image Segmentation using Sparse Subset Selection
36
Ivona Tautkute
Tooploox
I Know How You Feel: Emotion Recognition with Facial Landmarks
37
Jyoti Islam
Georgia State University
Early Diagnosis of Alzheimer's Disease: A Neuroimaging Study with Deep Learning Architectures
38
Sheila Pinto Caceres
University of Sydney
Activity Recognition under Energy Constraints
39
vibha gupta
IIT Mandi
Hybridization of Feature Selection Methods Based on Tournament Design: HEp-2 Cell Image Classification
40
Reham Abobeah
Egypt Japan University of Science and Technology
Wearable RGB Camera-based Navigation System for the Visually Impaired
41
Rajvi Shah
IIIT Hyderabad
View-graph Selection Framework for SfM
42
Arantxa Casanova
MILA
On the iterative refinement of densely connected representation levels for semantic segmentation
43
Huijuan Xu
University of Boston
Joint Event Detection and Description in Continuous Video Streams
44
Sima Behpour
University of Illinois
ARC: Adversarial Robust Cuts for Semi-Supervised and Multi-Label Classification
45
Arsha Nagrani
University of Oxford
Learnable PINs: Cross-Modal Embeddings for Person Identity
46
Sayna Ebrahimi
UC Berkeley
Gradient-free policy architecture search and adaptation
47
Aishwarya Agrawal
Georgia Tech
Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Mentoring Dinner on June 21

6:30 - 10:00 pm   Dinner sponsored by Facebook

The dinner event is an opportunity to meet other female computer vision researchers. Poster presenters will be matched with senior computer vision researchers to share experience and career advice. Invitees will receive an e-mail and be asked to confirm attendance.

*Note that the dinner takes place the evening before the main workshop day.*


Dinner speakers

Timnit Gebru (Microsoft Research)

Bio: Timnit Gebru works in the Fairness Accountability Transparency and Ethics (FATE) group at Microsoft Research, New York. Prior to joining Microsoft Research, she was a PhD student in the Stanford Artificial Intelligence Laboratory, studying computer vision under Fei-Fei Li. Her main research interest is in data mining large-scale, publicly available images to gain sociological insight, and working on computer vision problems that arise as a result, including fine-grained image recognition, scalable annotation of images, and domain adaptation. She is currently studying the ethical considerations underlying any data mining project, and methods of auditing and mitigating bias in sociotechnical systems. The New York Times, MIT Tech Review and others have recently covered her work. As a cofounder of the group Black in AI, she works to both increase diversity in the field and reduce the negative impacts of racial bias in training data used for human-centric machine learning models.

Dima Damen (University of Bristol)

Bio: Dima Damen is a Senior Lecturer (Associate Professor) in Computer Vision at the University of Bristol, United Kingdom. Received her PhD from the University of Leeds (2009). Dima's research interests are in the automatic understanding of object interactions, actions and activities using static and wearable visual (and depth) sensors. Dima co-chaired BMVC 2013, is area chair for BMVC (2014-2018), associate editor of Pattern Recognition (2017-). She was selected as a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in ICCV17, CVPR13 and CVPR12. She currently supervises 9 PhD students, and 3 postdoctoral researchers.

Xin Lu (Adobe)

Bio: Xin Lu is researcher, developer, and manager at Adobe. She has been working on deep learning and its application on computer vision and image processing. Her recent research interests include deep neural network architecture optimization, neural network pruning, and image generation. Her research work has been deployed across Adobe desktop and mobile products. Prior to joining Adobe, Xin received her Ph.D. from The Pennsylvania State University. Her thesis focused on image aesthetics assessment, emotion recognition and image denoising.