archive-cc.com » CC » I » ICML.CC

Total: 273

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".
  • ICML Beijing
    rank r from O nr log 2 n revealed entries provided that revealed entries are drawn proportionally to the local row and column coherences closely related to leverage scores of the underlying matrix Our results are order optimal up to logarithmic factors and extend existing results for nuclear norm minimization which require strong incoherence conditions on the types of matrices that can be recovered due to assumed uniformly distributed revealed entries We further provide extensive numerical evidence that a proposed two phase sampling algorithm can perform nearly as well as local coherence sampling and without requiring a priori knowledge of the matrix coherence structure Finally we apply our results to quantify how weighted nuclear norm minimization can improve on unweighted minimization given an arbitrary set of sampled entries Download PDF Download Pic Benjamin Haeffele and Eric Young and Rene Vidal Structured Low Rank Matrix Factorization Optimality Algorithm and Applications to Image Processing pdf Recently convex solutions to low rank matrix factorization problems have received increasing attention in machine learning However in many applications the data can display other structures beyond simply being low rank For example images and videos present complex spatio temporal structures which are largely ignored by current

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/nuclear_norm.html (2016-02-18)
    Open archived version from archive

  • ICML Beijing
    communication overhead and allows adaptive load balancing Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level Download PDF Download Pic Issei Sato and Hiroshi Nakagawa Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker Planck Equation and Ito Process pdf The stochastic gradient Langevin dynamics SGLD algorithm is appealing for large scale Bayesian learning The SGLD algorithm seamlessly transit stochastic optimization and Bayesian posterior sampling However solid theories such as convergence proof have not been developed We theoretically analyze the SGLD algorithm with constant stepsize in two ways First we show by using the Fokker Planck equation that the probability distribution of random variables generated by the SGLD algorithm converges to the Bayesian posterior Second we analyze the convergence of the SGLD algorithm by using the Ito process which reveals that the SGLD algorithm does not strongly but weakly converges This result indicates that the SGLD algorithm can be an approximation method for posterior averaging Download PDF Download Pic Anoop Korattikara and Yutian Chen and Max Welling Austerity in

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/gradient_langevin_dynamics.html (2016-02-18)
    Open archived version from archive

  • ICML Beijing
    are solely based on independence assumptions Download PDF Download Pic Chengtao Li and Jun Zhu and Jianfei Chen Bayesian Max margin Multi Task Learning with Data Augmentation pdf Both max margin and Bayesian methods have been extensively studied in multi task learning but have rarely been considered together We present Bayesian max margin multi task learning which conjoins the two schools of methods thus allowing the discriminative max margin methods to enjoy the great flexibility of Bayesian methods on incorporating rich prior information as well as performing nonparametric Bayesian feature learning with the latent dimensionality resolved from data We develop Gibbs sampling algorithms by exploring data augmentation to deal with the non smooth hinge loss For nonparametric models our algorithms do not need to make mean field assumptions or truncated approximation Empirical results demonstrate superior performance than competitors in both multi task classification and regression Download PDF Download Pic Feiping Nie and Yizhen Huang and Heng Huang Linear Time Solver for Primal SVM pdf Support Vector Machines SVM is among the most popular classification techniques in machine learning hence designing fast primal SVM algorithms for large scale datasets is a hot topic in recent years This paper presents a new L2 norm regularized primal SVM solver using Augmented Lagrange Multipliers with linear time computational cost for Lp norm loss functions The most computationally intensive steps that determine the algorithmic complexity of the proposed algorithm is purely and simply matrix by vector multiplication which can be easily parallelized on a multi core server for parallel computing We implement and integrate our algorithm into the interfaces and framework of the well known LibLinear software toolbox Experiments show that our algorithm is with stable performance and on average faster than the state of the art solvers such as SVMperf Pegasos and the LibLinear that integrates the TRON PCD and DCD algorithms Download PDF Download Pic Jie Wang and Peter Wonka and Jieping Ye Scaling SVM and Least Absolute Deviations via Exact Data Reduction pdf The support vector machine SVM is a widely used method for classification Although many efforts have been devoted to develop efficient solvers it remains challenging to apply SVM to large scale problems A nice property of SVM is that the non support vectors have no effect on the resulting classifier Motivated by this observation we present fast and efficient screening rules to discard non support vectors by analyzing the dual problem of SVM via variational inequalities DVI As a result the number of data instances to be entered into the optimization can be substantially reduced Some appealing features of our screening method are 1 DVI is safe in the sense that the vectors discarded by DVI are guaranteed to be non support vectors 2 the data set needs to be scanned only once to run the screening and its computational cost is negligible compared to that of solving the SVM problem 3 DVI is independent of the solvers and can be integrated with any existing efficient solver We also

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/svms.html (2016-02-18)
    Open archived version from archive

  • ICML Beijing
    both synthetic and real world datasets the nonparametric tensor power method compares favorably to EM algorithm and other spectral algorithms Download PDF Download Pic Aaditya Ramdas and Javier Peña Margins Kernels and Non linear Smoothed Perceptrons pdf We focus on the problem of finding a non linear classification function that lies in a Reproducing Kernel Hilbert Space RKHS both from the primal point of view finding a perfect separator when one exists and the dual point of view giving a certificate of non existence with special focus on generalizations of two classical schemes the Perceptron primal and Von Neumann dual algorithms We cast our problem as one of maximizing the regularized normalized hard margin rho in an RKHS and use the Representer Theorem to rephrase it in terms of a Mahalanobis dot product semi norm associated with the kernel s normalized and signed Gram matrix We derive an accelerated smoothed algorithm with a convergence rate of tfrac sqrt log n Download PDF Download Pic Dino Sejdinovic and Heiko Strathmann and Maria Lomeli Garcia and Christophe Andrieu and Arthur Gretton Kernel Adaptive Metropolis Hastings pdf A Kernel Adaptive Metropolis Hastings algorithm is introduced for the purpose of sampling from a target distribution with strongly nonlinear support The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space RKHS such that the feature space covariance of the samples informs the choice of proposal The procedure is computationally efficient and straightforward to implement since the RKHS moves can be integrated out analytically our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution and adapts to its local covariance structure Furthermore the procedure requires neither gradients nor any other higher order

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/kernel_hilbert_space.html (2016-02-18)
    Open archived version from archive

  • ICML Beijing
    our classifier may be trained on n examples in O n 2 log n time and evaluated on new points in O log n time Download PDF Download Pic Jacob Gardner and Matt Kusner and Zhixiang and Kilian Weinberger and John Cunningham Bayesian Optimization with Inequality Constraints pdf Bayesian optimization is a powerful framework for minimizing expensive objective functions while using very few function evaluations It has been successfully applied to a variety of problems including hyperparameter tuning and experimental design However this framework has not been extended to the inequality constrained optimization setting particularly the setting in which evaluating feasibility is just as expensive as evaluating the objective Here we present constrained Bayesian optimization which places a prior distribution on both the objective and the constraint functions We evaluate our method on simulated and real data demonstrating that constrained Bayesian optimization can quickly find optimal and feasible points even when small feasible regions cause standard methods to fail Download PDF Download Pic Anoop Cherian Nearest Neighbors Using Compact Sparse Codes pdf In this paper we propose a novel scheme for approximate nearest neighbor ANN retrieval based on dictionary learning and sparse coding Our key innovation is to build compact codes dubbed SpANN codes using the active set of sparse coded data These codes are then used to index an inverted file table for fast retrieval The active sets are often found to be sensitive to small differences among data points resulting in only near duplicate retrieval We show that this sensitivity is related to the coherence of the dictionary small coherence resulting in better retrieval To this end we propose a novel dictionary learning formulation with incoherence constraints and an efficient method to solve it Experiments are conducted on two state of the art computer vision datasets with 1M

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/approximate_nearest_neighbor.html (2016-02-18)
    Open archived version from archive

  • ICML Beijing
    achieve various forms of convergence In our third contribution we show how several reinforcement learning methods that use NGA without positive definite metric tensors can be adapted to properly use GeNGA Download PDF Download Pic Haitham Bou Ammar and Eric Eaton and Paul Ruvolo and Matthew Taylor Online Multi Task Learning for Policy Gradient Methods pdf Policy gradient algorithms have shown considerable recent success in solving high dimensional sequential decision making tasks particularly in robotics However these methods often require extensive experience in a domain to achieve high performance To make agents more sample efficient we developed a multi task policy gradient method to learn decision making tasks consecutively transferring knowledge between tasks to accelerate learning Our approach provides robust theoretical guarantees and we show empirically that it dramatically accelerates learning on a variety of dynamical systems including an application to quadrotor control Download PDF Download Pic Xuezhi Wang and Tzu Kuo Huang and Jeff Schneider Active Transfer Learning under Model Shift pdf Transfer learning algorithms are used when one has sufficient training data for one supervised learning task the source task but only very limited training data for a second task the target task that is similar but not identical to the first These algorithms use varying assumptions about the similarity between the tasks to carry information from the source to the target task Common assumptions are that only certain specific marginal or conditional distributions have changed while all else remains the same Alternatively if one has only the target task but also has the ability to choose a limited amount of additional training data to collect then active learning algorithms are used to make choices which will most improve performance on the target task These algorithms may be combined into active transfer learning but previous efforts have had

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/domains.html (2016-02-18)
    Open archived version from archive

  • ICML Beijing
    clusters of the objects We demonstrate how to build a hierarchically clustered factor analysis model with the beta diffusion tree and how to perform inference over the random tree structures with a Markov chain Monte Carlo algorithm We conclude with several numerical experiments on missing data problems with data sets of gene expression arrays international development statistics and intranational socioeconomic measurements Download PDF Download Pic David Barber and Yali Wang Gaussian Processes for Bayesian Estimation in Ordinary Differential Equations pdf Bayesian parameter estimation in coupled ordinary differential equations ODEs is challenging due to the high computational cost of numerical integration In gradient matching a separate data model is introduced with the property that its gradient can be calculated easily Parameter estimation is achieved by requiring consistency between the gradients computed from the data model and those specified by the ODE We propose a Gaussian process model that directly links state derivative information with system observations simplifying previous approaches and providing a natural generative model Download PDF Download Pic Yarin Gal and Zoubin Ghahramani Pitfalls in the use of Parallel Inference for the Dirichlet Process pdf Recent work done by Lovell Adams and Mansingka 2012 and Williamson Dubey and Xing 2013 has suggested an alternative parametrisation for the Dirichlet process in order to derive non approximate parallel MCMC inference for it work which has been picked up and implemented in several different fields In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these requirements while approximate approaches often satisfy most of them We present both theoretical and experimental evidence analysing the load balance for the inference and showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation We end with suggestions of alternative paths of research for efficient non approximate parallel inference for the Dirichlet process Download PDF Download Pic Alessio Benavoli and Giorgio Corani and Francesca Mangili and Marco Zaffalon and Fabrizio Ruggeri A Bayesian Wilcoxon signed rank test based on the Dirichlet process pdf Bayesian methods are ubiquitous in machine learning Nevertheless the analysis of empirical results is typically performed by frequentist tests This implies dealing with null hypothesis significance tests and p values even though the shortcomings of such methods are well known We propose a nonparametric Bayesian version of the Wilcoxon signed rank test using a Dirichlet process DP based prior We address in two different ways the problem of how to choose the infinite dimensional parameter that characterizes the DP The proposed test has all the traditional strengths of the Bayesian approach for instance unlike the frequentist tests it allows verifying the null hypothesis not only rejecting it and taking decision which minimize the expected loss Moreover one of the solutions proposed to model the infinitedimensional parameter of the DP allows isolating instances

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/priors.html (2016-02-18)
    Open archived version from archive

  • ICML Beijing
    new distance metric learning method outperforms related state of the art methods in a variety of experimental settings to cluster both noiseless and noisy data Download PDF Download Pic Farhad Pourkamali Anaraki and Shannon Hughes Memory and Computation Efficient PCA via Very Sparse Random Projections pdf Algorithms that can efficiently recover principal components in very high dimensional streaming and or distributed data settings have become an important topic in the literature In this paper we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli generated nonzero entries Indeed our approach is simultaneously efficient in memory storage space efficient in computation and produces accurate PC estimates while also allowing for rigorous theoretical performance analysis Moreover one can tune the sparsity of the random vectors deliberately to achieve a desired point on the tradeoffs between memory computation and accuracy We rigorously characterize these tradeoffs and provide statistical performance guarantees In addition to these very sparse random vectors our analysis also applies to more general random projections We present experimental results demonstrating that this approach allows for simultaneously achieving a substantial reduction of the computational complexity and memory storage space with little loss in accuracy particularly for very high dimensional data Download PDF Download Pic Danilo Jimenez Rezende and Shakir Mohamed and Daan Wierstra Stochastic Backpropagation and Approximate Inference in Deep Generative Models pdf We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep directed generative models endowed with a new algorithm for scalable inference and learning Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound We develop stochastic backpropagation rules for gradient backpropagation through stochastic variables and derive an algorithm that allows for

    Original URL path: http://icml.cc/2014/icml2014keywords/index/article/covariance_matrix.html (2016-02-18)
    Open archived version from archive