Workshops

Pre-Conference workshops will be held sequentially on Monday, August 26, 2019, from 08:30am to 4:30pm (exact timetable and venue will be announced soon). 

Workshop 1. Compositional Data Analysis in Practice
  • Instructor:
    Prof. Michael Greenacre, Universitat Pompeu Fabra, Barcelona.
  • Short description:
    Compositional data are multivariate data with the constant sum constraint, for example sets of nonnegative data that each sum to 1, often found in chemistry (samples in bio- and geochemistry), sociology and economics (time and monetary budgets) and marketing (market shares), for example. The key idea is the logratio transformation, which has certain implications for the analysis and then the interpretation of the results. This short course explains the main features of this novel area of statistics, with many illustrations to real data in a variety of contexts.
  • Introductory background:
    Greenacre, M.J. (2018). Compositional Data Analysis in Practice. Chapman & Hall / CRC Press. https://github.com/michaelgreenacre/CODAinPractice
  • Tentative schedule:
    1. Compositional data and the logratio transformation: practical implications and interpretation [50 min]
    2. Total logratio variance and logratio distance; logratio analysis [50 min]
    Break [20 min]
    3. Modeling with logratios; variable selection; software [60 min].
  • Target Audience: Practitioners and researchers related to any domains where compositional data are found. Statisticians who want an introduction to this field of research and application.
  • Facilities Required:
    • Course participants should preferably bring their own laptops.
    • Software:  R, open source, with the R package easyCODA installed (which also requires the caveganboot and ellipse packages
    • Course Material.   All course materials, including the data and R scripts for the examples, will be made available for course participants.

Workshop 2. Symbolic Data Analysis: Parametric multivariate analysis of interval data
  • Instructors:
    Prof. Paula Brito, University of Porto, Portugal
    Prof. Pedro Duarte Silva, Católica Porto Business School, Portugal.
  • Short description:
    Symbolic Data is concerned with analysing data with intrinsic variability, which is to be taken into account. In Data Mining, Multivariate Data Analysis and classical Statistics, the elements under analysis are generally individual entities for which a single value is recorded for each variable – e.g., individuals, described by age, salary, education level, etc. But when the elements of interest are classes or groups of some kind – the citizens living in given towns; car models, rather than specific vehicles – then there is variability inherent to the data. Symbolic data goes beyond the usual data representation model, considering variables whose observed values for each element are no longer necessarily single real values or categories, but may assume the form of sets, intervals, or, more generally, distributions. In this Tutorial we focus on the analysis of interval data, i.e., when the variables’ values are intervals of IR, adopting a parametric approach. The proposed modelling allows for multivariate parametric analysis; in particular M(ANOVA), discriminant analysis, model-based clustering, robust estimation and outlier detection are addressed. The referred modelling and methods are implemented in the R package MAINT.Data, available on CRAN.
  • Introductory background:
    Brito, P. (2014). Symbolic data analysis: another look at the interaction of Data Mining and Statistics. WIREs Data Mining and Knowledge Discovery, 4(4), 281-295.
    Brito, P. & Duarte Silva, A.P. (2012). Modelling interval data with Normal and Skew-Normal distributions. Journal of Applied Statistics, 39(1), 3-20.
  • Tentative schedule:
    1. Introduction to Symbolic Data Analysis: Motivation. Examples. Types of symbolic variables and their representations. Sources of symbolic data: aggregation of microdata. [60 min].
    Break [15 min]
    2. Parametric modelling of interval data: Gaussian and Skew-Normal models. (M)ANOVA, Discriminant Analysis, Robust estimation and outlier detection, Model-based clustering [90min].
    Break [15 min]
    3. Case-studies with R Package MAINT.Data [60 min].
  • Target Audience: The course is aimed at all potential data analysts who need or are interested in analyzing data with variability, e.g. data resulting from the aggregation of individual records into groups of interest, or data that represent abstract entities such as biological species or regions as a whole. This methodology is particularly interesting for Economics and Management studies, Marketing, Social Sciences, Geography, Official Data statistics, as well as for Biology or Geology Data Analysis.
    It is assumed that the participants have a good background in classical Statistics and Multivariate Data Analysis.
  • Facilities Required:
    • Course participants’ should bring own laptops, with R, RStudio, and the R package MAINT.Data installed.
    • Course Material. All course materials, including the data and examples of software used for the case studies, will be made available for course participants.