Study Methods Pdf 89352 | Methods In Financial Machine Learning Awalee

Partial capture of text on file.
           Why And How To Use 
           Ensemble Methods in 
                Financial Machine 
                           Learning?
                Study carried out by the Quantitative Practice 
                  Special thanks to Pierre-Edouard THIERY
                                   JANVIER 2021
                                 S
      Introduction                                                 1
      1. FromASingleModelToEnsembleMethods:BaggingandBoosting      1
      2. TheThreeErrorsOfAMachineLearningModel                     2
      3. WhyIsItBeerToRelyOnBaggingInFinance?                     3
      Conclusion                                                   5
      References                                                   5
     Note Awalee
                 Introduction                                                                                                  the number of neurons in each layer as well as the functions
                                                                                                                               within each neuron, forms the metaparameters M. The pa-
                 MachineLearningtechniquesaregainingcurrencyinfinance                                                          rameters of the neural network are the weights for each link
                 nowadays; ever more strategies rely on Machine Learning                                                       betweentwoneuronsfromtwoconsecutivelayers. Thosepa-
                 models such as neural networks to detect ever subtler sig-                                                    rameters are estimated thanks to a training set D: formally
                 nals. Nonetheless this rising popularity does not come with-                                                  P = P(D). As of now, since the training sets which are used
                 out any shortcoming, the most widespread one being the so-                                                    areoftheutmostimportance,wealwayswriteP(D)toclearly
                 called "overfiing", when models tend to learn by heart from                                                  mention which training set is used to find the parameters of
                 the data and are thus unable to face unknown data. In our                                                     a given model.
                 opinion, using Machine Learning algorithms in finance with-
                 out a deep understanding of their inner logic is highly risky:                                                      The gist of ensemble methods is fairly simple: we com-
                 promisinginitial results are oen misleading, the real-life im-                                               bine several weak models to produce a single output. As of
                 plementation being much disappointing due to the lack of                                                      now and for the rest of this paper, the number of models is
                 comprehension of what is really happening.                                                                    denoted N.
                       In this paper we decide to focus on a specific category
                 of Machine Learning meta-algorithms: the ensemble meth-                                                             Theensemblemethodscanbedividedintotwomainsets:
                 ods. The ensemble methods are called meta-algorithms since                                                    the parallel methods, where the N models are independent,
                 theyprovidediﬀerentwaysofcombiningmiscellaneousmod-                                                           and the sequential methods, where the N models are built
                 els of Machine Learning in order to build a stronger model.                                                   progressively.
                 Thosetechniquesarewell-knownforbeingextremelypower-
                 ful withinmanyareas;howeverwebelievethatitisimportant                                                               •AParallel Method: Bootstrap Aggregating
                 to understand what their advantages are from a mathemati-
                 cal point of view to make sure that they are used purposefully                                                      In this section, we set forth the bootstrap aggregat-
                 whendealingwithafinancial Machine Learning problem.                                                           ing method, also known as "bagging", which is the most
                       First we set forth how ensemble methods work from a                                                     widespread of the parallel methods [1]. As of now, we as-
                 general point of view. We then present the three sources of                                                   sumeatraining set, denoted D, is at our disposal:
                 errors in Machine Learning models before explaining what
                 the advantages of bagging over boosting are in finance, and                                                         Definition 2 (Data Set)
                 howtouseeﬀiciently bagging.                                                                                   Adataset D is a set of couples of the following form
                                                                                                                                                                             m
                                                                                                                                                             (
                                                                                                                                                    D={xi,yi) ∈ R ×R,1≤i ≤n}
                                                                                                                               wherenisthecardinalofD. Forthei-thelementinthedataset,
                                                                                                                                         m
                 1 FromASingleModelToEnsemble                                                                                  xi ∈ R is called the vector of the m features, and yi is called
                                                                                                                               the output value.
                          Methods: BaggingandBoosting                                                                                                                                                                   j
                                                                                                                                     Tocarryoutthebagging,weconstructN modelsM with
                 Machine Learning is mainly premised on predictive models.                                                     1 ≤ j ≤ N. To do so, we consider a generic model M(M;•;•),
                 Once devised, a model is then trained thanks to available                                                     i.e. a predictive model whose metaparameters are fixed, for
                 data; its purpose is to predict the output value, also known as                                               instance a neural network with a given shape. In order to get
                 the outcome, corresponding to new input data. Formally we                                                     N models, the generic model will be trained with N diﬀerent
                 can define a predictive model in the following manner:                                                        training sets Dj:
                                                                                                                                                               Mj =M(M;P(Dj);•)
                       Definition 1 (Predictive Model)
                 Apredictive model is defined as an operator M, based on meta-                                                 Thus, Mj is now a function which, for every input vector
                                                                                                                                       m                                             j 
                 parameters denoted M, and on parameters denoted P. It uses a                                                  x ∈ R outputarealvaluey = M (x).
                                                              m
                 set of inputs, denoted x ∈ R , to compute an output, denoted                                                        The N models are diﬀerent since they are not trained on
                 O∈R,seenasthepredictedvalue. We can write:                                                                    thesametrainingset;itmeansthatthesetofparameterswill
                                     (                       m
                                MM;P;•) : R                         →                  R                                       bediﬀerent; therefore we will have diﬀerent output values for
                                                                                       (         
                                                            x       →      O=MM;P;x)                                                                                                
                                                                                                                               the same input vector of features x.
                 Thus the idea of a predictive model is only to predict a value                                                      The N training sets are created thanks to the data set D.
                 based on several features which are the inputs. If M is consid-                                               Thesizeofthetrainingsetsischosenbytheuseranddenoted
                 ered to be "the machine", the learning part consists in estimating                                            K with K < n, otherwise the training sets would necessary
                 the parameters P that enable us to use the model. The metapa-                                                 contain redundant information. The K elements of the train-
                 rameters M are chosen, and oen optimized, by the user.                                                       ing set D for 1 ≤ j ≤ N are sampled in D with replacement:
                                                                                                                                             j
                       For instance, a neural network is a predictive model. The                                               for 1 ≤ j ≤ N                                                         
                                                                                                                                                                 
                 shape of the neural network, i.e. the number of layers and                                                                          D =         x       , y         ,1≤k≤K
                                                                                                                                                        j          u(j,k)    u(j,k
                                                                                                                        1
                                                                                                                                                                                                                                1
           whereu(j,k) is a uniform random variable in 1,n .                    Theprocess is then exactly similar. Thanks to the error of
                                                      [    ]
              OncetheN modelsMj havebeentrained, they are com-              thesecondmodel,wecancomputeforeachelementwithinD
                                                                            a new weight. Those weights are used to create a new train-
           bined into the final model Mf. For instance, if we consider      ing set: we sample D using the new weights to get D2, which
           a regression problem, meaning that the output value y does       will then be used to train model M3 and so on.
           not belong to a predetermined finite set of values:                  It is then possible to define the final model Mf as a
                                                                            weighted sum of the N models Mj, where the weight as-
                        Mf : Rm →                R
                                           1 N      j                      sociated with a given model is derived from the error of this
                                                      
                                                      (
                                  x   →   N   j=1 M x)                     modelonthedata.
              If we consider a classification problem, meaning that the
           output value y belongs to a finite set of values S, the output       Wehave only presented the main ideas of boosting; the
           value of the final model is determined by a vote of the N        simplest implementation of those guidelines is probably the
           modelsMj: theoutcomewhichappearsthemostamongthe                  AdaBoost algorithm [2].
           N output values produced by the N models is the outcome of
           the final model.                                                 2 The Three Errors Of A Machine
              Suchamodelcanthenbetestedonatestsetofdataasit                       LearningModel
           is usually done for every model of Machine Learning.
              It is also worth noticing that there are many bagging ap-     A Machine Learning model can suﬀer from three sources of
           proaches, which all derive from the general principle as pre-    error: the bias, the variance and the noise. It is important
           sented above. Even though we do not delve into the details,      to understand what lies behind those words in order to un-
           we can for instance mention the so-called "feature bagging",     derstand why and how ensemble methods can prove to be
           where each one of the N models is trained using only a spe-      helpful in finance.
           cific subset of features.                                            Thebias is the error spawned by unrealistic assumptions.
              • A Sequential Method: Boosting                               When the bias is particularly important, it means that the
                                                                            model fails to recognize the important relations between the
              Sequential methods consist no longer in using a set of        features and the outputs. The model is said to be underfied
           N independent models, but instead a sequence of N models,        whensuchacaseoccurs.
           wheretheorderofthemodelsmaers:                                      Figure1displaysamodelwithanimportantbias. Thedots
                                                                        represent the training data, which obviously do not exhibit a
                        M1,...,MN → M1,...,MN                               linear relation. If we assume that there is a linear relationship
                                                                    betweenthefeaturesandtheoutcomes,suchamodelclearly
                        a mere set: no order  a sequence                    founders to recognize any relation between the former and
              So we have to construct the sequence of the N models,         the laer.
           beginning with the first one, which will then sway how the
           second one is defined, and so on and so forth. In the rest of
           thissectionwepresentsomeoftheprincipalideasofboosting.
              First, as with bagging, we assume we have a training set
           madeofnelementsanddenotedD. Ifwechoosetoconsider
           a generic model M(M;•;•), we can train a first model:
                                 1     (
                              M =MM;P(D);•)
                                                             1 
                                                              (  ) and
              For every element within D we can compute M xi
           compareit to the outcome yi.
              TodevisethesecondmodelM2,wearegoingtotrainthe
                            (
           generic model M M;•;•) on a new training set D1: the new
           training set derives from D. It contains K < n elements, as
           will all the subsequent training sets.                                            Figure 1: Underfied model
              WeaributetoeachelementwithinDaweightdepending
                                   1 
           on how far               ( ). The more important the error,
                      y is from M x
                       i              i                                         The variance stems from the sensitivity to tiny changes
           the higher the weight associated with an element. We then
           usethoseweightstorandomlysampleDinordertogenerate                in the training set. When variance is too high, the model is
           the new training set D1.                                         overfied on the training set: there happens a "learning-by-
                                                                            heart"situation. Thisexplainswhyevenasmallchangeinthe
                             M2        (                                    training set can lead to widely diﬀerent predictions.
                                  = M M;P(D1);•)
                                                                        2
                                                                                                                                       2
The words contained in this file might help you see if this file matches what you are looking for:

...Why and how to use ensemble methods in financial machine learning study carried out by the quantitative practice special thanks pierre edouard thiery janvier s introduction fromasinglemodeltoensemblemethods baggingandboosting thethreeerrorsofamachinelearningmodel whyisitbe ertorelyonbagginginfinance conclusion references note awalee number of neurons each layer as well functions within neuron forms metaparameters m pa machinelearningtechniquesaregainingcurrencyinfinance rameters neural network are weights for link nowadays ever more strategies rely on betweentwoneuronsfromtwoconsecutivelayers thosepa models such networks detect subtler sig estimated a training set d formally nals nonetheless this rising popularity does not come with p now since sets which used any shortcoming most widespread one being so areoftheutmostimportance wealwayswritep toclearly called overfi ing when tend learn heart from mention is find parameters data thus unable face unknown our given model opinion using al...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area