Date post: | 15-May-2023 |

Category: |
## Documents |

Upload: | khangminh22 |

View: | 0 times |

Download: | 0 times |

Share this document with a friend

18

Transcript

Reviewers' comments:

Reviewer #1 (Remarks to the Author):

Dear authors,

let me (Valerio Lucarini) say that I honestly appreciate the style and goals of your paper (at many

levels), but I am afraid that I have to give a negative evaluation of its content for a couple of

reasons.

The more important reason has to do with the interpretation of the transfer operator in the case of

reducing the system to just one variable - globally averaged surface temperature or sea

temperature.

The transfer operator is "well defined" if the system one is considering obeys the semigroup

property. This means - broadly speaking - that the operator describing the transition of

probabilities between 0 and time t can be written as the product of the evolution operator between

0 and t-\tau and the one between t-\tau and t, for any choice of \tau. This is the case if one works

in the full space of the system, which, in the case of a climate model, has O(10^7) degree of

freedom. If one considers a reduced phase space, the semi-group property, is unfortunately, lost.

This has to do exactly with the fact that neglecting one or many variables leads to loss of

markovianity in the system. This is carefully explained in Chekroun et al. 2015 For detailed yet

compact description of these properties, please see section 4.1 in Tantet et al (2018), where the

specific case of reduced order models in a climate model is discussed, and the criteria for choosing

specific climate variables in order to (partly) reduce this issue is discussed. It is also discussed in

my paper on J. Stat. Phys. you have cited (it is mentioned as a barrier for the method for

computing the response).

Note that the loss of markovianity in a reduced space is the fundamental reasons why if one wants

to construct parametrisation for unresolved processes (parametrisation is needed when we project

our part of the phase space), one needs to include a memory term to take into account non-

markovianity (Wouters and Lucarini 2012, Vissio and Lucarini 2017).

The second problem is that you patch together results from different models. Unfortunately, this is

ill-defined in probabilistic terms also when computing multi-model averages, let alone when

computing probability transitions. I know it is a common procedure used in many climate papers,

but it is, indeed, wrong.

One may wonder why the results look promising. Well, also using AR(n) fits one often gets good

results in terms of hind-cast. But this does not mean that we have really achieved predictive skill.

Please take these comments as extremely constructive and not dismissive at all. As you can see, I

am working on these issues and would be happy to have an off-line discussion with the authors.

Mickaël D. Chekroun, Honghu Liu, Shouting Wang, Stochastic Parameterizing Manifolds and Non-

Markovian Reduced Equations, Springer 2015

Jeroen Wouters and Valerio Lucarini, Disentangling multi-level systems: averaging, correlations

and memory, J. Stat. Mech. (2012) P03003

Vissio, G. and Lucarini, V., A proof of concept for scale-adaptive parametrizations: the case of the

Lorenz '96 model. Q.J.R. Meteorol. Soc.. doi:10.1002/qj.3184 (2017)

Alexis Tantet, Valerio Lucarini, Frank Lunkeit, Henk A. Dijkstra, Crisis of the Chaotic Attractor of a

Climate Model: A Transfer Operator Approach, arxiv:1507.02228, Nonlinearity, accepted (2018);

text at https://arxiv.org/pdf/1507.02228.pdf

Editorial Note: Parts of this peer review file have been redacted as comments were included that were not part of the transparent peer review scheme.

Reviewer #2 (Remarks to the Author):

This paper uses a statistical or machine learning simple prediction system to predict global mean

sea surface and near surface temperature. The system is trained with 10 climate models, and the

applied to observations. The evaluation metrics appear state of the art and the paper is generally

well written and certainly noteworthy. It appears longer than necessary though.

Having a friendly competition between statistical and physical model-based prediction systems in

climate research would certainly be useful for climate science. Nevertheless I have a few

comments before I think this is ready for the journal:

a) the perfect model test is certainly very useful and appropriate. However, what really should be

used to evaluate the predictions in the context of a possibly imperfect model with imperfect

forcing, are imperfect model tests. That would be to omit a single model from the database, and

predict its 3 or more ensemble members using the other 9, and also using the multimodel mean

forced response based on the other 9. I think the GISS data are not spatially complete (although

more infilled than HadCRUT) so it would also be nice to consider this in a prediction experiment,

see next point. Without imperfect model predictions it will be harder to ascertain if the present

good performance is partly down to luck given the small number of decadal samples.

b) it is not discussed in the paper how gaps in spatial data are handled. are the data considered

approximations of global mean? in a perfect model test (and an imperfect one) it would be good to

test if those gaps matter by omitting data from the models where the observations have gaps (ie

missing gridpoints) and then form the global mean to see the significance of this issue

c) while the writing is generally excellent, aspects of it are too uncritical and adversarial. For

example, I dont find it helpful to compare the computer time used between a physically based

climate prediction and this statistical one without mentioning that the model based ones predicts

the 4-D climate system not just global mean (you do this but much later)! So you put in less

computer time, but you also get out less.

d) Results shown for the hiatus start in 1998 only if I capture this correctly although you have

clearly done this for the entire record in another figure. could you please discuss a bit to what

extent the start point matters here?

Specific points:

abstract: 22ms will certainly depend on the laptop and is unhelpful here: why not be a bit more

subdued and say that prediction of global mean SST and SAT takes minutes on a laptop. and I

dont see why I would want to predict global mean SAT on my phone? how would that be useful?

(ok fascinating but useful?)

p 2 'can only occur' thats too narrow - better characterization of near term economic and emission

predictions and natural forcing predictions will also help!

text on bottom of page 2 top of page 3 seems to preempt material later in the paper and seems

out of place to me.

p. 3 end of page: 'after removing the part... forcing see method: the methods contain also almost

nothing here. general attribution approaches use multilinear regression too so this is way not

specific enough. I suspect you have done line fitting and removed the best match but how did you

do this and what forcings go into it? I really dont like it when something gets bumped off to

methods without being explained there!

end of p3: I have never encountered cK and dont think this is a useful addition to units. 0.1 K is

fine! Also, i would think the range of std devs for the instrumental period and the relative location

of the observed residual in them is more helpful than the percent difference, and also I dont quite

get where the 41% comes from - again not a useful metric. THis should also be crosslinked to the

AR5 box on the hiatus in chapter 9 which discusses this quesiton (not just for the hiatus)

p 4, end of 2nd paragraph: I am not convinced by the skin temperature explanation here. There

are good papers by kevin COwton explaining the role of SST vs SAT, but of course for SAT - but

here for SST I am unconvinced as skin temperature I think is the uppermost top of the ocean

which is not consistent with ship intake measurement, and the mixed layer should be well mixed

so 10m is certainly no issue here. Unless you can support this I would reduce this speculation

sharply

p 4 middle: I dont quite get the discussion on convergence to total density function could you eg

show some figures in a supplement or describe this more clearly?

p. 7 top: you havent retuned before going to observations, right? maybe worth mentioning. (and if

you have then I would be very sceptical of the paper!)

can you explain the meaning/consequences of reliability 2.3 vs 1?

p. 9 hiatus prediction discussion is a bit uncritical - if I read your predictions correctly, you do

underestimate the amplitude, and that would be consistent with the possibility of further forcings.

Citing a few review papers on the hiatus here would be helpful

p. 11 and discussion of final figure: you are predicting extremes of global SAT and SST not

regional events - that should be made clearer in text and caption. You could relate to what extent

global extreme T typically goes along with regional extremes.

citation 1: you cite this for attribution so citing the respective chapter is more appropriate I think

Figure 4 and 5 caption: THe captions are very similar but poorly separated. eg 4 mentions

hatching which only appears in 5, and I dont see the point of a flat reliability field plotted - why is

it so flat? and what value is the pink hue that cant well be attributed to a very shallow

colourscale?

Figure 7 caption: see above clarify anomalous event of global T.

2/14

Reviewer#1:Letme(ValerioLucarini) say that Ihonestlyappreciate thestyleandgoalsofyourpaper (atmanylevels).

Thankyouforthispositiveandsupportivecomment.

Themore important reasonhas todowith the interpretationof the transferoperator in thecaseofreducingthesystemtojustonevariable-globallyaveragedsurfacetemperatureorseatemperature.

WefeelthatthereisconfusionaboutthemethodweusedandhowtheTransferOperatorwasapplied.Thisispartiallyourfaultsincewedonotuseitintheusualway.Thishasnowbeenfullyclarifiedinthetext.

The transferoperator is "welldefined" if thesystemone is consideringobeys thesemigroupproperty. This means - broadly speaking - that the operator describing the transition ofprobabilities between 0 and time t can be written as the product of the evolution operatorbetween0andt-\tauandtheonebetweent-\tauandt,foranychoiceof\tau.Thisisthecaseifoneworks in the full spaceof thesystem,which, in thecaseofaclimatemodel,hasO(10^7)degree of freedom. If one considers a reduced phase space, the semi-group property, isunfortunately, lost.Thishastodoexactlywiththefact thatneglectingoneormanyvariablesleadstolossofmarkovianityinthesystem.ThisiscarefullyexplainedinChekrounetal.2015Fordetailedyetcompactdescriptionoftheseproperties,pleaseseesection4.1inTantetetal(2018),wherethespecificcaseofreducedordermodels inaclimatemodel isdiscussed,andthe criteria for choosing specific climate variables in order to (partly) reduce this issue isdiscussed.ItisalsodiscussedinmypaperonJ.Stat.Phys.youhavecited(itismentionedasabarrierforthemethodforcomputingtheresponse).

Wefullyagreewiththisdescription.However,ourmethoddiffersfromwhatyousuggestandfromthe“moreclassical”useofTransferOperators.Wedonotapplyasingle transferoperatorseveral timessuccessivelytopropagateforwardintimetheprobabilitydensityfunction.Indeed,thiswouldrequirethemarkovianityofthesystem,whichislostbyconsideringonlytheglobaltemperature(reducingthephasespacetoasinglevariable).Instead,weapplysequentiallydifferenttransferoperatorscomputedfordifferentpropagationtimes.ThiscouldbesimplydescribedasT(1yr)=M1*T(0),T(2yr)=M2*T(0),…, T(n yr)=Mn*T(0). However, in contrast to the traditional Markov chain, in our method M2 isdifferentfromM1*M1.So,thereisnoneedformarkovianity.

Henceourmethodovercomesthedifficultydescribedinthereviewer’scommentabove.OurmethodiscomputationallymoreexpensivethanthetraditionalMarkovchain(sinceitrequiresthecomputationoftransferoperatorsforeachpropagationtimeseparately),andhenceitslightlydefeatstheclassicalpurposeofusingatransferoperator,butitallowsamoreaccuratedescriptionoftheevolutionoftheprobabilitydensity function,without theneed formarkovianityandbypassing itsconstraints.WhilethetraditionalMarkovchainapproach indeedwouldnotbevalid forourapplication(aswetested),throughtheadjustmentwemade,theuseofTransferOperatorsbecomesperfectlyvalid.

Thismethodwasalreadyfullydescribedinthemethodpartofthemanuscript:“TheTransferOperator is built by evaluating thenumber of trajectories […] in each state and thenevaluating thenumberof these trajectoriesending-up ineachpossible stateafteragiven transitiontime(τ).Theratioof thesetwonumbersgivestheprobability fromatrajectory inan initialstatetoend-upinafinalstateafteratimeτ(Fig.2).Theprobabilityofstatetransitionisrepeatedforτ=1,2,3,4,5,6,7,8,9,and10years,leadingto10TransferOperators[…].”

Thishasbeenfurtherclarifiedinthenewversionofthemanuscript.

3/14

p.22:“ItiscrucialtonotethatwedonotapplyasingleTransferOperatorsuccessively(e.g.,applyingthe1-yr Transfer Operator twice to get a 2-yr prediction). In contrast, we apply a range of TransferOperators sequentially (i.e., applying the 1-yr Transfer Operator for 1-yr prediction, applying the 2-yrTransfer Operator for 2-yr, and so on). The essential difference with the traditional use of TransferOperator(i.e.,appliedsuccessivelyasaMarkovianchain)isthatwedonotrequirethatthe2-yrTransferOperator is equal to applying the 1-yr Transfer Operator twice successively. So, we need to, and did,establish all different TransferOperators for different lags separately and independently. ThismethodindeedallowsustoavoidtheneedforaMarkovianchain(anditsrequiredproperties),which,asaresult,isnotverified inourreducedsingle-variablespace(Lucarini,2016),andisprobablynotvalideither. Itshould be emphasized that our method is closer to a conditional probabilistic prediction (where theconditionisbasedonthecurrentGMTorSST)thantothetraditionaluseofTransferOperatorswithinaMarkovianchain.”

Note that the lossofmarkovianity ina reducedspace is the fundamental reasonswhy ifonewantstoconstructparametrisationforunresolvedprocesses(parametrisationisneededwhenwe project our part of the phase space), one needs to include a memory term to take intoaccountnon-markovianity(WoutersandLucarini2012,VissioandLucarini2017).

We understand and agree with the point above. However as mentioned above, we do not assumemakovianityofthesystemand,asaresultdonotneedtoincludeamemory-termtoaccountfornon-markovanity.OnecouldarguethatourdifferentuseofTransferOperatorsimplicitly includessuchamemory-termbyrelaxingthedemandthatM(2)=M(1)*M(1).Whenallowingforthisextradegreeoffreedom,reducingthephasespacetoglobaltemperatureisenoughtopredictwithgoodaccuracytheglobaltemperature(accuracywhichsurpassesstate-of-the-artpredictionsystem),asdemonstratedinour manuscript. This method is at the core of our prediction system; and the good skill is afundamentalresultofthepaper.

The secondproblem is that youpatch together results fromdifferentmodels.Unfortunately,this is ill-definedinprobabilistic termsalsowhencomputingmulti-modelaverages, letalonewhen computing probability transitions. I know it is a common procedure used in manyclimatepapers,butitis,indeed,wrong.

Wearenotpatchingtogetherresults fromdifferentmodels.Wearesimplybinningtheresultsofallthemodelsacknowledgingthateachmodel isonepossibleresult.Wedonot feel thatapplyingsuchpost-processingstatistic is ill-defined(albeitaffectedbythewell-knownsmoothingthatanybinningimplies).Weagreethatitwouldbedifferentifwewereusingasingletransferoperatorsuccessively.In that case, someprobabilitieswould come froma trajectory frommodel1, followedby trajectoryfrommodel2(forinstance),whichwouldleadtoafundamentalinconsistency(thereisnomodelthatisactuallyabletocreatesucha“Frankenstein”-artificialtrajectory).But,asmentionedintheresponsetoyourfirst/maincomment,wearenotdoingthat.

Inourview,theimplicitassumptionwemake,a-posterioriverifiedbytheresults, isthateachsinglemodelhasabiasedprobabilitydistribution.Bybinningalargeenoughnumberofmodels,thebiasinprobabilitydistribution isreduced,andnowbecomesamulti-modelaveragedbias,which issmallerthananysingle-modelbias,asitonlycontainsthebiasthatiscommontoallmodels.

Please see below how we modified the manuscript regarding Specific Points on the statisticalmethodology.

Onemaywonderwhytheresultslookpromising.Well,alsousingAR(n)fitsoneoftengetsgoodresults in termsofhind-cast.But thisdoesnotmean thatwehavereallyachievedpredictiveskill.

IndeedwehavedemonstratedgoodpredictionskillsofGMTandSST(Fig.5),aswellastheirevolution(Fig.6).Wehope,thattheaboveclarificationonourmethodshedssomelightonthereasonwhyour

4/14

methodachievedgoodskill.Wehaveindeedtestedapplyingasingletransferoperator(trainedwitha1yeartransitiontime)successively.Asexpectedwedidnotgetanyskill(probablyduetothelostofmarkovianity).Howeverourmethodoftrainingtransferoperatorsforeachtransitiontimesdoshowsignificantpredictiveskills.

Redacted

8/14

Reviewer#2:This paper uses a statistical ormachine learning simple prediction system to predict globalmeanseasurfaceandnearsurfacetemperature.Thesystemistrainedwith10climatemodels,andtheappliedtoobservations.Theevaluationmetricsappearstateoftheartandthepaperisgenerallywellwrittenandcertainlynoteworthy.

Thankyouforthissupportivecomment.

Havingafriendlycompetitionbetweenstatisticalandphysicalmodel-basedpredictionsystemsinclimateresearchwouldcertainlybeusefulforclimatescience.

Weagreewith the revieweron theusefulnessof friendly competition.Wealsowould like to stressthat our predictive system is “physicalmodel-based” (not statistical).We use statistics to build theprobabilistic prediction, but this statistics is based on “physical model-based”. We feel this is thefundamentalreasonoftheskillofoursystem.

a) theperfectmodeltestiscertainlyveryusefulandappropriate.However,whatreallyshouldbeusedtoevaluatethepredictionsinthecontextofapossiblyimperfectmodelwithimperfectforcing,areimperfectmodeltests.Thatwouldbetoomitasinglemodelfromthedatabase,andpredict its 3 or more ensemble members using the other 9, and also using the multimodelmeanforcedresponsebasedontheother9.

Wehavedonethissetofexperiments.TheresultingskillsaresummarizedintheFigurebelow.Thisshowsthattheimperfectmodelapproachhasequivalentskillastheperfectmodel(withreliability~1and an slight decrease of Coefficient of Determination bound to 0.01). This further shows therobustnessofthemethod.

Caption:AsFigure4but for the imperfectmodel approach– Skill differences (e-f) are imperfect vsperfectapproach.

Wehavemodifiedthetexttoincludethisnewresult.

9/14

p.7:“TofurthertestthepredictiveskillandreliabilityofPROCASTwehaveassessedtheminanimperfectmodelapproach(i.e.,removingoutputsofonemodelfromtheTransferOperatorcomputationandusingthemaspseudo-observations).WefindthatPROCASTisstillabletoperformatthesamelevelofaccuracythanwithintheperfectmodelapproachwithaslightdecreaseofthecoefficientofdeterminationoflessthan0.01foralllagsandaveragingtimestested.”I think theGISSdataarenot spatially complete (althoughmore infilled thanHadCRUT) so itwould also be nice to consider this in a prediction experiment, see next point. Withoutimperfectmodelpredictions itwillbehardertoascertain if thepresentgoodperformance ispartlydowntoluckgiventhesmallnumberofdecadalsamples.Wehaveused136startingdatesandappliedhindcastsfor1to10-yraverages(whichare,inthecaseofPROCAST,independent).Thisleaves1,360independentdecadalhindcasts.Wefeelthatitisalargeenoughsampletotesttherobustnessofourprediction…Inparticular,theextremelysmoothstatisticsofourskillinitsoperationalmode(Fig.5)furthersuggesttheaccuracyofthecomputedskill.b)itisnotdiscussedinthepaperhowgapsinspatialdataarehandled.arethedataconsideredapproximationsofglobalmean?Indeedwehavetreateditasanapproximationofglobalmean.Thisisnowfullyacknowledgedinthetext.p.24: “GMT and SST are computed as spatial averages (NASA GISS temperature record for GMT; theNOAAERSSTv5record forSST),wherespatialgaps in thedataare ignored.Thepercentageofmissingdata(importantbefore1958)doesnotshowanyimpactonthepredictionskill.TheinternalvariationsinGMTandSST in theobservational recordare computedas the residualafterhaving removed thepartthatcanbeattributedtoexternalforcing.”inaperfectmodeltest(andanimperfectone)itwouldbegoodtotestifthosegapsmatterbyomittingdata fromthemodelswheretheobservationshavegaps(iemissinggridpoints)andthenformtheglobalmeantoseethesignificanceofthisissueWeagreethatitmightbeproblematic.Totestit,ratherthanrecomputingthepredictionwithartificialgapsinthemodeldata,wehaveplottedthepercentageofmissingdataforthestartingyearagainstthehindcasterror,withinthefullyoperationalframework(Figurebelow).Thisclearlyshowsthatmissingdataissignificantuntil1958withmissingvalues from32%to15%.After1958thegoodcoverage leads toaquitestable levelofmissingvaluesof less than2%(Figurebelow).(Notethatwehavebeenextremelyconservativetocomputethemissingvalues,sinceasinglemonthsofmissingvaluesisconsideredasamissingvaluesfortheentireyear.)Ontheotherhand,theprediction error doesnot show significant differencebetween pre- andpost-1958,with an averageerror of 8 mK and of 8 mK; and a standard deviation of 9 mK and of 10mK respectively (Figurebelow). Hence no statistical difference can be determined between pre- and post-1958. Finally weplottedthepredictionerrorwiththepercentageofmissingdataforeachyeartoevaluateanypossiblerelationship(Figurebelow).Wedidnotfindanyclearrelation.

10/14

Caption:(top)PercentageofmissingdatafromGISSasafunctionoftime.(middle)1-yrlagpredictionerror of PROCAST in its fully operationalmode. The vertical black line represents year 1958whenmissingdatabecomesweak(lessthan2%);thick(thin)redandbluelinesrepresenttime-mean(time-mean+one standarddeviation) for thepre- andpost-1958periods, respectively. (bottom)1-yr lagpredictionerrorofPROCAST(initsfullyoperationalmode)asafunctionofthepercentageofmissingdatafromGISS.Redandbluecrossesrepresenttime-meanpercentageofmissingdataandof1-yrlagpredictionerror.Wehaveaddedacommentonthispointinthetext.p.24: “GMT and SST are computed as spatial averages (NASA GISS temperature record for GMT; theNOAAERSSTv5record forSST),wherespatialgaps in thedataare ignored.Thepercentageofmissingdata(importantbefore1958)doesnotshowanyimpactonthepredictionskill.TheinternalvariationsinGMTandSST in theobservational recordare computedas the residualafterhaving removed thepartthatcanbeattributedtoexternalforcing.”c)whilethewritingisgenerallyexcellent,aspectsof itaretoouncriticalandadversarial.Forexample,Idontfindithelpfultocomparethecomputertimeusedbetweenaphysicallybasedclimate prediction and this statistical one without mentioning that the model based onespredictsthe4-Dclimatesystemnotjustglobalmean(youdothisbutmuchlater)!Soyouputinlesscomputertime,butyoualsogetoutless.Wehada thoroughreadto thepaperandhavetoneddownanyclaimthatwastoouncriticalor tooadversarial.RegardingdiscussionbetweentherelativebenefitofDePreSys3andPROCAST.Thediscussionaboutthe single variable prediction (PROCAST) and the full state variable (DePreSys3) is done in thesentencedirectlyfollowingcomputertimeefficiency.Wedonotfeelitissolate.Thisistheonlytimewecomparecomputertimes.d)Resultsshownforthehiatusstartin1998onlyifIcapturethiscorrectlyalthoughyouhaveclearlydonethisfortheentirerecordinanotherfigure.couldyoupleasediscussabittowhatextentthestartpointmattershere?

11/14

We mention this hiatus has an emblematic example (i.e., post-1998 hiatus). As mentioned by thereviewer, thisdecadalpredictionhasbeendoneusingeachyearas thestartingyear.Obviously, thestartdatemattersfortheskill(asshownbelowandsummarizedinFig.5).We have tested a starting year in 2002 (see figure below). The Correlation Coefficient is betterwhereastheR-squareismoremoderatedthanforusing1998asstartingyear.PROCASTstillpredictsacoolingoftheanomalypartiallycompensatingtheforcedwarmingoverthisperiod.

Caption:AsFigure6butforastartingdayin2002.Thedependencyoftheskilltostartingyearisnowfullyacknowledgedinthetext.p.10:“Otherstartingdateshavebeentested(suchas2002)andalwaysallowPROCASTtocapturethelong-termhiatus,buttoalesserdegreetheexactinterannualvariationofthedecades.”Specificpoints:abstract:22mswillcertainlydependonthelaptopandisunhelpfulhere:whynotbeabitmoresubduedandsaythatpredictionofglobalmeanSSTandSATtakesminutesonalaptop.andIdontseewhyIwouldwanttopredictglobalmeanSATonmyphone?howwouldthatbeuseful?(okfascinatingbutuseful?)Weagree andhave changed the sentence tobe less specific.However “minutes” seemswayoff ouractualefficiency(byafactorofathousandatleast).p.1: “The extreme numerical efficiency of the method (a few hundredth of a second for a decadalpredictiononalaptop)”andIdontseewhyIwouldwanttopredictglobalmeanSATonmyphone?howwouldthatbeuseful?(okfascinatingbutuseful?)Wehopethatwewillsoonbeabletodopredictionofothervariablesandtimescaleshence“opensthepossibility”.Thiswouldbequiteneat ifonecanmakepredictionsofwhathe/shewantsbycarryinghis/herownpredictionsystemhis/herpocket.

12/14

p 2 'can only occur' thats too narrow - better characterization of near term economic andemissionpredictionsandnaturalforcingpredictionswillalsohelp!Weagreethatitwastoorestrictive.Wechangedthesentence.p.2: “[…], further improvement of climate predictionswillmainly occur throughbetter,more accuratepredictionsoftheinternalvariability.”textonbottomofpage2topofpage3seemstopreemptmateriallaterinthepaperandseemsoutofplacetome.Itwasasummaryofthemethodandresult.Itisprobablynotneededforaletter.Weshortenedit.p.3endofpage:'afterremovingthepart...forcingseemethod:themethodscontainalsoalmostnothinghere.generalattributionapproachesusemultilinearregressiontoosothisiswaynotspecificenough.Isuspectyouhavedonelinefittingandremovedthebestmatchbuthowdidyoudothisandwhatforcingsgointoit?Ireallydontlikeitwhensomethinggetsbumpedofftomethodswithoutbeingexplainedthere!Weagree themethoddescriptionwas toogeneralandsuperficialandmainlyreferring topublishedstudies.Wehaveclarifiedthispointinthemethodsection.p.24-25: “For removing the part attributed to external forcing from the timeseries of SAT and SST amultiple linear regressionanalysis isperformed, following ref [2].Theapproachassumes thatgloballyaveragedtemperaturerespondslinearly,withsomelag,tothevariousforcingagents.Afterremovingtheaveragevalue(weareonlyinterestedinanomalies),wecanwrite:𝑇 𝑡 = 𝜀 + 𝑎!

!!! iFi(t-li),whereTisthetotalGMTorSST,nisthenumberofforcingsconsidered(i.e.,three:anthropogenicforcing,solarforcing,andvolcanoes),aiaretheregressioncoefficients,Fiaretheforcingtimeseries,liisthelagbywhichtemperaturerespondstotheforcing,andεistheresidual.TheforcingtimeseriesaretakenfromtheCMIP5historicalplusRCP4.5 (after2005 till present) forcingdataset.We refer to ref [2] formoredetailsandfiguresofhowtheregressionperforms.”endofp3:IhaveneverencounteredcKanddontthinkthisisausefuladditiontounits.0.1Kisfine!Weagree.Wehavemodifiedthetext.Also,iwouldthinktherangeofstddevsfortheinstrumentalperiodandtherelativelocationoftheobservedresidualinthemismorehelpfulthanthepercentdifference,andalsoIdontquitegetwherethe41%comesfrom-againnotausefulmetric.THisshouldalsobecrosslinkedtotheAR5boxonthehiatusinchapter9whichdiscussesthisquesiton(notjustforthehiatus)We feel that there is confusion in thegoalof this sentence.Whatwemean is that thespreadof thetotal density function is different in the observations than in themodel outputs. This difference issmallforGMTbutimportantoftheorderof41%forSST.HenceforSSTweappliedarenormalizationofthedensityfunctionsothatthespreadofthemodeledvaluesfitstheobservedone.Wehaveclarifiedthispoint:

13/14

p.3-4:“TheCMIP5GMTandSSTanomaliesconsistofcentereddistributionswithastandarddeviationoftheannualmeanof0.1Kand0.07K,respectively.ForGMT, themodeledstandarddeviation is slightlyweaker than the observed one (0.12 K), but remains in good agreement: less than 9% of relativedifference. On the other hand, for SST, the standard deviation of the distribution in the CMIP5 issignificantlyweakerthanintheobservations(0.13K),witharelativedifferenceof43%.HenceforSST,themodeleddistributionisrenormalizedtofitthestandarddeviationoftheobservations.”p4,endof2ndparagraph:Iamnotconvincedbytheskintemperatureexplanationhere.TherearegoodpapersbykevinCOwtonexplainingtheroleofSSTvsSAT,butofcourseforSAT-buthereforSSTIamunconvincedasskintemperatureIthinkistheuppermosttopoftheoceanwhich is not consistent with ship intake measurement, and the mixed layer should be wellmixed so 10m is certainly no issue here. Unless you can support this I would reduce thisspeculationsharplyWe agree that the difference might be more complex that we suggested. We have removed thesentencetoavoidconfusion.p4middle:Idontquitegetthediscussiononconvergencetototaldensityfunctioncouldyouegshowsomefiguresinasupplementordescribethismoreclearly?Fig.3 gives an example of the convergence of the density function toward the “climatological”/totaldensity function. We are now referring more explicitly to Fig. 3 when discussing the asymptoticconvergence.p.4:“Thisshowsthatforallpossibleinitialconditionstheprobabilitydensityfunctionslowlyconvergestothe total density distributionwith a timescale of ~10 yr (see Fig. 3 for an example of the asymptoticconvergence).”p. 7 top: you havent retuned before going to observations, right?maybe worthmentioning.(andifyouhavethenIwouldbeveryscepticalofthepaper!)Nothereisabsolutelyno“retuning”done.Thisistheexactsamestatisticsthatareused.Thisisfullymentionedinthepapernow.p.7:“AfterhavingtestedPROCASTinaperfectmodelsetting,wenowtesttheexactsamesystemwithrealobservations.(Notethatnoretuningbeforegoingtoobservationshasbeenapplied.)”canyouexplainthemeaning/consequencesofreliability2.3vs1?This was explained in the method. We have added specific comments in the main text forcompleteness.p.8-9:“Morespecifically,PROCASTreliabilityforannualGMTisalmostperfectwithanaveragedvaluefor hindcast lags from 1 to 5 yr of 1 (prediction spread is as big as the prediction error on average),whereas it is 2.3 for DePreSys3 (prediction spread is 2.3 times as small as the prediction error onaverage). ThisrelativelyweakreliabilityofDePreSys3isasignedoftheunder-dispersionoftheensembleincomparisonwithitspredictionerrorandofanover-confidentpredictionsystem.”p.9hiatuspredictiondiscussionisabituncritical-ifIreadyourpredictionscorrectly,youdounderestimate the amplitude, and that would be consistent with the possibility of furtherforcings.CitingafewreviewpapersonthehiatusherewouldbehelpfulWegenerallyagreewiththiscomment.Theactualamplitudeofthehiatuswascorrectlypredictedinaprobabilisticsensestayingwellwithin+/-standarddeviationoftheprobabilisticprediction.Thegoal

14/14

wasnottodiscusshiatus,buttogiveanexample.Wehaveaddedreferencesandgiveamorecriticalviewofourresults.p.9-10: “Despite some error in its exact intensity (especiallywhen focusing ofmean prediction) or thedetailsofitsannualvariations,thisshowsthataneventsuchasthepost-1998hiatuscouldnothavebeenmissed using PROCAST (especially when acknowledging the predictive spread). In particular ourprobabilisticforecastframeworkshowsthatadecade-longhiatuswasalwaysalikelyoutcome(alwayswell within 1 standard deviation of our prediction), even if not the most likely, especially after 7 yr.Because the amplitude is somewhat lower than observed, itwould be consistent if a small part of thehiatus was indeed caused by external forcing, although the main part would be due to internalvariability18-21.”p. 11 and discussion of final figure: you are predicting extremes of global SAT and SST notregional events - that should bemade clearer in text and caption. You could relate to whatextentglobalextremeTtypicallygoesalongwithregionalextremes.Weagreeweareonlydiscussingglobalprediction.Thetwometricsweusedsincethebeginningofthestudy.Thishasbeenclarifiedinthetextbyaddingthewordsglobalwhennecessary.We are not aware of any studies relating the extreme GMTwith regional extremes, except for thesubsetofENSOrelatedGMTchange.citation1: you cite this forattribution so citing the respective chapter ismoreappropriate IthinkDoneFigure 4 and 5 caption: THe captions are very similar but poorly separated. eg 4 mentionshatchingwhichonlyappearsin5,andIdontseethepointofaflatreliabilityfieldplotted-whyis it so flat? and what value is the pink hue that cant well be attributed to a very shallowcolourscale?Weagree that some results of Fig.4 could have been better explained. We choose to use the samelayout and color scales between Fig.4 and 5 to help comparison (i.e., comparison between perfectmodelapproachandhindcast inoperationalmode).Theabsenceofhatching inFig.4 isbecause themodel surpasses persistence for all hindcast lags and averaging timescales. The flat pink color isbecausetheReliability isperfectandsoequal to1 forallhindcast lagsandaveragingtimescales(asexpectedinperfectmodelapproach).Bothpropertieswerementionedinthecaption:“Notethebetterskillthanpersistence(nohatchedregion)andthegoodreliabilitycloseto1forallhindcastlagsandaveragingtimes.”WehaveclarifiedthatinthenewversionofthecaptionofFig.4.CaptionFig.4:“Notetheabsenceofhatchedregioninaandbdenotingthebetterskillthanpersistenceforallhindcast lagsandaveraging times.Also,note the flatpinkcolour incanddcorresponding toagood reliability close to 1 for all hindcast lags and averaging times, as expected in a perfect modelapproach.”Figure7caption:seeaboveclarifyanomalouseventofglobalT.Done

REVIEWERS' COMMENTS:

Reviewer #1 (Remarks to the Author):

Dear authors,

I wish to say that the quality of your manuscript has substantially improved compared the

previous version. I encourage publication of the paper but I kindly ask the authors to keep into

account the comments below:

1) At page 3 and 4, some information about the limitations is the methodology caused by using a

severely projected phase space should be briefly discussed (the current explanation in the

appendix is ok). Otherwise, the reader would be slightly misled.

2) I really do not understand this sentence

"Here, the climate state is evaluated through the 1-dimensional phase space defined by either

GMT or SST, whereas the state transitions are based on the evolution of the respective metric in

the CMIP5 database"

3) Assumption 4 as discussed in the appendix: the entire IPCC report on extremes (which has its

own limitations) discusses changes in extremes as due to combined changes of the mean and of

the variability of the distribution. This is - at many levels - a grossly simplified view on the

problem. The authors should clarify that their assumption, which neglects dealing with changes in

the natural variability, might lead to low skill in predicting the probability of occurrence of

extremes, which is instead something they focus on also in the abstract.

4) I honestly believe that, when mentioning the problem of markovianity (or lack of) in your

estimates of the transfer operator, and, in fact, to provide support to your approach, you should

refer to the recently published paper by Tantet et al Nonlinearity 2018

http://iopscience.iop.org/article/10.1088/1361-6544/aaaf42/pdf

where this is discussed in detail, when looking at tipping points and analysing decay of

correlations.

5) You say that your way of performing climate prediction in the next few years is very

cheap and efficient. But please do not forget that you are using datasets that have been

produced with incredibly expensive numerical exercises!

All the best,

Valerio Lucarini

Reviewer #2 (Remarks to the Author):

The authors have addressed my comments well and my major questions have been

resolved. I am happy for the ms to go forward, as it is very timely and delaying it longer

would miss opportunities. I have a few suggestions for the authors to decide on:

title: its interesting but when I saw the title I wasnt sure it was the paper I was thinking

of! maybe have at least the word 'predicting' in title?

abstract: I am still not convinced the world needs mobile predictable global SSTs or

SATs. I think it would be more useful here to point at many impacts of climate change

scaling with global temperature and hence make a link to impacts.

The section about predictability vs reliability ending in the middle of page 6 is a bit

verbose. I recommend cutting it a bit by removing duplications.

p. 10: its interesting that using the 'best' models doesnt give you better predictions but

could you here briefly explain what you mean by 'best'? its not clear immediately.

2/4

Reviewer#1:1)Atpage3and4,someinformationaboutthelimitationsisthemethodologycausedbyusinga severelyprojectedphase space shouldbebrieflydiscussed (the current explanation in theappendixisok).Otherwise,thereaderwouldbeslightlymisled.Done.p.4: “In particular, the severe truncation of the phase space to a single variable implies that differentclimate states with equivalent GMT or SST are all aggregated in the probabilistic approach of theTransferOperators.”2)Ireallydonotunderstandthissentence"Here,theclimatestateisevaluatedthroughthe1-dimensionalphasespacedefinedbyeitherGMTorSST,whereasthestatetransitionsarebasedontheevolutionoftherespectivemetricintheCMIP5database"Wehaverephrasedthesentence:p.3:"Here,theclimatestateisevaluatedthroughthe1-dimensionalphasespacedefinedbyeitherGMTorSST,whereasthestatetransitionsarebasedonGMTorSSTevolutionssimulatedbyclimatemodelsfromtheCMIP5database(seeMethodsforfurtherdetails)."3)Assumption4asdiscussedintheappendix:theentireIPCCreportonextremes(whichhasits own limitations) discusses changes in extremes as due to combined changes of themeanandofthevariabilityofthedistribution.Thisis-atmanylevels-agrosslysimplifiedviewonthe problem. The authors should clarify that their assumption, which neglects dealing withchanges in the natural variability, might lead to low skill in predicting the probability ofoccurrenceofextremes,whichisinsteadsomethingtheyfocusonalsointheabstract.Wehavenowmentionedthisshortcoming.p.23-24: “Assumption 4 is also not correct, although generally applied in climate science. The generalassumptionisthattheperturbationimpliedbyglobalwarmingistoosmalltofundamentallychangetheanomalies,orclimatevariability(i.e.,anomaliesandperturbationtothebackgrounddonotaffecteachother).Thismightbeproblematicwhenstudyingextremeevents,whoseoccurrencesmightchangeinachangingclimate.Hence,thisassumptionisquestionableforcertainvariablesthataresubjecttoorderone changes under climate change, such as, e.g. the Atlantic Meridional Overturning Circulation, orextremes in the tail of a distribution. However, there is no indication that the assumption is notapproximately valid for GMT, whose variability is for a large part dominated by El Niño-SouthernOscillation and the Interdecadal Pacific Oscillation. So, despite these various assumptions, PROCAST isskillful for interannual GMT and (globally averaged) SST prediction, proving that these assumptionsalthoughnotstrictlytruearereasonableinthecontextofourstudy,becausetheyapplyinmostcasesofvariationinGMT.”4) Ihonestlybelievethat,whenmentioningtheproblemofmarkovianity(or lackof) inyourestimatesofthetransferoperator,and,infact,toprovidesupporttoyourapproach,youshouldrefertotherecentlypublishedpaperbyTantetetalNonlinearity2018http://iopscience.iop.org/article/10.1088/1361-6544/aaaf42/pdf

3/4

where this is discussed in detail, when looking at tipping points and analysing decay ofcorrelations.Wehaveaddedthisreferenceinthediscussiononp.22.5)Yousaythatyourwayofperformingclimatepredictioninthenextfewyearsisverycheapandefficient.Butpleasedonotforgetthatyouareusingdatasetsthathavebeenproducedwithincrediblyexpensivenumericalexercises!p.9:“Thisdifferenceinnumericalcosthastobeputintoperspective,though.PROCASTtakesadvantageof the freely available CMIP5 database, which is an incredibly expensive numerical exercise of theworldwideclimatesciencecommunity.Also,unlikePROCAST,DePreSys3isnotspecificallytrainedforasinglevariableprediction,sothattheentireclimatestate ispredicted inone forecast.This isobviouslybeneficial.”Reviewer#2:title: its interestingbutwhen I saw the title Iwasnt sure itwas thepaper Iwas thinkingof!maybehaveatleasttheword'predicting'intitle?Weagreewiththiscommentandhavemodifythetitle:Title: “ANovelProbabilisticForecastSystemPredictingAnomalouslyWarm2018-2022ReinforcingtheLong-TermGlobalWarmingTrend”abstract: I am still not convinced theworld needsmobile predictable global SSTs or SATs. Ithink itwould bemore useful here to point atmany impacts of climate change scalingwithglobaltemperatureandhencemakealinktoimpacts.WeagreethattheimpactofGMTchangesissignificantandhaveaddedthistotheconclusion.p.15:“ThisalsoopensthepossibilityofgivingaccesstoclimateforecastandpossiblesubsequentimpactsthatscalewithGMTtoawiderscientificcommunity(withouttheneed forsuper-computer)andtothegeneralpublicbyrunningasimpleapplicationonapersonalportabledevice.”Thesectionaboutpredictabilityvsreliabilityendinginthemiddleofpage6isabitverbose.Irecommendcuttingitabitbyremovingduplications.Wehaveshortenedthetext.p5-6: “To estimate the validity of our probabilistic predictions we use two different measures: thecoefficientofdetermination–R2,whichshowstheskillofthemeanprediction;andtheReliability,whichmeasures the accuracy of the spread in the prediction. These two measures can be mathematicallyexpressedas:

***Eq1a***,***Eq1b***,

4/4

where t is time, i is the possibility or state index, o(t) is the observation and xi are the predictedpossibilitieswithprobabilitypi(t).Thebardenotesanaverageovertimeorpossibilitiesdependingonthesuperscript.(OurequationoftheReliabilityisanextensionfornon-stationarystatisticofthepreviouslysuggesteddefinition13.)Thecoefficientofdetermination,whenmultipliedby100,givesthepercentageofvarianceoftheobservationexplainedbytheprediction.Sincethesystemischaotic(thereisadegreeofuncertainty around the mean prediction), it is expected that the prediction cannot represent theobservationperfectly, even if themodel represents perfectly reality.Hence the reliabilitymeasures theaccuracyofthispredictionerror.Whenareliablepredictionhaslargeskill(~1)weexpectthepredictionuncertaintytobesmall.Ontheotherhand,whenareliablepredictionsystemhaslowskill(~0)weexpectthepredictionuncertaintytobeasbigastheobservedvariance.Inthiscontext,andregardlessofitsskill,areliablepredictionsystemalwaysneedstohaveaReliabilitycloseto1.Hence,despitethatahighvalueof R2 is preferable for a skillful prediction, the reliability is arguablymore important to estimate theusefulness of the prediction system. Indeed a reliable prediction system can be used for probabilisticforecastsandriskassessments,evenifithaslowskill14,15.”p.10: its interestingthatusingthe 'best'modelsdoesntgiveyoubetterpredictionsbutcouldyouherebrieflyexplainwhatyoumeanby'best'?itsnotclearimmediately.Wehavemodifiedthetext.p9: “It is also interesting tonote that skill and reliability are improvedby theadditionof informationfrommoremodelsratherthanbyselectingasubsetofthebestmodels(i.e.,modelsgivingthebestskillwhenusedalonetotraintheTransferOperator).”

Recommended