56IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 ASurveyofTechniquesforInternetTrafﬁc ClassiﬁcationusingMachineLearning ThuyT.T.NguyenandGrenvilleArmitage Abstract—TheresearchcommunityhasbegunlookingforIP trafﬁcclassiﬁcationtechniquesthatdonotrelyon‘wellknown’ TCPorUDPportnumbers,orinterpretingthecontentsofpacket payloads.Newworkisemergingontheuseofstatisticaltrafﬁc characteristicstoassistintheidentiﬁcationandclassiﬁcation process.Thissurveypaperlooksatemergingresearchintothe applicationofMachineLearning(ML)techniquestoIPtrafﬁc classiﬁcation-aninter-disciplinaryblendofIPnetworkingand dataminingtechniques.Weprovidecontextandmotivationfor theapplicationofMLtechniquestoIPtrafﬁcclassiﬁcation, andreview18signiﬁcantworksthatcoverthedominantperiod from2004toearly2007.Theseworksarecategorizedand reviewedaccordingtotheirchoiceofMLstrategiesandprimary contributionstotheliterature.Wealsodiscussanumberofkey requirementsfortheemploymentofML-basedtrafﬁcclassiﬁers inoperationalIPnetworks,andqualitativelycritiquetheextent towhichthereviewedworksmeettheserequirements.Open issuesandchallengesintheﬁeldarealsodiscussed. IndexTerms—Trafﬁcclassiﬁcation,InternetProtocol,Machine Learning,RealTime,Payloadinspection,Flowclustering,Sta- tisticaltrafﬁcproperties. I.INTRODUCTION REAL-TIMEtrafﬁcclassiﬁcationhasthepotentialto solvedifﬁcultnetworkmanagementproblemsforIn- ternetserviceproviders(ISPs)andtheirequipmentvendors. Networkoperatorsneedtoknowwhatisﬂowingovertheir networkspromptlysotheycanreactquicklyinsupportof theirvariousbusinessgoals.Trafﬁcclassiﬁcationmaybea corepartofautomatedintrusiondetectionsystems[1][2] [3],usedtodetectpatternsindicativeofdenialofservice attacks,triggerautomatedre-allocationofnetworkresources forprioritycustomers[4],oridentifycustomeruseofnetwork resourcesthatinsomewaycontravenestheoperator’sterms ofservice.Morerecently,governmentsarealsoclarifying ISPobligationswithrespectto‘lawfulinterception’(LI)of IPdatatrafﬁc[5].Justastelephonecompaniesmustsupport interceptionoftelephoneusage,ISPsareincreasinglysubject togovernmentrequestsforinformationonnetworkuseby particularindividualsatparticularpointsintime.IPtrafﬁc classiﬁcationisanintegralpartofISP-basedLIsolutions. CommonlydeployedIPtrafﬁcclassiﬁcationtechniqueshave beenbasedarounddirectinspectionofeachpacket’scontents atsomepointonthenetwork.SuccessiveIPpacketshaving thesame5-tupleofprotocoltype,sourceaddress:portand destinationaddress:portareconsideredtobelongtoa ﬂow ManuscriptreceivedApril23,2007;revisedSeptember5,2007. TheauthorsarewiththeCentreforAdvancedInternetArchitec- tures,SwinburneUniversityofTechnology,Melbourne,Australia(e-mail: [email protected],[email protected]). DigitalObjectIdentiﬁer10.1109/SURV.2008.080406. whosecontrollingapplicationwewishtodetermine.Simple classiﬁcationinfersthecontrollingapplication’sidentityby assumingthatmostapplicationsconsistentlyuse‘wellknown’ TCPorUDPportnumbers(visibleintheTCPorUDPhead- ers).However,manyapplicationsareincreasinglyusingunpre- dictable(oratleastobscure)portnumbers[6].Consequently, moresophisticatedclassiﬁcationtechniquesinferapplication typebylookingforapplication-speciﬁcdata(orwell-known protocolbehavior)withintheTCPorUDPpayloads[7]. Unfortunately,theeffectivenessofsuch‘deeppacketin- spection’techniquesisdiminishing.Suchpacketinspection reliesontworelatedassumptions: • Thirdpartiesunafﬁliatedwitheithersourceorrecipient areabletoinspecteachIPpacket’spayload(i.e.isthe payloadvisible) • Theclassiﬁerknowsthesyntaxofeachapplication’s packetpayloads(i.e.canthepayloadbeinterpreted) Twoemergingchallengesunderminetheﬁrstassumption- customersmayuseencryptiontoobfuscatepacketcontents (includingTCPorUDPportnumbers),andgovernments mayimposeprivacyregulationsconstrainingtheabilityof thirdpartiestolawfullyinspectpayloadsatall.Thesecond assumptionimposesaheavyoperationalload-commercial deviceswillneedrepeatedupdatestostayaheadofregular (orsimplygratuitous)changesineveryapplication’spacket payloadformats. Theresearchcommunityhasrespondedbyinvestigating classiﬁcationschemescapableof inferringapplication-level usagepatternswithoutdeepinspectionofpacketpayloads. Newerapproachesclassifytrafﬁcbyrecognisingstatistical patternsinexternallyobservableattributesofthetrafﬁc(such astypicalpacketlengthsandinter-packetarrivaltimes).Their ultimategoaliseitherclusteringIPtrafﬁcﬂowsintogroups thathavesimilartrafﬁcpatterns,orclassifyingoneormore applicationsofinterest. Anumberofresearchersarelookingparticularlycloselyat theapplicationofMachineLearning(ML)techniques(asub- setofthewiderArtiﬁcialIntelligencediscipline)toIPtrafﬁc classiﬁcation.TheapplicationofMLtechniquesinvolvesa numberofsteps.First, features aredeﬁnedbywhichfutureun- knownIPtrafﬁcmaybeidentiﬁedanddifferentiated.Features areattributesofﬂowscalculatedovermultiplepackets(such asmaximumorminimumpacketlengthsineachdirection, ﬂowdurationsorinter-packetarrivaltimes).ThentheML classiﬁeristrainedtoassociatesetsoffeatureswithknown trafﬁcclasses(creatingrules),andapplytheMLalgorithm toclassifyunknowntrafﬁcusingpreviouslylearnedrules. EveryMLalgorithmhasadifferentapproachtosortingand 1553-877X/08/$25.00 c 2008IEEENGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING57 prioritisingsetsoffeatures,whichleadstodifferentdynamic behaviorsduringtrainingandclassiﬁcation.Inthispaperwe providetherationaleforIPtrafﬁcclassiﬁcationinIPnetworks, reviewthestate-of-the-artapproachestotrafﬁcclassiﬁcation, andthenreviewandcritiqueemergingML-basedtechniques forIPtrafﬁcclassiﬁcation. Therestofthispaperisorganisedasfollows.SectionII outlinestheimportanceofIPtrafﬁcclassiﬁcationinopera- tionalnetworks,introducesanumberofmetricsforassess- ingclassiﬁcationaccuracy,anddiscussesthelimitationsof traditionalport-andpayload-basedclassiﬁcation.SectionIII providesbackgroundinformationaboutMLandhowitcanbe appliedinIPtrafﬁcclassiﬁcation.Thesectionalsodiscussesa numberofkeyrequirementsfortheemploymentofML-based classiﬁersinoperationalIPnetworks.SectionIVreviewsthe signiﬁcantworksinthisﬁeld(predominantlyfrom2004to early2007).Thesectioniswrappedupwithaqualitative discussionoftheextenttowhichthereviewedworksmeet therequirementsspeciﬁedinsectionIII.SectionVconcludes thepaperwithsomeﬁnalremarksandsuggestionsofpossible futurework. II.APPLICATION CONTEXTFOR MACHINE LEARNING BASED IPTRAFFIC CLASSIFICATION A.TheimportanceofIPtrafﬁcclassiﬁcation TheimportanceofIPtrafﬁcclassiﬁcationmaybeillustrated bybrieﬂyreviewingtoimportantareas-IPqualityofservice (QoS)schemes,andlawfulinterception(LI). Inrespondingtothenetworkcongestionproblem,acom- monstrategyfornetworkprovidersisunder-utilising(over- provisioning)thelinkcapacity.However,thisisnotnecessar- ilyaneconomicsolutionformostISPs.Ontheotherhand, thedevelopmentofotherQoSsolutionssuchasIntServ[8] orDiffServ[9]hasbeenstymiedinpartduetothelack ofQoSsignalingandofaneffectiveservicepricingmech- anism(assuggestedin[10]and[11]).Signalingallowsthe communicationofspeciﬁcQoSrequirementsbetweenInternet applicationsandthenetwork.Apricingmechanismisneeded todifferentiatecustomerswith differentneedsandchargefor theQoSthattheyreceive.Italsoactsasacostrecovery mechanismandprovidesrevenuegenerationfortheISPsto compensatefortheireffortsinprovidingQoSandmanaging resourceallocation. AllQoSschemeshavesomedegreeofIPtrafﬁcclassiﬁ- cationimplicitintheirdesign.DiffServassumesthatedge routerscanrecogniseanddifferentiatebetweenaggregate classesoftrafﬁcinordertosettheDiffServcodepoint(DSCP) onpacketsenteringthenetworkcore.IntServpresumesthat routersalongapathareabletodifferentiatebetweenﬁnely grainedtrafﬁcclasses(andhistoricallyhaspresumedtheuse ofpacketheaderinspectionto achievethisgoal).Trafﬁcclas- siﬁcationalsohasthepotentialtosupportclass-basedInternet QoScharging.Furthermore,real-timetrafﬁcclassiﬁcationis thecorecomponentofemergingQoS-enabledproducts[12] andautomatedQoSarchitectures[4][13]. Trafﬁcclassiﬁcationisalsoanimportantsolutionforthe emergingrequirementthatISPnetworkshavetoprovideLI capabilities.GovernmentstypicallyimplementLIatvarious Fig.1.EvaluationMetrics levelsofabstraction.Inthetelephonyworldalawenforcement agencymaynominatea‘personofinterest’andissueawarrant forthecollectionofinterceptinformation.Theinterceptmay behigh-levelcallrecords(whocalledwhoandwhen)or low-level‘tapping’oftheaudiofromactualphonecallsin progress.IntheISPspace,trafﬁcclassiﬁcationtechniques offerthepossibilityofidentifyingtrafﬁcpatterns(which endpointsareexchangingpacketsandwhen),andidentifying whatclassesofapplicationsarebeingusedbya‘person ofinterest’atanygivenpointintime.Dependingonthe particulartrafﬁcclassiﬁcationscheme,thisinformationmay potentiallybeobtainedwithoutviolatinganyprivacylaws coveringtheTCPorUDPpayloadsoftheISPcustomer’s trafﬁc. B.Trafﬁcclassiﬁcationmetrics Akeycriteriononwhichtodifferentiatebetweenclassiﬁ- cationtechniquesispredictiveaccuracy(i.e.,howaccurately thetechniqueormodelmakesdecisionswhenpresentedwith previouslyunseendata).Anumberofmetricsexistwithwhich toexpresspredictiveaccuracy. 1)Positives,negatives,accuracy,precisionandrecall: AssumethereisatrafﬁcclassXinwhichweareinterested, mixedinwithabroadersetofIPtrafﬁc.Atrafﬁcclassiﬁeris beingusedtoidentify(classify)packets(orﬂowsofpackets) belongingtoclassXwhenpresentedwithamixtureof previouslyunseentrafﬁc.Theclassiﬁerispresumedtogive oneoftwooutputs-aﬂow(orpacket)isbelievedtobea memberofclassX,oritisnot. Acommonwaytocharacterizeaclassiﬁer’saccuracyis throughmetricsknownas FalsePositives, FalseNegatives, TruePositives and TrueNegatives.Thesemetricsaredeﬁned asfollows: • FalseNegatives (FN):PercentageofmembersofclassX incorrectlyclassiﬁedasnotbelongingtoclassX. • FalsePositives (FP):Percentageofmembersofother classesincorrectlyclassiﬁedasbelongingtoclassX. • TruePositives (TP):PercentageofmembersofclassX correctlyclassiﬁedasbelongingtoclassX(equivalentto 100%-FN). • TrueNegatives (TN):Percentageofmembersofother classescorrectlyclassiﬁedasnotbelongingtoclassX (equivalentto100%-FP). Figure1illustratestherelationshipsbetweenFN,FP,TP andTN.AgoodtrafﬁcclassiﬁeraimstominimisetheFalse NegativesandFalsePositives. Someworksmakeuseof Accuracy asanevaluationmetric. Itisgenerallydeﬁnedasthepercentageofcorrectlyclassiﬁed58IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 instancesamongthetotalnumberofinstances.Thisdeﬁnition isusedthroughoutthepaperunlessotherwisestated. MLliteratureoftenutilisestwoadditionalmetricsknown as Recall and Precision.Thesemetricsaredeﬁnedasfollows: • Recall:PercentageofmembersofclassXcorrectly classiﬁedasbelongingtoclassX. • Precision:Percentageofthoseinstancesthattrulyhave classX,amongallthoseclassiﬁedasclassX. Ifallmetricsareconsideredtorangefrom0(bad)to100% (optimal)itcanbeseenthatRecallisequivalenttoTP. 2)ByteandFlowaccuracy: Whencomparingliteratureon differentclassiﬁcationtechniquesitisalsoimportanttonote theunitoftheauthor’schosenmetric.Recall,Precision,FN andFPmayallbereportedaspercentagesofbytesorﬂows relativetothetrafﬁcbeingclassiﬁed.Anauthor’schoicehere cansigniﬁcantlyalterthemeaningoftheirreportedaccuracy results. Mostrecentlypublishedtrafﬁcclassiﬁcationstudieshave focusedon ﬂowaccuracy -measuringtheaccuracywithwhich ﬂowsarecorrectlyclassiﬁed,relativetothenumberofother ﬂowsintheauthor’stestand/ortrainingdataset(s).However, somerecentworkhasalsochosentoexpresstheiraccuracy calculationsintermsof byteaccuracy -focusingmoreonhow manybytesarecarriedbythepacketsofcorrectlyclassiﬁed ﬂows,relativetothetotalnumberofbytesintheauthor’stest and/ortrainingdataset(s)(e.g.[14][15]). Ermanetal.in[16]arguethatbyteaccuracyiscrucialwhen evaluatingtheaccuracyoftrafﬁcclassiﬁcationalgorithms. TheynotethatthemajorityofﬂowsontheInternetaresmall andaccountforonlyasmallportionoftotalbytesandpackets inthenetwork(mice ﬂows).Ontheotherhand,themajority ofthetrafﬁcbytesaregeneratedbyasmallnumberoflarge ﬂows(elephant ﬂows).Theygiveanexamplefroma6-month datatracewherethetop(largest)1%ofﬂowsaccountfor over73%ofthetrafﬁcintermsofbytes.Withathreshold todifferentiateelephantandmiceﬂowsof3.7MB,thetop 0.1%ofﬂowswouldaccountfor46%ofthetrafﬁc(inbytes). Presentedwithsuchadataset,aclassiﬁeroptimisedtoidentify allbutthetop0.1%oftheﬂowscouldattaina99.9%ﬂow accuracybutstillresultin46%ofthebytesinthedatasetto bemisclassiﬁed. Whetherﬂowaccuracyorbyteaccuracyismoreimportant willgenerallydependontheclassiﬁer’sintendeduse.For example,whenclassifyingtrafﬁcforIPQoSpurposesitis plausiblethatidentifyingeveryinstanceofashortlivedﬂow needingQoS(suchasa5minute,32Kbit/secphonecalls)isas importantasidentifyinglonglivedﬂowsneedingQoS(such asa30minute,256Kbit/secvideoconference)withbothbeing farmoreimportanttocorrectlyidentifythanthefewﬂowsthat representmulti-hour(and/orhundredsofmegabytes)peerto peerﬁlesharingsessions.Conversely,anISPdoinganalysis ofloadpatternsontheirnetworkmaywellbesigniﬁcantly interestedincorrectlyclassifyingtheapplicationsdrivingthe elephantﬂowsthatcontributeadisproportionatenumberof packetsacrosstheirnetwork. C.Limitationsofpacketinspectionfortrafﬁcclassiﬁcation TraditionalIPtrafﬁcclassiﬁcationreliesontheinspection ofapacket’sTCPorUDPportnumbers(portbasedclas- siﬁcation),orthereconstructionofprotocolsignaturesinits payload(payloadbasedclassiﬁcation).Eachapproachsuffers fromanumberoflimitations. 1)PortbasedIPtrafﬁcclassiﬁcation: TCPandUDPpro- videforthemultiplexingofmultipleﬂowsbetweencommon IPendpointsthroughtheuseofportnumbers.Historically manyapplicationsutilisea‘wellknown’portontheirlocal hostasarendezvouspointtowhichotherhostsmayinitiate communication.Aclassiﬁersittinginthemiddleofanetwork needonlylookforTCPSYNpackets(theﬁrststepinTCP’s three-wayhandshakeduringsessionestablishment)toknow theserversideofanewclient-serverTCPconnection.The applicationistheninferredbylookinguptheTCPSYN packet’stargetportnumberintheInternetAssignedNumbers Authority(IANA)’slistofregisteredports[17].UDPuses portssimilarly(thoughwithoutconnectionestablishmentnor themaintenanceofconnectionstate). However,thisapproachhaslimitations.Firstly,someap- plicationsmaynothavetheirportsregisteredwithIANA (forexample,peertopeerapplicationssuchasNapsterand Kazaa)[18].Anapplicationmayuseportsotherthanits well-knownportstoavoidoperatingsystemaccesscontrol restrictions(forexample,non-privilegedusersonunix-like systemsmaybeforcedtorunHTTPserversonportsother thanport80.)Also,insomecasesserverportsaredynamically allocatedasneeded.Forexample,theRealVideostreamer allowsthedynamicnegotiationoftheserverportusedfor thedatatransfer.Thisserverportisnegotiatedonaninitial TCPconnection,whichisestablishedusingthewell-known RealVideocontrolport[19]. MooreandPapagiannaki[20]observednobetterthan70% byteaccuracyforport-basedclassiﬁcationusingtheofﬁcial IANAlist.MadhukarandWilliamson[21]showedthatport- basedanalysisisunabletoidentify30-70%ofInternettrafﬁc ﬂowstheyinvestigated.Senetal.[7]reportedthatthedefault portaccountedforonly30%ofthetotaltrafﬁc(inbytes)for theKazaaP2Pprotocol. InsomecircumstancesIPlayerencryptionmayalsoobfus- catetheTCPorUDPheader,makingitimpossibletoknow theactualportnumbers. 2)PayloadbasedIPtrafﬁcclassiﬁcation: Toavoidtotal relianceonthesemanticsofportnumbers,manycurrent industryproductsutilisestatefulreconstructionofsessionand applicationinformation fromeachpacket’scontent. Senetal.[7]showedthatpayloadbasedclassiﬁcation ofP2Ptrafﬁc(byexaminingthesignaturesofthetrafﬁcat theapplicationlevel)couldreducefalsepositivesandfalse negativesto5%oftotalbytesformostP2Pprotocolsstudied. MooreandPapagiannaki[20]useacombinationofport andpayloadbasedtechniquestoidentifynetworkapplications. Theclassiﬁcationprocedurestartswiththeexaminationofa ﬂow’sportnumber.Ifnowell-knownportisused,theﬂow ispassedthroughtothenextstage.Inthesecondstage,the ﬁrstpacketisexaminedtoseewhetheritcontainsaknown signature.Ifoneisnotfound,thenthepacketisexaminedto seewhetheritcontainsawell-knownprotocol.Ifthesetests fail,theprotocolsignaturesintheﬁrstKByteoftheﬂow arestudied.Flowsremainingunclassiﬁedafterthatstagewill requireinspectionoftheentireﬂowpayload.TheirresultsNGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING59 showthatportinformationby itselfiscapableofcorrectly classifying69%ofthetotalbytes.Includingtheinformation observedintheﬁrstKByteofeachﬂowincreasestheaccuracy toalmost79%.Higheraccuracy(uptonearly100%)canonly beachievedbyinvestigatingtheremainingunclassiﬁedﬂows’ entirepayload. Althoughpayloadbasedinspectionavoidsrelianceonﬁxed portnumbers,itimposessigniﬁcantcomplexityandprocessing loadonthetrafﬁcidentiﬁcationdevice.Itmustbekeptup- to-datewithextensiveknowledgeofapplicationprotocolse- mantics,andmustbepowerfulenoughtoperformconcurrent analysisofapotentiallylargenumberofﬂows.Thisapproach canbedifﬁcultorimpossiblewhendealingwithproprietary protocolsorencryptedtrafﬁc.Furthermoredirectanalysisof sessionandapplicationlayercontentmayrepresentabreachof organisationalprivacypoliciesorviolationofrelevantprivacy legislation. D.Classiﬁcationbasedonstatisticaltrafﬁcproperties Theprecedingtechniquesarelimitedbytheirdependence ontheinferredsemanticsoftheinformationgatheredthrough deepinspectionofpacketcontent(payloadandportnumbers). Newerapproachesrelyontrafﬁc’sstatisticalcharacteristics toidentifytheapplication.Anassumptionunderlyingsuch methodsisthattrafﬁcatthenetworklayerhasstatistical properties(suchasthedistributionofﬂowduration,ﬂowidle time,packetinter-arrivaltimeandpacketlengths)thatare uniqueforcertainclassesofapplicationsandenabledifferent sourceapplicationstobedistinguishedfromeachother. Therelationshipbetweentheclassoftrafﬁcanditsobserved statisticalpropertieshasbeennotedin[22](wheretheauthors analysedandconstructedempiricalmodelsofconnection characteristics-suchasbytes,duration,arrivalperiodicity -foranumberofspeciﬁcTCPapplications),andin[23] (wheretheauthorsanalysedInternetchatsystemsbyfocusing onthecharacteristicsofthetrafﬁcintermsofﬂowduration, packetinter-arrivaltimeand packetsizeandbyteproﬁle). Laterwork(forexample[24][25]and[26])alsoobserved distinctivetrafﬁccharacteristics,suchasthedistributionsof packetlengthsandpacketinter-arrivaltimes,foranumber ofInternetapplications.Theresultsoftheseworkshave stimulatednewclassiﬁcationtechniquesbasedontrafﬁcﬂow statisticalproperties.Theneedtodealwithtrafﬁcpatterns, largedatasetsandmulti-dimensionalspacesofﬂowandpacket attributesisoneofthereasonsfortheintroductionofML techniquesinthisﬁeld. III.BACKGROUNDON MACHINE LEARNINGANDTHE APPLICATIONOF MACHINE LEARNINGIN IPTRAFFIC CLASSIFICATION ThissectionsummariesthebasicconceptsofMachine Learning(ML)andoutlineshowMLcanbeappliedtoIP trafﬁcclassiﬁcation. A.AreviewofclassiﬁcationwithMachineLearning MLhashistoricallybeenknownasacollectionofpowerful techniquesfordataminingandknowledgediscovery,which searchforanddescribeusefulstructuralpatternsindata. In1992Shi[27]noted ‘Oneofthedeﬁningfeaturesof intelligenceistheabilitytolearn. [...].Machinelearningis thestudyofmakingmachinesacquirenewknowledge,new skills,andreorganiseexistingknowledge.’ Alearningmachine hastheabilitytolearnautomaticallyfromexperienceand reﬁneandimproveitsknowledgebase.In1983Simonnoted ‘Learningdenoteschangesinthesystemthatareadaptivein thesensethattheyenablethesystemtodothesametaskor tasksdrawnfromthesamepopulationmoreefﬁcientlyand moreeffectivelythenexttime’ [28]andin2000Wittenand Frankobserved ‘Thingslearnwhentheychangetheirbehavior inawaythatmakesthemperformbetterinthefuture’ [29]. MLhasawiderangeofapplications,includingsearch engines,medicaldiagnosis,textandhandwritingrecognition, imagescreening,loadforecasting,marketingandsalesdi- agnosis,andsoon.AnetworktrafﬁccontrollerusingML techniqueswasproposedin1990,aimingtomaximisecall completioninacircuit-switchedtelecommunicationsnetwork [30];thiswasoneoftheworksthatmarkedthepointat whichMLtechniquesexpandedtheirapplicationspaceinto thetelecommunicationsnetworkingﬁeld.In1994MLwas ﬁrstutilisedforInternetﬂowclassiﬁcationinthecontextof intrusiondetection[31].Itisthestartingpointformuchof theworkusingMLtechniquesinInternettrafﬁcclassiﬁcation thatfollows. 1)InputandoutputofaMLprocess: Broadlyspeaking, MListheprocessofﬁndinganddescribingstructuralpatterns inasupplieddataset. MLtakesinputintheformofa dataset of instances (also knownas examples).Aninstancereferstoanindividual,inde- pendentexampleofthedataset.Eachinstanceischaracterised bythevaluesofits features (alsoknownas attributes or discriminators)thatmeasuredifferentaspectsoftheinstance. (Inthenetworkingﬁeldconsecutivepacketsfromthesame ﬂowmightformaninstance,whilethesetoffeaturesmight includemedianinter-packetarrivaltimesorstandarddeviation ofpacketlengthsoveranumberofconsecutivepacketsin aﬂow.)Thedatasetisultimatelypresentedasamatrixof instancesversusfeatures[29]. Theoutputisthedescriptionoftheknowledgethathas beenlearnt.Howthespeciﬁcoutcomeofthelearningprocess isrepresented(thesyntaxandsemantics)dependslargelyon theparticularMLapproachbeingused. 2)Differenttypesoflearning: WittenandFrank[29]deﬁne fourbasictypesoflearning: • Classiﬁcation(or supervisedlearning) • Clustering(or unsupervisedlearning) • Association • Numericprediction Classiﬁcationlearninginvolvesamachinelearningfroma setofpre-classiﬁed(alsocalledpre-labeled)examples,from whichitbuildsasetofclassiﬁcationrules(a model)toclassify unseenexamples.Clusteringisthegroupingofinstancesthat havesimilarcharacteristicsintoclusters,withoutanyprior guidance.Inassociationlearning,anyassociationbetween featuresissought.Innumericprediction,theoutcometobe predictedisnotadiscreteclassbutanumericquantity.60IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 MostMLtechniquesusedforIPtrafﬁcclassiﬁcationfocus ontheuseofsupervisedandunsupervisedlearning. 3)SupervisedLearning: Supervisedlearningcreates knowledgestructuresthatsupportthetaskofclassifyingnew instancesintopre-deﬁnedclasses[32].Thelearningmachine isprovidedwithacollectionofsampleinstances,pre-classiﬁed intoclasses.Outputofthelearningprocessisaclassiﬁcation modelthatisconstructedbyexaminingandgeneralisingfrom theprovidedinstances. Ineffect,supervisedlearningfocusesonmodelingthe input/outputrelationships.Itsgoalistoidentifyamapping frominputfeaturestoanoutputclass.Theknowledgelearnt (e.g.commonalitiesamongmembersofthesameclassand differencesbetweencompetingones)canbepresentedasa ﬂowchart,adecisiontree,classiﬁcationrules,etc.,thatcanbe usedlatertoclassifyanewunseeninstance. Therearetwomajorphases(steps)insupervisedlearning: • Training:Thelearningphasethatexaminestheprovided data(calledthetrainingdataset)andconstructs(builds) aclassiﬁcationmodel. • Testing (alsoknownasclassifying):Themodelthathas beenbuiltinthetrainingphaseisusedtoclassifynew unseeninstances. Forexample,letTSbeatrainingdataset,thatisasetof input/outputpairs, TS= {,,...,} wherexi isthevectorofvaluesoftheinputfeatures correspondingtothei th instance,andyi isitsoutputclass value.(Thename supervisedlearning comesfromthefact thattheoutputclassesarepre-deﬁnedinthetrainingdataset.) Thegoalofclassiﬁcationcanbeformulatedasfollows:From atrainingdatasetTS,ﬁndafunctionf(x)oftheinputfeatures thatbestpredictstheoutcomeoftheoutputclassyforany newunseenvaluesofx.Theoutputtakesitsvalueinadiscrete set {y1,y2,...,yM} thatconsistsofallthepre-deﬁnedclass values.Thefunctionf(x)isthecoreoftheclassiﬁcationmodel. Themodelcreatedduringtrainingisimprovedifwesi- multaneouslyprovideexamplesofinstancesthatbelongto class(es)ofinterestandinstancesknownto not bemembers oftheclass(es)ofinterest.Thiswillenhancethemodel’slater abilitytoidentifyinstancesbelongingtoclass(es)ofinterest. Thereexistanumberofsupervisedlearningclassiﬁcation algorithms,eachdifferingmainlyinthewaytheclassiﬁcation modelisconstructedandwhatoptimizationalgorithmisused tosearchforagoodmodel.(Examplesincludethesupervised DecisionTreeandNaiveBayesclassiﬁcationalgorithms[33] [29].) 4)Clustering: Classiﬁcationtechniquesusepre-deﬁned classesoftraininginstances.Incontrast,clusteringmethods arenotprovidedwiththisguidance;instead,theydiscovernat- uralclusters(groups)inthedatausinginternalisedheuristics [33]. Clusteringfocusesonﬁndingpatternsintheinputdata.It clustersinstanceswithsimilarproperties(deﬁnedbyaspeciﬁc distancemeasuringapproach,suchasEuclideanspace)into groups.Thegroupsthataresoidentiﬁedmaybeexclusive,so thatanyinstancebelongsinonlyonegroup;ortheymaybe overlapping,whenoneinstancemayfallintoseveralgroups; theymayalsobeprobabilistic,thatisaninstancebelongsto agroupwithacertainprobability.Theymaybehierarchical, wherethereisadivisionofinstancesintogroupsatthetop level,andtheneachofthesegroupsisreﬁnedfurther-even downtothelevelofindividualinstances[29]. Therearethreebasicclusteringmethods:theclassick- meansalgorithm,incrementalclustering,andtheprobability- basedclusteringmethod.Theclassick-meansalgorithmforms clustersinnumericdomains,partitioninginstancesintodis- jointclusters,whileincrementalclusteringgeneratesahierar- chicalgroupingofinstances.Theprobability-basedmethods assigninstancestoclassesprobabilistically,notdeterministi- cally[29]. 5)Evaluatingsupervisedlearningalgorithms: AgoodML classiﬁerwouldoptimiseRecallandPrecision.However,there maybetrade-offsbetweenthesemetrics.Todecidewhich oneismoreimportantorshouldbegivenhigherpriorityone needstotakeintoaccountthecostofmakingwrongdecisions orwrongclassiﬁcations.Thedecisiondependsonaspeciﬁc applicationcontextandonescommercialand/oroperational priorities. Varioustoolsexisttosupportthistrade-offprocess.The receiveroperatingcharacteristic (ROC)curveprovidesaway tovisualizethetrade-offsbetweenTPandFPbyplotting thenumberofTPasafunctionofthenumberofFP(both expressedaspercentageofthetotalTPandFPrespectively). Ithasbeenfoundusefulinanalysinghowclassiﬁersperform overarangeofthresholdsettings[29].AnotheristheNeyman- Pearsoncriterion[34],whichattemptstomaximizeTPgiven aﬁxedthresholdonFP[35]. MostofIPclassiﬁcationworkreviewedlaterinthissurvey donotaddressthecostsoftradingbetweenRecalland Precision. Achallengewhenusingsupervisedlearningalgorithmsis thatboththetrainingandtestingphasesmustbeperformed usingdatasetsthathavebeenpreviouslyclassiﬁed(labeled) 1 .Ideallyonewouldhavealargetrainingset(foroptimal learningandcreationofmodels)andalarge,yetindependent, testingdatasettoproperlyassessthealgorithm’sperformance. (Testingonthetrainingdatasetisbroadlymisleading.Such testingwillusuallyonlyshowthattheconstructedmodel isgoodatrecognisingtheinstancesfromwhichitwas constructed.) Intherealworldweareoftenfacedwithalimitedquantity ofpre-labeleddatasets.Asimpleprocedure(sometimesknown as holdout [29])involvessettingasidesomepart(e.g.two thirds)ofthepre-labeleddatasetfortrainingandtherest(e.g. onethird)fortesting. Inpracticewhenonlysmallorlimiteddatasetsareavailable avariantofholdout,called N-foldcross-validation,ismost commonlyused.ThedatasetisﬁrstsplitintoNapproximately equalpartitions(orfolds).Eachpartition(1/N)inturnisthen usedfortesting,whiletheremainder((N − 1)/N)areused fortraining.TheprocedureisrepeatedNtimessothatinthe end,everyinstancehasbeenusedexactlyoncefortesting.The overallRecallandPrecisionarecalculatedfromtheaverage 1Inthiscontext’labeling’istheprocessofclassifyingthemembersofa datasetusingmanual(human)inspectionoranirrefutableautomatedprocess. Incontrasttoacontrolledtrainingandtestingenvironment,operational classiﬁersdonothaveaccesstopreviouslylabeledexampleﬂows.NGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING61 oftheRecallsandPrecisionsmeasuredfromallNtests.Ithas beenclaimedthatN=10(tenfoldcross-validation)provides agoodestimateofclassiﬁcationperformance[29]. SimplypartitioningthefulldatasetNwaysdoesnotguaran- teethatthesameproportionisusedforanygivenclasswithin thedataset.Afurtherstep,knownas stratiﬁcation,isusually applied-randomlysamplingthedatasetinsuchawaythat eachclassisproperlyrepresentedinbothtrainingandtesting datasets.Whenstratiﬁcationisusedincombinationwithcross- validation,itiscalled stratiﬁedcross-validation.Itiscommon tousestratiﬁedten-foldcross-validationwhenonlylimited pre-labeleddatasetsareavailable. 6)Evaluatingunsupervisedlearningalgorithms: While RecallandPrecisionarecommonmetricstoevaluateclas- siﬁcationalgorithms,evaluatingclusteringalgorithmsismore complicated.Thereareintermediatestepsinevaluatingthe resultingclustersbeforelabelingthemorgeneratingrulesfor futureclassiﬁcation.Givenadataset,aclusteringalgorithm canalwaysgenerateadivision,withitsownﬁndingof structurewithinthedata.Differentapproachescanleadto differentclusters,andevenfor thesamealgorithm,different parametersordifferentorderofinputpatternsmightalterthe ﬁnalresults[36][37]. Therefore,itisimportanttohaveeffectiveevaluationstan- dardsandcriteriatoprovidetheuserswithacertainlevelof conﬁdenceinresultsgeneratedbyaparticularalgorithm,or comparisonsofdifferentalgorithms[38].Criteriashouldhelp toanswerusefulquestionssuchashowmanyclustersare hiddeninthedata,whataretheoptimalnumberofclusters [37],whethertheresultedclustersaremeaningfulorjustan artifactofthealgorithms[38],howonealgorithmperforms comparedtoanother:howeasytheyaretouse,howfastit istobeemployed[36],whatistheintra-clusterquality,how goodisinter-clusterseparation,whatisthecostoflabelingthe clustersandwhataretherequirementsintermsofcomputer computationandstorage. Halkidietal.[37]identifythreeapproachestoinvestigate clustervalidity:externalcriteria,internalcriteriaandrelative criteria.Theﬁrsttwoapproachesarebasedonstatistical hypothesistesting.Externalcriteriaarebasedonsomepre- speciﬁedstructure,whichisknownaspriorinformationon thedata,andusedasastandardtocompareandvalidate theclusteringresults[38].Internalcriteriaapproachevaluates clusteringresultofanalgorithmbasedonexaminingthe internalstructureinheritedfromthedataset.Relativecriteria emphasisesﬁndingthebestclusteringschemethataclustering algorithmcandeﬁneundercertainassumptionsandparame- ters.Thebasicideaistoevaluateaclusteringstructureby comparingittotheonesusingthesamealgorithmbutwith differentparametervalues[39].(Moredetailscanbefoundin [37][38][36][29].) 7)Featureselectionalgorithms: KeytobuildingaML classiﬁerisidentiﬁcationofthesmallestnecessarysetof featuresrequiredtoachieveone’saccuracygoals-aprocess knownas featureselection. Thequalityofthefeaturesetiscrucialtotheperformance ofaMLalgorithm.Usingirrelevantorredundantfeatures oftenleadstonegativeimpactsontheaccuracyofmostML algorithms.Itcanalsomakethesystemmorecomputationally expensive,astheamountofinformationstoredandprocessed riseswiththedimensionalityofafeaturesetusedtodescribing thedatainstances.Consequentlyitisdesirabletoselecta subsetoffeaturesthatissmallinsizeyetretainsessential andusefulinformationabouttheclassesofinterest. Featureselectionalgorithmscanbebroadlyclassiﬁedinto ﬁltermethodorwrappermethod.Filtermethodalgorithms makeindependentassessmentbasedongeneralcharacteristics ofthedata.Theyrelyonacertainmetrictorateandselect thebestsubsetbeforethelearningcommences.Theresults providedthereforeshouldnotbebiasedtowardaparticular MLalgorithm.Wrappermethodalgorithms,ontheotherhand, evaluatetheperformanceofdifferentsubsetsusingtheML algorithmthatwillultimatelybeemployedforlearning.Its resultsarethereforebiasedtowardtheMLalgorithmused. Anumberofsubsetsearchtechniquescanbeused,e.g. Correlation-basedFeatureSelection(CFS)ﬁltertechniques withGreedy,Best-FirstorGeneticsearch.(Additionaldetails canbefoundin[29][40][41][42][43].) B.TheapplicationofMLinIPtrafﬁcclassiﬁcation AnumberofgeneralMLconceptstakeaspeciﬁcmeaning whenappliedtoIPtrafﬁcclassiﬁcation.Forthepurposeof subsequentdiscussionwedeﬁnethefollowingthreeterms relatingtoﬂows: • Flow or Uni-directionalﬂow:Aseriesofpacketssharing thesameﬁve-tuple:sourceanddestinationIPaddresses, sourceanddestinationIP portsandprotocolnumber. • Bi-directionalﬂow:Abi-directionalﬂowisapairofuni- directionalﬂowsgoingintheoppositedirectionsbetween thesamesourceanddestinationIPaddressesandports. • Full-ﬂow:Abi-directionalﬂowcapturedoveritsentire lifetime,fromtheestablishmenttotheendofthecom- municationconnection. AclassusuallyindicatesIPtrafﬁccausedby(orbelonging to)anapplicationorgroupofapplications.Instancesareusu- allymultiplepacketsbelonging tothesameﬂow.Featuresare typicallynumericalattributescalculatedovermultiplepackets belongingtoindividualﬂows.Examplesincludemeanpacket lengths,standarddeviationofinter-packetarrivaltimes,total ﬂowlengths(inbytesand/orpackets),Fouriertransformof packetinter-arrivaltime,andsoon.Aspreviouslynotednotall featuresareequallyuseful,sopracticalMLclassiﬁerschoose thesmallestsetoffeaturesthatleadtoefﬁcientdifferentiation betweenmembersofaclassandothertrafﬁcoutsidetheclass. 1)TrainingandtestingasupervisedMLtrafﬁcclassiﬁer: Figures2,3and4illustratethestepsinvolvedinbuildinga trafﬁcclassiﬁerusingasupervisedlearning(or supervisedML) algorithm.Inthisexample,thetrafﬁcclassiﬁerisintendedto recogniseaparticularclassofapplications(real-timeonline gametrafﬁc)inamongsttheusualmixoftrafﬁcseenonan IPnetwork. Figure2capturestheoveralltrainingandtestingprocess thatresultsinaclassiﬁcationmodel.Asnotedearlier,the optimalapproachtotrainingasupervisedMLalgorithmis toprovidepreviouslyclassiﬁedexamplesoftwotypesofIP trafﬁc:trafﬁcmatchingtheclassoftrafﬁcthatonewisheslater toidentifyinthenetwork(inthiscaseonlinegametrafﬁc),62IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 Fig.2.Trainingandtestingforatwo-classsupervisedMLtrafﬁcclassiﬁer Fig.3.TrainingthesupervisedMLtrafﬁcclassiﬁer andrepresentativetrafﬁcofentirelydifferentapplicationsone wouldexpecttoseeinfuture(oftenreferredtoas interfering trafﬁc). Figure3expandsonthesequenceofeventsinvolvedin trainingasupervisedMLtrafﬁcclassiﬁer.Firstamixof‘trafﬁc traces’arecollectedthatcontainbothinstancesoftheapplica- tionofinterest(e.g.onlinegametrafﬁc)andinstancesofother interferingapplications(suchasHTTP,DNS,SSHand/or peer2peerﬁlesharing).The‘ﬂowstatisticsprocessing’step involvescalculatingthestatisticalpropertiesoftheseﬂows (suchasmeanpacketinter-arrivaltime,medianpacketlength and/orﬂowduration)asapreludetogeneratingfeatures. Anoptionalnextstepis‘datasampling’,designedtonarrow downthesearchspacefortheMLalgorithmwhenfacedwith extremelylargetrainingdatasets(trafﬁctraces).Thesampling stepextractsstatisticsfromasubsetofinstancesofvarious applicationclasses,andpassesthesealongtotheclassiﬁerto beusedinthetrainingprocess. AsnotedinsectionIII-A7,afeatureﬁltering/selectionstep isdesirabletolimitthenumberoffeaturesactuallyused totrainthesupervisedMLclassiﬁerandthuscreatethe classiﬁcationmodel.TheoutputofFigure3isaclassiﬁcation model. Cross-validation(orstratiﬁedcross-validation)maybeused togenerateaccuracyevaluationresultsduringthetraining phase.However,ifthesourcedatasetconsistsofIPpackets collectedatthesametimeandthesamenetworkmeasurement point,thecross-validationresultsarelikelytoover-estimate theclassiﬁer’saccuracy.(Ideallythesourcedatasetwould Fig.4.DataﬂowwithinanoperationalsupervisedMLtrafﬁcclassiﬁer containamixoftrafﬁccollectedatdifferenttimesand measurementpoints,oruseentirelyindependentlycollected trainingandtestingdatasets.) Figure4illustratesdataﬂowwithinanoperationaltrafﬁc classiﬁerusingthemodelbuiltinFigure3.Trafﬁccaptured inreal-timeisusedtoconstructﬂowstatisticsfromwhich featuresaredeterminedandthenfedintotheclassiﬁcation model.(Herewepresumethatthesetoffeaturescalculated fromcapturedtrafﬁcislimitedtotheoptimalfeatureset determinedduringtraining.)Theclassiﬁer’soutputindicates whichﬂowsaredeemedtobemembersoftheclassof interest(asdeﬁnedbythemodel).Certainimplementations mayoptionallyallowthemodeltobeupdatedinreal-time (performingasimilardatasamplingandtrainingprocessto thatshowninFigure3).(Forcontrolledtestingandevaluation purposesofﬂinetrafﬁctracescanbeusedinsteadoflivetrafﬁc capture.) 2)Supervisedversusunsupervisedlearning: Aspreviously noted,IPtrafﬁcclassiﬁcationisusuallyaboutidentifyingtraf- ﬁcbelongingtoknownapplications(classesofinterest)within previouslyunseenstreamsofIPpackets.Thekeychallengeis todeterminetherelationship(s)betweenclassesofIPtrafﬁc (asdifferentiatedbyMLfeatures)andtheapplicationscausing theIPtrafﬁc. SupervisedMLschemesrequireatrainingphasetocement thelinkbetweenclassesandapplications.Trainingrequires a-prioriclassiﬁcation(or labeling)oftheﬂowswithinthe trainingdatasets.Forthisreason,supervisedMLmaybe attractivefortheidentiﬁcationofaparticular(orgroupsof) application(s)ofinterest.However,asnotedinsectionIII-A3, thesupervisedMLclassiﬁerworksbestwhentrainedon examplesofalltheclassesitexpectstoseeinpractice. Consequently,itsperformancemaybedegradedorskewed ifnottrainedonarepresentativemixoftrafﬁcorthenetwork link(s)beingmonitoredstartseeingtrafﬁcofpreviouslyun- knownapplications.(Forexample,Parketal.[44]showedthat accuracyissensitivetosite-dependenttrainingdatasets,while Ermanetal.[45]showeddifferentaccuracyresultsbetween thetwodatatracesstudiedforthesameMLalgorithms.) WhenevaluatingsupervisedMLschemesinanoperational contextitisworthwhileconsideringhowtheclassiﬁerwillbeNGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING63 suppliedwithadequatesupervisedtrainingexamples,whenit willbenecessarytore-train,andhowtheuserwilldetecta newtypeofapplications. ItmightappearthatoneadvantageofunsupervisedML schemesistheautomaticdiscoveryofclassesthroughthe recognitionof‘natural’patterns(clusters)inthedataset. However,resultingclustersstillneedtobelabeled(forex- ample,throughdirectinspectionbyahumanexpert)inorder thatnewinstancesmaybeproperlymappedtoapplications. (Arelatedbeneﬁtisthattrafﬁcfrompreviouslyunknown applicationsmaybedetectedbynotingwhennewclusters emerge-sometimestheemergenceofnewapplicationﬂows isnoteworthyevenbeforetheactualidentityoftheapplication hasbeendetermined.) AnotherissueforunsupervisedMLschemesisthatclusters donotnecessarilymap1:1toapplications.Itwouldbeideal ifthenumberofclustersformedisequaltothenumber ofapplicationclassestobeidentiﬁed,andeachapplication dominatedoneclustergroup.However,inpractice,thenumber ofclustersisoftengreaterthanthenumberofapplication classes[46][47].Oneapplicationmightspreadoverand dominateanumberofclusters,or converselyanapplication mightalsospreadoverbutnotdominateanyoftheclusters. Mappingbackfromaclustertoasourceapplicationcan becomeagreatchallenge. WhenevaluatingunsupervisedMLschemesinanopera- tionalcontextitisworthwhileconsideringhowclusterswill belabeled(mappedtospeciﬁcapplications),howlabelswill beupdatedasnewapplicationsaredetected,andtheoptimal numberofclusters(balancingaccuracy,costoflabelingand labellookup,andcomputationalcomplexity). C.Challengesforoperationaldeployment SectionII-AnotedsomeimportantIPtrafﬁcclassiﬁcation scenarioswhereclassiﬁcationmustnormallyoccur asthetraf- ﬁcisﬂowing (orwithinsomefairlyshortperiodoftimeafter thetrafﬁcoccurred).Thiscreatessomeparticularrequirements fortimelyclassiﬁcationastrafﬁcisintransitacrossanetwork. 1)Timelyandcontinuousclassiﬁcation: A timely classiﬁer shouldreachitsdecisionusingasfewpacketsaspossible fromeachﬂow(ratherthanwaitinguntileachﬂowcompletes beforereachingadecision).Reducingthenumberofpackets requiredforclassiﬁcationalsoreducesthememoryrequiredto bufferpacketsduringfeaturecalculations.Thisisanimportant considerationforsituationswheretheclassiﬁeriscalculating featuresfor(tensof)thousandsofconcurrentﬂows.Depend- ingonthebusinessreasonforperformingclassiﬁcation,it maybeunacceptabletosampletheavailableﬂowsinorder toreducethememoryconsumption.Instead,oneaimstouse lesspacketsfromeachﬂow. However,itisnotsufﬁcienttosimplyclassifybasingon theﬁrstfewpacketsofaﬂow.Forexample,maliciousattacks mightdisguisethemselveswiththestatisticalpropertiesof atrustedapplicationearlyintheirﬂow’slifetime.Orthe classiﬁeritselfmighthavebeenstarted(orrestarted)whilst hundredsorthousandsofﬂowswerealreadyactivethrougha networkmonitoringpoint(therebymissingthestartsofthese activeﬂows).Consequentlytheclassiﬁershouldideallyper- form continuous classiﬁcation-recomputingitsclassiﬁcation decisionthroughoutthelifetimeofeveryﬂow. TimelyandcontinuousMLclassiﬁcationmustalsoaddress thefactthatmanyapplicationschangetheirstatisticalproper- tiesovertime,yetaﬂowshouldideallybecorrectlyclassiﬁed asbeingthesameapplicationthroughouttheﬂow’slifetime. 2)Directionalneutrality: Applicationﬂowsareoftenas- sumedtobebi-directional,andtheapplication’sstatistical featuresarecalculatedseparatelyintheforwardandreverse directions.Manyapplications(suchasmultiplayeronline gamesorstreamingmedia)exhibitdifferent(asymmetric) statisticalpropertiesintheclient-to-serverandserver-to-client directions.Consequently,theclassiﬁermusteither‘know’the directionofapreviouslyunseenﬂow(forexample,which endistheserverandwhichistheclient)orbetrainedto recogniseanapplicationofinterestwithoutrelyingonexternal indicationsofdirectionality. Inferringtheserverandclientendsofaﬂowisfraught withpracticaldifﬁculties.Asareal-worldclassiﬁershould notpresumethatithasseentheﬁrstpacketofeveryﬂow currentlybeingevaluated,itcannotbesurewhethertheﬁrst packetitsees(ofanynewbi-directionalﬂowofpackets)is headinginthe‘forward’or‘reverse’direction.Furthermore, thesemanticsoftheTCPorUDPportﬁeldsshouldbe consideredunreliable(eitherduetoencryptionobscuringthe realvalue,ortheapplicationusingunpredictableports),so itbecomesdifﬁculttojustifyusing‘wellknown’server-side portnumberstoinferaﬂow’sdirection. 3)Efﬁcientuseofmemoryandprocessors: Anotherim- portantcriteriaforoperationaldeploymentistheclassiﬁcation system’suseofcomputationalresources(suchasCPUtime andmemoryconsumption).Theclassiﬁer’sefﬁciencyimpacts ontheﬁnancialcosttobuild,purchaseandoperatelargescale trafﬁcclassiﬁcationsystems.Aninefﬁcientclassiﬁermaybe inappropriateforoperationaluseregardlessofhowquicklyit canbetrainedandhowaccuratelyitidentiﬁesﬂows. MinimisingCPUcyclesandmemoryconsumptionisadvan- tageouswhethertheclassiﬁerisexpectedtositinthemiddle ofanISPnetwork(whereasmallnumberoflarge,powerful devicesmayseehundredsofthousandsofconcurrentﬂowsat multi-gigabitrates)orouttowardtheedges(wherethetrafﬁc loadissubstantiallysmaller,buttheCPUpowerandmemory resourcesofindividualdevicesarealsodiminished). 4)PortabilityandRobustness: Amodelmaybeconsidered portable ifitcanbeusedinavarietyofnetworklocations,and robust ifitprovidesconsistentaccuracyinthefaceofnetwork layerperturbationssuchaspacketloss,trafﬁcshaping,packet fragmentation,andjitter.Aclassiﬁeralsoisrobustifitcan efﬁcientlyidentifytheemergenceofnewtrafﬁcapplications. IV.AREVIEWOF MACHINE LEARNINGBASED IP TRAFFIC CLASSIFICATION TECHNIQUES Inthissectionwecreatefourbroadcategoriestoreviewsig- niﬁcantworkspublishedonML-basedIPtrafﬁcclassiﬁcation todatein: • ClusteringApproaches:Workswhosemainapproach centersaroundunsupervisedlearningtechniques. • SupervisedLearningApproaches:Workswhosemain approachcentersaroundsupervisedlearningtechniques.64IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 • HybridApproaches:Workswhoseapproachcombine supervisedandunsupervisedlearningtechniques. • ComparisonsandRelatedWork:Worksthatcompareand contrastdifferentMLalgorithms,orconsidernon-ML approachesthatcouldbeconsideredinconjunctionwith MLapproaches. Thekeypointsofeachworkarediscussedinthefollowing subsectionsandsummarisedinTableI,IIandIII. A.ClusteringApproaches 1)FlowclusteringusingExpectationMaximization: In 2004McGregoretal.[48]publishedoneoftheearliest workthatappliedMLinIPtrafﬁcclassiﬁcationusingthe ExpectationMaximizationalgorithm[49].Theapproach clusterstrafﬁcwithsimilarobservablepropertiesintodifferent applicationtypes. TheworkstudiesHTTP,FTP,SMTP,IMAP,NTPandDNS trafﬁc.Packetsina6-hourAuckland-VItracearedivided intobi-directionalﬂows.Flowfeatures(listedinTableI)are calculatedonafull-ﬂowbasis.Flowsarenottimedout,except whentheyexceedthelengthofthetrafﬁctrace. Basedonthesefeatures,theEMalgorithmisusedtogroup thetrafﬁcﬂowsintoasmallnumberofclustersandthen createclassiﬁcationrulesfromtheclusters.Fromtheserules, featuresthatdonothavealargeimpactontheclassiﬁcationare identiﬁedandremovedfromtheinputtothelearningmachine andtheprocessisrepeated.Thework’simplementationof EMhasanoptiontoallowthenumberofclusterstobefound automaticallyviacross-validation.Theresultingestimationof performancewasusedtoselectthebestcompetingmodel (hencethenumberofclusters). Thealgorithmwasfoundtoseparatetrafﬁcintoanumberof classesbasedontrafﬁctype(bulktransfer,smalltransactions, multipletransactionsetc.).However,currentresultsarelimited inidentifyingindividualapplicationsofinterest.Nonetheless, itmaybesuitabletoapplythisapproachastheﬁrststepof classiﬁcationwherethetrafﬁciscompletelyunknown,and possiblygivesahintonthegroupofapplicationsthathave similartrafﬁccharacteristics. 2)AutomatedapplicationidentiﬁcationusingAutoClass: TheworkofZanderetal.[46],proposedin2005,uses AutoClass[50],whichisanunsupervisedBayesianclassiﬁer, usingtheEMalgorithmtodeterminethebestclusterssetfrom thetrainingdata.EMisguaranteedtoconvergetoalocal maximum.Toﬁndtheglobalmaximum,autoclassrepeatsEM searchesstartingfrompseudo-randompointsintheparameter space.Themodelwiththeparametersethavingthehighest probabilityisconsideredthebest. Autoclasscanbepreconﬁguredwiththenumberofclasses (ifknown)oritcantrytoestimatethenumberofclassesitself. Firstlypacketsareclassiﬁedintobi-directionalﬂowsandﬂow characteristicsarecomputedusingNetMate[51].Anumberof featuresarecalculatedforeachﬂow,ineachdirection(listed inTableI).Featurevaluesarecalculatedonafull-ﬂowbasis. Aﬂowtimeoutof60secondsisused. Samplingisusedtoselectasubsetoftheﬂowdatafor thelearningprocess.Oncetheclasses(clusters)havebeen learnt,newﬂowsareclassiﬁed.Theresultsofthelearning andclassiﬁcationareexportedforevaluation.Theapproach isevaluatedbasedonrandomsamplesofﬂowsobtained fromthree24-hourtrafﬁctraces(Auckland-VI,NZIX-IIand Leipzig-IItracesfromNLANR[52]). Takingafurtherstepfrom[48],theworkproposedamethod forclusterevaluation.Ametriccalledintra-classhomogeneity, H,forassessingthequalityoftheresultingclassesand classiﬁcationisintroduced.Hofaclassisdeﬁnedasthe largestfractionofﬂowsononeapplicationintheclass.The overallhomogeneityHofasetofclassesisthemeanofthe classhomogeneities.ThegoalistomaximiseHtoachievea goodseparationbetweendifferentapplications. Theresultshaveshownthatsomeseparationbetweenthe differentapplicationscanbeachieved,especiallyforcertain particularapplications(suchasHalf-Lifeonlinegametrafﬁc) incomparisonwiththeothers. Withdifferentsetsoffeatures used,theauthorsshowthatHincreaseswiththeincreasein numberoffeaturesused.Hreachesamaximumvalueof between85%and89%,dependingonthetrace.However, theworkhasnotaddressedthetrade-offsbetweennumber offeaturesusedandtheirconsequencesofcomputational overhead. Tocomputetheaccuracyforeachapplicationtheauthors mapeachclasstotheapplicationthatisdominatingtheclass (byhavingthelargestfractionofﬂowsinthatclass).The authorsusedaccuracy(Recall)asanevaluationmetric.Median accuracyis ≥ 80% forallapplicationsacrossalltraces. However,therearesomeexceptionalcases,forexample,for theNapsterapplicationthereisonetracewhereitisnot dominatinganyoftheclasses(hencetheaccuracyis0%). TheresultsalsoshowthatFTP,WebandTelnetseemtohave themostdiversetrafﬁccharacteristicsandarespreadacross manyclasses. Ingeneral,althoughthemappingofclasstoapplication showspromisingresultsinseparatingthedifferentappli- cations,thenumberofclassesresultedfromtheclustering algorithmishigh(approximately50classesfor8selected applications).Forclassandapplicationmapping,itisa challengetoidentifyapplicationsthatdonotdominateany oftheclasses. 3)TCP-basedapplicationidentiﬁcationusingSimpleK- Means: In2006Bernailleetal.[53]proposedatechnique usinganunsupervisedML(SimpleK-Means)algorithmthat classiﬁeddifferenttypesofTCP-basedapplicationsusingthe ﬁrstfewpacketsofthetrafﬁcﬂow. Incontrasttothepreviouslypublishedwork,themethod proposedinthispaperallowedearlydetectionoftrafﬁcﬂow bylookingatonlytheﬁrstfewpacketsofaTCPﬂow. Theintuitionbehindthemethodisthattheﬁrstfewpackets capturetheapplication’snegotiationphase,whichisusually apre-deﬁnedsequenceofmessagesandisdistinctamong applications. Thetrainingphaseisperformedofﬂine.Theinputisaone- hourpackettraceofTCPﬂowsfromamixofapplications. Flowsaregroupedintoclustersbasedonthevaluesoftheir ﬁrstPpackets.FlowsarerepresentedbypointsinaP- dimensionalspace,whereeachpacketisassociatedwitha dimension;thecoordinateondimensionpisthesizeofpacket pintheﬂow.Bi-directionalﬂowsareused.PacketssentbytheNGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING65 TCP-serveraredistinguishedfrompacketssentbytheTCP- clientbyhavinganegativecoordinate. SimilaritybetweenﬂowsismeasuredbytheEuclidean distancebetweentheirassociated spatialrepresentations.After naturalclustersareformed,themodelingstepdeﬁnesarule toassignanewﬂowtoacluster.(Thenumberofclusters waschosenbytrialwithdifferentnumberofclustersfor theK-meansalgorithm).Theclassiﬁcationruleissimple: theEuclideandistancebetween thenewﬂowandthecentre ofeachpre-deﬁnedclusteriscomputed,andthenewﬂow belongstotheclusterforwhichthedistanceisaminimum. Thetrainingsetalsoconsistsofpayload,sothatﬂowsin eachclustercanbelabeledwithitssourceapplication.The learningoutputconsistsoftwosets:onewiththedescription ofeachcluster(thecentreofthecluster)andtheotherwiththe compositionofitsapplications.Bothsetsareusedtoclassify ﬂowsonline. Intheclassiﬁcationphase,packetsareformedintoabi- directionalﬂow.ThesizesoftheﬁrstPpacketsofthe connectionarecapturedandusedtomapthenewﬂowto aspatialrepresentation.Aftertheclusterisdeﬁned,theﬂow isassociatedwiththeapplicationthatisthemostprevalentin thecluster. Theresultsshowthatmorethan80%oftotalﬂowsare correctlyidentiﬁedforanumberofapplicationsbyusingthe ﬁrstﬁvepacketsofeachTCPﬂow.Oneexceptionalcaseis thePOP3application.Theclassiﬁerlabels86%ofPOP3ﬂows asNNTPand12.6%asSMTP,becausePOP3ﬂowsalways belongtoclusterswherePOP3isnotthedominantapplication. Theresultsofthisworkareinspiringforearlydetection ofthetrafﬁcﬂow.However,itassumesthattheclassiﬁercan alwayscapturethestartofeachﬂow.Theeffectivenessofthe approachwhentheclassiﬁermissestheﬁrstfewpacketsofthe trafﬁcﬂowhasnotbeendiscussedoraddressed.Also,withthe useofunsupervisedalgorithmanditsclassiﬁcationtechnique, theproposalfacesthechallengeofclassifyinganapplication whenitdoesnotdominateanyoftheclustersfound. 4)IdentifyingWebandP2Ptrafﬁcinthenetworkcore: TheworkofErmanetal.[47]inearly2007addressedthe challengeoftrafﬁcclassiﬁcationatthecoreofthenetwork, wheretheavailableinformationabouttheﬂowsandtheir contributorsmightbelimited.Theworkproposedtoclassify aﬂowusingonlyuni-directionalﬂowinformation.While showingthatforaTCPconnection,server-to-clientdirection mightprovidemoreusefulstatisticsandbetteraccuracythan thereversedirection,itmaynotalwaysbefeasibletoobtain trafﬁcinthisdirection.Theyalsodevelopedandevaluatedan algorithmthatcouldestimatemissingstatisticsfromauni- directionalpackettrace. Theapproachproposedmakesuseofclusteringmachine learningtechniqueswithademonstrationofusingtheK- Meansalgorithm.Similartootherclusteringapproaches,Eu- clideandistanceisusedtomeasurethesimilaritybetweentwo ﬂowvectors. Uni-directionaltrafﬁcﬂowsaredescribedbyafull-ﬂow basedfeaturesset(listedinTableII).Possibletrafﬁcclasses includeWeb,P2P,FTP...Forthetrainingphase,itisassumed thatlabelsforalltrainingﬂowsareavailable(manually classiﬁedbasedonpayloadcontentandprotocolsignatures), andaclusterismappedbacktoatrafﬁcclassthatmakes upthemajorityofﬂowsinthatcluster.Anunseenﬂowwill bemappedtothenearestclusterbasedonitsdistancetothe clusters’centroids. Theapproachisevaluatedwithﬂowaccuracyandbyte accuracyasperformancemetrics.Threedatasetsareconsid- ered:datasetscontainingonlyclient-to-serverpackets,data setscontainingonlyserver-to-clientpacket,anddatasets containingarandom mixtureofeachdirection.K-Means algorithmrequiresthenumberofclustersasaninput,ithas beenshownthatbothﬂowandbyteaccuraciesimprovedask increasedfrom25to400.Overall,theserver-to-clientdatasets consistentlygivethebestaccuracy(95%and79%intermsof ﬂowsandbytesrespectively).Withtherandomdatasets,the averageﬂowandbyteaccuracyis91%and67%respectively. Fortheclient-to-serverdatasets,94%oftheﬂowsand57% ofthebytesarecorrectlyclassiﬁed. Thealgorithmtoestimatethemissingﬂowstatisticsisbased onthesyntaxandsemanticsoftheTCPprotocol.Soitonly workswithTCP,notothertransportprotocoltrafﬁc.Theﬂow statisticsaredividedintothreegeneralcategories:duration, numberofbytes,andnumberofpackets.Theﬂowdurationin themissingdirectionisestimatedasthedurationcalculated withtheﬁrstandthelastpacketseenintheobserveddirec- tion.Thenumberofbytestransmittedisestimatedaccording toinformationcontainedinACKspackets.Thenumberof packetssentisestimatedwiththetrackingofthelastsequence numberandacknowledgementnumberseenintheﬂow,with regardstotheMSS.Anumberofassumptionshavebeen made.Forexample,MSSisusedasacommonvalueof 1460bytes,simpleacknowledgmentstrategyofanACK(40- bytedataheaderwithnopayload)foreverydatapacket,and assumingthatnopacketlossandretransmissionoccurred.An evaluationoftheestimationalgorithmisreported,theresults werepromisingforﬂowdurationandbytesestimation,with relativelylargererrorrangefornumberofpacketsestimation. Theworkaddressedaninterestingissueofthepossibility ofusinguni-directionalﬂowstatisticsfortrafﬁcclassiﬁcation andproposedamethodtoestimatethemissingstatistics.A relatedissueofdirectionalityintheuseofbi-directionaltrafﬁc ﬂowswasaddressedintheworkof[54]. B.SupervisedLearningApproaches 1)Statisticalsignature-basedapproachusingNN,LDAand QDAalgorithms: In2004Roughanetal.[18]proposedto usethenearestneighbours(NN),lineardiscriminateanalysis (LDA)andQuadraticDiscriminantAnalysis(QDA)MLalgo- rithmstomapdifferentnetworkapplicationstopredetermined QoStrafﬁcclasses. Theauthorslistanumberofpossiblefeatures,andclassify themintoﬁvecategories: • PacketLevel:e.g.packetlength(meanandvariance,root meansquare) • FlowLevel:ﬂowduration,datavolumeperﬂow,number ofpacketsperﬂow(allwithmeanandvariancevalues) etc.Uni-directionalﬂowisused. • ConnectionLevel:e.g.advertisedTCPwindowsizes, throughputdistributionandthesymmetryoftheconnec- tion.66IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 • Intra-ﬂow/connectionfeatures:e.g.packetinter-arrival timesbetweenpacketsinﬂows. • Multi-ﬂow:e.g.multipleconcurrentconnectionsbetween thesamesetofend-systems. Ofthefeaturesconsidered,thepairofmostvaluewerethe averagepacketlengthandﬂowduration.Thesefeaturesare computedperfull-ﬂow,thenperaggregateofﬂowswithin24- hourperiods(anaggregateisacollectionofstatisticsindexed byserverportandserverIPaddress). Threecasesofclassiﬁcationareconsidered.Thethree-class classiﬁcationlooksatthreetypesofapplication:Bulkdata (FTP-data),Interactive(Telnet),andStreaming(RealMedia); thefour-classclassiﬁcationlooksatfourtypesofapplica- tions:Interactive(Telnet),Bulkdata(FTP-data),Streaming (RealMedia)andTransactional(DNS);andtheseven-class classiﬁcationlooksatsevenapplications:DNS,FTP-data, HTTPS,Kazaa,RealMedia,TelnetandWWW. Theclassiﬁcationprocessisevaluatedusing10-timescross validation.Theclassiﬁcationerrorratesareshowntovary basedonthenumberofclassesittriedtoidentify.Thethree- classclassiﬁcationhasthelowesterrorrate,varyingfrom 2.5%to3.4%fordifferentalgorithms,whilethefour-class classiﬁcationhadtheerrorrateintherangeof5.1%to7.9%, andtheseven-classonehadthehighesterrorrateof9.4%to 12.6%. 2)ClassiﬁcationusingBayesiananalysistechniques: In 2005MooreandZuev[14]proposedtoapplythesupervised MLNaiveBayestechniquetocategoriseInternettrafﬁcby application.Trafﬁcﬂowsinthedatasetusedaremanuallyclas- siﬁed(baseduponﬂowcontent)allowingaccurateevaluation. 248full-ﬂowbasedfeatureswereusedtotraintheclassiﬁer (asummaryislistedinTableI).SelectedtrafﬁcforInternet applicationswasgroupedintodifferentcategoriesforclas- siﬁcation,e.g.bulkdatatransfer,database,interactive,mail, services,www,p2p,attack,gamesandmultimedia. Toevaluatetheclassiﬁer’sperformance,theauthorsused AccuracyandTrust(equivalenttoRecall)asevaluationmet- rics.TheresultsshowedthatwiththesimpleNaiveBayes technique,usingthewholepopulationofﬂowfeatures,ap- proximately65%ﬂowaccuracycouldbeachievedinclassiﬁ- cation.Tworeﬁnementsfortheclassiﬁerwereperformed,with theuseofNaiveBayesKernelEstimation(NBKE)andFast Correlation-BasedFilter(FCBF)methods 2 .Thesereﬁnements helpedtoreducethefeaturespaceandimprovedtheclassiﬁer performancetoaﬂowaccuracybetterthan95%overall.With thebestcombinationtechnique,theTrustvalueforindividual classofapplicationranged,forinstance,from98%forwww, to90%forbulkdatatransfer,toapproximately44%for servicestrafﬁcand55%forP2P. TheworkisextendedwiththeapplicationofBayesian neuralnetworkapproachin[55].Ithasbeendemonstrated thataccuracyisfurtherimprovedcomparetoNaiveBayes 2TheNBKEmethodisageneralisationofNaiveBayes.Itaddressesthe problemofapproximatingeveryfeaturebyanormaldistribution.Instead ofusinganormaldistributionwithparametersestimatedfromthedata,it useskernelestimationmethods.FCBFisafeatureselectionandredundancy reductiontechnique.InFCBF,goodnessofafeatureismeasuredbyits correlationwiththeclassandothergood features.Thatfeaturebecomesgood ifitishighlycorrelatedwiththeclass,yetisnotcorrelatedwithanyother goodfeatures[14] technique.Bayesiantrainedneuralnetworkapproachisable toclassifyﬂowswithupto99%accuracyfordatatrainedand testedonthesameday,and95%accuracyfordatatrained andtestedeightmonthsapart.Thepaperalsopresentsa listoffeatureswiththeirdescriptionsandrankingintheir importance. 3)Real-timetrafﬁcclassiﬁcationusingMultipleSub-Flows features: AsnotedinsectionIII-Ctimelyandcontinuous classiﬁcationisanimportantconstraintforthepracticalem- ploymentofatrafﬁcclassiﬁer.In2006NguyenandArmitage [56]proposedamethodtoaddresstheissuebyproposing classiﬁcationbasedononlythemostrecentNpacketsofa ﬂow-calledaclassiﬁcationslidingwindow.Theuseofasmall numberofpacketsforclassiﬁcationensuresthetimeliness ofclassiﬁcationandreducesthebufferspacerequiredto storepackets’informationfortheclassiﬁcationprocess.The approachdoesnotrequiretheclassiﬁertocapturethestartof eachtrafﬁcﬂow(asrequiredin[53]and[57]).Thisapproach allowsclassiﬁcationtobeinitiatedatanypointintimewhen trafﬁcﬂowsarealreadyinprogress.Itoffersapotentialof monitoringtrafﬁcﬂowduringitslifetimeinatimelymanner withtheconstraintsofphysicalresources. TheworkproposestrainingMLclassiﬁersonmultiplesub- ﬂowsfeatures.First,extracttwoormoresub-ﬂows(ofN packets)fromeveryﬂowthatrepresentstheclassoftrafﬁcone wishestoidentifyinthefuture.Eachsub-ﬂowshouldbetaken fromplacesintheoriginalﬂow havingnoticeablydifferent statisticalproperties(forexample,thestartandmiddleofthe ﬂow).Eachsub-ﬂowwouldresultinasetofinstanceswith featurevaluesderivedfromitsNpackets.ThentraintheML classiﬁerwiththecombinationofthesesub-ﬂowsratherthan theoriginalfullﬂows. ThisoptimisationisdemonstratedusingtheNaiveBayes algorithm.Bi-directionalﬂowswereused.Differenttraining andtestingdatasetswereconstructedfromthetwoseparate month-longtracescollectedduringMayandSeptember2005 atapubliconlinegameserverinAustraliaandtwo24-hour periodscollectedbytheUniversityofTwente,Netherland [58].Withthefeaturesetused(listedinTableI),classiﬁer builtbasedonfull-ﬂowfeaturesisdemonstratedtoperform poorlywhentheclassiﬁermissedthestartofatrafﬁcﬂow. However,withtheapplicationoftheproposedmethod,results showtheclassiﬁermaintainsmorethan95%Recalland98% Precision(ﬂowaccuracy)evenwhenclassiﬁcationisinitiated mid-waythroughaﬂowusingonlyatotalof25packetsin bothdirections. However,theworkhasonlybeendemonstratedwithan exampleofidentifyinganonlinegameapplication(UDP- basedFirstPersonShootergame-EnemyTerritory[59]). Interferencetrafﬁcincludeda rangeofotherInternetappli- cations(Web,DNS,NTP,SMTP,SSH,Telnet,P2P...).The authorsalsosuggestedthepotentialbeneﬁtsofusingclustering algorithmsinautomatingthesub-ﬂowsselectionprocess. 4)Real-timetrafﬁcclassiﬁcationusingMultipleSynthetic Sub-FlowsPairs: AsnotedinsectionIII-Cdirectionalneu- tralityanimportantconstraintforthepracticalemploymentof atrafﬁcclassiﬁer.In2006NguyenandArmitage[54]further extendedtheirworkin[56]toovercomethisproblem.The authorsproposetrainingtheMLclassiﬁerusingstatisticalNGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING67 featurescalculatedovermultipleshortsub-ﬂowsextracted fromfull-ﬂowgeneratedby thetargetapplication and their mirror-imagedreplicasasiftheﬂowisinthereversedirection. TheoptimisationisdemonstratedwhenappliedtotheNaive BayesandDecisionTreealgorithmswithanexampleofiden- tifyingaUDP-basedFirstPersonShootergame-EnemyTer- ritory[59]trafﬁc.Usingthesamedatasetsasspeciﬁedin[56] theydemonstratethatbothclassiﬁersperformpoorlywhen theclassiﬁersaretrainedwithbi-directionalﬂowfeaturesthat makeassumptionsabouttheforwardandbackwarddirection. However,trainingonSyntheticSub-FlowsPairsresultsin signiﬁcantimprovementtoclassiﬁcationperformance(with upto99%Recalland98%Precision(ﬂowaccuracy)forthe exampleapplication)evenwhenclassiﬁcationisinitiatedmid- waythroughaﬂow,withoutpriorknowledgeoftheﬂow’s directionandusingwindowsassmallas25packetslong. 5)GA-basedclassiﬁcationtechniques: In2006Parketal. [60]madeuseoffeatureselectiontechniquebasedonGenetic Algorithm(GA).Usingthesamefeaturesetspeciﬁedin[44] (listedinII),threeclassiﬁersweretestedandcompared:the NaiveBayesianclassiﬁerwithKernelEstimation(NBKE), DecisionTreeJ48andtheReducedErrorPruningTree (REPTree)classiﬁer.Theirresultssuggesttwodecisiontree classiﬁersprovidemoreaccurate classiﬁcationresultsthanthe NBKEclassiﬁer.Theworkalsosuggeststheimpactofusing trainingandtestingdatafromdifferentmeasurementpoints. Earlyﬂowclassiﬁcationisalsobrieﬂymentioned.Accuracy asafunctionofthenumberofpacketsusedforclassiﬁcationis presentedforJ48andREPTreeclassiﬁers.Theﬁrst10packets usedforclassiﬁcationseemstoprovidethemostaccurate result.However,theaccuracyresultisprovidedasoverall result.Itisnotclearhowitwouldbedifferentfordifferent typesofInternetapplications. 6)Simplestatisticalprotocolﬁngerprintmethod: Crottiet al.[61]inearly2007proposedaﬂowclassiﬁcationmechanism basedonthreepropertiesofthecapturedIPpackets:packet length,inter-arrivaltimeandpacketarrivalorder.Theydeﬁned astructurecalled protocolﬁngerprints whichexpressthethree trafﬁcpropertiesinacompactwayandusedanalgorithm basedon normalisedthresholds forﬂowclassiﬁcation. Therearetwophasesintheclassiﬁcationprocess:training andclassifying.Inthetrainingphase,pre-labeledﬂowsfrom theapplicationtobeclassiﬁed(thetrainingdataset)areanal- ysedtobuildtheprotocolﬁngerprints.Aprotocolﬁngerprint isaPDFvector,estimatedfromasetofﬂowsofthesame protocolfromthetrainingdataset.ThePDFi isbuiltonallthe i th pairsofPi (Pi = {si, Δti})wheresi representsthesize ofpacketiand Δti representstheinter-arrivaltimebetween packetiandpacket(i-1).Inordertoclassifyanunknown trafﬁcﬂowgivenasetofdifferentPDFs,theauthorscheck whetherthebehaviouroftheﬂowisstatisticallycompatible withthedescriptiongivenbyatleastoneofthePDFs,and choosewhichPDFdescribesitbetter.An anomalyscore thatgivesavaluebetween0and1isusedtoindicatehow ‘statisticallydistant’anunknownﬂowisfromagivenprotocol PDF.Itshowsthecorrelationbetweentheunknownﬂow’s i th packetandtheapplicationlayerprotocoldescribedby thespeciﬁcPDFused;thehigherthevalue,thehigherthe probabilitythattheﬂowwasgeneratedbythatprotocol. Theirresultsshowﬂowaccuracyofmorethan91%for classifyingthreeapplications:HTTP,SMTPandPOP3,using theﬁrstfewpacketsofeachapplication’strafﬁcﬂow. InasimilarwaytotheworkofBernailleetal.[53] reviewedabove,thisapproachdemonstratesadvancedresults fortimelinessoftheclassiﬁcation.However,ithasthesame limitationinassumingthattheclassiﬁercanalwayscapturethe startofeachﬂow,andisawareofthelocationsofclientand server(forconstructingthePDFofclient-serverandserver- clientdirections).Theeffectivenessoftheapproachwhen theclassiﬁermissestheﬁrstfewpacketsofthetrafﬁcﬂow (assumedtocarrytheprotocolﬁngerprint),orsuffersfrom packetlossandpacketre-orderinghasnotbeenaddressed. C.HybridApproaches Ermanetal.in[62]inearly2007proposeda semi- supervised trafﬁcclassiﬁcationapproachwhichcombinesun- supervisedandsupervisedmethods.Motivationstothepro- posalareduetotwomainreasons:Firstlylabeledexamples arescarceanddifﬁculttoobtain,whilesupervisedlearning methodsdonotgeneralisewellwhenbeingtrainedwith fewexamplesinthedataset.Secondly,newapplicationsmay appearovertime,andnotallofthemareknownasapriori, traditionalsupervisedmethodsmapunseenﬂowinstancesinto oneoftheknownclasses,withouttheabilitytodetectnew typesofﬂows[62]. Toovercomethechallenges,theproposedclassiﬁcation methodconsistsoftwosteps.First,atrainingdatasetconsist- ingoflabeledﬂowscombined withunlabeledﬂowsarefed intoaclusteringalgorithm.Second,theavailablelabeledﬂows areusedtoobtainamappingfromtheclusterstothedifferent knownclasses.Thisstepsallowssomeclusterstoberemained. Tomapaclusterwithlabeledﬂowsbacktoanapplication type,aprobabilisticassignmentisused.Theprobabilityis estimatedbythemaximumlikelihoodestimate, njk nk where njk isthenumberofﬂowsthatwereassignedtoclusterk withlabelj,and nk isthetotalnumberoflabeledﬂowsthat wereassignedtoclusterk.Clusterswithoutanylabeledﬂows assignedtothemarelabeled‘Unknown’asapplicationtype. Finallyanewunseenﬂowwillbeassignedtothenearest clusterwiththedistancemetricchosenintheclusteringstep. Thisnewproposedapproachhaspromisingresults.Prelim- inaryresultshavebeenshownin[62]withtheemploymentof K-Meansclusteringalgorithm.Theclassiﬁerisprovidedwith 64,000unlabeledﬂows.Oncetheﬂowsareclustered,aﬁxed numberofrandomﬂowsineachclusterarelabeled.Results showthatwithtwolabeledﬂowsperclusterandK=400, theapproachresultsin94%ﬂowaccuracy.Theincreasein classiﬁcationaccuracyismarginalwhenﬁveormoreﬂows arelabeledpercluster.Moredetailsresultscanbefoundin [63]. Asclaimedbytheauthors[63]theproposalhasadvantages intermsoffastertrainingtimewithsmallnumberoflabeled ﬂowsmixedwithalargenumberofunlabeledﬂows,being abletohandlepreviouslyunseenapplicationsandthevariation ofexistingapplication’scharacteristics,andthepossibilityof enhancingtheclassiﬁer’sperformancebyaddingunlabeled ﬂowsforiterativeclassiﬁertraining.However,anevaluation68IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 oftheseadvantageshasnotbeendemonstratedinthecurrent paper. D.ComparisonsandRelatedWork 1)Comparisonofdifferentclusteringalgorithms: In2006 Ermanetal.[45]comparedthreeunsupervisedclusteringalgo- rithms:K-Means,DBSCANandAutoClass.Thecomparison isperformedontwoempiricaldatatraces:onepublictrace fromtheUniversityofAuckland andoneself-collectedtrace fromtheUniversityofCalgary. Theeffectivenessofeachalgorithmisevaluatedusing over- allaccuracy andthenumberofclustersitproduces. Overall accuracy measurementdetermineshowwelltheclustering algorithmisabletocreateclustersthatcontainonlyasingle trafﬁccategory.Aclusterislabeledbythetrafﬁcclassthat makesupthemajorityofitstotalconnections(bi-directional trafﬁcﬂows).Anyconnectionthathasnotbeenassigned toaclusterislabeledasnoise.Thenoverallaccuracyis determinedbytheportionofthetotalTPforallclustersout oftotalnumberofconnectionstobeclassiﬁed.Likeanyother clusteringalgorithms,thenumberofclustersproducedbya clusteringalgorithmisanimportantevaluationfactorasit affectstheperformanceofthealgorithminclassiﬁcationstage. TheirresultsshowthattheAutoClassalgorithmproduces thebestoverallaccuracy.Onaverage,AutoClassis92.4% and88.7%accurateintheAucklandandCalgarydatasets respectively.Itproducesonaverageof167clustersforthe Aucklanddataset(forlessthan10groupsofapplications) and247clustersfortheCalgarydataset(for4groupsof applications).ForK-Means,thenumberofclusterscanbe set,theoverallaccuracysteadilyimprovesasthenumberof clusters(K)increases.WhenKisaround100,overallaccuracy is79%and84%onaveragefortheAucklandandCalgary datasetsrespectively.Accuracyisimprovedonlyslightlywith greatervalueofK.DBSCANalgorithmproducesloweroverall accuracy(upto75.6%fortheAucklandand72%forthe Calgarydatasets);however,itplacesthemajorityofthe connectionsinasmallsubsetoftheclusters.Lookingatthe accuracyforparticulartrafﬁcclasscategories,theDBSCAN algorithmhasthehighestprecisionvalueforP2P,POP3and SMTP(lowerthanAutoclassforHTTPtrafﬁc). Theworkmentionsbrieﬂyaboutthecomparisonofmodel buildtime,andhasnotlookedatotherperformanceevaluation measurements,suchasprocessingspeed,CPUandmemory usage,orthetimelinessofclassiﬁcation. 2)Comparisonofclusteringvs.supervisedtechniques: Ermanetal.[64]evaluatetheeffectivenessofsupervised NaiveBayesandclusteringAutoClassalgorithm.Threeac- curacymetricswereusedforevaluation:recall,precisionand overallaccuracy(overallaccuracyisdeﬁnedthesameas[45] reviewedintheprevioussections). ClassiﬁcationmethodusingthesupervisedNaiveBayes algorithmisstraightforward.ForclassiﬁcationusingAuto- Class,onceAutoClasscomesupwiththemostprobableset ofclustersfromthetrainingdata,theclusteringistransformed intoaclassiﬁer.Aclusterislabeledwiththemostcommon trafﬁccategoryoftheﬂowsinit.Iftwoormorecategories aretied,thenalabelischosenrandomlyamongstthetied categorylabels.Anewﬂowisthenclassiﬁedwiththetrafﬁc classlabeloftheclusteritismostsimilarto[64]. Theevaluationwasperformedontwo72-hourdatatraces providedbytheUniversityofAuckland(NLANR).Acon- nectionisdeﬁnedasabi-directionalﬂow.Thefeaturesetis showninTableIII. Thepapershowsthatwiththedatasetusedandnine applicationclasses(HTTP,SMTP,DNS,SOCKS,IRC,FTP control,FTPdata,POP3andLIMEWIRE),AutoClasshasan averageoverallaccuracyof91.2%whereastheNaiveBayes classiﬁerhasanoveralaccuracyof82.5%.AutoClassalso performsbetterintermsofprecisionandrecallforindividual trafﬁcclasses.Onaverage,forNaiveBayes,theprecisionand recallforsixoutofnineclasseswereabove80%;whereas forAutoClass,allclasseshaveprecisionandrecallvalues above80%,sixoutofthenineclasseshaveaverageprecision valuesabove90%,andsevenhaveaveragerecallvaluesabove 90%.However,intermsoftimetakentobuildclassiﬁcation model,AutoClasstakesmuchlongertimethanNaiveBayes algorithm(2070secondsvs.0.06secondsforthealgorithm implementation,dataandequipmentused). TheconclusionthattheunsupervisedAutoClassoutper- formsthesupervisedNaiveBayesintermsofoverallaccuracy mightbecounterintuitivetosomereadersonthesurface. Whilethetestingmethodologyofthepaperissound,the resultsmightbeimpactedbythesizeofthetrainingdata sets(thecurrentworkuses1000samplesperapplication),the speciﬁcdatasetused,howtheNaiveBayesclassiﬁerisbeing trained(singleapplicationclassiﬁcationatatime,ormultiple applicationsclassiﬁcationata time),andthespeciﬁcfeature setused. Anissueofclusteringapproachesisthereal-timeclassiﬁca- tionspeed,asthenumberofclustersresultedfromthetraining phaseistypicallylargerthanthenumberofapplicationclasses. However,thishasnotbeenevaluatedinthepaper. 3)ComparisonofdifferentsupervisedMLalgorithms: Williamsetal.[65]providesinsightsintotheperformance aspectofMLtrafﬁcclassiﬁcation.Theworkslookatanumber ofsupervisedMLalgorithms:NaiveBayeswithDiscretisation (NBD),NaiveBayeswithKernelDensityEstimation(NBK), C4.5DecisionTree,BayesianNetwork,andNaiveBayesTree. Thesealgorithms’computationalperformanceisevaluatedin termsofclassiﬁcationspeed(numberofclassiﬁcationspersec- ond)andthetimetakentobuildtheassociatedclassiﬁcation model. Resultsarecollectedbyexperimentsonthreepublic NLANRtraces.Featuresusedforanalysisincludethefull setof22features,andtwobestreducedfeaturesetsselected bycorrelation-basedfeatureselection(CFS)andconsistency- basedfeatureselection(CON)algorithms.Thefeaturessetis showninTableIII. Theresultsshowthatmostalgorithmsachievehighﬂow accuracywiththefullsetof22features(onlyNBKalgorithm achieves > 80%accuracyandtherestofthealgorithms achievegreaterthan95%accuracy).Withthereducedsetsof 8(CFS)and9(CON)features,theresultsachievedbycross- validationshowonlyslightchangesintheoverallaccuracy comparedtotheuseoffullfeatureset.ThelargestreductionNGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING69 inaccuracywere2-2.5%forNBDandNBKwiththeuseof CONreducedfeatureset. Despitethesimilarityinclassiﬁcationaccuracy,thepaper showssigniﬁcantdifferencesin classiﬁcationcomputational performance.C4.5algorithmwas seenasthefastestalgorithm whenusinganyofthefeatureset(withmaximumof54,700 classiﬁcationspersecondona3.4GHzPentium4workstation runningSUSELinux9.3withWEKAimplementation).Al- gorithmsrankedindescendingorderintermsofclassiﬁcation speedsare:C4.5,NBD,BayesianNetwork,NaiveBayesTree, NBK. Intermsofthemodelbuildtime,NaiveBayesTreetakes signiﬁcantlongertimethantheremainingalgorithms.Algo- rithmsrankedindescendingorderintermsofmodelbuild timeare:NaiveBayesTree,C4.5,BayesianNetwork,NBD, NBK. Resultsofthepaperalsoshowfeaturereductiongreatly improvesperformanceofthealgorithmsintermsofmodel buildtimeandclassiﬁcationspeedsformostalgorithms. 4)ACAS:Classiﬁcationusingmachinelearningtechniques onapplicationsignatures: Haffneretal.[57]in2005pro- posedanapproachforautomatedconstructionofapplication signaturesusingmachinelearningtechniques.Differentfrom theotherworks,thisworkmakesuseoftheﬁrstn-Bytesof adatastreamasfeatures.Thoughithasthesamelimitation withthoseworksthatrequireaccessingtopacketpayload, weincludeitinthesurveyasitisalsomachinelearning based,anditsinterestingresultsmaybeusefulinacomposite machinelearningbasedapproachthatcombinesdifferent informationsuchasstatisticalcharacteristics,contents,and communicationpatterns. Threelearningalgorithms:NaiveBayes,AdaBoostand MaximumEntropyhavebeeninvestigatedinconstructing applicationsignaturesforavariousrangeofnetworkappli- cations:ftpcontrol,smtp,pop3,imap,https,httpandssh. Aﬂowinstanceischaracterised withn-Bytesrepresentedin binaryvalue,andorderedbythepositionoftheByteinthe ﬂowstream.Collectionofﬂowinstanceswithbinaryfeature isusedasinputbythemachinelearningalgorithms. Usingoftheﬁrst64bytesofeachTCPunidirectional ﬂowtheoverallerrorrateisbelow0.51%forallapplications considered.AdaboostandMaximumEntropyprovidebest resultswithmorethan99%ofallﬂowsclassiﬁedcorrectly. Precisionisabove99%forallapplicationsandRecallrateis above94%forallapplicationexceptssh(86.6%)(Thepoor performanceonsshapplicationwassuspectedduetothesmall amountofsampleinstancesinthetrainingdataset). 5)Unsupervisedapproachforprotocolinferenceusingﬂow content: CloselytoHaffneretal.[57]’swork,in2006,Ma etal.[66]introducedandanalysedalternativemechanisms forautomaticidentiﬁcationoftrafﬁc,basedsolelyonﬂow content.Unsupervisedlearningwasappliedinthreedifferent modelingtechniquesforcapturingstatisticalandstructural aspectsofmessagesexchangedinaprotocol,namely product distribution, Markovprocesses,and commonsubstringgraphs (CSG). Differentfromotherworkthatmadeuseofﬂowclassi- ﬁcation,theworkfocusedonprotocolinference,inwhicha protocol wasdeﬁnedas‘apairofdistributionsonﬂows’-one wasabytesequencefromtheinitiatortotheresponderandone wasabytesequencefromtherespondertotheinitiator(which doesnotincludepacket-levelinformationsuchasinter-arrival time,framesizeorheaderﬁelds). Productdistributionmodeltreatseachn-byteﬂowdistri- butionasaproductofnindependentbytedistributions.Each byteoffsetinaﬂowisrepresentedbyitsownbytedistribution thatdescribesthedistributionofbytesatthatoffsetinthe ﬂow.TheMarkovprocessisdescribedasarandomwalk onaweighteddirectedgraph.Thenodesofthegraphsare labeledwithuniquebytevalues.Eachedgeisweightedwith atransitionprobabilitysuchthat,foranynode,thesumofall itsout-edgeweightsis1.Thenextnodeischosenaccording toweightoftheedgefromthecurrentnodetoitsneighbors. Andcommonsubstringgraphscapturethecommonstructural informationabouttheﬂowsfromwhichitisbuilt. Detailedmodeldescriptions,howtoconstructeachmodel, howtomerge,compareandclassifynewinstancesarede- scribedin[66].Overall,theproductdistributionresultedin thelowesttotalmisclassiﬁcationerror(1.68%-4.15%),while Markovprocesseshadthehighest(3.33-9.97%)andCSGsin themiddle(2.08-6.19%). 6)BLINC:Multileveltrafﬁcclassiﬁcationinthedark: Karagiannisetal.[15]developedanapplicationclassiﬁcation methodbasedonthebehavioursofthesourcehostatthe transportlayer,dividedintothreedifferentlevels.Thesocial levelcapturesandanalysestheinteractionsoftheexamined hostwithotherhosts,intermsofthenumbersofthemit communicateswith.Thehost’spopularityandthatofother hostsinitscommunity’scircleareconsidered.Theroleofthe host,inactingasaproviderortheconsumerofaservice, isclassiﬁedatthefunctionallevel.Finally,transportlayer informationisused,suchasthe4-tupleofthetrafﬁc(source anddestinationIPaddresses,andsourceanddestinationports), ﬂowcharacteristicssuchasthetransportprotocol,andthe averagepacketsize. Arangeofapplicationtypeswasstudiedinthiswork, includingweb,p2p,datatransfer,networkmanagementtrafﬁc, mail,chat,mediastreaming, andgaming.Byanalysingthe socialactivitiesofthehost,theauthorsconcludethatamong thehost’scommunities,neighbouringIPsmayofferthesame service(aserverfarm)iftheyusethesameserviceport,exact communitiesmightindicateattacks,whilepartialcommunities maysignifyp2porgamingapplications.Inaddition,most IPsactingasclientshaveaminimumnumberofdestination IPs.Thus,focusingontheidentiﬁcationofthatsmallnumber ofserverscanhelpclientidentiﬁcation,leadingtotheclas- siﬁcationofalargeamountoftrafﬁc.Classiﬁcationatthe functionallevelshowsthatahostislikelytobeprovidinga serviceifduringadurationoftimeitusesasmallnumberof sourceports,normallylessthanorequaltotwoforalloftheir ﬂows.Typicalclientbehaviourisnormallyrepresentedwhen thenumberofsourceportsisequaltothenumberofdistinct ﬂows.Theconsistencyofaveragepacketsizeperﬂowacross allﬂowsattheapplicationlevelissuggestedtobeagood propertyforidentifyingcertainapplications,suchasgaming andmalware. Completenessandaccuracyarethetwometricsusedforthe classiﬁcationapproach.Completenessisdeﬁnedastheratio70IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 ofthenumberofﬂows(bytes)classiﬁedbyBLINCoverthe totalnumberofﬂows(bytes),indicatedbypayloadanalysis. TheresultsshowthatBLINCcanclassify80%to90%trafﬁc ﬂowswithmorethan95%ﬂowaccuracy(70%to90%for byteaccuracy). Thismethodhastogatherinformationfromseveralﬂows foreachhostbeforeitcandecideontheroleofonehost.Such requirementsmightpreventtheemploymentofthismethodin real-timeoperationalnetworks. 7)Pearson’sChi-SquaretestandNaiveBayesclassiﬁer: Bonﬁglioetal.[67]recentlyproposedaframeworkbasedon twotechniquestoidentifySkypetrafﬁcinrealtime.Theﬁrst, basedonPearsonsChi-Squaretest,detectsSkypesﬁngerprint throughanalysisofthemessagecontentrandomnessintro- ducedbytheencryptionprocess.Thesecond,basedonthe NaiveBayestheorem,detectsSkypestrafﬁcfrommessage sizeandarrivalratecharacteristics. Usingtwospeciﬁctestdatasets,theauthors’compared theperformanceofeachtechniquerelativetoclassiﬁcation usingdeep-packetinspection.TheyshowedtheirNaiveBayes techniquetobeeffectiveinidentifyingvoicetrafﬁcoverIP regardlessofsourceapplication.TheirPearsonsChi-Square testeffectivelyidentiﬁedSkypetrafﬁc(includingSkype voice/video/chat/datatrafﬁc)overUDPandallencryptedor compressedtrafﬁcforTCPﬂows.Whenusedincombination thetwotechniquesdetectedSkypevoicetrafﬁc(UDPﬂows) with0%falsepositivesand9.82%falsenegativesforonetest dataset,and0.11%falsepositivesand2.40%falsenegatives fortheother.Thesefalsepositivesratesareanimprovement comparedtoeachtechniquebeingusedindividually.However, thefalsenegativesratesareslightlyworse.Alsoitisimportant tonotethegreatimbalancebetweentheamountofSkype trafﬁccomparedtoothertrafﬁcinthetestdatasets.Theresults shouldalsobeevaluatedintermsofprecisionandrecall,to reﬂecttheclassiﬁers’performancepertrafﬁcclass,insteadof onlytheoverallfalsepositivesandfalsenegatives. Bothtechniquesofferreal-timetrafﬁcclassiﬁcation.The Chi-Squaretechniquelooksattheﬁrstfewbytesofthe message.TheNaiveBayestechniquelooksatthestatistical characteristicsforeachwindowof30consecutivepackets. E.Challengesforoperationaldeployment Wewrapupoursurveywithaqualitativelookattheextent towhichthereviewedworksoverlapSectionIII-C’sadditional constraintsandrequirementsforusingMLtechniquesinside real-timeIPtrafﬁcclassiﬁers. 1)Timelyandcontinuousclassiﬁcation: Mostofthere- viewedworkhasevaluatedtheefﬁcacyofdifferentML algorithmswhenappliedtoentiredatasetsofIPtrafﬁc,trained andtestedoverfull-ﬂowsconsistingofthousandsofpackets (suchas[14][18][46][48][64]and[65]). Some([53]and[61])haveexploredtheperformanceofML classiﬁersthatutiliseonlytheﬁrstfewpacketsofaﬂow,but theycannotcopewithmissingtheﬂow’sinitialpackets.Others ([56])haveexploredtechniquesforcontinuousclassiﬁcation ofﬂowsusingasmallslidingwindowacrosstime,without needingtoseetheinitialpacketsofaﬂow. 2)Directionalneutrality: Theassumptionthatapplication ﬂowsarebi-directional,andtheapplication’sdirectionmay beinferredpriortoclassiﬁcation,permeatesmanyofthe workspublishedtodate([14][48][53][46][68]).Most workhasassumedthattheywillseetheﬁrstpacketofeach bi-directionalﬂow,thatthisinitialpacketisfromaclient toaserver.Theclassiﬁcationmodelistrainedusingthis assumption,andsubsequentevaluationshavepresumedthe MLclassiﬁercancalculatefeatureswiththecorrectsenseof forwardandreversedirection. Asgettingthedirectionwrongwilldegradeclassiﬁcation accuracy,[54]exploresthecreationofclassiﬁermodelsthat donotrelyonexternalindicationsofdirectionality. 3)Efﬁcientuseofmemoryandprocessors: Thereare deﬁnitetrade-offstobemadebetweentheclassiﬁcationper- formanceofaclassiﬁerandtheresourceconsumptionofthe actualimplementation.Forexample,[14]and[55]revealex- cellentpotentialforclassiﬁcationaccuracy.However,theyuse alargenumberoffeatures,manyofwhicharecomputationally challenging.Theoverheadofcomputingcomplexfeatures (suchaseffectivebandwidthbaseduponentropy,orFourier Transformofthepacketinter-arrivaltime)mustbeconsidered againstthepotentiallossofaccuracyifonesimplydidwithout thosefeatures. Williamsetal.[65]providesomepertinentwarningsabout thetrade-offbetweentraining timeandclassiﬁcationspeed. (Forexample,amongﬁveMLalgorithmsstudied,NaiveBayes withKernelEstimationtooktheshortesttimetobuildclassiﬁ- cationmodels,yetperformedslowestintermsofclassiﬁcation speed.) Techniquesfortimelyandcontinuousclassiﬁcationhave tendedtosuggestaslidingwindowoverwhichfeaturesare calculated.Increasingthelengthofthiswindow([56][54] and[57])mightincreaseclassiﬁcationaccuracy.However, dependingontheparticularimplementation(opportunitiesfor pipelining,stepsizewithwhichthewindowslidesacross theincomingpacketstreams,etc.)thismaydecreasethe timelinesswithwhichclassiﬁcationdecisionsaremade(and increasethememoryrequiredtobufferpacketsduringfeature calculations).Mostofthereviewedworkhasnot,todate, closelyinvestigatedthisaspect. 4)PortabilityandRobustness: Noneofthereviewedworks seriouslyconsideredoraddressedtheissueofclassiﬁcation modelportabilitymentionedinsectionIII-C. Noneofthereviewedworkshasaddressedandevaluate theirmodel’srobustnessintermsofclassiﬁcationperformance withtheintroductionofpacketloss,packetfragmentation, delayandjitter.Unsupervisedapproacheshavethepotential todetecttheemergenceofnewtypesoftrafﬁc.However,this issuehasnotbeenevaluatedinmostoftheworks.Itwas brieﬂymentionedin[62]. 5)Qualitativesummary: TableIVprovidesaqualitative summaryofthereviewedworksagainstthefollowingcriteria: • Real-timeClassiﬁcation – No: Theworkmakesuseoffeaturesthatrequireﬂow completiontocompute(e.g.Flowduration,totalﬂow bytescount) – Yes: Theworkrequiresthecaptureofasmallnumber ofpackets/bytesofaﬂowtodoclassiﬁcationNGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING71 TABLEI ASUMMARYOF RESEARCH REVIEWEDIN SECTION IV Work MLAlgorithms Features DataTraces TrafﬁcConsid- ered Classiﬁcation Level McGregoretal.[48] Expectation Maximization • Packetlengthstatistics(min,max,quar- tiles, ...) • Inter-arrivalstatistics • Bytecounts • Connectionduration • Numberoftransitionsbetweentransac- tionmodeandbulktransfermode • Idletime Calculatedonfullﬂows NLANRand Waikatotrace Amixtureof HTTP,SMTP, FTP(control), NTP,IMAP, DNS... Coarse grained(bulk transfer,small transactions, multiple transactions ...) Zanderetal.[46] AutoClass • Packetlengthstatistics(meanandvari- anceinforwardandbackwarddirections) • Inter-arrivaltimestatistics(meanand varianceinforwardandbackwarddirec- tions) • Flowsize(bytes) • Flowduration Calculatedonfull-ﬂows Auckland-VI, NZIX-IIand Leipzig-IIfrom NLANR Half-Life, Napster,AOL, HTTP,DNS, SMTP,Telnet, FTP(data) Finegrained (8applications studied) Roughanetal.[18] Nearest Neighbour, Linear Discriminate Analysisand Quadratic Discriminant Analysis • PacketLevel • FlowLevel • ConnectionLevel • Intra-ﬂow/Connectionfeatures • Muli-ﬂowfeatures Calculatedonfullﬂows Waikatotrace andsection logsfroma commercial streaming services Telnet,FTP (data),Kazaa, RealMedia Streaming, DNS,HTTPS Finegrained (three,fourand sevenclasses ofindividual applications) MooreandZuev[14] Baysian Techniques (NaiveBayes andNaive Bayeswith Kernel Estimation andFast Correlation- BasedFilter method) Totalof248features,amongthemare • Flowduration • TCPport • Packetinter-arrivaltimestatistics • Payloadsizestatistics • Effectivebandwidthbaseduponentropy • Fouriertransformofpacketinter-arrival time Calculatedonfullﬂows Proprietary HandClassiﬁed Traces Alargerange ofDatabase, P2P,Buck, Mail,Services, ...trafﬁc Coarsegrained Barnailleetal.[53] SimpleK- Means Packetlengthsoftheﬁrstfewpacketsofbi- directionaltrafﬁcﬂows Proprietary traces eDonkey, FTP,HTTP, Kazaa,NTP, POP3,SMTP, SSH,HTTPS, POP3S Finegrained (10applications studied) Parketal.[44][44] NaiveBayes withKernel Estimation, Decision TreeJ48and ReducedError PrunningTree • Flowduration • InitialAdvertisedWindowbytes • Numberofactualdatapackets • Numberofpacketswiththeoptionof PUSH • Packetlengths • Advertisedwindowbytes • Packetinter-arrivaltime • Sizeoftotalburstpackets NLANR, USC/ISI, CAIDA WWW, Telnet,Chat (Messenger), FTP,P2P (Kazaa, Gnutella), Multimedia, SMTP,POP, IMAP,NDS, Oracle,X11 N/A(compari- sonwork) NguyenandArmitage [56] Supervised NaiveBayes • Packetlengths(min,max,mean,standard deviation) • Inter-Packetlengthsstatistics(min,max, mean,standarddeviation) • PacketInter-arrivaltimesstatistics(min, max,mean,stddev.) • Calculatedoverasmallnumber(e.g.25 packets)ofconsecutivepackets(classiﬁ- cationwindows)takenatvariouspoints oftheﬂowlifetime-wherethechanges inﬂow’scharacteristicsaresigniﬁcant Traces collectedat anonline gameserver inAustralia andprovided byUniversity ofTwente, Netherland OnlineGame (Enemy Territory) trafﬁc,Others (HTTP, HTTPS,DNS, NTP,SMTP, Telnet,SSH, P2P...) Application speciﬁc(Online Game,UDP based,First PersonShooter, Enemy Territory trafﬁc)72IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 TABLEII ASUMMARYOF RESEARCH REVIEWEDIN SECTION IV(CONTINUED) Work MLAlgorithms Features DataTraces TrafﬁcConsid- ered Classiﬁcation Level NguyenandArmitage [54] NaiveBayes andDecision Treein combination withClustering algorithms forautomated sub-ﬂows selection • Packetlengthsstatistics(min,max,mean, stddev.) • Inter-Packetlengthsstatistics(min,max, mean,stddev.) • PacketInter-arrivaltimesstatistics(min, max,mean,stddev.) • Calculatedoverasmallnumber(e.g.25 packets)ofconsecutivepackets(classiﬁ- cationwindows)takenatvariouspoints oftheﬂowlifetime-wherethechanges inﬂow’scharacteristicsaresigniﬁcant. • Furtherextensionwithsyntheticmirror- ingfeatures. Traces collectedat anonline gameserver inAustralia andprovided byUniversity ofTwente, Netherland OnlineGame (Enemy Territory) trafﬁc,Others (HTTP, HTTPS,DNS, NTP,SMTP, Telnet,SSH, P2P...) Application speciﬁc(Online Game,UDP based,First PersonShooter, Enemy Territory trafﬁc) Ermanetal.[47] K-Means • Totalnumberofpackets • Meanpacketlength • meanpayloadlengthexcludingheaders • Numberofbytestransferred • Flowduration • Meaninter-arrivaltime Self-collected8 1-hourcampus tracesbetween April6-9,2006 Web,P2P,FTP, Others Coarsegrained (29different protocols groupedinto anumberof application categoriesfor studies) Crottietal.[61] Protocol ﬁngerprints (Probability Density Function vectors)and Anomaly score(from protocolPDFs toprotocol ﬁngerprints) • Packetlengths • Inter-arrivaltime • Packetarrivalorder 6-monthself- collectedtraces attheedge gatewayofthe Universityof Bresciadata centrenetwork TCP applications (HTTP,SMTP, POP3,SSH) Finegrained (fourTCP protocols) Haffneretal.[57] NaiveBayes, AdaBoost, Regularized Maximum Entropy Discretebyteencodingof theﬁrstn-bytespay- loadofaTCPunidirectionalﬂow Proprietary FTP(control), SMTP,POP3, IMAP,HTTPS, HTTP,SSH Finegrained Maetal.[66] Unsupervised learning (product distribution, Markov processes, and common substring graphs) Discretebyteencodingof theﬁrstn-bytespay- loadofaTCPunidirectionalﬂow Proprietary FTP(control), SMTP,POP3, IMAP,HTTPS, HTTP,SSH Finegrained Auldetal.[55] BayesianNeu- ralNetwork 246featuresintotal,including: • Flowmetrics(duration,packet-count,to- talbytes) • Packetinter-arrivaltimestatistics • SizeofTCP/IPcontrolﬁelds • Totalpacketsineachdirectionandtotal forbi-directionalﬂow • Payloadsize • Effectivebandwidthbaseduponentropy • Top-tenFouriertransformcomponentsof packetinter-arrivaltimesforeachdirec- tion • NumerousTCP-speciﬁcvaluesderived fromtcptrace(e.g.totalpayloadbytes transmitted,totalnumberofPUSHED packets,totalnumberofACKpackets carryingSACKinformationetc.) Proprietary handclassiﬁed traces Alargerange ofDatabase, P2P,Buck, Mail,Services, Multimedia, Web...trafﬁc CoarsegrainedNGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING73 TABLEIII ASUMMARYOF RESEARCH REVIEWEDIN SECTION IV(CONTINUED) Work MLAlgorithms Features DataTraces TrafﬁcConsid- ered Classiﬁcation Level Williamsetal.[65] Naive Bayeswith Discretisation, NaiveBayes withKernel Estimation, C4.5Decision Tree,Bayesian Networkand NaiveBayes Tree • Protocol • Flowduration • Flowvolumeinbytesandpackets • Packetlength(minimum,mean,maxi- mumandstandarddeviation) • Inter-arrivaltimebetweenpackets(mini- mum,mean,maximumandstandardde- viation) NLANR FTP(data), Telnet,SMTP, DNS,HTTP N/A(Compari- sonwork) Ermanetal.[45] K-Means,DB- SCANandAu- toClass • Totalnumberofpackets • Meanpacketlength • Meanpayloadlengthexcludingheaders • Numberofbytestransfered(ineachdi- rectionandcombined) • Meanpacketinter-arrivaltime NLANRanda self-collected 1-hourtrace fromthe Universityof Calgary HTTP,P2P, SMTP,IMAP, POP3,MSSQL, Other N/A(Compari- sonwork) Ermanetal.[64] NaiveBayes andAutoClass • Totalnumberofpackets • Meanpacketlength(ineachdirectionand combined) • Flowduration • Meandatapacketlength • Meanpacketinter-arrivaltime NLANR HTTP,SMTP, DNS,SOCKS, FTP(control), FTP(data), POP3, Limewire N/A(Compari- sonwork) Bonﬁglioetal.[67] NaiveBayes andPearson’s Chi-Squatetest • Messagesize(thelengthofthemessage encapsulatedintothetransportlayerpro- tocolsegment) • Averageinterpacketgap Twoselfcol- lecteddatasets Skypetrafﬁc Application speciﬁc TABLEIV REVIEWEDWORKINLIGHTOFCONSIDERATIONSFOROPERATIONALTRAFFICCLASSIFICATION Work Real-timeClassiﬁcation FeatureComputation Overhead ClassifyFlowsIn Progress Directionalneutrality McGregoretal.[48] No Average Notaddressed No Zanderetal.[46] No Average Notaddressed No Roughanetal.[18] No Average Notaddressed N/A MooreandZuev[14] No High Notaddressed No Barnailleetal.[53] Yes Low Notaddressed No Parketal.[44] No Average Notaddressed Notclear NguyenandArmitage[56] Yes Average Yes Yes NguyenandArmitage[54] Yes Average Yes Yes Ermanetal.[47] No Average Notaddressed No Crottietal.[61] Yes Average Notaddressed No Haffneretal.[57] Yes Average Notaddressed N/A Maetal.[66] No Average Notaddressed No Auldetal.[55] No High Notaddressed No Williamsetal.[65] N/A Average N/A N/A Ermanetal.[45] N/A Average N/A N/A Ermanetal.[64] N/A Average N/A N/A Bonﬁglioetal.[67] Yes Average Notaddressed Notclear74IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 • FeatureComputationOverhead Low: Theworkmakesuseofasmallnumberof features(e.gsizesoftheﬁrstfewpackets,binary encodedoftheﬁrstfewbytesofaunidirectional ﬂow) – Average: Theworkmakesuseofanaveragesetof features(suchaspacketlengthandinter-arrivaltimes statistics,ﬂowduration,bytescount) – High: Theworkmakesuseofalarge(comparatively withotherworkinthearea)includingcomputa- tionalcomplexfeatures(suchasFouriertransform ofpacketinter-arrivaltime) • ContinuousClassiﬁcation – Notaddressed: Theissuewasnotconsideredinthe work – Yes: Theissuewasconsideredandsolvedinthework • DirectionalNeutrality – No: Theworkmakesuseofbi-directinalﬂowand featurescalculations,butdidnotconsidertheissue – Yes: Theworkmakesuseofbi-directionalﬂow andfeaturecalculations,addressedtheissuesand proposedsolution – N/A: Theworkmakesuseofuni-directionalﬂowand theissueisnotapplicable – Notclear:Notclearlystatedinthepaper V.CONCLUSION Thispapersurveyssigniﬁcantworksintheﬁeldofmachine learningbasedIPtrafﬁcclassiﬁcationduringthepeakperiod of2004toearly2007.Motivatedbyadesiretomoveaway fromport-basedorpayload-basedtrafﬁcclassiﬁcation,itis clearthatMLcanbeappliedwellinthetask.Theuseof anumberofdifferentMLalgorithmsforofﬂineanalysis, suchasAutoClass,ExpectationMaximisation,DecisionTree, NaiveBayesetc.hasdemonstratedhighaccuracy(upto99%) foravariousrangeofInternetapplicationstrafﬁc.Early MLtechniquesreliedonstatic,ofﬂineanalysisofpreviously capturedtrafﬁc.Morerecentworkisbeginningtoaddress therequirementsforpractical,ML-basedreal-timeIPtrafﬁc classiﬁcationinoperationalnetworks.Inthissurveypaper,we haveoutlinedanumberofcriticaloperationalrequirementsfor real-timeclassiﬁersandqualitativelycritiquedthereviewed worksagainsttheserequirements. Thereisstillalotofroomforfurtherresearchintheﬁeld. Whilemostoftheapproachesbuildtheirclassiﬁcationmodels basedonsampledatacollectedat certainpointsoftheInternet, thosemodels’usabilityneedstobecarefullyevaluated.The accuracyevaluatedonthetestdatasetcollectedatthesame pointofmeasurementmightnotbetruewhenbeingappliedin differentpointofmeasurement.Therearestillopenquestions astohowwelltheycanmaintain theirperformanceinthepres- enceofpacketloss,latencyjitter,andpacketfragmentation. EachMLalgorithmmayperformdifferentlytowarddifferent Internetapplications,andmayrequiredifferentparameter conﬁgurations.Theuseofacombinationofclassiﬁcation modelsisworthinvestigating.Parallelprocessingforreal- timeclassiﬁcationattrafﬁcaggregationpointsinthenetwork maybeusefulwhentheclassiﬁersneedtocopewithmillions ofconcurrentﬂowssimultaneously.Andtheapplicationof MLalgorithmsfornewerapplications(suchasSkype,video streaming,voiceoverIPandpeertopeerﬁle-sharing)isstill aninterestingopenﬁeld. Nevertheless,thepromisingresultsofML-basedIPtrafﬁc classiﬁcationmayopenmanynewavenuesforrelatedresearch areas,suchastheapplicationofMLinintrusiondetections, anomalydetectioninuserdataandcontrol,routingtrafﬁc, andbuildingnetworkproﬁlesfor proactivenetworkreal-time monitoringandmanagement.Theclassiﬁcationoftrafﬁcin greynetordarknetnetworksisalsoaninterestingpossible extensionoftheresearchﬁeld. ACKNOWLEDGMENTS Wewouldliketothanktheanonymousreviewersfor theirveryhelpfulcommentsandfeedbackstoimprovethe manuscript. REFERENCES [1]Snort-Thedefactostandardforintrusiondetection/prevention, http://www.snort.org,asofAugust14,2007. [2]Brointrusiondetectionsystem-Brooverview,http://bro-ids.org,asof August14,2007. [3]V.Paxson,“Bro:Asystemfordetectingnetworkintrudersinreal-time,” ComputerNetworks,no.31(23-24),pp.2435–2463,1999. [4]L.Stewart,G.Armitage,P.Branch, andS.Zander,“Anarchitecturefor automatednetworkcontrolofQoSoverconsumerbroadbandlinks,”in IEEEInternationalRegion10Conference(TENCON05),Melbourne, Australia,November2005. [5]F.Baker,B.Foster,andC.Sharp,“Ciscoarchitectureforlawfulintercept inIPnetworks,”InternetEngineeringTaskForce,RFC3924,2004. [6]T.Karagiannis,A.Broido,N.Brownlee,andK.Claffy,“IsP2Pdying orjusthiding?”in Proc.47thannualIEEEGlobalTelecommuni- cationsConference(Globecom2004),Dallas,Texas,USA,Novem- ber/December2004. [7]S.Sen,O.Spatscheck,andD.Wang,“Accurate,scalableinnetwork identiﬁcationofP2Ptrafﬁcusingapplicationsignatures,”in WWW2004, NewYork,NY,USA,May2004. [8]R.Braden,D.Clark,andS.Shenker,“IntegratedservicesintheInternet architecture:anoverview,”RFC1633,IETF,1994. [9]S.Blake,D.Black,M.Carlson,E.Davies,Z.Wang,andW.Weiss,“An architecturefordifferentiatedservices,”RFC2475,IETF,1998. [10]G.Armitage, QualityofServiceInIPNetworks:Foundationsfora Multi-ServiceInternet.MacmillanTechnicalPublishing,April2000. [11]L.Burgstahler,K.Dolzer,C.Hauser,J.Jahnert,S.Junghans,C.Ma- cian,andW.Payer,“Beyondtechnology:themissingpiecesforQoS success,”in RIPQoS’03:Proc.ACMSpecialInterestGrouponData Communication(SIGCOMM)workshoponRevisitingIPQoS.New York,NY,USA:ACMPress,August,2003,pp.121–130. [12]UbicomInc., SolvingPerformanceProblemswithInteractive ApplicationsinaBroadbandEnvironmentusingStreamEngine Technology,http://www.ubicom.com/pdfs/whitepapers/StreamEngine- WP-20041031.pdf,asofAugust14,2007. [13]J.But,N.Williams,S.Zander,L.Stewart,andG.Armitage,“ANGEL- AutomatedNetworkGamesEnhancementLayer,”in Proc.of5thAnnual WorkshoponNetworkandSystemsSupportforGames(Netgames)2006, Singapore,October2006. [14]A.MooreandD.Zuev,“InternettrafﬁcclassiﬁcationusingBayesian analysistechniques,”in ACMInternationalConferenceonMeasure- mentandModelingofComputerSystems(SIGMETRICS)2005,Banff, Alberta,Canada,June2005. [15]T.Karagiannis,K.Papagiannaki,andM.Faloutsos,“Blinc:Multilevel trafﬁcclassiﬁcationinthedark,”in Proc.oftheSpecialInterestGroup onDataCommunicationconference(SIGCOMM)2005,Philadelphia, PA,USA,August2005. [16]J.Erman,A.Mahanti,andM.Arlitt,“Byteme:acaseforbyte accuracyintrafﬁcclassiﬁcation,”in MineNet’07:Proc.3rdannual ACMworkshoponMiningnetworkdata.NewYork,NY,USA:ACM Press,June2007,pp.35–38. [17] InternetAssignedNumbersAuthority(IANA), http://www.iana.org/assignments/port-numbers,asofAugust14,2007.NGUYENandARMITAGE:ASURVEYOFTECHNIQUESFORINTERNETTRAFFICCLASSIFICATIONUSINGMACHINELEARNING75 [18]M.Roughan,S.Sen,O.Spatscheck,andN.Dufﬁeld,“Class-of-service mappingforQoS:Astatisticalsignature-basedapproachtoIPtrafﬁc classiﬁcation,”in Proc.ACM/SIGCOMMInternetMeasurementCon- ference(IMC)2004,Taormina,Sicily,Italy,October2004. [19]H.Schulzrinne,S.Casner,R. Frederick,andV.Jacobson,“RTP:A TransportProtocolforReal-TimeApplications,”RFC1889,IETF,1996. [20]A.MooreandK.Papagiannaki,“Towardtheaccurateidentiﬁcation ofnetworkapplications,”in Proc.PassiveandActiveMeasurement Workshop(PAM2005),Boston,MA,USA,March/April2005. [21]A.MadhukarandC.Williamson,“AlongitudinalstudyofP2Ptrafﬁc classiﬁcation,”in 14thIEEEInternationalSymposiumonModeling, Analysis,andSimulationofComputerandTelecommunicationSystems, September2006. [22]V.Paxson,“Empiricallyderivedanalyticmodelsofwide-areaTCP connections,” IEEE/ACMTrans.Networking,vol.2,no.4,pp.316–336, 1994. [23]C.Dewes,A.Wichmann,andA.Feldmann,“AnanalysisofInternet chatsystems,”in ACM/SIGCOMMInternetMeasurementConference 2003,Miami,Florida,USA,October2003. [24]K.Claffy,“Internettrafﬁccharacterisation,”PhDThesis,Universityof California,SanDiego,1994. [25]T.Lang,G.Armitage,P.Branch, andH.-Y.Choo,“Asynthetictrafﬁc modelforHalf-life,”in Proc.AustralianTelecommunicationsNetworks andApplicationsConference2003ATNAC2003,Melbourne,Australia, December2003. [26]T.Lang,P.Branch,andG.Armitage,“Asynthetictrafﬁcmodelfor Quake3,”in Proc.ACMSIGCHIInternationalConferenceonAdvances incomputerentertainmenttechnology(ACE2004),Singapore,June 2004. [27]Z.Shi, PrinciplesofMachineLearning.InternationalAcademic Publishers,1992. [28]H.Simon,“Whyshouldmachineslearn?”in R.S.Michalski,J.G. Carbonell,andT.M.Mitchell(editors)MachineLearning:AnArtiﬁcial IntelligenceApproach. MorganKaufmann,1983. [29]I.WittenandE.Frank, DataMining:PracticalMachineLearningTools andTechniqueswithJavaImplementations(SecondEdition).Morgan KaufmannPublishers,2005. [30]B.Silver,“Netman:Alearningnetworktrafﬁccontroller,”in Proc.Third InternationalConferenceonIndustrialandEngineeringApplicationsof ArtiﬁcialIntelligenceandExpertSystems,AssociationforComputing Machinery,1990. [31]J.Frank,“Machinelearningandintrusiondetection:Currentandfuture directions,”in Proc.National17thComputerSecurityConference, Washington,D.C.,October1994. [32]Y.ReichandJ.S.Fenves,“Theformationanduseofabstractconcepts indesign,”in Fisher,D.H.andPazzani,M.J.(editors),ConceptForma- tion:KnowledgeandExperienceinUnsupervisedLearning.Morgan Kaufmann,1991. [33]H.D.Fisher,J.M.Pazzani,andP.Langley, ConceptFormation:Knowl- edgeandExperienceinUnsupervisedLearning.MorganKaufmann, 1991. [34]R.Duda,P.Hart,andD.Stork, PatternClassiﬁcation(2ndedition). JWiley-Interscience,2001. [35]O.CarmichaelandM.Hebert,“Shape-basedrecognitionofwiryob- jects,” IEEETrans.PatternAnal.MachineIntell.,vol.26,no.12,pp. 1537–1552,2004. [36]W.Rand,“Objectivecriteriafortheevaluationofclusteringmethods,” J.AmericanStatisticalAssociation,vol.66,no.336,pp.846–850,1971. [37]M.Halkidi,Y.Batistakis,andM.Vazirgiannis,“Clustervaliditymeth- ods:partI,” SIGMODRec.,vol.31,no.2,pp.40–45,2002. [38]R.XuandD.Wunsch,“Surveyofclusteringalgorithms,” IEEETrans. NeuralNetworks,no.Vol.16,Issue3,pp.645–678,May2005. [39]M.Halkidi,Y.Batistakis,andM.Vazirgiannis,“Clusteringvalidity checkingmethods:partII,” SIGMODRec.,vol.31,no.3,pp.19–27, 2002. [40]M.HallandG.Holmes,“Benchmarkingattributeselectiontechniques fordiscreteclassdatamining,” IEEETrans.KnowledgeDataEng., vol.15,no.6,pp.1437–1447,2003. [41]D.Goldberg, GeneticAlgorithmsinSearch,OptimizationandMachine Learning.Boston,MA,USA:Addison-WesleyLongmanPublishing Co.,Inc.,1989. [42]R.KohaviandG.H.John,“Wrappersforfeaturesubsetselection,” ArtiﬁcialIntelligent,vol.97,no.1-2,pp.273–324,1997. [43]P.H.Winston, Artiﬁcialintelligence(2nded.).Boston,MA,USA: Addison-WesleyLongmanPublishingCo.,Inc.,1984. [44]J.Park,H.-R.Tyan,andK.C.-C.J.,“Internettrafﬁcclassiﬁcation forscalableQoSprovision,”in IEEEInternationalConferenceon MultimediaandExpo,Toronto,Ontario, Canada,July2006. [45]J.Erman,M.Arlitt,andA.Mahanti,“Trafﬁcclassiﬁcationusingclus- teringalgorithms,”in MineNet’06:Proc.2006SIGCOMMworkshop onMiningnetworkdata.NewYork,NY,USA:ACMPress,2006,pp. 281–286. [46]S.Zander,T.Nguyen,andG.Armitage,“Automatedtrafﬁcclassiﬁ- cationandapplicationidentiﬁcationusingmachinelearning,”in IEEE 30thConferenceonLocalComputerNetworks(LCN2005),Sydney, Australia,November2005. [47]J.Erman,A.Mahanti,M.Arlitt,andC.Williamson,“Identifyingand discriminatingbetweenwebandpeer-to-peertrafﬁcinthenetworkcore,” in WWW’07:Proc.16thinternationalconferenceonWorldWideWeb. Banff,Alberta,Canada:ACMPress,May2007,pp.883–892. [48]A.McGregor,M.Hall,P.Lorier,andJ.Brunskill,“Flowclusteringusing machinelearningtechniques,”in Proc.PassiveandActiveMeasurement Workshop(PAM2004),AntibesJuan-les-Pins,France,April2004. [49]A.Dempster,N.Laird,andD.Rubin,“Maximumlikelihoodfrom incompletedataviatheEMalgorithm,” J.RoyalStatisticalSociety, vol.30,no.1,1997. [50]P.CheesemanandJ.Stutz,“Bayesianclassiﬁcation(AutoClass):Theory andresults,”in AdvancesinKnowledgeDiscoveryandDataMining, 1996. [51]NetMate,http://sourceforge.net/projects/netmate-meter/,asofAugust 14,2007. [52]TheNationalLaboratoryforAppliedNetworkResearch(NLANR), TrafﬁcMeasurementDataRepository,http://pma.nlanr.net/Special/,as ofAugust14,2007. [53]L.Bernaille,R.Teixeira,I.Akodkenou,A.Soule,andK.Salamatian, “Trafﬁcclassiﬁcationontheﬂy,” ACMSpecialInterestGroupon DataCommunication(SIGCOMM)ComputerCommunicationReview, vol.36,no.2,2006. [54]T.NguyenandG.Armitage,“Syntheticsub-ﬂowpairsfortimelyand stableIPtrafﬁcidentiﬁcation,”in Proc.AustralianTelecommunication NetworksandApplicationConference,Melbourne,Australia,December 2006. [55]T.Auld,A.W.Moore,andS.F.Gull,“Bayesianneuralnetworksfor Internettrafﬁcclassiﬁcation,” IEEETrans.NeuralNetworks,no.1,pp. 223–239,January2007. [56]T.NguyenandG.Armitage,“Trainingonmultiplesub-ﬂowstooptimise theuseofMachineLearningclassiﬁersinreal-worldIPnetworks,” in Proc.IEEE31stConferenceonLocalComputerNetworks,Tampa, Florida,USA,November2006. [57]P.Haffner,S.Sen,O.Spatscheck,andD.Wang,“ACAS:automated constructionofapplicationsignatures,”in MineNet’05:Proceeding ofthe2005ACMSIGCOMMworkshoponMiningnetworkdata. Philadelphia,Pennsylvania,USA:ACMPress,August2005,pp.197– 202. [58]TheUniversityofTwente, TrafﬁcMeasurementDataRepository, http://arch.cs.utwente.nl/projects/m2c/m2c-D15.pdf,asof17thAugust 2007. [59] WolfensteinEnemyTerritory,http://www.enemyterritory.com/,asof17th August2007. [60]J.Park,H.-R.Tyan,andK.C.-C.J.,“GA-BasedInternetTrafﬁcClassi- ﬁcationTechniqueforQoSProvisioning,”in Proc.2006International ConferenceonIntelligentInformationHidingandMultimediaSignal Processing,Pasadena,California,December2006. [61]M.Crotti,M.Dusi,F.Gringoli,andL.Salgarelli,“Trafﬁcclassiﬁcation throughsimplestatisticalﬁngerprinting,” SIGCOMMComput.Commun. Rev.,vol.37,no.1,pp.5–16,2007. [62]J.Erman,A.Mahanti,M.Arlitt,I.Cohen,andC.Williamson,“Semi- supervisednetworktrafﬁcclassiﬁcation,” ACMInternationalConference onMeasurementandModelingofComputerSystems(SIGMETRICS) PerformanceEvaluationReview,vol.35,no.1,pp.369–370,2007. [63]——,“Ofﬂine/realtimenetworktrafﬁcclassiﬁcatioinusingsemi- supervisedlearning,”DepartmentofComputerScience,Universityof Calgary,Tech.Rep.,February2007. [64]J.Erman,A.Mahanti,andM.Arlitt,“Internettrafﬁcidentiﬁcation usingmachinelearningtechniques,”in Proc.of49thIEEEGlobal TelecommunicationsConference(GLOBECOM2006),SanFrancisco, USA,December2006. [65]N.Williams,S.Zander,andG.Armitage,“Apreliminaryperformance comparisonofﬁvemachinelearningalgorithmsforpracticalIPtrafﬁc ﬂowclassiﬁcation,” SpecialInterestGrouponDataCommunication (SIGCOMM)ComputerCommunicationReview,vol.36,no.5,pp.5– 16,2006. [66]J.Ma,K.Levchenko,C.Kreibich,S.Savage,andG.Voelker,“Un- expectedmeansofprotocolinference,”in IMC’06:Proc.6thACM SIGCOMMonInternetmeasurement.RiodeJaneriro,Brazil:ACM Press,October2006,pp.313–326.76IEEECOMMUNICATIONSSURVEYS&TUTORIALS,VOL.10,NO.4,FOURTHQUARTER2008 [67]D.Bonﬁglio,M.Mellia,M.Meo,D.Rossi,andP.Tofanelli,“Revealing Skypetrafﬁc:whenrandomnessplayswithyou,”in SIGCOMM’07: Proc.2007conferenceonApplications,technologies,architectures,and protocolsforcomputercommunications.NewYork,NY,USA:ACM, August2007,pp.37–48. [68]N.Williams,S.Zander,andG.Armitage,“Evaluatingmachinelearning methodsforonlinegametrafﬁcidentiﬁcation,”CentreforAdvancedIn- ternetArchitectures,http://caia.swin.edu.au/reports/060410C/CAIA-TR- 060410C.pdf,Tech.Rep.060410C,asofAugust14,2007. ThuyT.T.Nguyen ([email protected])receivedtheB.Eng.Dip.Prac.in telecommunicationsengineeringfromtheUniversityofTechnology,Sydney in2002withFirstClassHonour.SheisnowaPhDcandidateattheCentre forAdvancedInternetArchitectures,SwinburneUniversityofTechnology, Melbourne,Australia.HerresearchinterestsincludeInternettrafﬁccharac- terizationandclassiﬁcation,Internetpricingandchargingsystems,QoSand performanceevaluationofbroadbandandwirelessnetworks.Sheisamember ofIEEECommunicationsSociety. GrenvilleArmitage [M03]([email protected])earnedaB.Eng (Elec)(Hons)in1988andaPhDinelectronicengineeringin1994,bothfrom theUniversityofMelbourne,Australia.Since2002hehasbeenAssociate ProfessorofTelecommunicationsEngineeringandDirectoroftheCentre forAdvancedInternetArchitecturesat SwinburneUniversityofTechnology, Melbourne,Australia.Heauthored QualityofServiceInIPNetworks: FoundationsforaMulti-ServiceInternet(MacmillanTechnicalPublishing, April2000)andco-authoredNetworkingandOnlineGames-Understanding andEngineeringMultiplayerInternetGames(JohnWiley&Sons,UK,April 2006).AssociateProfessorArmitageisalsoamemberofACMandACM SIGCOMM.