Khan_picAbstract: Effective Website Fingerprinting for Evolving Website Traffic

Website fingerprinting is a set of techniques used to discover patterns from a sequence of network packets generated while a user accesses different websites. Internet users (such as online activists or journalists) may wish to hide their identity and online activity to protect their privacy. Typically, an anonymity network is utilized for this purpose. These anonymity networks are built using technologies such as proxy servers with SSL/TLS encryption, or Tor (The Onion Router). They provide layers of data encryption and multiple proxy servers before transmitting data into the Internet. Website fingerprinting studies have employed various traffic analysis and statistical techniques over anonymity networks. Most studies use a similar set of features including packet size, packet direction, total count of packets, and other summaries of different packets. Moreover, various defense mechanisms have been proposed to counteract these feature selection processes, thereby reducing prediction accuracy. In this talk, we propose a new set of features obtained from bi-direction packet sequences by considering the size and request/response time of the bursting packets to perform an attack. Furthermore, we not only consider traditional learning techniques for prediction, but also use semantic vector space models (VSMs) of language where each word (packet) is represented as a real-valued vector. We evaluate our approach in both settings, i.e., closed-world and open-world. The evaluation is performed while employing defense mechanisms to illustrate the attack’s resilience to such defenses. We study the effect of evolving nature of website traffic and propose a method to overcome the challenge of distribution changes in website traffic with time. Our evaluation shows a superior performance over competing techniques.


Dr. Latifur Khan is currently a full Professor (tenured) in the Computer Science department at the University of Texas at Dallas, USA where he has been teaching and conducting research since September 2000. He received his Ph.D. and M.S. degrees in Computer Science from the University of Southern California (USC) in August of 2000, and December of 1996 respectively. Dr. Khan is an ACM Distinguished Scientist. He has received prestigious awards including the IEEE Technical Achievement Award for Intelligence and Security Informatics.

Dr. Khan has published over 200 papers in prestigious journals, and in peer reviewed conference proceedings. Currently, his research area focuses on big data management and analytics, data mining, complex data management including geo-spatial data and multimedia data. More details can be found at: www.utdallas.edu/~lkhan/