On the Compression of Unknown Sources

Date
2018-12
Authors
Hosseinitoushmanlouei, Maryamsadat
Contributor
Advisor
Santhanam, Narayana Prasad
Department
Electrical Engineering
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Usually, data is considered as the outcome of probability sources and to get insight from data, we need to know more about its underlying distribution. Although it is very helpful to know the precise characterization of the source, most of the time such information is not available. However, we usually know that the underlying distribution is not completely arbitrary and belongs to a general class of models, such as the class of \iid or Markov distributions. While universal compression of finite support distributions has been well studied, we look into more involved classes---in particular, distributions over countable supports, as well as the relations between compression and estimation in Markov setups without mixing assumptions. In the first part of the dissertation, we investigate ``compressibility" of a class of distributions. The exact identity of each distribution in the class is unknown, so we aim to find a universal encoder to compress all distributions in the class. But since the universal encoder does not match exactly to the underlying distribution, the average number of bits we use is higher, and the excess bits used over the entropy is the redundancy. We study the redundancy of universal encodings of strings generated by independent identically distributed (i.i.d.) sampling from a set of distributions over a countable support. We show that universal compression of length-n i.i.d. sequences is characterized by how well the tails of distributions in the collection can be universally described, and we formalize the later as the tail-redundancy of the collection. We show that per symbol redundancy converges to tail redundancy asymptotically and therefore characterize a necessary and sufficient condition for a collection of distributions to be ``strongly compressible". We also consider the redundancy of universally compressing strings generated by a binary Markov source without any bound on the memory. We prove asymptotically matching (in order) upper and lower bounds on the redundancy. Apart from the abstract analysis of a collection of unknown distributions, we adapt and implement an algorithm to compress the data obtained from an unknown source. Compression can be lossless or lossy. For lossless compression, Lempel and Ziv proposed a universal implementable algorithm and prove that their algorithm achieves the theoretical bound asymptotically. However, many applications can tolerate such amount of distortion which may allow for additional compression. We adapt Codelet parsing, a lossy Lempel-Ziv type algorithm. It sequentially parses a sequence to phrases which we call sourcelet and maps them to codelet in the dictionary. We develop concept ``strong match" and use Cycle Lemma to make sure that strong match method does not remove most of the possible matches from the tree. We study different properties of this dictionary and monitor how the leaves of the dictionary evolve. In the last chapter of the dissertation, we use rate-distortion theory to formulate a problem in cyber-physical systems. Consider a process being controlled remotely by a controller. Let an attacker have access to the communication channel so that she is able to replace the signal transmitted by the controller with any signal she wishes. The attacker wishes to degrade the control performance maximally without being detected. The controller wishes to detect the presence of the attacker by watermarking signaling information in the control input without degrading the control performance. We show that in the one-step version of the problem, if the watermark is a Gaussian distributed random variable, then the maximal performance degradation for any given level of stealthiness for the attacker is achieved when the attacker replaces the control input with the realization of a Gaussian random variable. Conversely, we show the watermark signal that minimizes the stealthiness of a Gaussian attacker is also Gaussian.
Description
Keywords
Electrical engineering
Citation
Extent
151 pages
Format
Geographic Location
Time Period
Related To
Table of Contents
Rights
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.