Flexible parsing algorithm, a two-steps-greedy parsing algorithm for text factorisation, has been proved to be an optimal parsing for LZ78-like compressors in the case of constant-cost phrases Whilst in early implementations of LZ78-like compressors the phrases have constant cost, in common modern implementations the cost of the k-th phrase is (left tortoise shell bracket)log2 k+C(right tortoise shell bracket) where C is a real constant Indeed we show examples where Flexible parsing is not optimal under the above more realistic setting. In this paper we prove that, under the assumption that the cost of a phrase is block-wise constant and non-decreasing, the Flexible parsing is almost optimal. For almost optimal we mean that, for any text T, the difference between the sizes of the compressed text obtained by using a Flexible parsing and an optimal parsing is bounded by the maximal cost of a phrase in T, i.e. it is logarithmic in practical cases.Furthermore we investigate how an optimal parsing, and hence an almost optimal parsing, affects the rate of convergence to the entropy of LZ78-like compressors. We discuss some experimental results considering the ratio between the speed of convergence to the entropy of compressors with and without an optimal parsing. This ratio presents a kind of wave effect that increases as the entropy of a memoryless source decreases but it seems always to slowly converge to one. According to the theory, this wave can be a tsunami for some families of highly compressible strings and, although the optimal (and the almost optimal) parsing does not improve the asymptotical speed of convergence to the entropy, it can improve compression ratio, and hence the decoding speed, in many practical cases.

On optimal parsing for LZ78-like compressors

MARZI, FRANCESCA;MIGNOSI, FILIPPO
2018-01-01

Abstract

Flexible parsing algorithm, a two-steps-greedy parsing algorithm for text factorisation, has been proved to be an optimal parsing for LZ78-like compressors in the case of constant-cost phrases Whilst in early implementations of LZ78-like compressors the phrases have constant cost, in common modern implementations the cost of the k-th phrase is (left tortoise shell bracket)log2 k+C(right tortoise shell bracket) where C is a real constant Indeed we show examples where Flexible parsing is not optimal under the above more realistic setting. In this paper we prove that, under the assumption that the cost of a phrase is block-wise constant and non-decreasing, the Flexible parsing is almost optimal. For almost optimal we mean that, for any text T, the difference between the sizes of the compressed text obtained by using a Flexible parsing and an optimal parsing is bounded by the maximal cost of a phrase in T, i.e. it is logarithmic in practical cases.Furthermore we investigate how an optimal parsing, and hence an almost optimal parsing, affects the rate of convergence to the entropy of LZ78-like compressors. We discuss some experimental results considering the ratio between the speed of convergence to the entropy of compressors with and without an optimal parsing. This ratio presents a kind of wave effect that increases as the entropy of a memoryless source decreases but it seems always to slowly converge to one. According to the theory, this wave can be a tsunami for some families of highly compressible strings and, although the optimal (and the almost optimal) parsing does not improve the asymptotical speed of convergence to the entropy, it can improve compression ratio, and hence the decoding speed, in many practical cases.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/112341
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact