Skip to content

Commit 5cdf342

Browse files
committed
Add a document with plans for 2.0 release
1 parent 1db1485 commit 5cdf342

1 file changed

Lines changed: 314 additions & 0 deletions

File tree

2-0-Plans.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
# 2.0 Plans
2+
3+
In this document I am going to outline changes that I am planning to implement in the next major update of BitByteData,
4+
version 2.0. I am writing this document with two goals in mind. First, to provide an opportunity to give feedback on
5+
the proposed changes. Secondly, for historical reasons: if at some point in the future I am wondering what were the
6+
reasons for doing any particular change, I will be able to open this document and remind myself of them. Finally, I
7+
would like to mention that all these changes are more or less breaking (according to SemVer, which I am trying to follow
8+
in this project) and this is why I considering them only for the major update.
9+
10+
__Note:__ For breviety, in this text the phrase "`LsbBitReader` and `MsbBitReader`" is combined into `L/MsbBitReader`.
11+
12+
## Improve class and protocol structure of BitByteData
13+
14+
### Motivation
15+
16+
The key idea that was always supposed to be emphasized in the class-protocol structure of BitByteData is that all the
17+
bit readers are also byte readers. Currently, this idea is expressed via `L/MsbBitReader` subclassing `ByteReader`. The
18+
disadvantage of this approach is that the `BitReader` protocol, which both `L/MsbBitReader` conform to, cannot inherit
19+
from `ByteReader` class. This is simply impossible within Swift for a protocol to inherit from a class. The problem with
20+
that arises when you have a variable of the protocol type `BitReader` instead of the concrete class `L/MsbBitReader`.
21+
In this situation you cannot use any byte-reading methods, despite `L/MsbBitReader` being subclasses of `ByteReader`. To
22+
solve this problem, I had to add all byte-reading methods to `BitReader` protocol.
23+
24+
The second motivating problem is the evident hole in the API of BitByteData: absence of any means to read bytes in the
25+
Big Endian order. The proposed restructuring is a good opportunity to introduce a new class for this purpose, which may
26+
be useful in some (though, admittedly, niche) situations.
27+
28+
### Proposed solution
29+
30+
1. Add a new _protocol_ `ByteReader` (sic!) with the following API:
31+
32+
```swift
33+
public protocol ByteReader {
34+
35+
var size: Int { get }
36+
37+
var data: Data { get }
38+
39+
var offset: Int { get set }
40+
41+
var isFinished: Bool { get }
42+
43+
var bytesLeft: Int { get }
44+
45+
var bytesRead: Int { get }
46+
47+
init(data: Data)
48+
49+
func byte() -> UInt8
50+
51+
func bytes(count: Int) -> [UInt8]
52+
53+
func int(fromBytes count: Int) -> Int
54+
55+
func uint64(fromBytes count: Int) -> UInt64
56+
57+
func uint32(fromBytes count: Int) -> UInt32
58+
59+
func uint16(fromBytes count: Int) -> UInt16
60+
61+
}
62+
```
63+
64+
__Note:__ This protocol differs from the byte reading APIs of the `BitReader` protocol's current version. These
65+
changes are discussed in the separate sections of this document.
66+
67+
2. Rename the `ByteReader` class to `LittleEndianByteReader`.
68+
69+
3. Add the conformance to the new `ByteReader` protocol to the renamed `LittleEndianByteReader` class.
70+
71+
4. Add a new class, `BigEndianByteReader`, which also conforms to the `ByteReader` protocol.
72+
73+
5. Make the `BitReader` protocol extend the `ByteReader` protocol.
74+
75+
6. Make both `L/MsbBitReader` no longer subclass the `ByteReader` class (which will be renamed to `LittleEndianByteReader`).
76+
77+
7. Change implementation of both `L/MsbBitReader` to explicitly read bytes in the Little Endian order (currently, they use
78+
superclass's implementation).
79+
80+
8. Mark all classes (`Little/BigEndianByteReader` and `L/MsbBitReader`) as `final`.
81+
82+
These changes should solve the issues from "Motivation" section.
83+
84+
### Alternatives considered
85+
86+
#### Add a `Self` constraint requirement to the `BitReader` protocol
87+
88+
```swift
89+
public protocol BitReader where Self: ByteReader {
90+
// ...
91+
}
92+
```
93+
94+
Ideally, this should express the idea that only something that is `ByteReader` (e.g. something that subclasses it) can
95+
conform to the `BitReader` protocol. Additionally, it naturally allows to remove any `ByteReader`'s methods and
96+
properties from the `BitReader` protocol. Unfortunatelly, the problem with this approach is that it is really unstable:
97+
when I first tried it (early 2018) for some Swift versions the compiler was crashing during compilation, and for others
98+
the resulting binary was experiencing undefined behavior. With this in mind, it is safe to say that this idea is off the
99+
table.
100+
101+
__Note (TODO):__ I haven't tested neither Swift 4.2 compiler nor development snapshots of Swift 5. I will update this
102+
document with the final results for them, but it is likely that it still won't work (and, frankly, I am not even sure if
103+
it is supposed to).
104+
105+
#### Use `ByteReaderProtocol` as a name for the new protocol
106+
107+
The proposed soultions is, probably, the most obvious one. The only hard part in it is, as always, naming:
108+
what would be the name of a protocol if we already have a class with `ByteReader` name? One can find inspiration in
109+
Swift standard library (particularly, `IteratorProtocol`) and come up with with an alternative protocol name,
110+
`ByteReaderProtocol`. While I haven't tested it, I have a feeling that this name would incur more source code changes on
111+
BitByteData's users. The reason for this feeling is that I expect that the most common usage scenario is to initialize a
112+
byte (bit) reader once and then pass it as an argument to other functions:
113+
114+
```swift
115+
func f(_ reader: ByteReader) {
116+
// ...
117+
}
118+
119+
let reader = LsbBitReader(data: data)
120+
f(reader)
121+
```
122+
123+
Since `L/MsbBitReader` are no longer subclasses of `ByteReader` the users would be required to
124+
change function declaration to:
125+
126+
```swift
127+
func (f_ reader: ByteReaderProtocol) {
128+
//...
129+
}
130+
```
131+
132+
And if users have a lot of functions with `ByteReader` arguments this could quickly get out of control. Additionally, if
133+
we were to introduce a new byte reader for Big Endian byte order (as proposed), we would likely still need to change the
134+
name of `ByteReader` class for a symmetry with the new Big Endian byte reader, which makes this alternative even worse.
135+
136+
#### Use `Le/BeByteReader` as names for the new and renamed classes
137+
138+
I don't think that these names are much better than the full spelling of Little and Big Endian. In practice, as was
139+
stated above, I don't expect the class names of byte readers to be used extensively, thus, I don't think that the
140+
tradeoff ratio between clarity and brevity is good in this case. On the contrast, the abbreviations were used for
141+
`L/MsbBitReader` because their full names are ridiculously long: `LeastSignificantBit` and `MostSignificantBit`
142+
respectively.
143+
144+
#### Add Big-Endian alternatives for `L/MsbBitReader`
145+
146+
As stated above, both `L/MsbBitReader` are going to explicitly support only Little Endian byte order. In theory, we could
147+
add classes with tentative names like `LsbBigEndianBitReader` (`LsbBeBitReader`?), etc. to support Big Endian byte order,
148+
but this would be extremely narrow feature for the price of the increased project size (2 more classes). Finally, it
149+
could be proposed to add a new enum for byte order, add a byte order argument to initializers, and then switch at runtime
150+
on byte order. Unfortunately, this would be detrimental to the overall performance, and thus, this is not considered.
151+
152+
#### Don't mark classes as `final`
153+
154+
This would allow to create new subclasses of byte and bit readers, but honestly, I can't imagine any use case for this.
155+
On the other hand marking classes as `final` enables some compiler optimizations.
156+
157+
## Add (more) means of conversion between readers
158+
159+
Currently, there exist converting initializers from ByteReader to `L/MsbBitReader`. With 2.0 update I am planning to add
160+
several more: from `L/MsbBitReader` to byte reader(s), and between `LsbBitReader` and `MsbBitReader`. While, I don't
161+
currently have any use case for these conversions, with the proposed above plan for restructuring BitByteData, these
162+
initializers would be extremely easy to implement and, potentially, even possible to implement as extension methods to
163+
corresponding protocols
164+
165+
## Add missing methods to protocols
166+
167+
In 1.x updates there were several new methods introduced both to readers and writers. For the reasons of backwards
168+
compatibility and SemVer, these new methods haven't been added to any of the protocols. The major 2.0 update is the
169+
perfect opportunity to introduce them as new protocol requirements. It is also possible that some of the new and old
170+
protocols' methods could be even provided with default implementation, but I haven't yet assessed that.
171+
172+
List of currently planned additions to the protocols:
173+
174+
1. `BitReader.advance(by:)`
175+
2. `BitWriter.write(unsignedNumber:bitsCount:)`
176+
3. Anything else that is added in 2.0 update
177+
178+
## Remove no argument versions of `uintX(fromBytes:)` methods
179+
180+
Currently, there are two versions of methods for reading bytes into an unsigned integer: with an argument, which allows
181+
to specify a number of bytes to read, and without the argument. The version without the argument reads the maximum
182+
possible amount of bytes possible to store in unsigned integer. These two versions exist, because at the time they were
183+
added it was possible to implement without-argument version in a more efficient way. But since the new implementation
184+
strategy of byte reading was introduced in 1.4.0, these manual optimizations became redundant and currently these
185+
no-argument methods just call their counterparts with arguments. In 2.0 update I want to remove these no-argument
186+
methods and add natural default argument values to the remaining methods, e.g.:
187+
188+
```swift
189+
func uint16(fromBytes count: Int = 2) -> UInt16
190+
```
191+
192+
## Check if a bit reader is aligned in `offset`'s `willSet` observer
193+
194+
The current philosophy for byte reading in bit readers is that it is only possible to read _bytes_ (and, probably, the
195+
only case when it makes sense) when a bit reader is "aligned" to the byte boundary. This is a carefully maintained
196+
invariant in all _methods_ of both `L/MsbBitReader`, but not in the `offset`'s setter. In other words, it is currently
197+
possible to change `L/MsbBitReader.offset` property when said bit reader is not aligned. This can lead to surprising and
198+
unpredictable behavior, and I am going to explicitly ban this by adding a precondition to the `offset`'s `willSet`
199+
property observer.
200+
201+
## Add methods for reading and writing bits of a negative number
202+
203+
The recent addition of `L/MsbBitWriter.write(unsignedNumber:bitsCount:)` methods made me realize that there is currently
204+
no way to read and write negative signed integer from and to bits (and bytes, for that matter). To resolve this problem,
205+
I am going to modify methods for reading and writing integers from and to bits as following:
206+
207+
```swift
208+
// Before:
209+
// func write(number: Int, bitsCount: Int)
210+
func write(number: Int, bitsCount: Int, signed: SignedNumberRepresentation = .none)
211+
212+
// Before:
213+
// func int(fromBits count: Int) -> Int
214+
func int(fromBits count: Int, signed: SignedNumberRepresentation = .none) -> Int
215+
```
216+
217+
`SignedNumerRepresenation` is a new enum which allows to choose an encoding scheme for a signed integer (see the
218+
[wikipedia article](https://en.wikipedia.org/wiki/Signed_number_representations)). The default value for `signed`
219+
argument is supposed to preserve current behavior with slight modification: it will write and read an integer disregarding
220+
its sign. This will be the same as the current behavior for positive integers. (Un)fortunately, it will be different for
221+
negative integers but in a good way, because its current behavior is more or less undefined. It depends on the values
222+
of the arguments (e.g. `number` and `bitsCount`), on the platform-specific bit width of `Int` type, and, possibly,
223+
something else.
224+
225+
Proposed cases for the new `SignedNumberRepresentation` enum:
226+
227+
```swift
228+
public enum SignedNumberRepresentation {
229+
case none
230+
case signMagnitude
231+
case oneComplement
232+
case twoComplement
233+
case biased(offset: Int) // Other options: offsetBinary, excessK
234+
case negativeBase
235+
}
236+
```
237+
238+
This addition will likely have negative impact on performance of the changed methods (because of the introduced switching
239+
on the value of the `signed` argument), but I believe this is an acceptable tradeoff (we get much more in terms of
240+
correctness and functionality). That said, if turns out to be a big enough problem, I will add a no-argument version of
241+
these methods, which will provide `.none` behavior and, at the same time remove `.none` case from the new enum and the
242+
default values for the `signed` argument.
243+
244+
## Other crazy ideas
245+
246+
This is the list of ideas that came to my mind, but they either are too breaking or I haven't assessed them at all. For
247+
these reasons they are very unlikely to be implemented in 2.0 update (or at all, FWIW), but I still decided to briefly
248+
mention them for completeness.
249+
250+
### Make `offset` property zero-based
251+
252+
Currently, `offset` property of all the readers mirrors the `Data`'s behavior: it is not necessarily zero-based. This
253+
"feature" is somewhat inconvenient from the implementation point of view. We could change it to be zero-based, but this
254+
would be extremely breaking change, and in a very subtle way. On the other hand, current behavior is consistent with the
255+
behavior of `Data` which maybe is a good thing.
256+
257+
### Remove `ByteReader.data` property
258+
259+
This property is not really used in BitByteData (apart from converting initializers, but they can be reimplemented
260+
without this property) and its supposed usage scenarios aren't immediately clear in theory. This makes it a candidate for
261+
removal, but there are two concerns:
262+
263+
1. We have to make sure that under no circumstances the internal pointer inside the readers becomes invalid. The presence
264+
of `data` property adds some guarantees for the validity of this pointer.
265+
2. My [other project](https://github.com/tsolomko/SWCompression) heavily uses this `data` property.
266+
267+
### Remove `ByteReader.offset` property
268+
269+
In theory, this property is an implementation detail of all readers and currently it is even true in practice: there is
270+
a "true" internal `_offset` property and public `offset` property is computed using it. We could remove it and instead
271+
directly increment internal pointer inside the readers, but there are still some semi-open questions remaining:
272+
273+
1. Changing the offset is an extremely valuable functionality. To replace it, we could provide a new method, `seek(by:)`,
274+
for this, but it would be extremely source-breaking.
275+
2. What about offset's getter? There are some situations where it is important to know the current offset of the reader.
276+
277+
The last question actually highlights another issue: indices in general in Swift world aren't necessarily zero-based. This
278+
leads to the problem that the value of the reader's offset is nonsensical without knowing what the base value is. This
279+
may be another argument in favor of removing `offset` property.
280+
281+
### Combine `bitMask` and `offset` properties of _bit_ readers
282+
283+
It is possible to add another property which represents continuous _bit_ offset, and compute _byte_ `offset` property
284+
from it (via `/ 8`). With this change it is possible to eliminate internal `bitMask` property and we won't have to check
285+
and update _byte_ offset on every bit read. Unfortunately, there is a problem: the maximum _byte_ offset in this case
286+
would be `Int.max / 8` and this is less than currently possible (`Int.max`).
287+
288+
### More ideas for BitByteData's architecture
289+
290+
#### Inclusion instead of inheritance
291+
292+
One other approach for the design of BitByteData is to use inclusion instead of inheritance to connect bit and byte
293+
readers. I haven't assessed this idea at all, and with the changes proposed above it is very unlikely that this idea
294+
is ever implemented, but it would be still an interesting thought experiment.
295+
296+
#### `ByteReader` as `Data` subclass
297+
298+
One could try to make `ByteReader` a subclass of `Data`. The only problem here, is that `Data` is a struct and it is
299+
impossible to subclass structs in Swift. As an alternative one could also try to sublcass `NSData` instead, but there is
300+
another set of completely unexplored questions (what about `NSMutableData`? what is the future for `NSData`?
301+
what about its Objective-C legacy? what about relationship between Linux and `NSData`?)
302+
303+
### Make `ByteReader` conform to `Sequence`/`IteratorProtocol`
304+
305+
It seems like byte and bit readers do have certain features of a `Sequence` (or maybe some other stdlib's protocol).
306+
In theory it may be useful to conform the readers to these protocols, or even make BitByteData's protocols to extend
307+
these protocols. This idea is actually one of the least crazy ones and likely to be considered at one point in the
308+
(distant?) future.
309+
310+
### Chage `offset` property's type to `UInt64`
311+
312+
In theory there is a problem that some extremely big data instances can lead to undesirable behavior, since `offset` is
313+
of type `Int` and `Int.max < UInt64.max`. Luckily, such big datas are very rare in practice (I haven't seen any). The
314+
other problem is that `Data.Index == Int`. Overall, this is problem to tackle in very-very distant future.

0 commit comments

Comments
 (0)