|
| 1 | +# 2.0 Plans |
| 2 | + |
| 3 | +In this document I am going to outline changes that I am planning to implement in the next major update of BitByteData, |
| 4 | +version 2.0. I am writing this document with two goals in mind. First, to provide an opportunity to give feedback on |
| 5 | +the proposed changes. Secondly, for historical reasons: if at some point in the future I am wondering what were the |
| 6 | +reasons for doing any particular change, I will be able to open this document and remind myself of them. Finally, I |
| 7 | +would like to mention that all these changes are more or less breaking (according to SemVer, which I am trying to follow |
| 8 | +in this project) and this is why I considering them only for the major update. |
| 9 | + |
| 10 | +__Note:__ For breviety, in this text the phrase "`LsbBitReader` and `MsbBitReader`" is combined into `L/MsbBitReader`. |
| 11 | + |
| 12 | +## Improve class and protocol structure of BitByteData |
| 13 | + |
| 14 | +### Motivation |
| 15 | + |
| 16 | +The key idea that was always supposed to be emphasized in the class-protocol structure of BitByteData is that all the |
| 17 | +bit readers are also byte readers. Currently, this idea is expressed via `L/MsbBitReader` subclassing `ByteReader`. The |
| 18 | +disadvantage of this approach is that the `BitReader` protocol, which both `L/MsbBitReader` conform to, cannot inherit |
| 19 | +from `ByteReader` class. This is simply impossible within Swift for a protocol to inherit from a class. The problem with |
| 20 | +that arises when you have a variable of the protocol type `BitReader` instead of the concrete class `L/MsbBitReader`. |
| 21 | +In this situation you cannot use any byte-reading methods, despite `L/MsbBitReader` being subclasses of `ByteReader`. To |
| 22 | +solve this problem, I had to add all byte-reading methods to `BitReader` protocol. |
| 23 | + |
| 24 | +The second motivating problem is the evident hole in the API of BitByteData: absence of any means to read bytes in the |
| 25 | +Big Endian order. The proposed restructuring is a good opportunity to introduce a new class for this purpose, which may |
| 26 | +be useful in some (though, admittedly, niche) situations. |
| 27 | + |
| 28 | +### Proposed solution |
| 29 | + |
| 30 | +1. Add a new _protocol_ `ByteReader` (sic!) with the following API: |
| 31 | + |
| 32 | +```swift |
| 33 | +public protocol ByteReader { |
| 34 | + |
| 35 | + var size: Int { get } |
| 36 | + |
| 37 | + var data: Data { get } |
| 38 | + |
| 39 | + var offset: Int { get set } |
| 40 | + |
| 41 | + var isFinished: Bool { get } |
| 42 | + |
| 43 | + var bytesLeft: Int { get } |
| 44 | + |
| 45 | + var bytesRead: Int { get } |
| 46 | + |
| 47 | + init(data: Data) |
| 48 | + |
| 49 | + func byte() -> UInt8 |
| 50 | + |
| 51 | + func bytes(count: Int) -> [UInt8] |
| 52 | + |
| 53 | + func int(fromBytes count: Int) -> Int |
| 54 | + |
| 55 | + func uint64(fromBytes count: Int) -> UInt64 |
| 56 | + |
| 57 | + func uint32(fromBytes count: Int) -> UInt32 |
| 58 | + |
| 59 | + func uint16(fromBytes count: Int) -> UInt16 |
| 60 | + |
| 61 | +} |
| 62 | +``` |
| 63 | + |
| 64 | +__Note:__ This protocol differs from the byte reading APIs of the `BitReader` protocol's current version. These |
| 65 | +changes are discussed in the separate sections of this document. |
| 66 | + |
| 67 | +2. Rename the `ByteReader` class to `LittleEndianByteReader`. |
| 68 | + |
| 69 | +3. Add the conformance to the new `ByteReader` protocol to the renamed `LittleEndianByteReader` class. |
| 70 | + |
| 71 | +4. Add a new class, `BigEndianByteReader`, which also conforms to the `ByteReader` protocol. |
| 72 | + |
| 73 | +5. Make the `BitReader` protocol extend the `ByteReader` protocol. |
| 74 | + |
| 75 | +6. Make both `L/MsbBitReader` no longer subclass the `ByteReader` class (which will be renamed to `LittleEndianByteReader`). |
| 76 | + |
| 77 | +7. Change implementation of both `L/MsbBitReader` to explicitly read bytes in the Little Endian order (currently, they use |
| 78 | + superclass's implementation). |
| 79 | + |
| 80 | +8. Mark all classes (`Little/BigEndianByteReader` and `L/MsbBitReader`) as `final`. |
| 81 | + |
| 82 | +These changes should solve the issues from "Motivation" section. |
| 83 | + |
| 84 | +### Alternatives considered |
| 85 | + |
| 86 | +#### Add a `Self` constraint requirement to the `BitReader` protocol |
| 87 | + |
| 88 | +```swift |
| 89 | +public protocol BitReader where Self: ByteReader { |
| 90 | + // ... |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +Ideally, this should express the idea that only something that is `ByteReader` (e.g. something that subclasses it) can |
| 95 | +conform to the `BitReader` protocol. Additionally, it naturally allows to remove any `ByteReader`'s methods and |
| 96 | +properties from the `BitReader` protocol. Unfortunatelly, the problem with this approach is that it is really unstable: |
| 97 | +when I first tried it (early 2018) for some Swift versions the compiler was crashing during compilation, and for others |
| 98 | +the resulting binary was experiencing undefined behavior. With this in mind, it is safe to say that this idea is off the |
| 99 | +table. |
| 100 | + |
| 101 | +__Note (TODO):__ I haven't tested neither Swift 4.2 compiler nor development snapshots of Swift 5. I will update this |
| 102 | +document with the final results for them, but it is likely that it still won't work (and, frankly, I am not even sure if |
| 103 | +it is supposed to). |
| 104 | + |
| 105 | +#### Use `ByteReaderProtocol` as a name for the new protocol |
| 106 | + |
| 107 | +The proposed soultions is, probably, the most obvious one. The only hard part in it is, as always, naming: |
| 108 | +what would be the name of a protocol if we already have a class with `ByteReader` name? One can find inspiration in |
| 109 | +Swift standard library (particularly, `IteratorProtocol`) and come up with with an alternative protocol name, |
| 110 | +`ByteReaderProtocol`. While I haven't tested it, I have a feeling that this name would incur more source code changes on |
| 111 | +BitByteData's users. The reason for this feeling is that I expect that the most common usage scenario is to initialize a |
| 112 | +byte (bit) reader once and then pass it as an argument to other functions: |
| 113 | + |
| 114 | +```swift |
| 115 | +func f(_ reader: ByteReader) { |
| 116 | + // ... |
| 117 | +} |
| 118 | + |
| 119 | +let reader = LsbBitReader(data: data) |
| 120 | +f(reader) |
| 121 | +``` |
| 122 | + |
| 123 | +Since `L/MsbBitReader` are no longer subclasses of `ByteReader` the users would be required to |
| 124 | +change function declaration to: |
| 125 | + |
| 126 | +```swift |
| 127 | +func (f_ reader: ByteReaderProtocol) { |
| 128 | + //... |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +And if users have a lot of functions with `ByteReader` arguments this could quickly get out of control. Additionally, if |
| 133 | +we were to introduce a new byte reader for Big Endian byte order (as proposed), we would likely still need to change the |
| 134 | +name of `ByteReader` class for a symmetry with the new Big Endian byte reader, which makes this alternative even worse. |
| 135 | + |
| 136 | +#### Use `Le/BeByteReader` as names for the new and renamed classes |
| 137 | + |
| 138 | +I don't think that these names are much better than the full spelling of Little and Big Endian. In practice, as was |
| 139 | +stated above, I don't expect the class names of byte readers to be used extensively, thus, I don't think that the |
| 140 | +tradeoff ratio between clarity and brevity is good in this case. On the contrast, the abbreviations were used for |
| 141 | +`L/MsbBitReader` because their full names are ridiculously long: `LeastSignificantBit` and `MostSignificantBit` |
| 142 | +respectively. |
| 143 | + |
| 144 | +#### Add Big-Endian alternatives for `L/MsbBitReader` |
| 145 | + |
| 146 | +As stated above, both `L/MsbBitReader` are going to explicitly support only Little Endian byte order. In theory, we could |
| 147 | +add classes with tentative names like `LsbBigEndianBitReader` (`LsbBeBitReader`?), etc. to support Big Endian byte order, |
| 148 | +but this would be extremely narrow feature for the price of the increased project size (2 more classes). Finally, it |
| 149 | +could be proposed to add a new enum for byte order, add a byte order argument to initializers, and then switch at runtime |
| 150 | +on byte order. Unfortunately, this would be detrimental to the overall performance, and thus, this is not considered. |
| 151 | + |
| 152 | +#### Don't mark classes as `final` |
| 153 | + |
| 154 | +This would allow to create new subclasses of byte and bit readers, but honestly, I can't imagine any use case for this. |
| 155 | +On the other hand marking classes as `final` enables some compiler optimizations. |
| 156 | + |
| 157 | +## Add (more) means of conversion between readers |
| 158 | + |
| 159 | +Currently, there exist converting initializers from ByteReader to `L/MsbBitReader`. With 2.0 update I am planning to add |
| 160 | +several more: from `L/MsbBitReader` to byte reader(s), and between `LsbBitReader` and `MsbBitReader`. While, I don't |
| 161 | +currently have any use case for these conversions, with the proposed above plan for restructuring BitByteData, these |
| 162 | +initializers would be extremely easy to implement and, potentially, even possible to implement as extension methods to |
| 163 | +corresponding protocols |
| 164 | + |
| 165 | +## Add missing methods to protocols |
| 166 | + |
| 167 | +In 1.x updates there were several new methods introduced both to readers and writers. For the reasons of backwards |
| 168 | +compatibility and SemVer, these new methods haven't been added to any of the protocols. The major 2.0 update is the |
| 169 | +perfect opportunity to introduce them as new protocol requirements. It is also possible that some of the new and old |
| 170 | +protocols' methods could be even provided with default implementation, but I haven't yet assessed that. |
| 171 | + |
| 172 | +List of currently planned additions to the protocols: |
| 173 | + |
| 174 | +1. `BitReader.advance(by:)` |
| 175 | +2. `BitWriter.write(unsignedNumber:bitsCount:)` |
| 176 | +3. Anything else that is added in 2.0 update |
| 177 | + |
| 178 | +## Remove no argument versions of `uintX(fromBytes:)` methods |
| 179 | + |
| 180 | +Currently, there are two versions of methods for reading bytes into an unsigned integer: with an argument, which allows |
| 181 | +to specify a number of bytes to read, and without the argument. The version without the argument reads the maximum |
| 182 | +possible amount of bytes possible to store in unsigned integer. These two versions exist, because at the time they were |
| 183 | +added it was possible to implement without-argument version in a more efficient way. But since the new implementation |
| 184 | +strategy of byte reading was introduced in 1.4.0, these manual optimizations became redundant and currently these |
| 185 | +no-argument methods just call their counterparts with arguments. In 2.0 update I want to remove these no-argument |
| 186 | +methods and add natural default argument values to the remaining methods, e.g.: |
| 187 | + |
| 188 | +```swift |
| 189 | +func uint16(fromBytes count: Int = 2) -> UInt16 |
| 190 | +``` |
| 191 | + |
| 192 | +## Check if a bit reader is aligned in `offset`'s `willSet` observer |
| 193 | + |
| 194 | +The current philosophy for byte reading in bit readers is that it is only possible to read _bytes_ (and, probably, the |
| 195 | +only case when it makes sense) when a bit reader is "aligned" to the byte boundary. This is a carefully maintained |
| 196 | +invariant in all _methods_ of both `L/MsbBitReader`, but not in the `offset`'s setter. In other words, it is currently |
| 197 | +possible to change `L/MsbBitReader.offset` property when said bit reader is not aligned. This can lead to surprising and |
| 198 | +unpredictable behavior, and I am going to explicitly ban this by adding a precondition to the `offset`'s `willSet` |
| 199 | +property observer. |
| 200 | + |
| 201 | +## Add methods for reading and writing bits of a negative number |
| 202 | + |
| 203 | +The recent addition of `L/MsbBitWriter.write(unsignedNumber:bitsCount:)` methods made me realize that there is currently |
| 204 | +no way to read and write negative signed integer from and to bits (and bytes, for that matter). To resolve this problem, |
| 205 | +I am going to modify methods for reading and writing integers from and to bits as following: |
| 206 | + |
| 207 | +```swift |
| 208 | +// Before: |
| 209 | +// func write(number: Int, bitsCount: Int) |
| 210 | +func write(number: Int, bitsCount: Int, signed: SignedNumberRepresentation = .none) |
| 211 | + |
| 212 | +// Before: |
| 213 | +// func int(fromBits count: Int) -> Int |
| 214 | +func int(fromBits count: Int, signed: SignedNumberRepresentation = .none) -> Int |
| 215 | +``` |
| 216 | + |
| 217 | +`SignedNumerRepresenation` is a new enum which allows to choose an encoding scheme for a signed integer (see the |
| 218 | +[wikipedia article](https://en.wikipedia.org/wiki/Signed_number_representations)). The default value for `signed` |
| 219 | +argument is supposed to preserve current behavior with slight modification: it will write and read an integer disregarding |
| 220 | +its sign. This will be the same as the current behavior for positive integers. (Un)fortunately, it will be different for |
| 221 | +negative integers but in a good way, because its current behavior is more or less undefined. It depends on the values |
| 222 | +of the arguments (e.g. `number` and `bitsCount`), on the platform-specific bit width of `Int` type, and, possibly, |
| 223 | +something else. |
| 224 | + |
| 225 | +Proposed cases for the new `SignedNumberRepresentation` enum: |
| 226 | + |
| 227 | +```swift |
| 228 | +public enum SignedNumberRepresentation { |
| 229 | + case none |
| 230 | + case signMagnitude |
| 231 | + case oneComplement |
| 232 | + case twoComplement |
| 233 | + case biased(offset: Int) // Other options: offsetBinary, excessK |
| 234 | + case negativeBase |
| 235 | +} |
| 236 | +``` |
| 237 | + |
| 238 | +This addition will likely have negative impact on performance of the changed methods (because of the introduced switching |
| 239 | +on the value of the `signed` argument), but I believe this is an acceptable tradeoff (we get much more in terms of |
| 240 | +correctness and functionality). That said, if turns out to be a big enough problem, I will add a no-argument version of |
| 241 | +these methods, which will provide `.none` behavior and, at the same time remove `.none` case from the new enum and the |
| 242 | +default values for the `signed` argument. |
| 243 | + |
| 244 | +## Other crazy ideas |
| 245 | + |
| 246 | +This is the list of ideas that came to my mind, but they either are too breaking or I haven't assessed them at all. For |
| 247 | +these reasons they are very unlikely to be implemented in 2.0 update (or at all, FWIW), but I still decided to briefly |
| 248 | +mention them for completeness. |
| 249 | + |
| 250 | +### Make `offset` property zero-based |
| 251 | + |
| 252 | +Currently, `offset` property of all the readers mirrors the `Data`'s behavior: it is not necessarily zero-based. This |
| 253 | +"feature" is somewhat inconvenient from the implementation point of view. We could change it to be zero-based, but this |
| 254 | +would be extremely breaking change, and in a very subtle way. On the other hand, current behavior is consistent with the |
| 255 | +behavior of `Data` which maybe is a good thing. |
| 256 | + |
| 257 | +### Remove `ByteReader.data` property |
| 258 | + |
| 259 | +This property is not really used in BitByteData (apart from converting initializers, but they can be reimplemented |
| 260 | +without this property) and its supposed usage scenarios aren't immediately clear in theory. This makes it a candidate for |
| 261 | +removal, but there are two concerns: |
| 262 | + |
| 263 | +1. We have to make sure that under no circumstances the internal pointer inside the readers becomes invalid. The presence |
| 264 | + of `data` property adds some guarantees for the validity of this pointer. |
| 265 | +2. My [other project](https://github.com/tsolomko/SWCompression) heavily uses this `data` property. |
| 266 | + |
| 267 | +### Remove `ByteReader.offset` property |
| 268 | + |
| 269 | +In theory, this property is an implementation detail of all readers and currently it is even true in practice: there is |
| 270 | +a "true" internal `_offset` property and public `offset` property is computed using it. We could remove it and instead |
| 271 | +directly increment internal pointer inside the readers, but there are still some semi-open questions remaining: |
| 272 | + |
| 273 | +1. Changing the offset is an extremely valuable functionality. To replace it, we could provide a new method, `seek(by:)`, |
| 274 | + for this, but it would be extremely source-breaking. |
| 275 | +2. What about offset's getter? There are some situations where it is important to know the current offset of the reader. |
| 276 | + |
| 277 | +The last question actually highlights another issue: indices in general in Swift world aren't necessarily zero-based. This |
| 278 | +leads to the problem that the value of the reader's offset is nonsensical without knowing what the base value is. This |
| 279 | +may be another argument in favor of removing `offset` property. |
| 280 | + |
| 281 | +### Combine `bitMask` and `offset` properties of _bit_ readers |
| 282 | + |
| 283 | +It is possible to add another property which represents continuous _bit_ offset, and compute _byte_ `offset` property |
| 284 | +from it (via `/ 8`). With this change it is possible to eliminate internal `bitMask` property and we won't have to check |
| 285 | +and update _byte_ offset on every bit read. Unfortunately, there is a problem: the maximum _byte_ offset in this case |
| 286 | +would be `Int.max / 8` and this is less than currently possible (`Int.max`). |
| 287 | + |
| 288 | +### More ideas for BitByteData's architecture |
| 289 | + |
| 290 | +#### Inclusion instead of inheritance |
| 291 | + |
| 292 | +One other approach for the design of BitByteData is to use inclusion instead of inheritance to connect bit and byte |
| 293 | +readers. I haven't assessed this idea at all, and with the changes proposed above it is very unlikely that this idea |
| 294 | +is ever implemented, but it would be still an interesting thought experiment. |
| 295 | + |
| 296 | +#### `ByteReader` as `Data` subclass |
| 297 | + |
| 298 | +One could try to make `ByteReader` a subclass of `Data`. The only problem here, is that `Data` is a struct and it is |
| 299 | +impossible to subclass structs in Swift. As an alternative one could also try to sublcass `NSData` instead, but there is |
| 300 | +another set of completely unexplored questions (what about `NSMutableData`? what is the future for `NSData`? |
| 301 | +what about its Objective-C legacy? what about relationship between Linux and `NSData`?) |
| 302 | + |
| 303 | +### Make `ByteReader` conform to `Sequence`/`IteratorProtocol` |
| 304 | + |
| 305 | +It seems like byte and bit readers do have certain features of a `Sequence` (or maybe some other stdlib's protocol). |
| 306 | +In theory it may be useful to conform the readers to these protocols, or even make BitByteData's protocols to extend |
| 307 | +these protocols. This idea is actually one of the least crazy ones and likely to be considered at one point in the |
| 308 | +(distant?) future. |
| 309 | + |
| 310 | +### Chage `offset` property's type to `UInt64` |
| 311 | + |
| 312 | +In theory there is a problem that some extremely big data instances can lead to undesirable behavior, since `offset` is |
| 313 | +of type `Int` and `Int.max < UInt64.max`. Luckily, such big datas are very rare in practice (I haven't seen any). The |
| 314 | +other problem is that `Data.Index == Int`. Overall, this is problem to tackle in very-very distant future. |
0 commit comments