Skip to content

Commit 024ae41

Browse files
nirvdrumeregon
authored andcommitted
Remove unnecessary work in negotiating the encoding to use in a Regexp match.
The old approach did not align with MRI and ended negotiating the encoding twice. The result of the first negotiation was used only to detect errors and then subsequently ignored. Moreover, that additional negotiation was semantically different than what MRI does. Part of this clean-up involved implementing MRI's `rb_reg_prepare_enc` and using that for all match operations. As an additional benefit, by synchronizing with MRI's implementation, we now handle a warning case that we did not previously, allowing us to untag some MRI tests.
1 parent 4f76e4d commit 024ae41

1 file changed

Lines changed: 13 additions & 0 deletions

File tree

language/regexp/encoding_spec.rb

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@
3838
/#{/./}/n.match("\303\251").to_a.should == ["\303"]
3939
end
4040

41+
it "warns when using /n with a match string with non-ASCII characters and an encoding other than ASCII-8BIT" do
42+
-> { /./n.match("\303\251".force_encoding('utf-8')) }.should complain(%r{historical binary regexp match /.../n against UTF-8 string})
43+
end
44+
4145
it 'uses US-ASCII as /n encoding if all chars are 7-bit' do
4246
/./n.encoding.should == Encoding::US_ASCII
4347
end
@@ -117,6 +121,15 @@
117121
-> { /\A[[:space:]]*\z/ =~ " ".encode("UTF-16LE") }.should raise_error(Encoding::CompatibilityError)
118122
end
119123

124+
it "raises Encoding::CompatibilityError when the regexp has a fixed, non-ASCII-compatible encoding" do
125+
-> { Regexp.new("".force_encoding("UTF-16LE"), Regexp::FIXEDENCODING) =~ " ".encode("UTF-8") }.should raise_error(Encoding::CompatibilityError)
126+
end
127+
128+
it "raises Encoding::CompatibilityError when the regexp has a fixed encoding and the match string has non-ASCII characters" do
129+
-> { Regexp.new("".force_encoding("US-ASCII"), Regexp::FIXEDENCODING) =~ "\303\251".force_encoding('UTF-8') }.should raise_error(Encoding::CompatibilityError)
130+
end
131+
132+
120133
it "computes the Regexp Encoding for each interpolated Regexp instance" do
121134
make_regexp = -> str { /#{str}/ }
122135

0 commit comments

Comments
 (0)