|
1 | | -= LibXML Ruby |
2 | | - |
3 | | -== Overview |
4 | | -The libxml gem provides Ruby language bindings for GNOME's Libxml2 |
5 | | -XML toolkit. It is free software, released under the MIT License. |
6 | | - |
7 | | -We think libxml-ruby is the best XML library for Ruby because: |
8 | | - |
9 | | -* Speed - Its much faster than REXML and Hpricot |
10 | | -* Features - It provides an amazing number of featues |
11 | | -* Conformance - It passes all 1800+ tests from the OASIS XML Tests Suite |
12 | | - |
13 | | -== Requirements |
14 | | -libxml-ruby requires Ruby 3.0.0 or higher. It depends on libxml2 to |
15 | | -function properly. libxml2, in turn, depends on: |
16 | | - |
17 | | -* libm (math routines: very standard) |
18 | | -* libz (zlib) |
19 | | -* libiconv |
20 | | - |
21 | | -If you are running Linux or Unix you'll need a C compiler so the |
22 | | -extension can be compiled when it is installed. If you are running |
23 | | -Windows, then install the x64-mingw-ucr gem or build it yourself using (Ruby |
24 | | -for Windows)[https://rubyinstaller.org/] or directly with msys2[https://msys2.github.io/] |
25 | | -and ucrt64. |
26 | | - |
27 | | -== Installation |
28 | | -The easiest way to install libxml-ruby is via RubyGems. To install: |
29 | | - |
30 | | -<tt>gem install libxml-ruby</tt> |
31 | | - |
32 | | -If the extension compile process cannot find libxml2, you may need to indicate |
33 | | -the location of the libxml2 configuration utility as it is used to find the |
34 | | -required header and include files. (If you need to indicate a location for the |
35 | | -libxml2 library or header files different than reported by <tt>xml2-config</tt>, |
36 | | -see the additional configuration options.) |
37 | | - |
38 | | -This may be done with RubyGems: |
39 | | - |
40 | | -<tt>gem install libxml-ruby -- --with-xml2-dir=/path/to/xml2-config</tt> |
41 | | - |
42 | | -Or bundler: |
43 | | - |
44 | | -<tt>bundle config build.libxml-ruby --with-xml2-config=/path/to/xml2-config</tt> |
45 | | - |
46 | | -<tt>bundle install libxml-ruby</tt> |
47 | | - |
48 | | -If you are running Windows, then install the libxml-ruby-x64-mingw32 gem. |
49 | | -The gem includes prebuilt extensions for Ruby 3.2 and 3.3. |
50 | | - |
51 | | -The gem also includes a Microsoft VC++ solution and XCode project - these |
52 | | -are very useful for debugging. |
53 | | - |
54 | | -libxml-ruby's source codes lives on GitHub[https://github.com/xml4r/libxml-ruby]. |
55 | | - |
56 | | -== Getting Started |
57 | | -Using libxml is easy. First decide what parser you want to use: |
58 | | - |
59 | | -* Generally you'll want to use the LibXML::XML::Parser which provides a tree based API. |
60 | | -* For larger documents that don't fit into memory, or if you prefer an input based API, use the LibXML::XML::Reader. |
61 | | -* To parse HTML files use LibXML::XML::HTMLParser. |
62 | | -* If you are masochistic, then use the LibXML::XML::SaxParser, which provides a callback API. |
63 | | - |
64 | | -Once you have chosen a parser, choose a datasource. Libxml can parse files, strings, URIs |
65 | | -and IO streams. For each data source you can specify an LibXML::XML::Encoding, a base uri and |
66 | | -various parser options. For more information, refer the LibXML::XML::Parser.document, |
67 | | -LibXML::XML::Parser.file, LibXML::XML::Parser.io or LibXML:::XML::Parser.string methods (the |
68 | | -same methods are defined on all four parser classes). |
69 | | - |
70 | | -== Advanced Functionality |
71 | | -Beyond the basics of parsing and processing XML and HTML documents, |
72 | | -libxml provides a wealth of additional functionality. |
73 | | - |
74 | | -Most commonly, you'll want to use its LibXML::XML::XPath support, which makes |
75 | | -it easy to find data inside an XML document. Although not as popular, |
76 | | -LibXML::XML::XPointer provides another API for finding data inside an XML document. |
77 | | - |
78 | | -Often times you'll need to validate data before processing it. For example, |
79 | | -if you accept user generated content submitted over the Web, you'll |
80 | | -want to verify that it does not contain malicious code such as embedded scripts. |
81 | | -This can be done using libxml's powerful set of validators: |
82 | | - |
83 | | -* DTDs (LibXML::XML::Dtd) |
84 | | -* Relax Schemas (LibXML::XML::RelaxNG) |
85 | | -* XML Schema (LibXML::XML::Schema) |
86 | | - |
87 | | -Finally, if you'd like to use XSL Transformations to process data, then install |
88 | | -the {libxslt gem}[https://github.com/xml4r/libxslt-rubygem]. |
89 | | - |
90 | | -== Usage |
91 | | -For information about using libxml-ruby please refer to its |
92 | | -documentation[https://xml4r.github.io/libxml-ruby]. Some tutorials are also |
93 | | -available[https://github.com/xml4r/libxml-ruby/wiki]. |
94 | | - |
95 | | -All libxml classes are in the LibXML::XML module. The easiest |
96 | | -way to use libxml is to <tt>require 'xml'</tt>. This will mixin |
97 | | -the LibXML module into the global namespace, allowing you to |
98 | | -write code like this: |
99 | | - |
100 | | - require 'xml' |
101 | | - document = XML::Document.new |
102 | | - |
103 | | -However, when creating an application or library you plan to |
104 | | -redistribute, it is best to not add the LibXML module to the global |
105 | | -namespace, in which case you can either write your code like this: |
106 | | - |
107 | | - require 'libxml' |
108 | | - document = LibXML::XML::Document.new |
109 | | - |
110 | | -Or you can utilize a namespace for your own work and include LibXML into it. |
111 | | -For example: |
112 | | - |
113 | | - require 'libxml' |
114 | | - |
115 | | - module MyApplication |
116 | | - include LibXML |
117 | | - |
118 | | - class MyClass |
119 | | - def some_method |
120 | | - document = XML::Document.new |
121 | | - end |
122 | | - end |
123 | | - end |
124 | | - |
125 | | -For simplicity's sake, the documentation uses the xml module in its examples. |
126 | | - |
127 | | -== Tests |
128 | | - |
129 | | -To run tests you first need to build the shared libary: |
130 | | - |
131 | | - rake compile |
132 | | - |
133 | | -Once you have build the shared libary, you can then run tests using rake: |
134 | | - |
135 | | - rake test |
136 | | - |
137 | | -+Build status: {rdoc-image:https://github.com/xml4r/libxml-ruby/actions/workflows/mri.yml/badge.svg}[https://github.com/xml4r/libxml-ruby/actions/workflows/mri.yml] |
138 | | - |
139 | | -== Performance |
140 | | - |
141 | | -In addition to being feature rich and conformation, the main reason |
142 | | -people use libxml-ruby is for performance. Here are the results |
143 | | -of a couple simple benchmarks recently blogged about on the |
144 | | -Web (you can find them in the benchmark directory of the |
145 | | -libxml distribution). |
146 | | - |
147 | | -From http://depixelate.com/2008/4/23/ruby-xml-parsing-benchmarks |
148 | | - |
149 | | - user system total real |
150 | | - libxml 0.032000 0.000000 0.032000 ( 0.031000) |
151 | | - Hpricot 0.640000 0.031000 0.671000 ( 0.890000) |
152 | | - REXML 1.813000 0.047000 1.860000 ( 2.031000) |
153 | | - |
154 | | -From https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks/ |
155 | | - |
156 | | - user system total real |
157 | | - libxml 0.641000 0.031000 0.672000 ( 0.672000) |
158 | | - hpricot 5.359000 0.062000 5.421000 ( 5.516000) |
159 | | - rexml 22.859000 0.047000 22.906000 ( 23.203000) |
160 | | - |
161 | | - |
162 | | -== Documentation |
163 | | -Documentation is available via rdoc, and is installed automatically with the |
164 | | -gem. |
165 | | - |
166 | | -libxml-ruby's {online |
167 | | -documentation}[https://xml4r.github.io/libxml-ruby/rdoc/index.html] is generated |
168 | | -using Hanna, which is a development gem dependency. |
169 | | - |
170 | | -Note that older versions of Rdoc, which ship with Ruby 1.8.x, will report |
171 | | -a number of errors. To avoid them, install Rdoc 2.1 or higher. Once you have |
172 | | -installed the gem, you'll have to disable the version of Rdoc that Ruby 1.8.x |
173 | | -includes. An easy way to do that is rename the directory |
174 | | -<tt>ruby/lib/ruby/1.8/rdoc</tt> to |
175 | | -<tt>ruby/lib/ruby/1.8/rdoc_old</tt>. |
176 | | - |
177 | | -== Support |
178 | | -If you have any questions about using libxml-ruby, please report an issue |
179 | | -on GitHub[https://github.com/xml4r/libxml-ruby/issues]. |
180 | | - |
181 | | -== Memory Management |
182 | | -libxml-ruby automatically manages memory associated with the |
183 | | -underlying libxml2 library. The bindings create a one-to-one mapping between |
184 | | -Ruby objects and libxml documents and libxml parent nodes (ie, nodes that do not |
185 | | -have a parent and do not belong to a document). In these cases, |
186 | | -the bindings manage the memory. They do this by installing a free |
187 | | -function and storing a back pointer to the Ruby object from the xmlnode |
188 | | -using the _private member on libxml structures. When the Ruby object |
189 | | -goes out of scope, the underlying libxml structure is freed. Libxml |
190 | | -itself then frees all child nodes (recursively). |
191 | | - |
192 | | -For all other nodes (the vast majority), the bindings create temporary |
193 | | -Ruby objects that get freed once they go out of scope. Thus there can be |
194 | | -more than one Ruby object pointing to the same xml node. To mostly hide |
195 | | -this from a programmer on the Ruby side, the <tt>#eql?</tt> and <tt>#==</tt> methods are |
196 | | -overriden to check if two Ruby objects wrap the same xmlnode. If they do, |
197 | | -then the methods return true. During the mark phase, each of these temporary |
198 | | -objects marks its owning document, thereby keeping the Ruby document object |
199 | | -alive and thus the xmldoc tree. |
200 | | - |
201 | | -In the sweep phase of the garbage collector, or when a program ends, |
202 | | -there is no order to how Ruby objects are freed. In fact, the Ruby document |
203 | | -object is almost always freed before any Ruby objects that wrap child nodes. |
204 | | -However, this is ok because those Ruby objects do not have a free function |
205 | | -and are no longer in scope (since if they were the document would not be freed). |
206 | | - |
207 | | -== License |
208 | | -See LICENSE for license information. |
| 1 | +# LibXML Ruby |
| 2 | + |
| 3 | +## Overview |
| 4 | +The libxml gem provides Ruby language bindings for GNOME's Libxml2 |
| 5 | +XML toolkit. It is free software, released under the MIT License. |
| 6 | + |
| 7 | +We think libxml-ruby is the best XML library for Ruby because: |
| 8 | + |
| 9 | +* Speed - It's much faster than REXML |
| 10 | +* Features - It provides an amazing number of features |
| 11 | +* Conformance - It passes all 1800+ tests from the OASIS XML Tests Suite |
| 12 | + |
| 13 | +## Requirements |
| 14 | +libxml-ruby requires Ruby 3.2 or higher. It depends on libxml2 to |
| 15 | +function properly. libxml2, in turn, depends on: |
| 16 | + |
| 17 | +* libm (math routines: very standard) |
| 18 | +* libz (zlib) |
| 19 | +* libiconv |
| 20 | + |
| 21 | +If you are running Linux or Unix you'll need a C compiler so the |
| 22 | +extension can be compiled when it is installed. If you are running |
| 23 | +Windows, then install the x64-mingw-ucr gem or build it yourself using |
| 24 | +[Ruby for Windows](https://rubyinstaller.org/) or directly with |
| 25 | +[msys2](https://msys2.github.io/) and ucrt64. |
| 26 | + |
| 27 | +## Installation |
| 28 | +The easiest way to install libxml-ruby is via RubyGems. To install: |
| 29 | + |
| 30 | +``` |
| 31 | +gem install libxml-ruby |
| 32 | +``` |
| 33 | + |
| 34 | +If the extension compile process cannot find libxml2, you may need to indicate |
| 35 | +the location of the libxml2 configuration utility as it is used to find the |
| 36 | +required header and include files. (If you need to indicate a location for the |
| 37 | +libxml2 library or header files different than reported by `xml2-config`, |
| 38 | +see the additional configuration options.) |
| 39 | + |
| 40 | +This may be done with RubyGems: |
| 41 | + |
| 42 | +``` |
| 43 | +gem install libxml-ruby -- --with-xml2-dir=/path/to/xml2-config |
| 44 | +``` |
| 45 | + |
| 46 | +Or bundler: |
| 47 | + |
| 48 | +``` |
| 49 | +bundle config build.libxml-ruby --with-xml2-config=/path/to/xml2-config |
| 50 | +bundle install libxml-ruby |
| 51 | +``` |
| 52 | + |
| 53 | +If you are running Windows, then install the libxml-ruby-x64-mingw32 gem. |
| 54 | +The gem includes prebuilt extensions for Ruby 3.2 and 3.3. |
| 55 | + |
| 56 | +The gem also includes a Microsoft VC++ solution and XCode project - these |
| 57 | +are very useful for debugging. |
| 58 | + |
| 59 | +libxml-ruby's source code lives on [GitHub](https://github.com/xml4r/libxml-ruby). |
| 60 | + |
| 61 | +## Getting Started |
| 62 | +Using libxml is easy. First decide what parser you want to use: |
| 63 | + |
| 64 | +* Generally you'll want to use the `LibXML::XML::Parser` which provides a tree based API. |
| 65 | +* For larger documents that don't fit into memory, or if you prefer an input based API, use the `LibXML::XML::Reader`. |
| 66 | +* To parse HTML files use `LibXML::XML::HTMLParser`. |
| 67 | +* If you are masochistic, then use the `LibXML::XML::SaxParser`, which provides a callback API. |
| 68 | + |
| 69 | +Once you have chosen a parser, choose a datasource. Libxml can parse files, strings, URIs |
| 70 | +and IO streams. For each data source you can specify an `LibXML::XML::Encoding`, a base uri and |
| 71 | +various parser options. For more information, refer the `LibXML::XML::Parser.document`, |
| 72 | +`LibXML::XML::Parser.file`, `LibXML::XML::Parser.io` or `LibXML::XML::Parser.string` methods (the |
| 73 | +same methods are defined on all four parser classes). |
| 74 | + |
| 75 | +## Advanced Functionality |
| 76 | +Beyond the basics of parsing and processing XML and HTML documents, |
| 77 | +libxml provides a wealth of additional functionality. |
| 78 | + |
| 79 | +Most commonly, you'll want to use its `LibXML::XML::XPath` support, which makes |
| 80 | +it easy to find data inside an XML document. Although not as popular, |
| 81 | +`LibXML::XML::XPointer` provides another API for finding data inside an XML document. |
| 82 | + |
| 83 | +Often times you'll need to validate data before processing it. For example, |
| 84 | +if you accept user generated content submitted over the Web, you'll |
| 85 | +want to verify that it does not contain malicious code such as embedded scripts. |
| 86 | +This can be done using libxml's powerful set of validators: |
| 87 | + |
| 88 | +* DTDs (`LibXML::XML::Dtd`) |
| 89 | +* Relax Schemas (`LibXML::XML::RelaxNG`) |
| 90 | +* XML Schema (`LibXML::XML::Schema`) |
| 91 | + |
| 92 | +Finally, if you'd like to use XSL Transformations to process data, then install |
| 93 | +the [libxslt gem](https://github.com/xml4r/libxslt-ruby). |
| 94 | + |
| 95 | +## Usage |
| 96 | +For information about using libxml-ruby please refer to its |
| 97 | +[documentation](https://xml4r.github.io/libxml-ruby/). |
| 98 | + |
| 99 | +All libxml classes are in the `LibXML::XML` module. The easiest |
| 100 | +way to use libxml is to `require 'xml'`. This will mixin |
| 101 | +the LibXML module into the global namespace, allowing you to |
| 102 | +write code like this: |
| 103 | + |
| 104 | +```ruby |
| 105 | +require 'xml' |
| 106 | +document = XML::Document.new |
| 107 | +``` |
| 108 | + |
| 109 | +However, when creating an application or library you plan to |
| 110 | +redistribute, it is best to not add the LibXML module to the global |
| 111 | +namespace, in which case you can either write your code like this: |
| 112 | + |
| 113 | +```ruby |
| 114 | +require 'libxml' |
| 115 | +document = LibXML::XML::Document.new |
| 116 | +``` |
| 117 | + |
| 118 | +Or you can utilize a namespace for your own work and include LibXML into it. |
| 119 | +For example: |
| 120 | + |
| 121 | +```ruby |
| 122 | +require 'libxml' |
| 123 | + |
| 124 | +module MyApplication |
| 125 | + include LibXML |
| 126 | + |
| 127 | + class MyClass |
| 128 | + def some_method |
| 129 | + document = XML::Document.new |
| 130 | + end |
| 131 | + end |
| 132 | +end |
| 133 | +``` |
| 134 | + |
| 135 | +For simplicity's sake, the documentation uses the xml module in its examples. |
| 136 | + |
| 137 | +## Tests |
| 138 | + |
| 139 | +To run tests you first need to build the shared library: |
| 140 | + |
| 141 | +``` |
| 142 | +rake compile |
| 143 | +``` |
| 144 | + |
| 145 | +Once you have built the shared library, you can then run tests using rake: |
| 146 | + |
| 147 | +``` |
| 148 | +rake test |
| 149 | +``` |
| 150 | + |
| 151 | +[](https://github.com/xml4r/libxml-ruby/actions/workflows/mri.yml) |
| 152 | + |
| 153 | +## Documentation |
| 154 | +Documentation is available at [xml4r.github.io/libxml-ruby](https://xml4r.github.io/libxml-ruby/). |
| 155 | + |
| 156 | +API reference documentation is generated via rdoc and is available at |
| 157 | +[xml4r.github.io/libxml-ruby/reference](https://xml4r.github.io/libxml-ruby/reference/). |
| 158 | + |
| 159 | +## Support |
| 160 | +If you have any questions about using libxml-ruby, please report an issue |
| 161 | +on [GitHub](https://github.com/xml4r/libxml-ruby/issues). |
| 162 | + |
| 163 | +## License |
| 164 | +See [LICENSE](LICENSE) for license information. |
0 commit comments