Skip to content

Commit e9f8e02

Browse files
Added the md files for TableExtractor
1 parent 41f04e9 commit e9f8e02

5 files changed

Lines changed: 401 additions & 0 deletions

File tree

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Extract tables from PDF and image documents in .NET | Syncfusion
3+
description: Syncfusion® Smart Table Extractor is a .NET library that extracts tabular data from PDF documents. It detects table regions, header rows, columns, cell spans (merged cells), and provides structured JSON.
4+
platform: SmartTableExtractor
5+
control: PDF
6+
documentation: UG
7+
keywords: Assemblies
8+
---
9+
# Assemblies Required to work with Smart Table Extractor
10+
11+
The following assemblies need to be referenced in your application based on the platform.
12+
<table>
13+
<thead>
14+
<tr>
15+
<th>Platform(s)</th>
16+
<th>Assembly</th>
17+
</tr>
18+
</thead>
19+
<tbody>
20+
<tr>
21+
<td>
22+
{{'[WPF](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-file-in-wpf)'| markdownify }},
23+
{{'[Windows Forms](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-file-in-windows-forms)'| markdownify }} and {{'[ASP.NET MVC](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-file-in-asp-net-mvc)'| markdownify }}
24+
</td>
25+
<td>
26+
Syncfusion.Compression.Base<br/>
27+
Syncfusion.ImagePreProcessor.Base<br/>
28+
Syncfusion.OCRProcessor.Base<br/>
29+
Syncfusion.Pdf.Base<br/>
30+
Syncfusion.PdfToImageConverter.Base<br/>
31+
</td>
32+
</tr>
33+
<tr>
34+
<td>
35+
{{'[Blazor](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-document-in-blazor)'| markdownify }},
36+
{{'[.NET Core](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-file-in-asp-net-core)'| markdownify }}
37+
and {{'[.NET Platforms](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-file-in-asp-net-mvc)'| markdownify }}
38+
</td>
39+
<td>
40+
Syncfusion.Compression.Portable<br/>
41+
Syncfusion.ImagePreProcessor.Portable<br/>
42+
Syncfusion.OCRProcessor.Portable<br/>
43+
Syncfusion.Pdf.Imaging.Portable<br/>
44+
Syncfusion.Pdf.Portable<br/>
45+
Syncfusion.PdfToImageConverter.Portable<br/>
46+
</td>
47+
</tr>
48+
<tr>
49+
<td>
50+
{{'[Windows UI library (WinUI)](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-file-in-winui)'| markdownify }},
51+
{{'[.NET Multi-platform App UI (.NET MAUI)](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/create-pdf-file-in-maui)'| markdownify }}
52+
</td>
53+
<td>
54+
Syncfusion.Compression.NET<br/>
55+
Syncfusion.ImagePreProcessor.NET<br/>
56+
Syncfusion.OCRProcessor.NET<br/>
57+
Syncfusion.Pdf.Imaging.NET<br/>
58+
Syncfusion.Pdf.NET<br/>
59+
Syncfusion.PdfToImageConverter.NET<br/>
60+
</td>
61+
</tr>
62+
</tbody>
63+
</table>
64+
65+
66+
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Extract tables from PDF and image documents in .NET | Syncfusion
3+
description: Syncfusion® Smart Table Extractor is a .NET library that extracts tabular data from PDF documents. It detects table regions, header rows, columns, cell spans (merged cells), and provides structured JSON.
4+
platform: documentProcessing
5+
control: SmartTableExtractor
6+
documentation: UG
7+
---
8+
9+
# How to resolve the “ONNX file missing” error in Smart Table Extractor
10+
11+
Problem:
12+
13+
When running Smart Table Extractor you may see an exception similar to the following:
14+
15+
```
16+
Microsoft.ML.OnnxRuntime.OnnxRuntimeException: '[ErrorCode:NoSuchFile] Load model from <path>\runtimes\models\syncfusion_doclayout.onnx failed. File doesn't exist'
17+
```
18+
19+
Cause:
20+
21+
This error occurs because the required ONNX model files (used internally for layout and data extraction) are not present in the application's build output (the project's `bin` runtime folder). The extractor expects the models under `runtimes\models` so the runtime can load them.
22+
23+
Solution:
24+
25+
1. Run a build so the application output is generated under `bin\Debug\netX.X\runtimes` (or your configured build configuration and target framework).
26+
2. Locate the project's build output `bin` path (for example: `bin\Debug\net6.0\runtimes`).
27+
3. Place all required ONNX model files into a `runtimes\models` folder inside that bin path (for example: `bin\Debug\net6.0\runtimes\models\syncfusion_doclayout.onnx`).
28+
4. In Visual Studio, for each ONNX file set **Properties → Copy to Output Directory → Copy always** so the model is included on every build.
29+
5. Rebuild and run your project. The extractor should now find the ONNX models and operate correctly.
30+
31+
Screenshot placeholder
32+
33+
Add a screenshot showing the exception or the `runtimes\models` folder layout. Save the image in the repo (suggested path relative to this file): `../images/onnx-missing.png` and include it here:
34+
35+
Notes:
36+
37+
- If you publish your application, ensure the `runtimes\models` folder and ONNX files are included in the publish output (you may need to mark files as content in the project file or use a <Content> entry).
38+
- If you prefer an automated approach, add the ONNX files to your project with `CopyToOutputDirectory` set, or create a post-build step to copy the models into the runtime folder.
39+
40+
If the problem persists after adding the model files, verify file permissions and the correctness of the model file names.
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
---
2+
title: Extract tables from PDF and image documents in .NET | Syncfusion
3+
description: Syncfusion® Smart Table Extractor is a .NET library that extracts tabular data from PDF documents. It detects table regions, header rows, columns, cell spans (merged cells), and provides structured JSON.
4+
platform: SmartTableExtractor
5+
control: PDF
6+
documentation: UG
7+
---
8+
9+
# SmartTableExtractor Features
10+
11+
## Extract Tables from a PDF Documents
12+
13+
To extract structured table data from a PDF document using the **ExtractTableAsJson** method of the **TableExtractor** class, refer to the following code
14+
15+
{% tabs %}
16+
17+
{% highlight c# tabtitle="C# [Cross-platform]" %}
18+
19+
using System.IO;
20+
using System.Text;
21+
using Syncfusion.SmartTableExtractor;
22+
23+
//Open the input PDF file as a stream.
24+
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
25+
{
26+
// Initialize the Smart Table Extractor
27+
TableExtractor extractor = new TableExtractor();
28+
29+
// Set all three options together
30+
TableExtractionOptions options = new TableExtractionOptions();
31+
options.DetectBorderlessTables = true;
32+
options.PageRange = new int[,] { { 1, 5 } };
33+
options.ConfidenceThreshold = 0.75;
34+
35+
extractor.TableExtractionOptions = options;
36+
37+
// Extract and save
38+
string data = extractor.ExtractTableAsJson(stream);
39+
File.WriteAllText("TableOutput_AllOptions.json", data, Encoding.UTF8);
40+
}
41+
42+
{% endhighlight %}
43+
44+
{% endtabs %}
45+
46+
## Extract Tables with detect borderless tables
47+
48+
To extract structured table data from a PDF document that contains tables without visible borders using the **ExtractTableAsJson** method of the **TableExtractor** class, refer to the following code examples.
49+
50+
{% tabs %}
51+
52+
{% highlight c# tabtitle="C# [Cross-platform]" %}
53+
54+
using System.IO;
55+
using System.Text;
56+
using Syncfusion.SmartTableExtractor;
57+
58+
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
59+
{
60+
// Initialize the Smart Table Extractor
61+
TableExtractor extractor = new TableExtractor();
62+
63+
// Set DetectBorderlessTables
64+
TableExtractionOptions options = new TableExtractionOptions();
65+
options.DetectBorderlessTables = true;
66+
67+
extractor.TableExtractionOptions = options;
68+
69+
// Extract and save
70+
string data = extractor.ExtractTableAsJson(stream);
71+
File.WriteAllText("Output.json", data, Encoding.UTF8);
72+
}
73+
74+
{% endhighlight %}
75+
76+
{% endtabs %}
77+
78+
## Extract Tables Within a Specific Page Range
79+
80+
To extract structured table data from a specific range of pages in a PDF document using the **ExtractTableAsJson** method of the **TableExtractor** class, refer to the following code example:
81+
82+
{% tabs %}
83+
84+
{% highlight c# tabtitle="C# [Cross-platform]" %}
85+
86+
using System.IO;
87+
using System.Text;
88+
using Syncfusion.SmartTableExtractor;
89+
90+
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
91+
{
92+
// Initialize the Smart Table Extractor
93+
TableExtractor extractor = new TableExtractor();
94+
95+
// Set only PageRange
96+
TableExtractionOptions options = new TableExtractionOptions();
97+
options.PageRange = new int[,] { { 2, 4 } };
98+
99+
extractor.TableExtractionOptions = options;
100+
101+
// Extract and save
102+
string data = extractor.ExtractTableAsJson(stream);
103+
File.WriteAllText("Output.json", data, Encoding.UTF8);
104+
}
105+
106+
{% endhighlight %}
107+
108+
{% endtabs %}
109+
110+
## Apply confidence threshold to extract the Table
111+
112+
To apply confidence thresholding when extracting table data from a PDF document using the **ExtractTableAsJson** method of the **TableExtractor** class, refer to the following code example:
113+
114+
{% tabs %}
115+
116+
{% highlight c# tabtitle="C# [Cross-platform]" %}
117+
118+
using System.IO;
119+
using System.Text;
120+
using Syncfusion.SmartTableExtractor;
121+
122+
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
123+
{
124+
// Initialize the Smart Table Extractor
125+
TableExtractor extractor = new TableExtractor();
126+
127+
// Set ConfidenceThreshold
128+
TableExtractionOptions options = new TableExtractionOptions();
129+
options.ConfidenceThreshold = 0.6;
130+
131+
extractor.TableExtractionOptions = options;
132+
133+
// Extract and save
134+
string data = extractor.ExtractTableAsJson(stream);
135+
File.WriteAllText("Output.json", data, Encoding.UTF8);
136+
}
137+
138+
{% endtabs %}
139+
140+
## Extract table data asynchronously from a PDF document
141+
142+
To extract table data asynchronously with cancellation support using the **ExtractTableAsJsonAsync** method of the **TableExtractor** class, refer to the following code example:
143+
144+
{% tabs %}
145+
146+
{% highlight c# tabtitle="C# [Cross-platform]" %}
147+
148+
using System.IO;
149+
using System.Text;
150+
using System.Threading;
151+
using Syncfusion.SmartTableExtractor;
152+
153+
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
154+
{
155+
// Declare and configure the extractor and options
156+
TableExtractionOptions extractionOptions = new TableExtractionOptions();
157+
extractionOptions.DetectBorderlessTables = true;
158+
extractionOptions.ConfidenceThreshold = 0.6;
159+
160+
TableExtractor tableExtractor = new TableExtractor();
161+
tableExtractor.TableExtractionOptions = extractionOptions;
162+
163+
// Create cancellation token with timeout
164+
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
165+
166+
// Call the async extraction API
167+
string data = await tableExtractor.ExtractTableAsJsonAsync(stream, cts.Token);
168+
169+
// Save the extracted data as JSON
170+
File.WriteAllText("TableOutput.json", data, Encoding.UTF8);
171+
}
172+
173+
174+
{% endhighlight %}
175+
176+
{% highlight c# tabtitle="C# [Windows-specific]" %}
177+
178+
using System.IO;
179+
using System.Text;
180+
using System.Threading;
181+
using Syncfusion.SmartTableExtractor;
182+
183+
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
184+
{
185+
// Declare and configure the extractor and options
186+
TableExtractionOptions extractionOptions = new TableExtractionOptions();
187+
extractionOptions.DetectBorderlessTables = true;
188+
extractionOptions.ConfidenceThreshold = 0.6;
189+
190+
TableExtractor tableExtractor = new TableExtractor();
191+
tableExtractor.TableExtractionOptions = extractionOptions;
192+
193+
// Create cancellation token with timeout
194+
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
195+
196+
// Call the async extraction API
197+
string data = await tableExtractor.ExtractTableAsJsonAsync(stream, cts.Token);
198+
199+
// Save the extracted data as JSON
200+
File.WriteAllText("TableOutput.json", data, Encoding.UTF8);
201+
}
202+
203+
204+
{% endhighlight %}
205+
206+
{% endtabs %}
207+
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Extract tables from PDF and image documents in .NET | Syncfusion
3+
description: Syncfusion® Smart Table Extractor is a .NET library that extracts tabular data from PDF documents. It detects table regions, header rows, columns, cell spans (merged cells), and provides structured JSON.
4+
platform: SmartTableExtractor
5+
control: PDF
6+
documentation: UG
7+
---
8+
# NuGet Packages Required for SmartTableExtractor
9+
10+
## Create and modify SmartTableExtractor documents
11+
12+
To work with SmartTableExtractor, the following NuGet packages need to be installed in your application.
13+
14+
<table>
15+
<tr>
16+
<thead>
17+
<th><b>Platform(s)</b></th>
18+
<th><b>NuGet Package</b></th>
19+
</thead>
20+
</tr>
21+
<tr>
22+
<td>
23+
Windows Forms<br/>
24+
Console Application (Targeting .NET Framework)
25+
</td>
26+
<td>
27+
{{'[Syncfusion.SmartTableExtractor.WinForms.nupkg](https://www.nuget.org/packages/Syncfusion.SmartTableExtractor.WinForms/)'| markdownify }}
28+
</td>
29+
</tr>
30+
<tr>
31+
<td>
32+
WPF
33+
</td>
34+
<td>
35+
{{'[Syncfusion.SmartTableExtractor.Wpf.nupkg](https://www.nuget.org/packages/Syncfusion.SmartTableExtractor.Wpf/)'| markdownify }}
36+
</td>
37+
</tr>
38+
<tr>
39+
<td>
40+
ASP.NET MVC5
41+
</td>
42+
<td>
43+
{{'[Syncfusion.SmartTableExtractor.AspNet.Mvc5.nupkg](https://www.nuget.org/packages/Syncfusion.SmartTableExtractor.AspNet.Mvc5/)'| markdownify }}
44+
</td>
45+
</tr>
46+
<tr>
47+
<td>
48+
ASP.NET Core (Targeting NET Core) <br/>
49+
Console Application (Targeting .NET Core) <br/>
50+
Blazor
51+
</td>
52+
<td>
53+
{{'[Syncfusion.SmartTableExtractor.Net.Core.nupkg](https://www.nuget.org/packages/Syncfusion.SmartTableExtractor.Net.Core/)'| markdownify }}
54+
</td>
55+
</tr>
56+
<tr>
57+
<td>
58+
Windows UI (WinUI) <br/>
59+
.NET Multi-platform App UI (.NET MAUI)
60+
</td>
61+
<td>
62+
{{'[Syncfusion.SmartTableExtractor.NET.nupkg](https://www.nuget.org/packages/Syncfusion.SmartTableExtractor.NET/)'| markdownify }}
63+
</td>
64+
</tr>
65+
</table>
66+

0 commit comments

Comments
 (0)