|
1218 | 1218 | Transaction Failure |
1219 | 1219 | </a> |
1220 | 1220 |
|
| 1221 | + <nav class="md-nav" aria-label="Transaction Failure"> |
| 1222 | + <ul class="md-nav__list"> |
| 1223 | + |
| 1224 | + <li class="md-nav__item"> |
| 1225 | + <a href="#recurring-transaction-recovery" class="md-nav__link"> |
| 1226 | + Recurring Transaction Recovery |
| 1227 | + </a> |
| 1228 | + |
| 1229 | +</li> |
| 1230 | + |
| 1231 | + </ul> |
| 1232 | + </nav> |
| 1233 | + |
1221 | 1234 | </li> |
1222 | 1235 |
|
1223 | 1236 | <li class="md-nav__item"> |
|
2009 | 2022 | Transaction Failure |
2010 | 2023 | </a> |
2011 | 2024 |
|
| 2025 | + <nav class="md-nav" aria-label="Transaction Failure"> |
| 2026 | + <ul class="md-nav__list"> |
| 2027 | + |
| 2028 | + <li class="md-nav__item"> |
| 2029 | + <a href="#recurring-transaction-recovery" class="md-nav__link"> |
| 2030 | + Recurring Transaction Recovery |
| 2031 | + </a> |
| 2032 | + |
| 2033 | +</li> |
| 2034 | + |
| 2035 | + </ul> |
| 2036 | + </nav> |
| 2037 | + |
2012 | 2038 | </li> |
2013 | 2039 |
|
2014 | 2040 | <li class="md-nav__item"> |
@@ -2074,17 +2100,71 @@ <h2 id="transaction-failure">Transaction Failure</h2> |
2074 | 2100 | caused. It is suggested to run the transaction repair process on a |
2075 | 2101 | separate machine connected to the cluster to isolate failures. Configure |
2076 | 2102 | a separately controlled process to run the following where the start |
2077 | | -time specifies the time since epoch where the recovery process should |
2078 | | -start reading from the write-ahead log. |
2079 | | -<div class="highlight"><pre><span></span><code><span class="n">recovery</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JanusGraphFactory</span><span class="o">.</span><span class="na">startTransactionRecovery</span><span class="o">(</span><span class="n">graph</span><span class="o">,</span><span class="w"> </span><span class="n">startTime</span><span class="o">,</span><span class="w"> </span><span class="n">TimeUnit</span><span class="o">.</span><span class="na">MILLISECONDS</span><span class="o">);</span> |
| 2103 | +time (Java Instant that specifies the time since epoch) where the recovery |
| 2104 | +process should start reading from the write-ahead log. |
| 2105 | +<div class="highlight"><pre><span></span><code><span class="n">recovery</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JanusGraphFactory</span><span class="o">.</span><span class="na">startTransactionRecovery</span><span class="o">(</span><span class="n">graph</span><span class="o">,</span><span class="w"> </span><span class="n">startTime</span><span class="o">);</span> |
2080 | 2106 | </code></pre></div></p> |
| 2107 | +<p>Once the recovery process is started, the process never ends and stops |
| 2108 | +only if:</p> |
| 2109 | +<ol> |
| 2110 | +<li>it is manually stopped by calling <code>recovery.shutdown()</code></li> |
| 2111 | +<li>the process encounters errors and fails due to exception</li> |
| 2112 | +<li>the graph gets closed</li> |
| 2113 | +</ol> |
| 2114 | +<p>While the recovery process runs, <code>recovery.getStatistics()</code> call provides |
| 2115 | +information about the progress of recovery process by returning three numbers:</p> |
| 2116 | +<ol> |
| 2117 | +<li>the first number shows how many secondary persistence transaction succeeded</li> |
| 2118 | +<li>the second number shows how many secondary persistence transaction failed |
| 2119 | +and attempted to be recovered</li> |
| 2120 | +<li>the third number shows how many failed secondary persistence transaction |
| 2121 | +could not be recovered</li> |
| 2122 | +</ol> |
| 2123 | +<p>Depending on the used <code>startTime</code> value and configured <code>log.tx.read-interval</code> |
| 2124 | +configuration option, the recovery process might need to run for hours in |
| 2125 | +order to process all relevant entries from the write-ahead log. |
| 2126 | +<code>startTime</code> defines the point in time from which the write-ahead |
| 2127 | +log should be read. The log is read in every <code>log.tx.read-interval</code> |
| 2128 | +millisecond and an approximately 100 seconds long chunk is processed in |
| 2129 | +one iteration.</p> |
| 2130 | +<p>For example, if <code>startTime</code> is configured to look for log entries from |
| 2131 | +the last twenty hours (72 000 seconds) and <code>log.tx.read-interval</code> is set to |
| 2132 | +5 000 ms (5 seconds), it might take approximately at least one hour |
| 2133 | +(72 000 / 100 * 5 = 3 600 seconds = 1 hour) while all log entries are processed. |
| 2134 | +Note: in case there are many failed secondary persistence transactions, the recovery |
| 2135 | +process might take much longer as fixing those transactions takes time.</p> |
| 2136 | +<p>When a write-ahead log entry is found that should be repaired, |
| 2137 | +the following INFO level log message appears in the JanusGraph's |
| 2138 | +logging system where the transaction ID appears between the |
| 2139 | +squared brackets:</p> |
| 2140 | +<div class="highlight"><pre><span></span><code>Attempting to repair partially failed transaction [...] |
| 2141 | +</code></pre></div> |
| 2142 | +<p>Even if the recovery process is stopped by calling <code>recovery.shutdown()</code>, |
| 2143 | +when it started again for the same graph <code>Provided read marker is not compatible |
| 2144 | +with existing read marker for previously registered readers</code> error is shown. |
| 2145 | +To avoid the error, the graph needs to be closed by <code>graph.close()</code> before the |
| 2146 | +process is started again.</p> |
2081 | 2147 | <p>Enabling the transaction write-ahead log causes an additional write |
2082 | 2148 | operation for mutating transactions which increases the latency. Also |
2083 | 2149 | note, that additional space is required to store the log. The |
2084 | 2150 | transaction write-ahead log has a configurable time-to-live of 2 days |
2085 | 2151 | which means that log entries expire after that time to keep the storage |
2086 | 2152 | overhead small. Refer to <a href="../../configs/configuration-reference/">Configuration Reference</a> for a complete list of all |
2087 | 2153 | log related configuration options to fine tune logging behavior.</p> |
| 2154 | +<h3 id="recurring-transaction-recovery">Recurring Transaction Recovery</h3> |
| 2155 | +<p>In case of daily data ingestion, transaction recovery needs to run recurring to ensure that |
| 2156 | +both primary and secondary persistence (e.g. indexing data by the mixed index backend) |
| 2157 | +of the data succeeds each day.</p> |
| 2158 | +<p>Since <code>JanusGraphFactory.startTransactionRecovery()</code> is not meant to be executed on |
| 2159 | +recurring way, JanusGraph provides a dedicated way to run transaction recovery multiple |
| 2160 | +times on the same graph:</p> |
| 2161 | +<div class="highlight"><pre><span></span><code><span class="n">recovery</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JanusGraphFactory</span><span class="o">.</span><span class="na">startRecurringTransactionRecovery</span><span class="o">(</span><span class="n">graph</span><span class="o">,</span><span class="w"> </span><span class="n">startTime</span><span class="o">);</span> |
| 2162 | +</code></pre></div> |
| 2163 | +<p>Similarly to the normal transaction recovery process, the recurring transaction recovery |
| 2164 | +process has the same <code>graph</code> and <code>startTime</code> parameters and provides the same <code>getStatistics()</code> |
| 2165 | +and <code>shutdown()</code> methods.</p> |
| 2166 | +<p>Once the process is stopped, <code>startRecurringTransactionRecovery()</code> can be used |
| 2167 | +to start the process again from the same or from another start time.</p> |
2088 | 2168 | <h2 id="janusgraph-instance-failure">JanusGraph Instance Failure</h2> |
2089 | 2169 | <p>JanusGraph is robust against individual instance failure in that other |
2090 | 2170 | instances of the JanusGraph cluster are not impacted by such failure and |
|
0 commit comments