Overview
User has been archiving SenSage AP event data in different data centers using Centera devices. After some servers were cleaned up/decommissioned, now they are getting an error when trying to access a certain date range, because the archived data from the decommissioned nearline storage is no longer available to be queried:
| Showing all categories of error messages
|
| *ERROR*
| ERROR_MSG : Internal Error: Retrieval of object 'object_name.gz' from near line storage <nearline_storage_identifier/storage_id> failed NSS Reason: Exception: CentCommon.cpp:63: 0x7f486df0bb28 CenteraAPI(FPClip_Open| FP_SERVER_ERR - The server reported an error from the operation
. . .
| ERROR_CODE : 0080022
| ERROR_TYPE : SYSTEM_NONRECOVERABLE
. . .
| STACK_TRACE : [Host:<hostname>; IP:<ip_address>; Port:8072; App:XMLRPC; RPC:XMLRPC_METHOD; PID:19889; Function:CDSMPhysicalNode::findPhysicalManifestation; Locus:DSMPhysicalNode.cpp@2225]
Or a frontend error like this:
Information
If the data was moved and not lost, can the pointers/references in the database pointing to the archived data be moved to point to the data's new location?
The Nearline Storage Service (NSS) uses the NSI (Nearline Storage Identifier) and Centera clip IDs to retrieve the data from the Nearline Storage Device.
If the data was moved to a new device, a new NSI pointing to it should be configured and all the NODE.dat
files of all the corresponding archived leaves in the SLS dsroot Primary and Secondary of all SLS nodes for that table would need to be pointed to the new NSI by manually modifying the files.
There is no Sensage command available to change the NSI of the archived data, as this is set during archival and it is not designed to be changed once archived:
<NodeMetaData_v1>
<NodeType>LEAF</NodeType>
<Compression>HIGH</Compression>
<FragmentCount>1</FragmentCount>
<RecordCount>165776</RecordCount>
<SmallestTimestamp>2001-08-26T11:02:00.000000Z</SmallestTimestamp>
<LargestTimestamp>2001-09-26T23:51:25.000000Z</LargestTimestamp>
<NearLine>
<Fragment>
<Number>0</Number>
<NSI>centera1</NSI>
<StorageID>DBLT4OB5CLV3Ue9F6A2HOEVP3P8G41BUCF4A0R0Q14JP5RHBL47Q7</StorageID>
<ExpirationDate>1510099526</ExpirationDate>
<Tampered>0</Tampered>
</Fragment>
</NearLine>
<AltPath>../1970-01-01_00h00m00.000000sGMT-18.d</AltPath>
<TransactionId>549E9797D534F432836ABDA9B1F4FDA1</TransactionId>
<LogicalChildren>
</LogicalChildren>
<UID>2883CA65DC668E2A929DBCA09502A60D</UID>
</NodeMetaData_v1>
If moving the data results in the Centera changing the clip IDs, then the data cannot be recovered as the StorageID wouldn't match the correct data the NSS is requesting. More regarding how archiving works is explained in the attached Administration Guide.
If the data is found, can it be re-ingested into the Sensage database? Is there a way to recover the data?
The Nearline Storage uses a different format to store the information and can't be used to re-ingest it, so it is not possible to recover the archived data outside of the NSS service.
If the data cannot be recovered, is there a way to get Sensage to ignore that section of data?
The error message is due to the SLS trying to access non-reachable archived data at the NSI configured location. If the archived data cannot be found, it can be ignored by retiring the data with the FORCE option. For example:
atquery ... -e "RETIRE from syslog BEFORE _time('FEB 01 00:00:00 2020') FORCE";
This will clear out the leaves pointing to the missing archived data. The same command can also be used to delete data within a certain time range.
As the data was archived, by retiring the data we'd remove the references to the non-reachable archived data so that the query on the table and specific time range wouldn't report an error when running the reports. Instead, the report would appear empty for the time ranges that were archived and bring results as normal for data still in the main SLS.
Note: If the NSI is removed, the SLS would still be referencing it in the metadata inside the leaves and the error would only change to pointing to a missing NSI. So the best option would be to retire the archived data for the references to them to be removed completely. Once this is done, the reports shouldn't error out.
The NSS Cache
Partial recovery of the information could come from the NSS cache and the report cached results, but both come from previously executed reports when the NSI was available.
The Nearline Storage Server (NSS) that controls the archival of the data has its own cache. Some reports and queries may execute successfully even if the archived data does not exist anymore. This is because these reports must have recently been executed before the decommissioning and hence that data was saved to the cache. You can confirm that the reports fail when disabling the NSS cache.
There is a chance the NSS cache might help to recover some of the data, as found on any reports which are running thanks to the cache. But this is a hit or miss situation as the cache holds a format we cannot track and this cache is only queried when the reports try to access the archived data.
Comments
0 comments
Please sign in to leave a comment.