Bug 22891868 OHASD does not restart CRSD when crsd.bin is hanging

This note gives a brief overview of bug 22891868.

The content was last updated on: 28-JUN-2018

Click here for details of each of the sections below.

Affects

Product (Component)Oracle Server (PCW)

Range of versions believed to be affectedVersions BELOW 12.2

Versions confirmed as being affected

12.1.0.2 (Server Patch Set)

Platforms affectedGeneric (all / most platforms affected)

Fixed

The fix for 22891868 is first included in

12.2.0.1 (Base Release)

12.1.0.2.170418 (Apr 2017) Grid Infrastructure Patch Set Update (GI PSU)

12.1.0.2.170418 (Apr 2017) Bundle Patch for Windows Platforms

Interim patches may be available for earlier versions - click here to check.

Symptoms

Related To

(None Specified)

Cluster Ready Services / Parallel Server Management

Description

This bug is only relevant when using Real Application Clusters (RAC)

OHASD may not restart CRSD when crsd.bin is hanging

Rediscovery Notes

1. We may have this scenario

Time 1. CRSD hangs

Time 2. OHASD is terminated

Time 3. CRSD still hanging

Time 4. OHASD restart

2. A call stack on OHASD reveals a check on ASM is running

....

Clsn_agent:: CrsCmd:: ClscrsCmdData:: stat(clsagfw_aectx const*,

Std:: map<std:: basic_string<char, std:: char_traits<char>, std:: allocator<char>

>, std:: basic_string<char, std:: char_traits<char>, std:: allocator<char> >,

Std:: less<std:: basic_string<char, std:: char_traits<char>,

Std:: allocator<char> > >, std:: allocator<std:: pair<std:: basic_string<char,

Std:: char_traits<char>, std:: allocator<char> > const, std:: basic_string<char,

Std:: char_traits<char>, std:: allocator<char> > > > >&, CLSCRS_STATFLAG, bool)

()

#16 0x00000000006fe968 in

Clsn_agent:: CrsCmd:: ClscrsCmdData:: stat(clsagfw_aectx const*,

Std:: basic_string<char, std:: char_traits<char>, std:: allocator<char> >

Const&, std:: basic_string<char, std:: char_traits<char>, std:: allocator<char>

>&, CLSCRS_STATFLAG, bool) ()

#17 0x00000000006f805f in clsn_agent::CrsCmd::stat(clsagfw_aectx const*,

Std:: basic_string<char, std:: char_traits<char>, std:: allocator<char> >

Const&, std:: basic_string<char, std:: char_traits<char>, std:: allocator<char>

> const&, CLSCRS_FLAG, std:: basic_string<char, std:: char_traits<char>,

Std:: allocator<char> >&, std:: basic_string<char, std:: char_traits<char>,

Std:: allocator<char> >&, CLSCRS_STATFLAG, bool) ()

#18 0x000000000048104f in clsn_agent::AsmAgent::checkCbk(clsagfw_aectx

Const*, clsn_agent:: Gimh*, std:: basic_string<char, std:: char_traits<char>,

Std:: allocator<char> >&) ()

#19 0x0000000000554d16 in clsn_agent::InstAgent::checkState(clsagfw_aectx

Const*) ()

#20 0x0000000000551fbb in clsn_agent::InstAgent::check(clsagfw_aectx const*)

()

#21 0x000000000045f7a1 in clsn_agent::Agent::commonCheck(clsagfw_aectx

Const*) ()

#22 0x0000000000508510 in clsn_agent::check(clsagfw_aectx const*) ()
#23 0x000000000098cf80 in cls_agfw::Cmd::execute() ()
#24 0x0000000000990cfd in cls_agfw::CmdEx::executeCmd(cls::Message*) ()
#25 0x0000000000990b4f in cls_agfw::CmdEx::clsRequestHdlr(cls::Message*) ()
#26 0x00000000009fd333 in cls::ThreadModel::processQueue(sltstid*) ()
#27 0x00000000009fbe54 in cls::ThreadModel::runTM(void*) ()
#28 0x0000000000a098fb in CLS_Threading::CLSthreadMain::cppStart(void*) ()

3. When the check action on ASM is completed (or aborted) then OHASD tries to start CRSD. This could happen after 20 mins.

Workaround

Start the CRSD resource running

Crsctl start res ora.crsd -init

Note. This fix is dependent on the fix for bug 8934841

DBRECOVER Recovery Options

For Oracle incidents, start with the DBRECOVER for Oracle trial to verify table visibility, row previews, and export readiness on copied datafiles. For MySQL and InnoDB incidents, DBRECOVER for MySQL is free software and can inspect.ibd files, ibdata1, and database directories locally.

When the case is urgent, preserve the original files first, work from copies, and contact paid emergency support with the database version, platform, error messages, file list, and recovery objective.

Archive ParnassusData Blog Migration Archive