Troubleshooting in MAS vs Maximo 7.6: The Complete Comparison Guide
Series: MAS ADMIN | Part 7 of 8
Read Time: 22-28 minutes
Who this is for: Maximo administrators transitioning from 7.6 to MAS who need a practical translation of their troubleshooting instincts into MAS-native workflows. Also valuable for new MAS admins who want to understand what veteran admins already know about diagnosing Maximo issues -- and how that knowledge maps to the new platform.
Introduction: Two Admins, Same Problem, Different Worlds
Picture this. It is a Tuesday at 2:15 PM. The integration between Maximo and the ERP system has stopped processing purchase requisitions. Procurement is on the phone. Accounts payable is escalating. The plant manager wants answers.
In 2019, the Maximo 7.6 admin would have been SSH'd into the server within 90 seconds, tailing logs, querying the database, and restarting the integration framework -- all before the second phone call came in.
In 2026, the MAS admin faces the same urgency but reaches for entirely different tools. There is no SSH. There is no direct database. The application runs across a cluster of containers orchestrated by Kubernetes. The integration is managed by an API-first service layer.
We have guided hundreds of administrators through this transition. The good news is that every troubleshooting instinct you developed over years of 7.6 administration remains valuable. The workflows change, but the diagnostic thinking does not. This guide is your Rosetta Stone.
The Mental Model Shift
Before we dive into specific scenarios, we need to name the fundamental shift.
Maximo 7.6 Power Model: CONTROL
You had root access. You could restart any service, query any table, modify any configuration file, and watch any log in real time. Your troubleshooting speed was limited only by your knowledge of the system internals.
MAS Power Model: OBSERVE and COORDINATE
You have namespace-scoped access. You can read logs, inspect pod states, review API responses, and analyze metrics. For issues outside your control boundary, you collect evidence and coordinate with IBM Support or your platform team.
Key insight: The shift is not a loss of capability -- it is a change in leverage. A 7.6 admin who could restart a JVM in 30 seconds now needs to understand why the pod restarted itself. That understanding, once acquired, is far more powerful than a manual restart ever was.
This does not mean MAS admins are less capable. It means the skill profile has changed. The best MAS admins we have worked with combine deep Maximo domain knowledge (what the application should be doing) with cloud-native observability skills (how to see what it is actually doing).
Scenario 1: Integration Failure
The Situation
Outbound purchase requisitions from Maximo to the ERP system have stopped flowing. The integration queue is backing up. Users are reporting that approved PRs are "stuck."
The 7.6 Workflow
Step 1: Check the MIF Outbound Queue
-- Direct database query to check queue status
SELECT
IFACENAME, STATUS, COUNT(*) as MSG_COUNT
FROM
MAXINTMSGTRK
WHERE
IFACENAME = 'MXPRInterface'
AND STATUS IN ('ERROR', 'RETRY')
GROUP BY
IFACENAME, STATUS
ORDER BY
MSG_COUNT DESC;Step 2: Review systemout.log on WebSphere
# SSH to the application server
ssh maxadmin@maximo-app-01
# Tail the WebSphere system log
tail -500f /opt/IBM/WebSphere/AppServer/profiles/ctgAppSrv01/logs/server1/systemout.log \
| grep -i "MXPRInterface\|integration\|ERROR"You would see something like:
[2/6/26 14:15:02:341 CST] 0000024a IntegrationSe E BMXAA7634E -
Integration error for MXPRInterface: Connection refused to endpoint
https://erp-system.company.com:8443/api/purchase-req
[2/6/26 14:15:02:342 CST] 0000024a IntegrationSe E BMXAA7635E -
Message placed in error queue. Transaction ID: PR-2026-00847
Root cause: java.net.ConnectException: Connection refusedStep 3: Test Connectivity
# From the Maximo server, test the endpoint directly
curl -v https://erp-system.company.com:8443/api/purchase-req
# Check if the ERP endpoint is reachable
telnet erp-system.company.com 8443Step 4: Check and Restart the Integration Framework
# In WebSphere admin console, or via wsadmin scripting
# Restart the MIF JMS resources
# Or restart the entire JVM if needed
/opt/IBM/WebSphere/AppServer/profiles/ctgAppSrv01/bin/stopServer.sh server1
/opt/IBM/WebSphere/AppServer/profiles/ctgAppSrv01/bin/startServer.sh server1Step 5: Reprocess Failed Messages
Navigate to Integration > Message Reprocessing in the Maximo UI, select the failed messages, and click Reprocess.
Total time for experienced 7.6 admin: 15-25 minutes.
The MAS Workflow
Step 1: Check Integration Service Pod Health
# List pods in the Manage namespace
oc get pods -n mas-inst1-manage -l app=manage \
--sort-by='.status.startTime'
# Check for recently restarted or failing pods
oc get pods -n mas-inst1-manage | grep -E "Error|CrashLoop|0/"Step 2: Stream Application Logs
# Find the specific Manage pod
oc get pods -n mas-inst1-manage -l app.kubernetes.io/component=manage-server
# Stream logs and filter for integration errors
oc logs -f manage-server-pod-abc123 -n mas-inst1-manage \
| grep -i "integration\|MXPRInterface\|ERROR\|BMXAA"You would see output similar to:
2026-02-06T14:15:02.341Z ERROR [IntegrationService] BMXAA7634E -
Integration error for MXPRInterface: Connection refused to endpoint
https://erp-system.company.com:8443/api/purchase-req
2026-02-06T14:15:02.342Z ERROR [IntegrationService] BMXAA7635E -
Outbound message failed. Interface: MXPRInterface,
Transaction: PR-2026-00847,
HTTP Status: N/A (connection refused)Step 3: Review API Response Codes and Tokens
# Check if the integration endpoint external service is configured correctly
# In Maximo UI: Integration > End Points
# If using API keys or OAuth tokens, verify they have not expired
# Check Keycloak or the identity provider for token validity
# For on-prem MAS, check the Suite API key
oc get secret inst1-credentials-superuser -n mas-inst1-core \
-o jsonpath='{.data.api_key}' | base64 -dStep 4: Verify Network Policies and Egress
# Check if network policies are blocking outbound traffic
oc get networkpolicy -n mas-inst1-manage
# Test outbound connectivity from within the pod
oc exec manage-server-pod-abc123 -n mas-inst1-manage -- \
curl -v https://erp-system.company.com:8443/api/purchase-reqStep 5: Reprocess Failed Messages
Navigate to Integration > Message Reprocessing in the MAS Manage UI (this part has not changed from 7.6), select failed messages, and reprocess.
Total time for experienced MAS admin: 20-35 minutes.
Side-by-Side Comparison: Integration Failure
Aspect — Maximo 7.6 — MAS
Log access — SSH + tail systemout.log — oc logs -f pod-name
Database query — Direct SQL via DB client — Admin UI or API queries
Connectivity test — curl/telnet from server — oc exec into pod, then curl
Restart — Stop/start WebSphere JVM — Delete pod (operator recreates) or restart deployment
Credential check — Properties files on disk — Kubernetes secrets, Keycloak tokens
Network diagnosis — traceroute, netstat — oc get networkpolicy, oc exec
Message reprocessing — Maximo UI (unchanged) — MAS Manage UI (unchanged)
Scenario 2: Performance Degradation
The Situation
Users are reporting that the Work Order Tracking application takes 12-15 seconds to load, compared to the usual 2-3 seconds. The system feels "sluggish" across all modules.
The 7.6 Workflow
Step 1: Check JVM Heap and Garbage Collection
# SSH to the server
ssh maxadmin@maximo-app-01
# Check JVM memory usage via WebSphere
# Navigate to WAS Admin Console > Servers > Server1 > Runtime tab
# Or use command line:
/opt/IBM/WebSphere/AppServer/bin/wsadmin.sh -lang jython -c \
"AdminControl.invoke(AdminControl.queryNames('type=JVM,process=server1,*'), 'getHeapInfo')"
# Check GC logs
tail -100 /opt/IBM/WebSphere/AppServer/profiles/ctgAppSrv01/logs/server1/native_stderr.log \
| grep -i "GC\|heap\|pause"Step 2: Check Database Performance
-- Check for long-running queries (Oracle example)
SELECT
sql_id, elapsed_time/1000000 as elapsed_sec,
sql_text
FROM
v$sql
WHERE
parsing_schema_name = 'MAXIMO'
AND elapsed_time/1000000 > 5
ORDER BY
elapsed_time DESC
FETCH FIRST 20 ROWS ONLY;
-- Check for table locks
SELECT
s.sid, s.serial#, s.username, o.object_name,
l.locked_mode
FROM
v$locked_object l
JOIN dba_objects o ON l.object_id = o.object_id
JOIN v$session s ON l.session_id = s.sid
WHERE
s.username = 'MAXIMO';Step 3: Check Thread Pool Utilization
# WebSphere thread pool monitoring
# WAS Console > Servers > server1 > Thread Pools > WebContainer
# Look for MaxSize vs CurrentSize
# Or via wsadmin
/opt/IBM/WebSphere/AppServer/bin/wsadmin.sh -lang jython -c \
"print AdminControl.getAttribute(AdminControl.queryNames('type=ThreadPool,name=WebContainer,*'), 'stats')"Step 4: Review Escalation and Workflow Processing
-- Check if escalations are overwhelming the system
SELECT
ESCALATION, STATUS, COUNT(*) as ACTIVE_COUNT
FROM
ESCALATION
WHERE
STATUS = 'ACTIVE'
GROUP BY
ESCALATION, STATUS
ORDER BY
ACTIVE_COUNT DESC;Step 5: Restart and Tune
If garbage collection pauses were the culprit, increase heap size in WebSphere JVM settings. If slow queries were the issue, update database statistics or add indexes. Restart the JVM to clear thread pool exhaustion.
The MAS Workflow
Step 1: Check Pod Resource Utilization
# Check CPU and memory across all Manage pods
oc adm top pods -n mas-inst1-manage
# Example output:
# NAME CPU(cores) MEMORY(bytes)
# manage-server-pod-abc123 1850m 3841Mi
# manage-server-pod-def456 920m 2100Mi
# manage-server-pod-ghi789 1780m 3790Mi
# Check if pods are hitting resource limits
oc describe pod manage-server-pod-abc123 -n mas-inst1-manage \
| grep -A 5 "Limits\|Requests\|Last State\|OOMKilled"Step 2: Review Pod Events and Restarts
# Check for OOMKilled events or throttling
oc get events -n mas-inst1-manage --sort-by='.lastTimestamp' \
| grep -i "oom\|kill\|evict\|unhealthy\|backoff"
# Check restart counts (high restarts suggest memory leaks or crashes)
oc get pods -n mas-inst1-manage -o wide \
| awk '{print $1, $4}' | sort -k2 -n -r | head -10Step 3: Check Database Connection Pools via Logs
# Look for connection pool exhaustion in application logs
oc logs manage-server-pod-abc123 -n mas-inst1-manage \
| grep -i "connection pool\|DSRA\|datasource\|timeout\|pool size"
# Typical pool exhaustion log:
# DSRA9400W: Connection pool {pool-name} reached maximum size of 40
# DSRA9110E: Connection not available, request timed outStep 4: Check Horizontal Pod Autoscaler (if configured)
# See if autoscaling is active and if more replicas are needed
oc get hpa -n mas-inst1-manage
# Example output:
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# manage-server Deployment/manage 82%/70% 2 6 4Step 5: Collect Diagnostics for IBM Support
# If the issue is beyond your tuning scope, collect comprehensive data
oc adm must-gather --dest-dir=/tmp/mas-perf-issue \
-- /usr/bin/gather_audit_logs
# Also gather Maximo-specific diagnostics
oc exec manage-server-pod-abc123 -n mas-inst1-manage -- \
/opt/IBM/SMP/maximo/tools/maximo/log/collectlogs.shSide-by-Side Comparison: Performance Degradation
Aspect — Maximo 7.6 — MAS
Memory monitoring — WAS Admin Console, JVM heap dumps — oc adm top pods, oc describe pod
Database performance — Direct SQL against v$sql, AWR reports — Application logs for pool errors, DBA coordination
Thread analysis — WAS thread pool stats, thread dumps — Pod CPU metrics, replica scaling
Tuning — JVM args, WAS thread pools, DB indexes — Resource limits in CRs, HPA config, pod replicas
Root cause access — Full stack visibility — Application-layer visibility, platform via IBM
Scenario 3: Authentication and SSO Issues
The Situation
Users are receiving "401 Unauthorized" errors when trying to log into Maximo. Some users can log in; others cannot. The SSO configuration appears to have broken after a recent change.
The 7.6 Workflow
Step 1: Check WebSphere Security Configuration
# SSH to the server
ssh maxadmin@maximo-app-01
# Check WAS security settings
/opt/IBM/WebSphere/AppServer/bin/wsadmin.sh -lang jython -c \
"sec = AdminConfig.list('Security'); print AdminConfig.show(sec)"
# Check TAI (Trust Association Interceptor) configuration
cat /opt/IBM/WebSphere/AppServer/profiles/ctgAppSrv01/config/cells/*/security.xml \
| grep -i "TAI\|interceptor\|SAML"Step 2: Review LDAP Connectivity
# Test LDAP connection
ldapsearch -H ldap://ldap-server.company.com:389 \
-D "cn=maxadmin,ou=serviceaccounts,dc=company,dc=com" \
-w $LDAP_PASSWORD \
-b "ou=users,dc=company,dc=com" \
"(uid=failing-user)"
# Check WAS LDAP registry configuration
grep -r "LDAP\|ldap" /opt/IBM/WebSphere/AppServer/profiles/ctgAppSrv01/config/cells/*/security.xmlStep 3: Check User Sync Status
-- Verify the user exists and is active in Maximo
SELECT
USERID, STATUS, LOGINID, PERSONID,
LASTLOGINTIME, FAILEDLOGINS
FROM
MAXUSER
WHERE
USERID = 'FAILING_USER';
-- Check user group membership
SELECT
g.GROUPNAME, g.DESCRIPTION
FROM
GROUPUSER gu
JOIN MAXGROUP g ON gu.GROUPNAME = g.GROUPNAME
WHERE
gu.USERID = 'FAILING_USER';Step 4: Review Login Traces
# Enable trace for security components
# WAS Console > Troubleshooting > Logs and Trace > server1
# Set trace: com.ibm.ws.security.*=all
# Then watch the trace
tail -f /opt/IBM/WebSphere/AppServer/profiles/ctgAppSrv01/logs/server1/trace.log \
| grep -i "authenticate\|FAILING_USER\|denied\|401"Step 5: Fix and Test
Reset user password, clear failed login counts, repair LDAP sync, or reconfigure the TAI interceptor. Restart WebSphere to pick up security configuration changes.
The MAS Workflow
Step 1: Check Keycloak (or AppConnect) Status
# Check if the Keycloak pods are healthy
oc get pods -n ibm-common-services | grep keycloak
# Review Keycloak logs for authentication failures
oc logs keycloak-0 -n ibm-common-services \
| grep -i "auth\|failed\|invalid\|401\|FAILING_USER"
# Typical Keycloak error:
# WARN [org.keycloak.events] type=LOGIN_ERROR, realmId=mas,
# clientId=manage, userId=null, error=invalid_user_credentials,
# username=FAILING_USERStep 2: Verify Identity Provider Configuration
# Check the identity provider (LDAP/SAML) configuration in Keycloak
# Access Keycloak Admin Console at:
# https://keycloak-ibm-common-services.apps.cluster.company.com/auth/admin
# Or check via API
oc exec keycloak-0 -n ibm-common-services -- \
/opt/jboss/keycloak/bin/kcadm.sh get identity-provider/instances \
-r mas --server http://localhost:8080/auth \
--realm master --user admin --password $(oc get secret credential-keycloak-initial-admin \
-n ibm-common-services -o jsonpath='{.data.password}' | base64 -d)Step 3: Check User Synchronization
# Check if the user sync job ran successfully
oc logs -l app=usersync -n mas-inst1-core --tail=200 \
| grep -i "sync\|complete\|error\|FAILING_USER"
# Verify the user exists in the MAS user registry
# Navigate to MAS Suite Administration > Users
# Or via API:
curl -s -H "Authorization: Bearer $API_KEY" \
"https://admin.mas.company.com/api/v1/users/FAILING_USER" | jq .Step 4: Review Suite-Level Authentication Logs
# Check the core namespace for authentication-related errors
oc logs -l app.kubernetes.io/component=api -n mas-inst1-core \
| grep -i "auth\|token\|jwt\|401\|403"
# Check if the OAuth token exchange is working
oc get route -n ibm-common-services | grep keycloakStep 5: Verify Certificate Chain
# SSO issues frequently trace back to certificate problems
oc get secret router-certs-default -n openshift-ingress \
-o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates
# Check for certificate expiration in the MAS namespace
oc get certificates -n mas-inst1-core
oc get certificates -n ibm-common-servicesSide-by-Side Comparison: Authentication and SSO
Aspect — Maximo 7.6 — MAS
Identity system — WebSphere security + LDAP registry — Keycloak (or SAML IdP) + LDAP federation
User lookup — Direct SQL against MAXUSER — Keycloak Admin Console, MAS Admin API
Login trace — WAS security trace logs — Keycloak pod logs, Core API logs
Credential reset — Direct DB update or WAS console — Keycloak user management, MAS Admin UI
Certificate check — Keystore files on filesystem — oc get certificates, OpenShift secret inspection
SSO configuration — TAI/SAML in WAS — Keycloak realm and client configuration
Scenario 4: Cron Task and Scheduled Job Failures
The Situation
Preventive Maintenance (PM) work orders are not being generated. The PMWoGenCronTask that runs every night at 11 PM has stopped producing results. Maintenance planners are raising alarms.
The 7.6 Workflow
Step 1: Check Cron Task Manager Status
-- Check the cron task instance configuration
SELECT
INSTANCENAME, CRONNAME, SCHEDULE,
ACTIVE, RUNASUSERID, LASTRUNDATE,
HASLD
FROM
CRONTASKINSTANCE
WHERE
CRONNAME = 'PMWOGENCRONTASK';Navigate to System Configuration > Platform Configuration > Cron Task Setup in the Maximo UI. Verify the cron task instance is Active and the schedule is correct.
Step 2: Check the Cron Task Log
# The cron task log is written to the Maximo log directory
tail -200 /opt/IBM/SMP/maximo/logs/maximo.log \
| grep -i "PMWoGen\|crontask\|PMWOGENCRONTASK"
# Typical failure log:
# [2/6/26 23:00:01:102] BMXAA6780I - Cron Task PMWOGENCRONTASK:PMWOGEN01
# started on server MXServer
# [2/6/26 23:00:15:445] BMXAA4187E - Error executing cron task PMWOGENCRONTASK:
# BMXAA4188E - PM 1047: Next date calculation failed.
# java.lang.NullPointerException at psdi.app.pm.PMService.generateWorkOrdersStep 3: Investigate the Specific PM
-- Find the PM that is causing the failure
SELECT
PMNUM, DESCRIPTION, SITEID,
NEXTDATE, FREQUENCY, FREQUNIT,
STATUS
FROM
PM
WHERE
PMNUM = '1047'
AND SITEID = 'BEDFORD';
-- Check for data quality issues
SELECT
PMNUM, SITEID, NEXTDATE
FROM
PM
WHERE
NEXTDATE IS NULL
AND STATUS = 'ACTIVE'
AND SITEID = 'BEDFORD';Step 4: Fix and Rerun
Correct the data issue on PM 1047 (perhaps a null next date or invalid frequency), then manually trigger the cron task from the Cron Task Setup application by clicking "Reload Request."
The MAS Workflow
Step 1: Check Cron Task Status in the Manage UI
Navigate to System Configuration > Platform Configuration > Cron Task Setup in the MAS Manage application. This interface is largely unchanged from 7.6 -- one of the familiar touchpoints.
Step 2: Check Manage Pod Logs for Cron Execution
# Find the Manage server pods (cron tasks run inside the Manage JVM)
oc get pods -n mas-inst1-manage -l app.kubernetes.io/component=manage-server
# Stream logs filtering for cron task activity
oc logs manage-server-pod-abc123 -n mas-inst1-manage --tail=500 \
| grep -i "PMWoGen\|crontask\|BMXAA6780\|BMXAA4187"
# If cron tasks run on a dedicated pod (ServerBundle configuration),
# check the cron-specific pod
oc get pods -n mas-inst1-manage -l app.kubernetes.io/component=manage-cron
oc logs manage-cron-pod-xyz789 -n mas-inst1-manage --tail=500 \
| grep -i "PMWoGen\|crontask"Step 3: Check if the Cron Pod is Even Running
# In MAS, cron tasks may run in a separate ServerBundle
# If that pod is down, no cron tasks execute at all
oc get pods -n mas-inst1-manage | grep cron
# If no cron pods exist, check the ManageWorkspace CR
oc get ManageWorkspace -n mas-inst1-manage -o yaml \
| grep -A 20 "serverBundles"Step 4: Review Cron Task History via API
# Use the Maximo REST API to query cron task history
curl -s -H "Authorization: Bearer $API_KEY" \
"https://manage.mas.company.com/maximo/oslc/os/mxapicrontask?\
oslc.where=crontaskname=%22PMWOGENCRONTASK%22&\
oslc.select=instancename,schedule,active,lastrundate" | jq .Step 5: Check for Resource Constraints Affecting Cron
# Cron task failures in MAS often trace to pod resource limits
oc describe pod manage-cron-pod-xyz789 -n mas-inst1-manage \
| grep -A 10 "Resources\|Limits\|OOMKilled"
# Check if the pod was evicted or restarted during cron execution
oc get events -n mas-inst1-manage --sort-by='.lastTimestamp' \
| grep "cron\|evict\|oom\|kill"Side-by-Side Comparison: Cron Task Failures
Aspect — Maximo 7.6 — MAS
Cron task UI — Cron Task Setup (unchanged) — Cron Task Setup (same UI in Manage)
Log location — maximo.log on filesystem — oc logs from Manage or Cron pod
Database investigation — Direct SQL against PM, CRONTASKINSTANCE — REST API queries or Manage UI
Cron execution host — MXServer JVM on known host — Manage pod (or dedicated cron ServerBundle pod)
Resource issues — JVM heap, server memory — Pod resource limits, OOMKilled events
Manual trigger — Reload Request in Cron Task Setup — Reload Request in Cron Task Setup (unchanged)
The MAS Troubleshooting Decision Tree
When an issue occurs in MAS, follow this structured approach before reaching for any tool.
Step 1: Classify the Symptom (First 2 Minutes)
- Who is affected? One user, one site, all users, or all sites?
- What is the symptom? Error message, slow response, no response, wrong data?
- When did it start? After a deployment, after a configuration change, or seemingly random?
- Where does it occur? One application, one module, all applications, or the login page?
Step 2: Determine the Layer (Next 3 Minutes)
Work through each layer systematically until you identify where the issue lives:
Layer — Symptoms — Investigation Tools — If Found Here
Application Logic (automation scripts, workflows, business rules, data) — Functional errors, wrong data, business process failures — Manage UI, pod logs, API queries — Fix in Manage application or via API
MAS Platform (authentication, API gateway, Suite administration) — Login failures, API errors, admin dashboard issues — Core namespace pods, Keycloak logs — Check IdP config, restart core pods, or escalate to IBM
Infrastructure (OpenShift, networking, storage, databases) — Timeouts, pod scheduling failures, storage errors — oc get nodes, oc adm top, DBA tools — Coordinate with platform team or IBM
External (third-party integrations, network, DNS, certificates) — Connection refused, SSL errors, timeout to external systems — oc exec + curl from inside pod, certificate inspection — Fix external endpoint, update certs, or contact partner
Step 3: Collect Evidence (Next 10 Minutes)
# Standard evidence collection script for any MAS issue
# 1. Pod status snapshot
oc get pods -n mas-inst1-manage -o wide > /tmp/evidence/pods.txt
oc get pods -n mas-inst1-core -o wide >> /tmp/evidence/pods.txt
# 2. Recent events
oc get events -n mas-inst1-manage --sort-by='.lastTimestamp' \
> /tmp/evidence/events.txt
# 3. Application logs (last 30 minutes)
oc logs manage-server-pod-abc123 -n mas-inst1-manage \
--since=30m > /tmp/evidence/manage-logs.txt
# 4. Pod descriptions for resource info
oc describe pod manage-server-pod-abc123 -n mas-inst1-manage \
> /tmp/evidence/pod-describe.txt
# 5. Operator logs (if relevant)
oc logs -l app.kubernetes.io/name=ibm-mas-operator -n mas-inst1-core \
--tail=500 > /tmp/evidence/operator-logs.txtStep 4: Act or Escalate (Based on Findings)
- Application-layer issue you can fix: Fix it in the Manage UI or via API.
- Platform-layer issue requiring operator action: File IBM Support case with collected evidence.
- Infrastructure-layer issue: Coordinate with your OpenShift platform team.
Key insight: The decision tree is not about finding the answer -- it is about finding the right layer to investigate. In 7.6, all layers lived on the same server. In MAS, each layer is a separate concern with separate tools and separate teams.
Common Mistakes New MAS Admins Make
We have seen these patterns repeatedly across organizations transitioning to MAS. Every one of them is avoidable.
Mistake 1: Deleting Pods as a First Response
In 7.6, restarting the JVM was a legitimate first step. In MAS, deleting a pod triggers the operator to recreate it -- but if the root cause is a configuration issue, the new pod will fail the same way. Worse, you lose the logs from the original pod.
Instead: Always collect logs before restarting. Use oc logs --previous if a pod has already crashed.
Mistake 2: Ignoring Operator Logs
The MAS operators (ibm-mas, ibm-sls, ibm-mas-manage) are responsible for reconciling the desired state with the actual state. When something goes wrong at the platform level, the operator logs contain the explanation. Many admins only look at the application pod logs and miss the real story.
# Always check operator logs when platform-level issues occur
oc logs -l control-plane=ibm-mas-operator -n mas-inst1-core --tail=200Mistake 3: Filing Vague IBM Support Cases
A case that says "Maximo is slow" will sit in the queue. A case that says "Manage pod manage-server-pod-abc123 in namespace mas-inst1-manage is experiencing 12-second response times for the WOTRACK application since 2026-02-06 14:00 UTC, with pod CPU at 1850m/2000m and attached logs showing DSRA9400W connection pool exhaustion" will get immediate attention.
The evidence checklist for IBM Support:
- Exact timestamps (UTC)
- Affected namespace and pod names
- Specific error codes (BMXAA, DSRA, etc.)
- Pod logs covering the time window
- Events from the namespace
- What changed recently (deployments, config changes)
- Business impact statement
Mistake 4: Not Understanding ServerBundle Architecture
In MAS Manage, different ServerBundles handle different workloads: UI requests, cron tasks, integration processing, and reporting. If you are troubleshooting a cron task failure but only looking at the UI pod logs, you are looking in the wrong place.
# Understand your ServerBundle layout
oc get pods -n mas-inst1-manage -l app.kubernetes.io/component \
--show-labels | grep "component="Mistake 5: Treating MAS Like a Single Server
In 7.6, "the Maximo server" was one (or a few) JVMs on known hosts. In MAS, the application is distributed across multiple pods, namespaces, and services. An issue in the ibm-common-services namespace (Keycloak) can look like a Maximo problem in the mas-inst1-manage namespace. Always check adjacent namespaces.
# Quick health check across all MAS-related namespaces
for ns in mas-inst1-core mas-inst1-manage ibm-common-services ibm-sls; do
echo "=== $ns ==="
oc get pods -n $ns | grep -v "Running\|Completed"
doneWhen to Escalate to IBM Support
Not every issue requires a support case. But when you do escalate, timing matters. Here is our experience-based guidance.
Escalate Immediately (Severity 1):
- All users unable to log in and you have verified Keycloak/IdP pods are unhealthy
- Data loss or data corruption suspected
- Operator is in a crash loop and reconciliation has failed for more than 15 minutes
- Suite administration console is unreachable
Escalate After 30 Minutes of Investigation (Severity 2):
- Performance degradation affecting business operations with no application-level cause found
- Integration failures where connectivity is confirmed but messages are not processing
- Certificate-related errors in namespaces you do not manage
- Pod restarts occurring without application-level errors in the logs
Investigate Fully Before Escalating (Severity 3-4):
- Single-user issues (likely permission or data issues you can fix)
- Cron task failures caused by data quality problems
- Slow queries that can be addressed with application-level tuning
- Configuration questions about MAS deployment options
Key insight: IBM Support resolution time correlates directly with the quality of your initial evidence. We have seen identical issues resolved in 2 hours (with excellent evidence) and 5 days (with vague descriptions). The 30 minutes you spend collecting evidence before filing the case will save you days of back-and-forth.
The Essential oc Commands Every MAS Admin Must Know
We recommend memorizing these ten commands. They cover 90% of MAS troubleshooting scenarios.
# 1. List pods with status in a namespace
oc get pods -n {namespace}
# 2. Stream live logs from a pod
oc logs -f {pod-name} -n {namespace}
# 3. Get logs from a crashed container's previous instance
oc logs --previous {pod-name} -n {namespace}
# 4. Describe a pod (events, resource limits, conditions)
oc describe pod {pod-name} -n {namespace}
# 5. Check resource utilization
oc adm top pods -n {namespace}
# 6. Get recent events sorted by time
oc get events -n {namespace} --sort-by='.lastTimestamp'
# 7. Execute a command inside a running pod
oc exec {pod-name} -n {namespace} -- {command}
# 8. Get secrets (e.g., API keys, credentials)
oc get secret {secret-name} -n {namespace} -o jsonpath='{.data.{key}}' | base64 -d
# 9. Check network policies
oc get networkpolicy -n {namespace}
# 10. Collect comprehensive diagnostics
oc adm must-gather --dest-dir=/tmp/diagKey Takeaways
- The troubleshooting fundamentals have not changed. Integration failures, performance issues, authentication problems, and cron task failures are the same categories you have always dealt with. The diagnostic thinking is the same. Only the tools are different.
- Your 7.6 experience is an asset, not a liability. Knowing what Maximo should be doing (the application behavior) is more valuable in MAS than knowing how the server works. Operators handle the server. You handle the application intelligence.
- Learn the `oc` CLI before you need it. Practice the ten essential commands in a non-production environment. When production is down at 2 AM, you do not want to be reading documentation.
- Evidence quality is your superpower. In the 7.6 world, you could brute-force a fix by restarting services and tweaking configs. In MAS, structured evidence collection and precise communication (with IBM Support or your platform team) is what drives fast resolution.
- The mental model shift is real but manageable. From "I control the server" to "I observe and coordinate." Once you internalize this shift, MAS troubleshooting becomes natural -- and in many ways, more systematic than the ad-hoc approaches many of us used in 7.6.
References
- IBM MAS 9 Documentation: Troubleshooting
- Red Hat OpenShift CLI Reference: oc commands
- IBM MAS Must-Gather Procedures
- IBM Support: How to Open an Effective Case
- Kubernetes Troubleshooting Guide
Series Navigation:
Previous: Part 6 — The Daily Toolset of MAS Admins
Next: Part 8 — The Future MAS SysAdmin: AI, Automation, and Autonomous Monitoring
View the full MAS ADMIN series index →
Part 7 of the "MAS ADMIN" series | Published by TheMaximoGuys



