Showing posts with label 2012. Show all posts
Showing posts with label 2012. Show all posts

Wednesday, March 12, 2014

DPM 2012 and Beyond Frustration

All of our Hyper-V Clusters, Server 2008 R2 hosts, started having failed backups inside our two independent Data Protection Managers. The problem initially progressed from one node consistently fail backups for virtual machines and the other hosts kept performing backups, until all of our nodes could no longer could make successfully backups of any virtual machines. Our standalone backups via DPM had no issue. These hosts had been configured and unchanged for well over a year - only Windows patches months prior and anti-virus updates were continuously loading.

DPM kept stating for the failed backups that "The VSS application writer or the VSS provider is in a bad state ... ID 30111: VssError:A function call was made when the object was in an incorrect state for that function(0x80042301)) and the local nodes wrote VSS 12362 Application Log Event Errors "A Shadow Copy LUN was not detected in the system and did not arrive" and VSS 12363 Application Log Event Errors "An expected hidden volume arrival did not complete because this LUN was not detected" whenever we attempted to run full virtual machine backup via a Consistency check.

We had tried and didn't work...
  • Power cycling all of the equipment involved: Hyper-V Servers (PowerEdge R710's), the iSCSI SAN (EqualLogic PS4000vx's), the switches connecting them (Catalyst 3750X's), and our DPM server
  • Unregistering and Registering the EqualLogic VSS provider (eqlvss /unregserver and eqlvss /regserver)
  • Removing virtual machines from a protection group (deleting disk data) and adding them back
  • Moving virtual machines to a new protection group
  • Upgrading the EqualLogic Windows Host Integration Toolkits (HIT kits) on the Hyper-V nodes - upgraded from 4.0 to 4.6
  • Installing the EqualLogic HIT kit on one of the virtual machines
  • Patching the Hyper-V nodes to all of the latest Windows Updates - even yesterdays released kb 2908783 which resolves issues with corruption of iSCSI LUNs in Windows Server 2008 R2 and 2012
... and still no success.

After much time wasted on what seemed to be magic potions and DPM's hatred of backing up critical data, a random thought of trying to disable our anti-virus on the cluster nodes resolved the issue! Yeah, I know they say to disable anti-virus on everything and everywhere you read, but we have had Microsoft Forefront Client Security on these systems configured and running since we setup these servers 2+ years ago. Apparently, some change in the definitions or just its mood decided to start messing with the iSCSI VSS Hardware process... and messing with my sleep over the last two days.

Good luck!


 

 

Thursday, May 30, 2013

DPM 2012 - Backup Protection Group to Tape

Update 2014-03-18: It appears there's been quite a few more revisions by Wilson, now version 1.6, since I first posted his version 1.0. I've updated the list below to reflect that latest version.

We've had a few instances where a backup job (long term to tape) fails. When we try to resume the backups by selecting "Resume tape backups...", it causes every data source in the protection group to backup to individual tapes rather than consolidating the backups onto one-tape (as it normally does on the scheduled jobs.)

We found this powershell script posted by from Mike Jacquet that works great to overcome the multiple tape issue! http://social.technet.microsoft.com/Forums/en-US/dpmtapebackuprecovery/thread/4fafbcb0-ac2c-4867-8434-31f1f5e532e0/#7b40ef6e-d8bd-4a24-aecd-5f1605e80225

After modifying it to work with DPM 2012 by adjusting line 25 to DPM 2012's SQL instance, $instance = '.\msdpm2012', we were able to run it to re-trigger one of our failed weekly tape backup. Hopefully this post will be a bit easier to find - but credit still goes to Mike and Wilson, I'm only the messenger. We did notice that you need to be sure and inactivate any alerts that are still active and it takes a minute or two to see the jobs kick off.


  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
#                                                                                              
# This script will list all currently scheduled backup to tape jobs                            
# Then you can select from that list which backup to tape job you want to run                  
#                                                                                              
# Author        : Wilson Souza                                                                 
# Date Created  : 07/13/2012                                                                   
# Last modified : 01/21/2013
# Version       : 1.6                                                                          
#                                                                                              
# Change log                                                                                   
# ==========                                                                                   
#                           
#       Ver 1.6 - Tested script with DPM 2012/DPM2012SP1/DPM2012R2                                                                   
#                 Fixed issue when DPM database is running within a SQL Default Instance
#       Ver 1.5 - Changed text related to where you can monitor the job after triggered        
#       Ver 1.4 - Found an issue for variable $Result when only a single row is returned       
#       Ver 1.3 - Added PowerShell variable to hide snap-in load error                         
#                 Script query registry to check where DPM database is located                 
#       Ver 1.2 - Added support for Copy tape configuration (up to 7 copies)                   
#       Ver 1.1 - Added Verbose switch to show more output information                         
#            Added Short Term/Long Term Information                                       
#               Added Rocovery goal information                                              
#                                                                                              
#                              
#                                                                                              

param([string] $verbose)
$ErrorActionPreference = "silentlycontinue"
add-pssnapin sqlservercmdletsnapin100
Add-PSSnapin -Name Microsoft.DataProtectionManager.PowerShell
$ConfirmPreference = 'None'
cls
$instance = Get-itemproperty "hklm:\SOFTWARE\Microsoft\Microsoft Data Protection Manager\DB\"
$dpmdb = $instance.databasename

if ($instance.instancename -eq 'MSSQLSERVER')
{
    $instance = $instance.SqlServer
}
else
{
   $instance = $instance.SqlServer + '\' + $instance.instancename
}


$query = "CREATE FUNCTION label (@GUID varchar(36), @kindred varchar(4), @vault varchar(8))
returns varchar (1024)
as
Begin
   declare @result varchar (1024)
   select @result = vaUltlabel from tbl_mm_vaultlabel where mediapoolid = @GUID and generation = 
      case  @kindred
          when 'Fath' Then '2'
          when 'Gran' then '1'
          when 'grea' Then '0'
      end and
      vault =
      case @vault
    when 'Offsite1' then '3'
    when 'Offsite2' then '4'
    when 'Offsite3' then '5'
    when 'Offsite4' then '6'
    when 'Offsite5' then '7'
    when 'Offsite6' then '8'
    when 'Offsite7' then '9'
   else
       '1'
   end
   RETURN @result
END
go
 
select ScheduleId as name
       ,def.JobDefinitionId as JD
       ,FriendlyName as PG
       ,SUBSTRING (CONVERT(VARCHAR(10),active_start_date),5,2) + '-' + SUBSTRING (CONVERT(VARCHAR(10),active_start_date),7,2) + '-' + SUBSTRING (CONVERT(VARCHAR(10),active_start_date),1,4) as SD
       ,jobs.date_created as SCD
       ,SUBSTRING (CONVERT(VARCHAR(10),last_run_date),5,2) + '-' + SUBSTRING (CONVERT(VARCHAR(10),last_run_date),7,2) + '-' + SUBSTRING (CONVERT(VARCHAR(10),last_run_date),1,4) + '  ' +
        SUBSTRING (CONVERT(VARCHAR(6),last_run_time),1,2) + ':' + SUBSTRING (CONVERT(VARCHAR(6),last_run_time),3,2) + ':' + SUBSTRING (CONVERT(VARCHAR(6),last_run_time),5,2) as LRD
       ,SUBSTRING (CONVERT(VARCHAR(10),next_run_date),5,2) + '-' + SUBSTRING (CONVERT(VARCHAR(10),next_run_date),7,2) + '-' + SUBSTRING (CONVERT(VARCHAR(10),next_run_date),1,4) + '  ' +
        SUBSTRING (CONVERT(VARCHAR(6),next_run_time),1,2) + ':' + SUBSTRING (CONVERT(VARCHAR(6),next_run_time),3,2) + ':' + SUBSTRING (CONVERT(VARCHAR(6),next_run_time),5,2) as NRD
       ,dbo.label ((substring(xml,(patindex('%MediaPoolId%',Xml))+13,36)), (substring(xml,(patindex('%generation%',Xml))+12,4)), (substring(xml,(patindex('%vault%',Xml))+7,8))) as TL
       ,case 
   when substring(xml,(patindex('%vault%',Xml))+7,3) = 'off'  then 'Long-Term' 
   else 'Short-term'
       end as STLT
       ,case
  when substring(xml,(patindex('%generation%',Xml))+12,4) = 'Fath' then 'Recovery Goal 1'
  when substring(xml,(patindex('%generation%',Xml))+12,4) = 'Gran' then 'Recovery Goal 2'
  when substring(xml,(patindex('%generation%',Xml))+12,4) = 'Grea' then 'Recovery Goal 3'
 end as RG
from    tbl_SCH_ScheduleDefinition sch 
       ,msdb.dbo.sysjobs jobs
       ,tbl_JM_JobDefinition def
       ," + $DPMDB + ".dbo.tbl_IM_ProtectedGroup prot
       ,msdb.dbo.sysjobschedules jobsch
       ,msdb.dbo.sysjobsteps jobsteps
       ,msdb.dbo.sysschedules syssch
where CAST(sch.ScheduleId as NCHAR (128)) = jobs.name
and def.JobDefinitionId = sch.JobDefinitionId
and def.ProtectedGroupId = prot.ProtectedGroupId
and jobs.job_id = jobsch.job_id
and jobs.job_id = jobsteps.job_id
and jobsch.schedule_id = syssch.schedule_id
and (def.Type = '913afd2d-ed74-47bd-b7ea-d42055e5c2f1' or def.Type = 'B5A3D25C-8EB2-4032-9428-C852DA5CE2C5')
and sch.IsDeleted = '0' and def.ProtectedGroupId is not null
order by FriendlyName, next_run_date, next_run_time
go
 
drop function label
go"

[array]$result = Invoke-Sqlcmd -ServerInstance $instance -Query $query -Database $dpmdb
$count = 1
write-host " The list below shows all scheduled backup to tape jobs (short term and long term)" -f green
write-host

if ($verbose.ToLower() -eq 'verbose')
{
 write-host " For optimun output, set PoweShell Width for screen buffer size to at least 300" -f yellow; write-host
 write-host
 write-host "     Protection Group               SQL Agent Name                       JobDefinitionID                      Creation Date Schedule Creation Date Last Run Date        Next Sched Run Date  Term       Goal            Tape Label"
 write-host "     ------------------------------ ------------------------------------ ------------------------------------ ------------- ---------------------- -------------------- -------------------- ---------- --------------- --------------" 
 foreach ($result1 in $result)
 {
  if ($color -eq 'white') {$color = 'cyan'} else {$color = 'white'}
  write-host ("{0,2}"-f $count) -foreground green -nonewline
  write-host ( " - {0,-30} {1,36} {2,36} {3,-13} {4,-22} {5,-20} {6,-20} {7,-10} {8,15} " -f $result1.PG, $result1.name, $result1.jd, $result1.SD, $result1.SCD, $result1.LRD, $result1.NRD, $result1.STLT, $result1.RG) -nonewline -f $color
  write-host $result1.TL -f yellow
  $count++
 }
}
else
{
 write-host " For optimun output, set PoweShell Width for screen buffer size  to at least 110" -f yellow; write-host
 write-host "     Protection Group               Term       Goal            Tape Label"
 write-host "     ------------------------------ ---------- --------------- --------------" 
 foreach ($result1 in $result)
 {
  if ($color -eq 'white') {$color = 'cyan'} else {$color = 'white'}
  write-host ("{0,2}"-f $count) -foreground green -nonewline
  write-host ( " - {0,-30} {1,-10} {2,15} " -f $result1.PG, $result1.STLT, $result1.RG) -nonewline -f $color
  write-host $result1.TL -f yellow
  $count++
 }
}

write-host
write-host "Which job(s) you want to run? If running more than one job enter numbers separated by space: " -f green -nonewline
$runjob = read-host
$runjob = $runjob -split " "
$executingjob = 0
if ($runjob)
{
 foreach ($startjob in $runjob)
 {
  $firejob = [int]$startjob
  if ($firejob -gt 0 -and $firejob -lt $count)
  {
   $query = "EXEC msdb.dbo.sp_start_job '{0}'" -f $result[$firejob-1].name
   Invoke-Sqlcmd -ServerInstance $instance -Query $query -Database $dpmdb
   $executingjob++
  }
 }
}
write-host
if ($executingjob -gt 0)
{
 write-host "You selected to run $executingjob job(s). You can monitor job(s) progress via DPM Administrator Console" -f green
}
else
{
  write-host "Due to the selection entered, no jobs will run" -f red
}



 

Wednesday, May 15, 2013

DPM 2012 Not Generating E-mail Reports after Upgrading to SP1

We have been using DPM 2012 for quite a while now. We also have the reports set to deliver reports daily/weekly. After we upgraded to SP1, we noticed it no longer was e-mailing us the reports, even though the alerts for errors continued to come. We also could run reports manually, but no automatic e-mails.

Went to clear and recreate the report schedule and set it to e-mail us, and we got this awesome non-descript error ID: 3014. "An error occurred causing the reporting job on to fail. The system files may be corrupt. Retry the reporting task. If the problem persists, repair your DPM installation using the steps described in the System Center 2012 Service Pack 1 DPM Deployment Guide. ID: 3014"

 
I checked out the guide and the basic idea to "repair" is uninstall and reinstall. I don't know about you all, but risking loosing backup data just to fix reporting didn't sit well. So, I proceeded to evaluate what was occurring with the SQL Server Profiler on our system and comparing it to our secondary server.
 
After playing around with it for hours, seemed to narrow down that it was an issue with permissions for the Reporting Services predefined database role called RSExecRole. I went through this guide Create the RSExecRole (http://technet.microsoft.com/en-us/library/cc281308.aspx), used to recreate permissions during a report database move, and we were able to recreate the e-mail subscriptions.  It looks like there must have been some undetected failure during the SP1 upgrade.

Monday, May 14, 2012

Backup Linux using Microsoft DPM 2010

Update (2012-07-18)
Today we transitioned from Microsoft DPM 2010 to Microsoft DPM 2012. It was no problem as the structure remained the same - backing up the copied files off our linux clients from the windows host NFS volume.
-Justin

We had a conundrum on a project over winter in my department. We’d been moving toward Microsoft’s Data Protection Manager 2010 to take over all our backups for our systems, but a new system that was coming online was a Red Hat server. Data Protection Manager doesn’t have a native client that supports Red Hat to add it to a protection group, unless you buy an expensive Data Protection Manager appliance that run’s a proprietary client.http://www.evault.com/products/data-backup-software/microsoft-backup-recovery/index.html

We put our heads together and came up with a cheaper alternative that required some initial labor and ongoing overhead to verify backups. We ended up with some overelaborate scripts to suite our taste, but I’ve oversimplified it for easy reading.

First, we carved out some backup storage space on both the Red Hat server for the initial local backups and on the Data Protection Manager’s server for an NFS mirror.


Second, we added an NFS root on the Data Protection Manager’s server with the NFS mapping to a mount on the Red Hat server.
http://support.microsoft.com/?kbid=324089
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/3/html/System_Administration_Guide/s1-nfs-mount.html


Third, a bash script creates tar files on the Red Hat server that is activated via a daily cron. Once the tar files are created, we use the RSYNC to mirror the local storage backups from the Red Hat server to the Data Protection Manager’s storage via the NFS mount folder.
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/4/html/Step_by_Step_Guide/s1-managing-compressing-archiving.html#S2-MANAGING-ARCHIVING
http://rsync.samba.org/


Forth, the Data Protection Manager Server NFS Root is added to a D.P.M. Protection Group.
http://technet.microsoft.com/en-us/library/cc161486.aspx


Lastly, we added scripts on both systems that pruned files based on our retention requirements and added log file outputs for verification and diagnostics.



It was a bit of an exercise to get this configured and learn the technologies, but I think it was well worth the learning experience to challenge ourselves, fight limitations, and save some hard costs of equipment with soft costs of labor.