Agnostic Guides
...
Solution Guide
DocMigrator

Migration Concepts and Strategies

11min

Migration content from a source to docuflow Target with DocMigrator is a combination of DocManager business rules (SAP Business Objects and DocTypes), RFCs and their connections, and SAP background jobs.

All of these, plus consideration of day-to-day docuflow processing and how quickly you want to migrate your content determines the constellation of these components.

The diagram below provides a high level overview:

Document image


Typically there is a timeline for a migration, and together with the volume of content to be migrated, a seconds/per document or document/per second target is determined.

The below sections outline the components, consideration, and best practices involved in order to achieve a successful migration.

Along with overall best practices, each content repository may have an additional notes or best practices section, please ensure you reference if applicable.

Sharing of DocManager Rules and RFCs

The first determination is whether content to be migrated will be sharing DocManager rules (which are at the SAP Business Object and/or DocType level) with day-to-day docuflow processing.

Things that will force separate DocManager rules:

  • If there is a need for separate folder locations in Box
  • If there is a need for separate SAP security
  • If there is a need for different meta-data

In short, RFC destinations (and their number of RFC connections per destination) are how docuflow can be load balanced to allow:

  1. multiple docuflow Migrator SAP background jobs to be ran in parallel
  2. sharing or separation of RFC processing between migration processing and day-to-day processing 

In typical day-to-day docuflow processing, the DocManager rules determine the RFC destination, however DocMigrator can override this to force use of dedicated RFC destinations at runtime (more on this below).

Parallel vs. Single Stream Processing

The second determination is whether your volume vs. timeline/milestones require multiple migration processes to run in parallel, or whether a single processing stream will suffice, in other words, how fast do you want or need your content to be migrated.

Single Stream

  • Additional consideration into carving and chunking migration is not required
  • The only consideration is whether to share existing RFCs with day-to-day processing using a higher number of RFC connections (more on this below)

Parallel Processing (most common and recommended)

  • Ideally is executed using separate RFC destinations than day-to-day processing, and further ideal over separate docuflow middleware connectors
  • The number of dedicated SAP background tasks available for migration dictates how many parallel processes can be executed
  • For Parallel Processing, there are 2 ways to carve and chunk a migration within the docuflow DocMigrator, both leverage the input/selection criteria to the DocMigrator:
    • ArchiveLink Link parameters (most common and recommended)
      • All fields in TOA01 are available, both for “source” and “target” link entries in the migrator table once loaded/staged
      • Typical and easiest levels of carving are SAP Business Object and/or DocType
      • The advantage of using these two elements is it typically prevents from job running in parallel for the same migration load table entry at the exact same time
      • The other fields in a link entry can be used (such as SAP record/object_id range) but extra caution must be given to not processing a load table entry at the same time over multiple jobs
    • Loading with custom ABAP to assign unique tags (requires development)
      • This method typically has the chunks set by Tag name set at time of the load process, and thus migration of the load table is done at the tag level

*note: it is imperative that parallel jobs do not have input criteria that allow it to process the same rule at the same time - they must be “unique” jobs to avoid runtime conflict. Running a migration for a document that is already migrated is OK, docuflow will not migrate or process a document twice.

RFC Destinations and Connections

Regardless of the processing style chosen able, but imperative to consider for Parallel Processing, is how to load balance the migration process over RFC destinations and RFC connections. 

Sharing DocManager Rules

If sharing the same DocManager rule as day-to-day processing is not concern, and load is not a concern, DocMigrator can be run to leverage the DocManager rule in it’s entirety, including which RFC to use

In this scenario, it is recommended to increase the number of RFC connections at the middleware level for each RFC to be shared, thus reducing the chance of migration activity and day-to-day processing waiting for an available RFC destination connection

Forcing RFC Destination(s)

To take full control over which RFC destination is to be used, DocMigrator can be run explicitly specifying the RFC destination to use for that background job by using the "Force RFC" input parameter (most common and recommended)

  • In this scenario, ideally use 1 RFC destination per background job
  • If background jobs are to share an RFC destination, increase the number of RFC connections to match the number of SAP jobs running against that RFC at the same time (assuming no day-to-day processing is using that same RFC)
  • Ideally each background job uses 1 RFC destination with 1 RFC connection
  • It is recommended to have a maximum of 10 RFC destinations per docuflow middleware connectors
  • It is recommended to have a maximum of 10 RFC connections per RFC destination
  • In theory, that is 100 RFC destination/connection combinations registered in your SAP Gateway, therefore mind your maximum allowed users as logged in clients in SMGW

Additional Best Practises

  • SAP Background Jobs
    • Never have migration table entries shared between background jobs for parallel processing. A potential collision could occur to process the same entry at the same time across more than 1 background job
  • Unique Filenames from Source Content
    • If your source files may contain non-unique filenames, and there is a chance they may be migrated to the same repository location for a CMS that does not allow same name files, versioning must be prevented. This can be handled in two ways:
      • 1) In the DocMigrator, choose the "Force Unique Filename". This will append a unique UUID stamp to the source filename.
      • 2) In the docuflow middleware, at the RFC destination level - enable "Error On Duplicate". This will result in an erorr for documents attempted to be migrated to the same folder if there is an file with that existing name. These erorrs can then be triaged after the fact to either modify the filename in the source system, or re-run the docuflow migration for those errors and choose "Force Unique Filename" above.
  • RFC’s and docuflow Middleware
    • Ideally avoid sharing docuflow migration RFC destinations with production day-to-day docuflow processing RFC destinations
    • Ideally avoid sharing docuflow migration with production day-to-day docuflow processing across the same middleware instance
    • Ideally have dedicated 1 or more docuflow middleware instances for migration (unless shared doctypes and DocManager rules for migration and day-to-day)
    • Consider total number of RFC destinations/RFC connections in your SAP Gateway maximum user count
    • For production day-to-day docuflow processing and migration processing, servers hosting docuflow middleware should be dedicated to docuflow only
    • Often 1 non-prod docuflow middleware instance/server is enough, ideally 1 per SAP instance (DEV/QUAL/PROD/TRAINING) but that is optional/nice to have. This only affects Inbound to SAP processing/testing as when you add a document to Box, how does the connector know which SAP system to go to (hint for the fancy pants out there: yes you can add more than one metadata field at the SAPLink rule level in the mdfilters string).
    • If your source data is also from a docuflow endpoint - ensure the CMS Use Tag in the DocMigrator configuration is seperate from DocManager business rules and/or is configured to a dedicated RFC as a docuflow READ will be performed on that RFC
      • Also important to ensure there are multiple RFC connections for a READ RFC destination to ensure maximum throughput and performance.
  • Repository ID’s and Doctype’s
    • Mapping to new Repository ID’s is almost always required and highly recommended
    • Mapping to new document types is optional. Remember that new document types typically requires updated SAP Security roles/profiles.
      • theory recap: ArchiveLink uses S_WFAR_OBJ authorization object which has repository id, SAP business object, and doctype
      • thus if repository ID is already part of your security model, you will have to accommodate the new repository ID at a minimum
      • recap section “Sharing of DocManager Rules and RFCs” above to determine if you have to have new doctypes (in addition to reasons you may have as well)
  • Deleting Links / Attachments List
    • During migration of SAP Business Objects/Doctypes buckets, if the source content is to remain intact, and if using then docuflow DocManager attachment list ONLY, consider “hiding” those from the attachment list until migration for that combination is complete (DocManager config)
    • After migration buckets of SAP Business Objects/Doctypes are completed, and if using then docuflow DocManager attachment list ONLY, consider “hiding” those from the attachment list until final verification (DocManager config)
    • Deletion of source links is only recommended after a verification window post-migration. This is also optional if using the DocManager attachment list, as you can also simply hide the orinal links as well.

Performance Notes

  • Set logging in docuflow middleware to ERROR only. Less logging, less writing to disk
  • Increase the JVM size for docuflow middleware server to 2048 initial/4096 max.
  • Have docuflow Profile Wizard closed during processing
  • Including metadata as part of the migration requires an additional API call as part of the “create”, but allows for the migration to process in one enchilada. However, If migration milestones are short and if metadata isn’t required Day 1, or if API max calls are a concern, consider transferring metadata as a second wave using the docuflow MDPro in SAP to execute a mass push of metadata after the fact.
  • Use Box folder numbers instead of names, this reduces the number of API calls and increases performance significantly.