top of page

Generate sample data with Python and push it into your Marketing Cloud Data Extension


You may be implementing a new solution, building a personalized email or CloudPage with Ampscript, or you've just discovered the SQL Query activity in Automation Studio and need some sample data to validate your work.


Or perhaps you need sample data to put together a demo, and you don't know where to start.


Here is a step-by-step article on generating sample data and two methods for importing it to a Data Extension.


First, let's introduce the tools we need to accomplish this:


1. Anaconda & Jupyter Notebook


Anaconda is a distribution platform that provides many tools for Python/R data science and machine learning.


Jupyter Notebook is a popular tool among data scientists and machine learning professionals. You can download Anaconda here.


Once installed, open the Anaconda Navigator and launch Jupyter Notebook.



Next, create a new folder.


Now we can move on and create our Python 3 notebook.



2. Python and Faker


The Python 3 notebook is now up and running, so it's time to generate our sample data. And here comes Faker, a Python package that generates fake data.


In addition, we'll also be using the Random module to generate random decimals for the LTV (Lifetime value) field.


The snippet above will generate a Python list with 10000 records and populate the following fields:

  • Title

  • FirstName

  • LastName

  • EmailAddress

  • SubscriberKey

  • Phone

  • JobTitle

  • City

  • County

  • DOB

  • LastPurchaseDate

  • LTV

The number of records created is dictated by the argument inside the range() function in our last line of code. So if you wish to generate a different number of records, change that figure and re-run the code.


Paste the snippet below into the first cell of your Jupyter Notebook and hit run.

from faker import Faker
import random
f = Faker(["en_GB", "en_US"]) # Localized providers
sampleData = [{'Title': f.prefix(),
           'FirstName': f.first_name(),
           'LastName': f.last_name(),
           'EmailAddress': f.email(),
           'SubscriberKey': f.bothify(text="?#???##?-?#???##?-?#???##?"),
           'Phone': f.phone_number(),
           'JobTitle': f.job(),
           'City': f.city(),
           'County': f.county(),
           'DOB': f.date_between(start_date='-85y', end_date='-18y').isoformat(),
           'LastPurchaseDate': f.date_between(start_date='-2y', end_date='today').isoformat(),
           'LTV': round(random.uniform(9.99, 9999.99), 2) 
           } for x in range(10000)]
print("Sample Data Generated")

Now let's check if our sampleData list contains 10000 records and preview the first one since lists are iterable objects.

Add the code below into a new cell and hit Run. If everything is okay, you should see that the length is 10000 and a preview of the first record.

print(len(sampleData))
print(sampleData[0])

So now we've got our sample data. What's next?


First, let's create our Data Extension, and then we will explore a couple of options for populating it with the sample data.

Name

Data Type

Length

Primary Key

Nullable

Title

Text

50

N

Y

FirstName

Text

100

N

Y

LastName

Text

100

N

Y

EmailAddress

EmailAddress

254

N

Y

SubscriberKey

Text

50

N

Y

Phone

Text

100

N

Y

JobTitle

Text

100

N

Y

City

Text

100

N

Y

County

Text

100

N

Y

DOB

Date

​

N

Y

LastPurchaseDate

Date

​

N

Y

LTV

Decimal

6

N

Y


Option 1 - CSV Import


For this option, we need to convert our sampleData list into a DataFrame and export it as a CSV.


import pandas as pd
import csv

df = pd.DataFrame(sampleData, columns=['Title', 'FirstName', 'LastName', 'EmailAddress', 'SubscriberKey', 'Phone',
                                    'JobTitle', 'City', 'County', 'DOB', 'LastPurchaseDate', 'LTV'])
df.to_csv('sampleData.csv', index=False)

If there are no errors, you should see a new file sampleData.csv inside the folder you have created after launching the Jupyter Notebook.


With the sampleData.csv available, now we're ready to log into Marketing Cloud and import the file into the Data Extension.


Option 2 - Insert the data using the Asynchronous Insert Data Extension API


To insert the data in our Data Extension via the API option, we need to have an Installed Package and add an API Integration and Server-to-Server component, and set its scope to write into Data Extensions.


Next, we need to generate an Access Token and use it in our API request.

import json, requests
from requests.structures import CaseInsensitiveDict

clientId = 'xxxxxxxxxxxxxxxxxx'  # Replace with your Client Id
clientSecret = 'xxxxxxxxxxxxxxxxxx' # Replace with your Client Secret
mid = 'xxxxxxxxx' # Replace with your MID
authBaseURI = 'https://xxxxxxxxxxxxxxxxxx.auth.marketingcloudapis.com/' # Replace with your Auth Base URI


payload = {
    'client_id': clientId,
    'client_secret': clientSecret,
    'grant_type': 'client_credentials',
    'account_id': mid
}

url = authBaseURI + '/v2/token';

r = requests.post(url,
    data=payload)

body = json.loads(r.content)
token = body['access_token']
print(token)

DE = 'xxxxxxxxxxxxxxxxxx' # Replace with your Data Extension customer key
restEndPoint = 'https://xxxxxxxxxxxxxxxxxx.rest.marketingcloudapis.com' # Replace with your REST Base URI
asyncApi = '/data/v1/async/dataextensions/key:' + DE + '/rows' 
endPoint = restEndPoint + asyncApi

headers = CaseInsensitiveDict()
headers["Accept"] = "application/json"
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer " + token

payload = json.dumps({  
    "items": sampleData
});

r = requests.post(endPoint, headers=headers, 
    data=payload)
body = json.loads(r.content)
print(body)

If everything executes correctly, your Access Token will be printed, and the API request id alongside the result messages array.


Job done. Now let's check our Data Extension and ensure that we've got our records.


Wrapping up


Faker is a powerful library that can help you generate sample data for your use cases.


In this article, the data was generated using Standard functions such as f.prefix() and f.first_name(). Several Standard Providers that support the generation of random data ranging from banking attributes (IBAN, SWIFT codes etc) to user agents (Android, iOS, Chrome, Linux IE, etc).


Regarding the methodology used when importing the sample data into Marketing Cloud, always consider your set size before selecting your method, as the API call might return an error if many rows are qued for insertion.







256 views0 comments

Comments


bottom of page