You may be implementing a new solution, building a personalized email or CloudPage with Ampscript, or you've just discovered the SQL Query activity in Automation Studio and need some sample data to validate your work.
Or perhaps you need sample data to put together a demo, and you don't know where to start.
Here is a step-by-step article on generating sample data and two methods for importing it to a Data Extension.
First, let's introduce the tools we need to accomplish this:
1. Anaconda & Jupyter Notebook
Anaconda is a distribution platform that provides many tools for Python/R data science and machine learning.
Jupyter Notebook is a popular tool among data scientists and machine learning professionals. You can download Anaconda here.
Once installed, open the Anaconda Navigator and launch Jupyter Notebook.
Next, create a new folder.
Now we can move on and create our Python 3 notebook.
2. Python and Faker
The Python 3 notebook is now up and running, so it's time to generate our sample data. And here comes Faker, a Python package that generates fake data.
In addition, we'll also be using the Random module to generate random decimals for the LTV (Lifetime value) field.
The snippet above will generate a Python list with 10000 records and populate the following fields:
Title
FirstName
LastName
EmailAddress
SubscriberKey
Phone
JobTitle
City
County
DOB
LastPurchaseDate
LTV
The number of records created is dictated by the argument inside the range() function in our last line of code. So if you wish to generate a different number of records, change that figure and re-run the code.
Paste the snippet below into the first cell of your Jupyter Notebook and hit run.
from faker import Faker
import random
f = Faker(["en_GB", "en_US"]) # Localized providers
sampleData = [{'Title': f.prefix(),
'FirstName': f.first_name(),
'LastName': f.last_name(),
'EmailAddress': f.email(),
'SubscriberKey': f.bothify(text="?#???##?-?#???##?-?#???##?"),
'Phone': f.phone_number(),
'JobTitle': f.job(),
'City': f.city(),
'County': f.county(),
'DOB': f.date_between(start_date='-85y', end_date='-18y').isoformat(),
'LastPurchaseDate': f.date_between(start_date='-2y', end_date='today').isoformat(),
'LTV': round(random.uniform(9.99, 9999.99), 2)
} for x in range(10000)]
print("Sample Data Generated")
Now let's check if our sampleData list contains 10000 records and preview the first one since lists are iterable objects.
Add the code below into a new cell and hit Run. If everything is okay, you should see that the length is 10000 and a preview of the first record.
print(len(sampleData))
print(sampleData[0])
So now we've got our sample data. What's next?
First, let's create our Data Extension, and then we will explore a couple of options for populating it with the sample data.
Name | Data Type | Length | Primary Key | Nullable |
Title | Text | 50 | N | Y |
FirstName | Text | 100 | N | Y |
LastName | Text | 100 | N | Y |
EmailAddress | EmailAddress | 254 | N | Y |
SubscriberKey | Text | 50 | N | Y |
Phone | Text | 100 | N | Y |
JobTitle | Text | 100 | N | Y |
City | Text | 100 | N | Y |
County | Text | 100 | N | Y |
DOB | Date | ​ | N | Y |
LastPurchaseDate | Date | ​ | N | Y |
LTV | Decimal | 6 | N | Y |
Option 1 - CSV Import
For this option, we need to convert our sampleData list into a DataFrame and export it as a CSV.
import pandas as pd
import csv
df = pd.DataFrame(sampleData, columns=['Title', 'FirstName', 'LastName', 'EmailAddress', 'SubscriberKey', 'Phone',
'JobTitle', 'City', 'County', 'DOB', 'LastPurchaseDate', 'LTV'])
df.to_csv('sampleData.csv', index=False)
If there are no errors, you should see a new file sampleData.csv inside the folder you have created after launching the Jupyter Notebook.
With the sampleData.csv available, now we're ready to log into Marketing Cloud and import the file into the Data Extension.
Option 2 - Insert the data using the Asynchronous Insert Data Extension API
To insert the data in our Data Extension via the API option, we need to have an Installed Package and add an API Integration and Server-to-Server component, and set its scope to write into Data Extensions.
Next, we need to generate an Access Token and use it in our API request.
import json, requests
from requests.structures import CaseInsensitiveDict
clientId = 'xxxxxxxxxxxxxxxxxx' # Replace with your Client Id
clientSecret = 'xxxxxxxxxxxxxxxxxx' # Replace with your Client Secret
mid = 'xxxxxxxxx' # Replace with your MID
authBaseURI = 'https://xxxxxxxxxxxxxxxxxx.auth.marketingcloudapis.com/' # Replace with your Auth Base URI
payload = {
'client_id': clientId,
'client_secret': clientSecret,
'grant_type': 'client_credentials',
'account_id': mid
}
url = authBaseURI + '/v2/token';
r = requests.post(url,
data=payload)
body = json.loads(r.content)
token = body['access_token']
print(token)
DE = 'xxxxxxxxxxxxxxxxxx' # Replace with your Data Extension customer key
restEndPoint = 'https://xxxxxxxxxxxxxxxxxx.rest.marketingcloudapis.com' # Replace with your REST Base URI
asyncApi = '/data/v1/async/dataextensions/key:' + DE + '/rows'
endPoint = restEndPoint + asyncApi
headers = CaseInsensitiveDict()
headers["Accept"] = "application/json"
headers["Content-Type"] = "application/json"
headers["Authorization"] = "Bearer " + token
payload = json.dumps({
"items": sampleData
});
r = requests.post(endPoint, headers=headers,
data=payload)
body = json.loads(r.content)
print(body)
If everything executes correctly, your Access Token will be printed, and the API request id alongside the result messages array.
Job done. Now let's check our Data Extension and ensure that we've got our records.
Wrapping up
Faker is a powerful library that can help you generate sample data for your use cases.
In this article, the data was generated using Standard functions such as f.prefix() and f.first_name(). Several Standard Providers that support the generation of random data ranging from banking attributes (IBAN, SWIFT codes etc) to user agents (Android, iOS, Chrome, Linux IE, etc).
Regarding the methodology used when importing the sample data into Marketing Cloud, always consider your set size before selecting your method, as the API call might return an error if many rows are qued for insertion.
Comments