Populating a WordPress product directory (e.g. a pet tech directory) can be streamlined by scraping product data from websites like Amazon and importing it into WordPress. This guide provides a step-by-step walkthrough for using Octoparse to extract product details (title, price, images, description, links) from Amazon, and then using WP All Import to import that data into WordPress. We will also cover Octoparse’s free plan limitations (and workarounds) and best practices for WP All Import (handling categories, images, and avoiding duplicates). The process is divided into two parts:
Part 1: Scraping Product Data with Octoparse
Octoparse is a no-code web scraping tool that lets you build “tasks” (crawlers) to fetch data from webpages into a structured format (What is a task in Octoparse? | Octoparse). Below are the steps to create a scraping task for Amazon product data and extract the desired fields.
Creating a Scraping Task in Octoparse
- Set Up and Start a New Task: Download and install Octoparse (Windows application) and sign up for a free account. Launch Octoparse and click the “+New” button (on the sidebar). Choose Advanced Mode to create a custom task (How to Scrape Data from a List of URLs | Octoparse). This allows you to build a crawler from scratch with full control (as opposed to using a pre-built template (What is a task in Octoparse? | Octoparse)).
- Enter the Target URL: Copy the URL of the Amazon page you want to scrape. This could be a product listing page (search results or category page showing multiple products) or a specific product detail page. Paste the URL into Octoparse’s interface and load it. Octoparse will automatically attempt to detect the page structure and data fields (“Auto-detect” mode starts) (No-Coding Steps to Scrape Amazon Product Data | Octoparse). For example, if you paste an Amazon search results URL, Octoparse may detect the list of products on the page and their details.
- Refine the Data Extraction (Select Fields): After auto-detection, Octoparse will generate a preliminary workflow with guessed data fields. You should review and modify these to ensure you’re capturing all required information:
- Product Title & Price: Click on a product title or price on the webpage in Octoparse’s built-in browser. In the Tips panel that appears, choose the option to “Extract the text of the element” (How To Scrape Amazon Product Data Without Coding – DEV Community). This adds a field (e.g. “Title” or “Price”) to your data extraction. Repeat this for the price if it wasn’t auto-detected. You can rename the fields for clarity (e.g. “Product_Title”, “Product_Price”).
- Product Link: If you need the URL of the product’s page (for example, to link back to Amazon), click on the product title or image which contains the link. In the Tips panel, choose to extract the href attribute (usually Octoparse will show an option like Extract URL for link elements). This will capture the product’s link.
- Product Image URL: To get product images, click on the product image thumbnail on the page. In the Tips panel, click the “>” arrow to drill down, and select the IMG element or its
src
attribute (How To Scrape Amazon Product Data Without Coding – DEV Community). This ensures you extract the image URL (the link to the image file). Octoparse can even download images in bulk if needed, but for our use (WordPress import) it’s usually enough to extract the image URLs (No-Coding Steps to Scrape Amazon Product Data | Octoparse). - Product Description/Details: If you’re on a listing page, you might only get a snippet or need to click each product to get full details. Octoparse allows multi-level scraping: you can set up a workflow where it clicks each product link to open the detail page and then extracts additional info like the description or specifications. To do this, use a Loop Click action on the list of product URLs. However, if you want a simpler setup, you might limit to data available on the listing itself (title, price, etc.) and later enrich the data manually or in another run. On a product detail page, you can similarly click the description text or features and select “Extract text” to capture the description. Ensure the field is added to your data list (e.g. “Description”).
- Handle Lists & Pagination (Multiple Products): If you are scraping a page with multiple products, Octoparse should have detected the list and created a Loop Item in the workflow (iterating through each product entry). Verify that it’s selecting all products. If not, you may need to manually create a loop: right-click the element representing the product list and choose “Loop click each element” or a similar option. Also, if the products span multiple pages (pagination), you can set up a Pagination action. For example, click the “Next” page button in Octoparse and set it as a loop until no more pages. Octoparse’s interface allows adding a Pagination loop and combining it with the product loop, all through point-and-click – no coding needed (No-Coding Steps to Scrape Amazon Product Data | Octoparse). This way, the task will scroll through each results page and collect products from all pages.
- Run the Scraping Task: Once your workflow is configured with the desired fields (and pagination if applicable), it’s time to run the crawler. Click the “Run” button. Octoparse will prompt you to choose a running mode – Local Run or Cloud. On the free plan, you’ll use local run (cloud run is a paid feature). Start the run, and Octoparse will begin to browse the pages and extract data according to your setup (How To Scrape Amazon Product Data Without Coding – DEV Community). You can watch it navigate each page and item. If it’s a large job, be patient as it iterates through all pages/products.
- Export the Data (CSV/JSON): After the run completes, you will have a dataset of all the extracted product info. Octoparse allows you to export this data in various formats. Click on “Export Data” in Octoparse and choose a format such as CSV, JSON, Excel, etc. (How To Scrape Amazon Product Data Without Coding – DEV Community). For example, select CSV if you plan to import into WordPress using WP All Import (WP All Import handles CSV and XML). Octoparse supports exporting to CSV/Excel by default, and even JSON, HTML, or database formats (Pricing | Octoparse). Save the exported file to your computer. This file will contain columns like Title, Price, Image_URL, Description, Product_Link (whatever fields you defined).
Tip: When scraping from Amazon (or any site), try to extract a unique identifier for each product if possible (e.g., ASIN or the product URL itself) – this can help later when importing to avoid duplicates. In Amazon’s case, the product URL or ASIN (which is part of the URL) can serve as an identifier.
Octoparse Free Plan Limitations and Workarounds
Octoparse’s free version is quite powerful for getting started, but it does have some limitations to be aware of:
- Number of Tasks & Concurrency: The free plan lets you create up to 10 tasks (crawlers) and run up to 2 tasks concurrently at a time (Octoparse Review – Proxyway). This is usually enough for a small project. If you need more than 10 different scrapers, you’d have to upgrade or replace some tasks when done. (A “task” in Octoparse is essentially one scraper configuration for a site (What is a task in Octoparse? | Octoparse).)
- Local Only, No Cloud Features: Free users can only run scrapes locally on their PC. Features like Cloud extraction, scheduling, IP rotation, and CAPTCHA solving are not available on the free plan (A Full Guide on Web Scraping Costs | Octoparse). This means you cannot set Octoparse to automatically run on a schedule or use Octoparse’s built-in rotating proxies pool on the free tier.
- Data Export Limits: The free plan allows exporting a maximum of 10,000 rows per export and up to 50,000 total rows per month (Pricing | Octoparse). If you try to scrape more than 10k items in one go, you’ll need to run multiple exports (the software will prompt when you hit the limit). Likewise, extremely large-scale scraping (tens of thousands of items per month) would require an upgrade.
Workarounds: For most small directories, these limits are manageable. You can scrape data in batches (e.g. scrape one category or subcategory at a time to stay under 10k rows, then merge the CSVs). If scheduling is needed (to keep data updated), you might run Octoparse manually periodically. Although you can’t use Octoparse’s cloud scheduling on free, you could schedule a local run by using your operating system’s scheduler to open Octoparse and trigger a task (this is a bit technical, so often it’s easier to just run it by hand when needed). To deal with IP bans (since free plan has no auto IP rotation), you may integrate your own proxies in the task settings or keep the scrape slow enough to avoid triggering Amazon’s anti-scraping measures. Also, if a CAPTCHA appears, on the free plan you’ll have to solve it manually to continue.
Overall, Octoparse’s free edition is sufficient for getting Amazon product data for a directory project, as long as you stay within these limits or use creative workarounds. The free plan does not limit the number of pages you can scrape in a single run aside from the export row count (Octoparse Review – Proxyway), so you can still gather a lot of data without paying. If your project grows, Octoparse Standard (paid) removes many of these limits and adds convenience features like scheduling and API access (Octoparse Review – Proxyway).
Exporting Scraped Data to CSV/JSON
As mentioned, you can export the results from Octoparse in multiple formats for use in other applications:
- CSV (Comma-Separated Values): Ideal for WP All Import. After running the task, click Export and choose CSV. Octoparse will save all the extracted data into a .csv file where each row is a product and each column is a field (title, price, etc.) (How To Scrape Amazon Product Data Without Coding – DEV Community). CSV is human-readable and easily opened in Excel or Google Sheets if you want to inspect or clean the data first.
- JSON: If you prefer JSON (JavaScript Object Notation) for any reason (say, for developers or another tool), Octoparse can output JSON as well (Pricing | Octoparse). You would get a .json file with an array of product objects. WP All Import can also import JSON, but in this case CSV is simpler.
- Excel, HTML, Database: Octoparse also supports direct export to Excel, HTML, XML, or even pushing to a database or Google Sheets (No-Coding Steps to Scrape Amazon Product Data | Octoparse). For our purpose, stick to CSV or XML, since WP All Import handles those formats well.
Make sure to save your exported file in a known location. We will now move on to importing this data into WordPress.
Part 2: Importing the Data into WordPress with WP All Import
Now that you have a CSV (or XML/JSON) file of product data, you can bulk import it into your WordPress site using the WP All Import plugin. WP All Import is a powerful tool that can read your file and create WordPress posts (or any post type) from each record. In this section, we’ll cover how to set up the import file, map the data to the right fields in WordPress, and ensure everything (categories, images, etc.) is handled properly. We’ll use the example of a pet tech product directory, where each imported item becomes a post on the site.
Uploading the CSV/XML and Configuring the Import
- Prepare the Import File: Ensure your CSV has a header row with identifiable column names (Octoparse usually uses the field names you set). For example: Title, Price, Description, Image_URL, Product_Link, Category, etc. If you scraped multiple categories or attributes, make sure each type of data is in its own column or in a consistent format.
- Launch a New Import in WP All Import: In your WordPress admin, go to All Import → New Import. WP All Import will ask how you want to upload the data file. Choose “Upload a file” and select your CSV from your computer. (Alternatively, WP All Import allows fetching from a URL or using a file already on the server. If you plan to update data regularly, using a URL here can be beneficial – see Automated Updates below.) (Bulk Importing Content to WordPress: WP All Import Guide)
- Choose Import Type: WP All Import will then ask what you want to import into. Select “Posts” (or your custom post type if you have one for products). If you are using a directory plugin or a custom post type (like “Products” or “Listings”), choose that post type in the dropdown. For a simple setup, importing as Posts (with a specific category for “pet tech”) can work too. After selecting the post type, click “Continue to Step 2”.
- Review Data & Filtering (optional): WP All Import will preview the data it found from your file. It typically shows a few records with their columns. You can optionally set filters here – for example, you might only want to import items where
Price
is above a certain amount, etc., using WP All Import’s filtering rules. For most cases, you can skip filtering and import all items (Bulk Importing Content to WordPress: WP All Import Guide). Proceed to the next step.
Mapping Scraped Fields to WordPress Fields
Now comes the core step: mapping the fields from your CSV to the appropriate fields in WordPress. WP All Import provides a drag-and-drop interface for this, listing WordPress fields on the left and your CSV columns on the right (Importing WordPress Data with WP All Import: In-depth Review). You simply drag a data element from the right and drop it onto the desired field box on the left.
Here are the common mappings for a product directory:
- Post Title: Drag the Product Title field from the right panel and drop it into the “Title” field on the left. This will set the WordPress post title to the product’s title.
- Post Content/Description: You have options here. If you want the product description or details to appear in the main body of the post, drag the Description field into the “Content” area. You can combine multiple fields here if needed (WP All Import allows you to drag several fields and even add static text or HTML). For example, you might drop the description, and maybe also include the price or a link within the content. However, since we’ll map price and link separately as well, you might keep content to just the description text for now. You can always format it with HTML in the import template if needed.
- Custom Fields (e.g. Price, External URL): If your WordPress setup (or theme/plugin like a directory theme) has custom fields for price, product URL, etc., you can map to those. In WP All Import, scroll down to “Custom Fields” section. For each custom field:
- Click “+ Add Custom Field”. In the Name, enter the meta key (for example, it could be
price
or_price
depending on your theme or plugin, or you create one). In the Value, drag the Price field from the right. This will store the price in a custom field. - Similarly, add another custom field for the Amazon link (if you want to save it). For example, Name:
external_url
orproduct_link
(you decide a key or use one required by your theme), and drag the Product_Link (the URL) into the value. This way the Amazon link is stored in WordPress. Later, you could use that for an affiliate button or reference. - You can repeat for any other custom data (rating, etc. if scraped) – just ensure you use consistent meta keys. If you’re using a specific plugin (say ACF or a directory plugin), make sure the meta keys match what that plugin expects.
- Click “+ Add Custom Field”. In the Name, enter the meta key (for example, it could be
- Categories (Taxonomy): To assign categories or tags, use the Taxonomies section in the WP All Import mapping screen. WordPress uses categories/tags to organize content (Bulk Importing Content to WordPress: WP All Import Guide). If your CSV has a column for category (e.g. all your products might have “Pet Tech” or more specific categories like “Smart Collars”), you can drag that field to the Category mapping. WP All Import can import into categories, tags, or any custom taxonomy associated with the post type (Import Categories, Tags, and Custom Taxonomies). For example:
- If you have a “Category” column in the CSV, drag it into the “Categories” box. If the category name doesn’t exist in WP yet, WP All Import will create it. If it exists, it will assign the post to that category.
- You can also assign a fixed category for all items (e.g. have all imported products go under a “Pet Tech Products” category) by simply typing that category name into the category field instead of dragging a column.
- If you have multiple categories for a product in one CSV field, separate them with a comma or pipe in the CSV. WP All Import will split them and assign all relevant categories.
- The same concept applies to tags or custom taxonomies (like a custom “Product Type” taxonomy): you map the CSV field to the respective taxonomy in the import template.
- Images: Handling images requires a special step. We have the image URLs (from Octoparse) in our CSV (e.g. an “Image_URL” column). WP All Import can download these images into the Media Library and attach them to the posts:
- Find the Images section (often labeled “Download images & attachments”). Drag the Image_URL field from the right into this section. This tells WP All Import to fetch that image from the URL. If you have multiple image URLs (e.g. multiple columns or a single column with URLs separated by commas), you can supply them as well – WP All Import can import multiple images per post (Bulk Importing Content to WordPress: WP All Import Guide).
- Check the option “Set the first image as the Featured Image” if you want the first image to be the post’s featured image (Bulk Importing Content to WordPress: WP All Import Guide). This is usually desired for a product listing, so the product thumbnail shows up.
- Note: The ability to import images by URL is available in WP All Import Pro (Bulk Importing Content to WordPress: WP All Import Guide). Ensure you have the Pro version if you need this. The free version of WP All Import doesn’t support image downloading or custom fields, so Pro is generally needed to import complex data like this.
- Other Post Options: WP All Import also lets you set things like post status (publish or draft), post date, slug, etc. By default, imported posts will be Published immediately. You can adjust these in the Other Post Options section if needed (for example, set all to draft first if you want to review them). You might also assign an author or allow comments, etc., but those are optional settings (Bulk Importing Content to WordPress: WP All Import Guide).
After mapping all the fields, use the Preview feature to double-check one example – WP All Import can show you a preview of a post with the data filled in. This helps ensure your template looks right (e.g. the content has the description, the title is correct, etc.). Once satisfied, proceed to the next step.
Running the Import and Verifying
Before finalizing, WP All Import will prompt to confirm settings and run the import. One important setting here is the Unique Identifier. WP All Import automatically suggests a unique key (you can also set one manually). This unique identifier is how WP All Import knows if a record has been imported before or not (Bulk Importing Content to WordPress: WP All Import Guide) (Bulk Importing Content to WordPress: WP All Import Guide). Typically, you can use a field that is unique per product — the product URL or ASIN is a good choice for Amazon data (since no two products share the same URL). WP All Import might auto-detect the URL as the unique key.
- Ensure the unique identifier is something like Product_Link (URL) or a combination that uniquely identifies each product. Why? Because if you run this import again later with an updated file, the plugin will use this key to match existing posts and prevent duplicates. If an incoming record has the same unique ID as an existing post, WP All Import will update that post instead of creating a new one (Bulk Importing Content to WordPress: WP All Import Guide).
- You can leave the import setting as ‘New items and updates’ (the default), which means it will create new posts for new records and update existing posts if it finds the same unique ID again.
Now, run the import. Click “Confirm & Run Import”. WP All Import will begin processing each record and creating posts. You’ll see a progress bar; the time depends on how many items (for a few dozen products it’s quick, for hundreds or thousands it may take a few minutes). Once done, it will report the number of created posts (and updated, if any).
Go to Posts (or your custom post type) in WordPress and verify the new content. You should see a post for each product, with the title, content, categories, etc. all in place. Edit one to check:
- The title is set,
- The content/description is there,
- The featured image is set (image downloaded successfully),
- The custom fields (price, link) are stored (you can see them in the custom fields meta box or wherever your theme shows them).
If everything looks good, congratulations – you’ve populated your directory with the scraped data!
Best Practices for Automated Updates and Avoiding Duplicates
One of the advantages of this setup is that you can update your WordPress listings when the source data changes (e.g., price updates on Amazon or new products added). Here are some best practices to manage updates and prevent duplicate content:
- Use a Stable Unique Identifier: As mentioned, choose a unique identifier field in WP All Import that will remain the same for each product across imports (the product URL or a product ID). This ensures that if you import the file again later, WP All Import recognizes existing entries. If the same ID is found, WP All Import will not create a duplicate – it will update the existing post or skip it, depending on your settings (Bulk Importing Content to WordPress: WP All Import Guide). This is crucial to avoid duplicate posts for the same product.
- Schedule Regular Imports (Automation): If you want to keep the directory updated automatically, WP All Import Pro offers scheduling. A common strategy is to host your CSV/XML file at a URL (for example, in Dropbox or on your server) and have WP All Import periodically fetch that URL. In the New Import step, instead of uploading from your computer, you would choose “Download from URL” and provide the file link. WP All Import can then be set to check that URL on a schedule (say daily or weekly) and import changes. According to documentation, this allows WP All Import to “continuously monitor the file for updates, creating, deleting, and updating posts on your website as necessary.” (Bulk Importing Content to WordPress: WP All Import Guide). In our case, you could set Octoparse to output to a file on a schedule (if you upgrade for scheduling or do it manually and upload the file), and WP All Import will sync those changes to WordPress.
- Avoiding SEO Duplicate Content: Since you are copying content from Amazon, be aware of duplicate content issues on the web. It’s often best to add some unique value. For instance, you might write a custom snippet or review for each product in addition to the scraped description. This isn’t a technical requirement, but a content strategy tip so your site isn’t just an exact copy of Amazon text. You can use WP All Import’s ability to combine fields or add static text to incorporate such notes if needed, or edit after import.
- Review and Clean Imported Content: Sometimes the scraped data might include unwanted bits (e.g., currency symbols, HTML tags, or Amazon-specific text). WP All Import has options like find-replace or using PHP functions during import to clean data (Importing WordPress Data with WP All Import: In-depth Review). Utilize these if necessary. For example, you might remove the “$” from prices or any trademark symbols in titles.
- Run Updates in Import Mode: When running an update, use WP All Import’s “Import Settings -> Choose Existing Items Import” if you have a separate import template for updates, or simply re-run the same import with an updated file. With the unique ID set, WP All Import will update fields for existing posts. You can configure whether it should delete missing items (if a product is no longer in the source file) or keep them — depending on your use case for a directory you might keep old products or not.
- Backup Your Site: Before running large imports or updates, it’s wise to have a backup. Bulk operations can potentially mess up content if configured incorrectly, so backup ensures you can roll back if needed.
Using the combination of Octoparse and WP All Import, you can automate the population of your WordPress pet tech directory. Octoparse handles extracting the latest product info from Amazon (which you can do periodically), and WP All Import handles merging that data into your site. By following the above steps and tips, you’ll minimize manual data entry and avoid duplicate content issues. With a clear structure (titles, descriptions, prices, images, categories all mapped correctly), your product listings will be up in no time and easy to maintain.
I’m glad I stumbled upon this blog.