141

I have two worksheets in two different Excel files. They both contain a list of names, id numbers, and associated data. One is a master list that includes general demographic fields, and the other is a list that only includes name and id, and an address. This list was pared down from the master list by another office.

I want to use the 2nd list to filter the first. Additionally, I want the results to include other fields from the master worksheet alongside the address fields from the second worksheet. I know how I could do this very easily with a database inner join, but I'm less clear on how to do this efficiently in Excel. How can join two worksheets in Excel? Bonus points for showing how to do outer joins as well, and I would greatly prefer knowing how to do this without needing a macro.

Joel Coehoorn
  • 28,098
  • 14
  • 88
  • 133
  • 1
    I find the following page very usefull: http://www.randomwok.com/excel/how-to-use-index-match/ – Tommy Bravo Mar 04 '14 at 14:19
  • randomwok.com/excel/how-to-use-index-match => especially the easy "reminder" at the end: `=INDEX ( Column_I_want_a_return_value_from , ( MATCH ( My_Lookup_Value , Column_I_want_to_Lookup_against , 0 ))` – Tommy Bravo Mar 04 '14 at 14:25
  • 1
    A little Python will do as well `import pandas as pd; file1 = pd.read_csv("in_1.csv", sep=","); # alternatively read_excel() file2 = pd.read_csv("in_2.csv", sep=";"); merged = file1.merge(file2, left_on='ID', right_on='OTHER_ID', how='outer'); merged.to_excel("out.xlsx")`; – Boern Oct 21 '19 at 11:31

10 Answers10

162

For 2007+ use Data > From Other Sources > From Microsoft Query:

  1. choose Excel File and select your 1st excel
  2. choose columns
    (if you don't see any list of columns, make sure to check Options > System Tables)
  3. go to Data > Connections > [choose the connection just created] > Properties > Definition > Command text

You can now edit this Command text as SQL. Not sure what syntax is suported, but I tried implicit joins, "inner join", "left join" and unions which all work. Here is a sample query:

SELECT *
FROM `C:\Users\Peter\Documents\Excel-to-excel\Source_1.xlsx`.`Sheet1$` a
LEFT JOIN `C:\Users\Peter\Documents\Excel-to-excel\Source_2.xlsx`.`Sheet1$` b
ON a.col2 = b.col2
Aprillion
  • 2,013
  • 1
  • 12
  • 19
  • 1
    Is there a way to avoid hardcoding the path? Would relative path work? – Rekin Nov 14 '12 at 12:57
  • 35
    As an added bonus, Microsoft Query lets you experience 16-bit-style Common Controls for opening the Excel file and for message boxes. That way you can remember when you were young. :-) – Edward Brey Dec 08 '12 at 05:10
  • Does this work with CSV files? I'm using MS Office Professional Plus 2010 and don't see how to follow step 3 - do I leave the dialog in step 2 hanging? There is no "choose your new connection" in my Connections dialog. – John Freeman Aug 20 '14 at 18:01
  • @JohnFreeman because you have to **choose** one of the available options according to your needs (e.g. 'Add...') – Aprillion Aug 21 '14 at 10:32
  • 4
    Note that you might get a dialog box saying 'This data source contains no visible tables', followed by a 'Query Wizard - Choose Columns' dialog. It is from *this* dialog that you want to click on the 'Options' button, then check the 'System Tables' checkbox to see the data that you want to query. – Tola Odejayi Oct 14 '16 at 17:26
  • Can you make it work in Microsoft Excel 2016 for the Mac? – juandesant Nov 07 '17 at 19:22
  • @juandesant I don't have any experience with installing "ODBC Excel Driver" on Mac... – Aprillion Nov 14 '17 at 11:06
14

Support the accepted answer. I just want to emphasize on "choose columns (if you don't see any list of columns, make sure to check Options > System Tables)"

Once you select the excel file, very likely you will see this data source contains no visible tables prompt, and the available tabs and columns are none. Microsoft admitted that is a bug that the tabs in the excel files are treated as "System Tables", and the option for "System Tables" is not selected by default. So don't panic at this step, you just need to click "option" and check "System Tables", then you see the columns available.

Ben Lin
  • 303
  • 2
  • 7
11

You can use Microsoft Power Query, available for newer versions of Excel (similar to the accepted answer, but much simpler and easier). Power Query calls joins 'merges'.

The easiest way is to have your 2 Excel sheets as Excel tables. Then in Excel, go to the Power Query ribbon tab, and click the 'From Excel' button. Once you have imported both tables into Power Query, select one and click 'Merge'.

Update: Power Query is now built into Excel as 'Get and transform data', accessed from the Data tab on the ribbon.

anotherfred
  • 256
  • 2
  • 8
11

VLOOKUP and HLOOKUP could be used to search for matching primary keys (stored vertically or horizontally) and return values from 'attribute' columns/rows.

Reuben L.
  • 1,052
  • 6
  • 15
  • very usefull for sheets on the same workbook (but i find INDEX+MATCH functions even more usefull), a little more complicated when updating data from closed external workbooks... – Aprillion May 07 '12 at 09:51
5

While I think Aprillion's answer using Microsoft Query is excellent, it inspired me to use Microsoft Access to join the datasheets which I found much easier.

You need to have MS Access installed of course.

Steps:

  • Create a new Access database (or use a scratch DB).
  • Use Get External Data to import your Excel data as new tables.
  • Use Relationships to show how your tables are joined.
  • Set the Relationship type to match what you want (representing left join etc.)
  • Create a new query which joins your tables.
  • Use External Data->Export to Excel to generate your results.

I really couldn't have done that without Aprillion's great answer.

Dizzley
  • 1,041
  • 2
  • 12
  • 19
  • 1
    I've never used access before today, but this only took me 10 minutes. I copy/pasted the excel columns into 2 tables instead of importing. – jiggunjer Aug 19 '19 at 09:41
3

You can't preform SQL style joins on Excel tables from within Excel. That said, there are multiple ways to accomplish what you are trying to do.

In Excel, like Reuben says, the formulas that will probably work the best are VLOOKUP and HLOOKUP. In both cases, you match on a unique row and it returns the value of the given column\row to the left\down from the found id.

If you only want to add a couple extra fields to the second list, then add the formulas to the second list. If you want an "outer join" style table, then add the VLOOKUP formula to the first list with ISNA to test if the lookup was found. If Excel's Help doesn't give you enough details on how to use these in your particular instance, let us know.

If you prefer to use SQL then link the data into your database program, create your query, and export the results back to Excel. (In Access you can import Excel Worksheets or Named Ranges in as a Linked Table.)

mischab1
  • 1,312
  • 8
  • 10
3

At XLTools.net we have created a good alternative for MS Query to work especially with SQL Queries against Excel tables. It is called XLTools SQL Queries. It is a lot easier to use rather than MS Query and works really well if you just need to create and run SQL - no VBA, no complex manipulations with MS Query...

With this tool you can create any SQL query against tables in Excel workbook(s) using embedded SQL editor and run it immediately with the option to put result on a new or any existing worksheet.

You can use almost any type of join including LEFT OUTER JOIN (only RIGHT OUTER JOIN and FULL OUTER JOIN is not supported).

Here is an example:

XLTools SQL Queries - Query Builder

Peter Liapin
  • 121
  • 3
2

For Excel 2007 User: Data > From Other Sources > From Microsoft Query > browse to the Excel file

According to this article, querying from XLS version 2003 might result a "This data source contains no visible tables." error because your worksheets are treated as SYSTEM table. So check the "System Tables" in the "Query Wizard -- Choose Columns" dialog's options when you create the query will work round.

To define your join: Microsoft Query dialog > Table menu > Joins...

To return data to your original Excel sheet, choose "Return data to Excel Sheet" from Microsoft Query dialog > File menu.

clarkttfu
  • 141
  • 3
0

Looking for the same problem I came across RDBMerge, which I think is a user friendly way to Merge Data from Multiple Excel Workbooks, csv and xml files into a Summary Workbook.

lalthomas
  • 317
  • 2
  • 9
0

If you're familiar enough with databases, you could use SQL Server to connect both worksheets as Linked Servers and then use T-SQL to do your back-end data work. Then finish by connecting Excel back to SQL and pull the data into a table (regular or pivot). You can also consider using Powerpivot; it will allow joins between any database sources-including Excel used as flat databases.

dav
  • 10,021
  • 5
  • 30
  • 52
  • 2
    You're right, but the initial question is based upon using a spreadsheet (or two) to do a database function, so I'm not sure of any really efficient way to accomplish the task. – dav May 07 '12 at 11:54